James Antill (illiterat) wrote,

Python uses too much memory?

There's a common attack leveled against Python applications that they take up too much memory, by people who understand the language difference (against, say, C) and by people just looking at their process list. This is often esp. evident on the newer x86_64 computers.

So as the Fedora Python maintainer, and a yum developer (an application written mostly using python), I figured it was probably worth investigating what the difference really was.

First I wrote a simple program which just created new yum.YumBase() objects and appended them to a list (numbers got from parsing /proc/self/status) which gave the following results:

.x86_64     0 peak 219.90MB size 219.90MB rss  13.30MB
.x86_64     1 peak 219.90MB size 219.90MB rss  13.33MB
.x86_64 90001 peak 610.46MB size 610.46MB rss 403.75MB

.i386       0 peak  20.65MB size  20.65MB rss   9.61MB
.i386       1 peak  20.65MB size  20.65MB rss   9.63MB
.i386   90001 peak 212.77MB size 212.77MB rss 201.82MB

...which seems pretty damning of python on .x86_64 and/or yum, 2x for RSS and much more for VSZ (10x to start with above, which is obviously a lot). So then I added a "pmap" call right at the end, to find out where that allocated memory was going, the most interesting pieces of data being:

0000000000601000 449696K rw---    [ anon ]
[...]
00002aaaaab5a000  76136K r----  /usr/lib/locale/locale-archive
[...]
00002aaaafa8d000     20K r-x--  /usr/lib64/python2.5/lib-dynload/stropmodule.so
00002aaaafa92000   2044K -----  /usr/lib64/python2.5/lib-dynload/stropmodule.so
00002aaaafc91000      8K rw---  /usr/lib64/python2.5/lib-dynload/stropmodule.so

...on .x86_64, and taking single shared object as an example vs on .i386:

00c58000             16K r-x--  /usr/lib/python2.5/lib-dynload/stropmodule.so
00c5c000              8K rwx--  /usr/lib/python2.5/lib-dynload/stropmodule.so
[...]
09290000         222296K rwx--    [ anon ]
[...]
b7d23000           2048K r----  /usr/lib/locale/locale-archive

...as you can see the shared library has a 2MB hole in the middle of it, which is counted towards it's VSZ even though it is not writable, executable or readable (and so I'd assume is not using any real memory). This basically means that VSZ is worthless on .x86_64, and is just even more worthless for python programs because they tend to load more shared objects.

The locale archive being 38 times bigger is explained by these lines from glibc/locale/loadarchive.c:

      /* Map an initial window probably large enough to cover the header
         and the first locale's data.  With a large address space, we can
         just map the whole file and be sure everything is covered.  */

      mapsize = (sizeof (void *) > 4 ? archive_stat.st_size
                 : MIN (archive_stat.st_size, ARCHIVE_MAPPING_WINDOW));

      result = __mmap64 (NULL, mapsize, PROT_READ, MAP_FILE|MAP_COPY, fd, 0);

...which means any program that uses the C locale functions gets an extra ~73MB of VSZ at startup on .x86_64.

The next interesting part of the data from pmap is that there are roughly 24 "anonymous" mappings for .x86_64 and only 20 for .i386, a little investigation shows that glibc is again the reason as the default value for M_MMAP_THRESHOLD (basically when glibc creates new entries for data, instead of reusing old ones) doesn't expand with size_t/time_t/etc. (which are twice as big). You can see this by setting MALLOC_MMAP_MAX_=0 in the environment, before running your application and that will produce the same number of "anonymous" mappings on x86_64.

And after taking into account all of the above, which is completely the domain of glibc and not python, the memory numbers add up as "simple doubling" as you go from 4 byte size_t/time_t/intptr_t/etc. to 8 bytes for the same.

Tags: fedora, performance, python, yum
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded  

  • 0 comments