Post a comment - James Antill
May. 6th, 2008
02:22 am - Python uses too much memory?
There's a common attack leveled against Python applications that they take up too much memory, by people who understand the language difference (against, say, C) and by people just looking at their process list. This is often esp. evident on the newer x86_64 computers.
So as the Fedora Python maintainer, and a yum developer (an application written mostly using python), I figured it was probably worth investigating what the difference really was.
First I wrote a simple program which just created new yum.YumBase() objects and appended them to a list (numbers got from parsing /proc/self/status) which gave the following results:
.x86_64 0 peak 219.90MB size 219.90MB rss 13.30MB .x86_64 1 peak 219.90MB size 219.90MB rss 13.33MB .x86_64 90001 peak 610.46MB size 610.46MB rss 403.75MB .i386 0 peak 20.65MB size 20.65MB rss 9.61MB .i386 1 peak 20.65MB size 20.65MB rss 9.63MB .i386 90001 peak 212.77MB size 212.77MB rss 201.82MB
...which seems pretty damning of python on .x86_64 and/or yum, 2x for RSS and much more for VSZ (10x to start with above, which is obviously a lot). So then I added a "pmap" call right at the end, to find out where that allocated memory was going, the most interesting pieces of data being:
0000000000601000 449696K rw--- [ anon ] [...] 00002aaaaab5a000 76136K r---- /usr/lib/locale/locale-archive [...] 00002aaaafa8d000 20K r-x-- /usr/lib64/python2.5/lib-dynload/stropmodule.so 00002aaaafa92000 2044K ----- /usr/lib64/python2.5/lib-dynload/stropmodule.so 00002aaaafc91000 8K rw--- /usr/lib64/python2.5/lib-dynload/stropmodule.so
...on .x86_64, and taking single shared object as an example vs on .i386:
00c58000 16K r-x-- /usr/lib/python2.5/lib-dynload/stropmodule.so 00c5c000 8K rwx-- /usr/lib/python2.5/lib-dynload/stropmodule.so [...] 09290000 222296K rwx-- [ anon ] [...] b7d23000 2048K r---- /usr/lib/locale/locale-archive
...as you can see the shared library has a 2MB hole in the middle of it, which is counted towards it's VSZ even though it is not writable, executable or readable (and so I'd assume is not using any real memory). This basically means that VSZ is worthless on .x86_64, and is just even more worthless for python programs because they tend to load more shared objects.
The locale archive being 38 times bigger is explained by these lines from glibc/locale/loadarchive.c:
/* Map an initial window probably large enough to cover the header and the first locale's data. With a large address space, we can just map the whole file and be sure everything is covered. */ mapsize = (sizeof (void *) > 4 ? archive_stat.st_size : MIN (archive_stat.st_size, ARCHIVE_MAPPING_WINDOW)); result = __mmap64 (NULL, mapsize, PROT_READ, MAP_FILE|MAP_COPY, fd, 0);
...which means any program that uses the C locale functions gets an extra ~73MB of VSZ at startup on .x86_64.
The next interesting part of the data from pmap is that there are roughly 24 "anonymous" mappings for .x86_64 and only 20 for .i386, a little investigation shows that glibc is again the reason as the default value for M_MMAP_THRESHOLD (basically when glibc creates new entries for data, instead of reusing old ones) doesn't expand with size_t/time_t/etc. (which are twice as big). You can see this by setting MALLOC_MMAP_MAX_=0 in the environment, before running your application and that will produce the same number of "anonymous" mappings on x86_64.
And after taking into account all of the above, which is completely the domain of glibc and not python, the memory numbers add up as "simple doubling" as you go from 4 byte size_t/time_t/intptr_t/etc. to 8 bytes for the same.