2008-08-26

Malloc behavior study

While investigating a memory consuming program with no memory leak, I was wondering how the GNU libc behaves with a bad allocation pattern I was seeing.

The allocation pattern was the following: given an initially allocated area, each time a reallocation is needed allocate twice the size of the previous area, then free the previous area.

With a schema it gives the following memory evolution:
#
~##
~~~####
~~~~~~~########

Where # represents allocated memory and ~ free memory.

As you can see choosing a reallocation size factor of 2 is pretty bad as you can never use the space freed to fit your new allocation. Of course that is unless the allocator does something clever to avoid such situation ;)

That's what I tested with the following code:
void printMallinfo(){
        struct mallinfo myMallinfo = mallinfo();
        printf(
                "Free: %9d, Allocated: %9d, Mmaped space: %9d\n",
                myMallinfo.fordblks,
                myMallinfo.uordblks,
                myMallinfo.hblkhd
                );
}

int main(
        int argc __attribute__((unused)),
        char **args __attribute__((unused))
        ){

        void   *ptrA = NULL;
        void   *ptrB = NULL;

        printMallinfo();
        for(size_t size=1; size<1024*1024*50;size*=2){
                ptrB = ptrA;
                ptrA = malloc(size);
                free(ptrB);

                printf("Size: %9d, ptrA: %p ", size, ptrA);
                printMallinfo();
        }

        free(ptrA);
        printMallinfo();

        return EXIT_SUCCESS;
}
The result clearly shows that while the allocated size is small enough we experience the behavior described above, but after the allocator switches to mmaped memory:
./malloc_behavior_study
Free:         0, Allocated:         0, Mmaped space:         0
Size:         1, ptrA: 0x804b008 Free:    135152, Allocated:        16, Mmaped space:         0
Size:         2, ptrA: 0x804b018 Free:    135152, Allocated:        16, Mmaped space:         0
Size:         4, ptrA: 0x804b008 Free:    135152, Allocated:        16, Mmaped space:         0
Size:         8, ptrA: 0x804b018 Free:    135152, Allocated:        16, Mmaped space:         0
Size:        16, ptrA: 0x804b028 Free:    135144, Allocated:        24, Mmaped space:         0
Size:        32, ptrA: 0x804b040 Free:    135128, Allocated:        40, Mmaped space:         0
Size:        64, ptrA: 0x804b068 Free:    135096, Allocated:        72, Mmaped space:         0
Size:       128, ptrA: 0x804b0b0 Free:    135032, Allocated:       136, Mmaped space:         0
Size:       256, ptrA: 0x804b138 Free:    134904, Allocated:       264, Mmaped space:         0
Size:       512, ptrA: 0x804b240 Free:    134648, Allocated:       520, Mmaped space:         0
Size:      1024, ptrA: 0x804b448 Free:    134136, Allocated:      1032, Mmaped space:         0
Size:      2048, ptrA: 0x804b850 Free:    133112, Allocated:      2056, Mmaped space:         0
Size:      4096, ptrA: 0x804c058 Free:    131064, Allocated:      4104, Mmaped space:         0
Size:      8192, ptrA: 0x804d060 Free:    126968, Allocated:      8200, Mmaped space:         0
Size:     16384, ptrA: 0x804f068 Free:    118776, Allocated:     16392, Mmaped space:         0
Size:     32768, ptrA: 0x8053070 Free:    102392, Allocated:     32776, Mmaped space:         0
Size:     65536, ptrA: 0x805b078 Free:     69624, Allocated:     65544, Mmaped space:         0
Size:    131072, ptrA: 0xb7e12008 Free:    135168, Allocated:         0, Mmaped space:    135168
Size:    262144, ptrA: 0xb7dd1008 Free:    135168, Allocated:         0, Mmaped space:    266240
Size:    524288, ptrA: 0xb7d50008 Free:    135168, Allocated:         0, Mmaped space:    528384
Size:   1048576, ptrA: 0xb7c4f008 Free:    135168, Allocated:         0, Mmaped space:   1052672
Size:   2097152, ptrA: 0xb7a4e008 Free:    135168, Allocated:         0, Mmaped space:   2101248
Size:   4194304, ptrA: 0xb764d008 Free:    135168, Allocated:         0, Mmaped space:   4198400
Size:   8388608, ptrA: 0xb6e4c008 Free:    135168, Allocated:         0, Mmaped space:   8392704
Size:  16777216, ptrA: 0xb5e4b008 Free:    135168, Allocated:         0, Mmaped space:  16781312
Size:  33554432, ptrA: 0xb3e4a008 Free:    135168, Allocated:         0, Mmaped space:  33558528
Free:    135168, Allocated:         0, Mmaped space:         0
This case is basic and the test program shows that the GNU libc handles properly such case. But we see some interesting specificity. First on my machine the allocation unit is 8 bytes with an overhead of 8 bytes per allocation. Then above 2^16 the allocator switches to mmaped memory. In fact I had no clue of how it was done in the GNU libc, of course we can now go directly to the source to understand in depth. it is interesting to remember that memory allocators have weaknesses, even the standard one. Some large projects already experienced fragmented memory due to a large number of small allocations. It may be interesting to switch to a custom allocator in that case. In my case the issue came from a memory hungry regex, detected thanks to valgrind --tool=massif. It is not always easy to reproduce such race conditions.

2008-08-22

Google Insights

Today I discovered a useful tool published by google: Google Insights. I am sure marketing people will enjoy it ;)

On the open source side it has been used to determine the popularity of Linux across the globe. It is particularly interesting to see which country has the more interest in which distribution.
My favorite Linux distribution Gentoo is not presented in the article, but you may find the related stats here.

The link between term popularity and headlines on the graph is especially interesting as it gives a good insight of what happened.

2008-08-19

Blog code syntax highlighting added

I have integrated SyntaxHighlighter to my blog to ease readability of code snippets in my posts. As usual everything is hosted by google, for the javascripts I used googlepages.com

At least it will be consistent in case of failure ;)

Some example python code snippet to demonstrate syntax highlighting:

def square(x):
return x*x

2008-08-14

How to generate breakpoints from the source code ?

In order to ease debugging an issue, it may be wise to introduce breakpoints in your code directly at the source code level.

On POSIX compliant systems raising the SIGTRAP signal does the job. It is my preferred approach since it is portable across different OSes implementing signals, but also across architectures. I already used it successfully on Linux/x86, Linux/x86_64, Linux/ia64, Solaris/Sparc and AIX/PowerPC.
Another popular approach according to my google searches is to trigger interrupt 3 on x86 architecture (i.e. asm("int $3")). However it is not as portable as the previous method.

Example (view source repository):
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>

int main(
int argc __attribute__((unused)),
char **args __attribute__((unused))
){
puts("Something really wrong happend,"
"break here to help investigation.");
raise(SIGTRAP); /* Software Breakpoint */
puts("This sentence is after the breakpoint.");

return EXIT_SUCCESS;
}


As you can see in this example it is as easy as adding raise(SIGTRAP); where you would like to break.

Now if you run the resulting program your program will stop were you raise the SIGTRAP signal. If you are running the program standalone it will just stop there with a message like this:
Something really wrong happend, break here to help investigation.
Trace/breakpoint trap


If you have core dumps allowed on your system (see man ulimit especially the -c option for more details), you will see the following:
Something really wrong happend, break here to help investigation.
Trace/breakpoint trap (core dumped)

At that point you have a core dump with all the debugging information you need to investigate your issue. Now you can run gdb yourExecutable core to investigate:
(gdb) bt
#0  0xb7f89410 in __kernel_vsyscall ()
#1  0xb7e65101 in raise () from /lib/libc.so.6
#2  0x0804840d in main () at main.c:29
[...]

This proves very useful if you have difficulties identifying an issue with a live debugging session, but you know you can trigger it in "production" conditions.

Last but not least, you can start your executable directly in gdb. Your breakpoint will then be caught by gdb:
(gdb) run
Starting program: /home/julien/Projects/juliensexperiments/generate_software_breakpoint/generate_software_breakpoint
Something really wrong happend, break here to help investigation.

Program received signal SIGTRAP, Trace/breakpoint trap.
0xb7f67410 in __kernel_vsyscall ()
(gdb) c
Continuing.
This sentence is after the breakpoint.

Program exited normally.

As you can see I was able to continue behind the breakpoint, the same way you would with standard gdb breakpoints.

To sum-up the SIGTRAP signal allows source level breakpoints interpreted as such by debuggers, and allows core generation for off-production investigation.

The example code is available through subversion:
svn checkout http://juliensexperiments.googlecode.com/svn/trunk/generate_software_breakpoint generate_software_breakpoint
Just type scons to compile it.

2008-06-26

Repository created

For reliability reason, I have decided to store my source code on Google Code: http://code.google.com/p/juliensexperiments/

This blog and the corresponding source code should be available and accessible from anywhere regardless of my Internet connection state. If any limitation arise I may move to another hosting but given the low traffic (i.e. me only yet), it looks perfect !

First Post !

A few months after the creation of this blog I finally decide to use it.

My motivation is to keep track and share the IT experiments I make. This way they won't get lost in some part of my hard drive. It will also allow me to continue experiments and follow the evolution of my knowledge and of the technologies.

Eventually I hope the visitors' comments will enlighten my vision and allow me to have a broader knowledge.

I hope you will enjoy it as much as I enjoy playing with IT.