Backtraces in PostgreSQL

PostgreSQL 13 has introduced a simple but extremely useful capability to log a stack trace into the server logs when an error is reported. Let's see the details.

There is a GUC to enable stacktrace generation : backtrace_functions. Set it to a comma-separated function names.

SET backtrace_functions TO 'func1,func2';

If the error is thrown from one of these functions, a backtrace will be generated and logged into the server log.

Note that only superusers can set the backtrace_functions GUC. It can be set locally in a session, or can be included in postgresql.conf file to globally set it.

It's easy to see how it would help in a situation where a customer reports an error message. We can find from where it came from by grep'ing for it in the source code. But beyond that, it was all guess work. Not anymore. Now, you can ask the customer to set backtrace_functions to all such functions which are emitting this error message, and get the stack trace. In most cases, the root cause of the error is not in the function which emits the error; its located somewhere in the middle of the stack; hence the stack trace is critical.

This capability is already available in many other databases like MySQL, Greenplum, Oracle.

What's still missing in PostgreSQL - and is present in most of these other databases - is being able to generate stack trace when a server backend crashes with a segmentation fault or other such unexpected signals, or when the server PANICs due to some reason. This capability would make a much bigger difference. We will get rid of having to explain steps to generate core file. More importantly, this helps in situations where the crash happens only randomly. Even with a single unexpected crash, the customer would always be ready with a backtrace. I am hopeful this would be implemented in the next major release of PostgreSQL.

Let's see how a PostgreSQL stack trace log looks like. We will try to use a non-existent type to create a table. Supposing we know that the "type does not exist" error comes from typenameType() in the source code. So we do this :

postgres=# set backtrace_functions TO 'typenameType';
postgres=# create table tab (id invalidtype);
ERROR:  type "invalidtype" does not exist
LINE 1: create table tab (id invalidtype);

Here's a snippet from the server log :
2020-07-28 20:17:01.482 CST [22454] ERROR:  type "invalidtype" does not exist at character 22
2020-07-28 20:17:01.482 CST [22454] BACKTRACE:                                 
    postgres: amit postgres [local] CREATE TABLE(typenameType+0xa4) [0xaaaaafcd2ac4]
    postgres: amit postgres [local] CREATE TABLE(+0x20f550) [0xaaaaafcd4550]   
    postgres: amit postgres [local] CREATE TABLE(transformCreateStmt+0x53c) [0xaaaaafcd7a10]
    postgres: amit postgres [local] CREATE TABLE(+0x44df20) [0xaaaaaff12f20]   
    postgres: amit postgres [local] CREATE TABLE(standard_ProcessUtility+0x16c) [0xaaaaaff1225c]
    postgres: amit postgres [local] CREATE TABLE(+0x44a4e4) [0xaaaaaff0f4e4]   
    postgres: amit postgres [local] CREATE TABLE(+0x44af88) [0xaaaaaff0ff88]   
    postgres: amit postgres [local] CREATE TABLE(PortalRun+0x198) [0xaaaaaff10ed8]
    postgres: amit postgres [local] CREATE TABLE(+0x44764c) [0xaaaaaff0c64c]    
    postgres: amit postgres [local] CREATE TABLE(PostgresMain+0x970) [0xaaaaaff0d3d4]
    postgres: amit postgres [local] CREATE TABLE(+0x3b3be4) [0xaaaaafe78be4]    
    postgres: amit postgres [local] CREATE TABLE(PostmasterMain+0xdc0) [0xaaaaafe79b70]
    postgres: amit postgres [local] CREATE TABLE(main+0x480) [0xaaaaafb82510]  
    /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffffaac956e0]  
    postgres: amit postgres [local] CREATE TABLE(+0xbd5d8) [0xaaaaafb825d8]    
2020-07-29 18:01:02.726 CST [28776] STATEMENT:  create table tab (id invalidtype);    

Each line of the backtrace has the function name, an offset into that function, and the return address of that frame.

For some stack frames, the function name is not present; instead, the function address is present. These are static functions. For such functions, the function names are not exposed. But we may be able to get their names from their addresses, with the help of addr2line command-line tool :

$ addr2line  0x20f550 0x44df20 -a -f -e `which postgres`
0x000000000020f550
transformColumnDefinition
:?
0x000000000044df20
ProcessUtilitySlow.constprop.0
:?

If it's a debug build, even the file name and offset is printed.



Now let's see how this simple feature is implemented.

In most of the RDBMS's including PostgreSQL, Greenplum, MySQL, the feature is implemented using a simple function backtrace() to generate the stacktrace:

int backtrace(void **buffer, int size);

This function only returns all the return addresses of the frames.  So it should be followed by a call to backtrace_symbols() that converts the addresses returned by backtrace() into strings that describe the addresses using the function names if available :

char **backtrace_symbols(void *const *buffer, int size);

All the details of these functions are nicely described in their man pages. These functions are available in most of the platforms.

Note a couple of points :

1. For the function names to be available for backtrace_symbols(), the executable has to be built using linker options that allow adding all these symbols into a "dynamic symbol table". These options can be given with one of the following ways (these are gcc compiler options) :
gcc -rdynamic
gcc -Wl,-E

2. Sometimes particular stack frames might be missing, when compiled with gcc -O2 or higher optimization level. E.g. check this sample program backtrace.c from the backtrace() man pages.

I compile it without -O2 :
amit:pg:error$ gcc -rdynamic -o backtrace backtrace.c
I get the full stack :
amit:pg:error$ ./backtrace 6
backtrace() returned 11 addresses
./backtrace(myfunc3+0x2c) [0xaaaad6b2edc0]
./backtrace(+0xe84) [0xaaaad6b2ee84]
./backtrace(myfunc+0x2c) [0xaaaad6b2eebc]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(myfunc+0x24) [0xaaaad6b2eeb4]
./backtrace(main+0x60) [0xaaaad6b2ef28]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8c5ba6e0]
./backtrace(+0xcc4) [0xaaaad6b2ecc4]

Now I compile it with -O2 :
amit:pg:error$ gcc -O2 -rdynamic -o backtrace backtrace.c
amit:pg:error$ ./backtrace 6
backtrace() returned 4 addresses
./backtrace(myfunc3+0x38) [0xaaaac7183e40]
./backtrace(main+0x4c) [0xaaaac7183cfc]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffffb91286e0]
./backtrace(+0xd38) [0xaaaac7183d38]

There is no frame for myfunc2() and myfunc(). One possibility is that the compiler has replaced the recursive calls of myfunc() and also myfunc2() call with the tail end call myfunc3(), which is called tail call optimization.

The point being: we need to be aware of such missing frames in a few scenarios.


Comments

Popular posts from this blog

PostgreSQL on ARM

Need for external compression methods in PostgreSQL