Posts

A quick sanity testing of pgpool-II on ARM64

pgpool-II is a well-known tool to pool PostgreSQL connections and load-balance work loads. In this blog, we will verify whether pgpool works well on ARM64 architecture. ARM64 packages On Ubuntu, pgpool2 ARM64 debian package is made available by the Debian PostgreSQL Maintainers : $ dpkg -s pgpool2 Maintainer: Debian PostgreSQL Maintainers <team+postgresql@tracker.debian.org> Architecture: arm64 Version: 4.1.4-3.pgdg18.04+2 Replaces: pgpool Depends: libpgpool0 (= 4.1.4-3.pgdg18.04+2), lsb-base (>= 3.0-3), postgresql-common (>= 26), ucf, libc6 (>= 2.17), libmemcached11, libpam0g (>= 0.99.7.1), libpq5, libssl1.1 (>= 1.1.0) On CentOS, the ARM64 yum packages are not available. But you can always download the source from the repository or using git and then install from the source.   Installation from source pgpool build requires installed PostgreSQL on the same machine. The build uses 'pg_config' command to locate the required PostgreSQL header files, library

A quick sanity testing of PostgreSQL parallel query on Arm64

Query parallelism is supported in PostgreSQL since quite a while now. People in the PG community call it "Parallel query", but by now it is not limited to just SELECT queries. Index build leverages multiple cores; and even utilities like VACUUM now make use of parallelism. Furthermore, community is working on parallelizing COPY and INSERTs.                                                                                  I was interested to do kind-of "sanity" check of this capability specifically on ARM64 platform. Let's see how it goes. And also at the same time, we will try to understand little bit of how to interpret the parallelism part of the plan output. Subqueries and partitions are not covered in this blog; probably I will add it in another blog.                                                                                  For running the queries I generated a scale-5 TPC-H benchmark schema with the help of scripts taken from https://github.com/tvond

Need for external compression methods in PostgreSQL

 Every modern database system has some way to compress its data at some level. The obvious reason for this feature is to reduce the size of it's database, especially in today's world where the data is growing exponentially. The less obvious reason is to improve query performance; the idea is: smaller data size means less data pages to scan, which means lesser disk i/o and faster data access. So, in any case, data de-compression should be fast enough so as not to hamper the query performance, if not improve it. Compression is offered at different levels : page compression, row compression, column compression, etc. Columnar databases have the advantage of a very high compression ratio of its column because of presence of a repetetive pattern of contiguous data in a column. Another case is when, in a row oriented database, the column values are so large that it makes sense to compress individual values of the column. Such values can even be kept separately if they do not fit in a

Backtraces in PostgreSQL

PostgreSQL 13 has introduced a simple but extremely useful capability to log a stack trace into the server logs when an error is reported. Let's see the details. There is a GUC to enable stacktrace generation : backtrace_functions. Set it to a comma-separated function names. SET backtrace_functions TO 'func1,func2'; If the error is thrown from one of these functions, a backtrace will be generated and logged into the server log. Note that only superusers can set the backtrace_functions GUC. It can be set locally in a session, or can be included in postgresql.conf file to globally set it. It's easy to see how it would help in a situation where a customer reports an error message. We can find from where it came from by grep'ing for it in the source code. But beyond that, it was all guess work. Not anymore. Now, you can ask the customer to set backtrace_functions to all such functions which are emitting this error message, and get the stack trace. In most cases, the roo

Leveraging SIMD Vectorization

With the advent of column store databases, there was an urge to make use of SIMD vector processing. It naturally fits into the way table data is arranged. Let's first briefly check what is SIMD. It stands for Single Instruction Multiple Data. Today, CPU instructions support this kind of mechanism where the same instruction can be executed simultaneously on multiple data elements. E.g. Say, you want to double all the column values. Or remove the red component of the RGB values of pixels of an image. For large data, these operations are CPU bottlenecks. So SIMD cuts the CPU time significantly by operating simultaneously on 2, 4, 8, 16 or 32 (or more) data elements depending on the size of each data element. So suppose we want to do "arr[i] *= 2" for each element of "int32 arr[]". Normally we would iterate through each of the elements for doing this operation. In the generated assembly code, MUL instruction will be run on each of the elements. With SIMD, we would

PostgreSQL on ARM

Image
In my last blog , I wrote that applications that have been running on x86 might need to undergo some adaptation if they are to be run on a different architecture such as ARM. Let's see what it means exactly. Recently I have been playing around with PostgreSQL RDBMS using an ARM64 machine. A few months back, I even didn't know whether it can be compiled on ARM, being oblivious of the fact that we already have a regular build farm member for ARM64 for quite a while . And now even the PostgreSQL apt repository has started making PostgreSQL packages available for ARM64 architecture. But the real confidence on the reliability of PostgreSQL-on-ARM came after I tested it with different kinds of scenarios. I started with read-only pgbench tests and compared the results on the x86_64 and the ARM64 VMs available to me. The aim was not to compare any specific CPU implementation. The idea was to find out scenarios where PostgreSQL on ARM does not perform in one scenario as good as