Mittwoch, 17. September 2014

Ada Direct I/O B-Tree vs SQLite benchmark


1. Abstract

SQLite is actively used in Ada community for storing persistent data. The advantage of SQLite over other RDBMS is that the database is kept in a single file. The database management system is supplied as plain C code which can be statically linked to the application, which makes it ideal for small embedded applications and other applications where a standalone database server looks an overkill. There are drawbacks as well. SQLite is written in C, which feels uncomfortable to use in an Ada application with its focus on safety and maintainability. Worse is that SQLite has a SQL interface with its high relational to host language impedance mismatch. Possible advantages of a relational database are minimal or non-existent for the use case when SQLite comes in question, while the costs of coding, design and performance needed to translate SQL statements and binding parameters are high.
In this post I present a pure Ada alternative to SQLite, based on the standard Ada library package Ada.Direct_IO with an implementation of a transaction layer on top of it and an implementation of B-tree resident in the file. I also provide some benchmark tests to compare its performance against SQLite. The software is a part of the Simple Components for Ada library available under GM GPL for many platforms. It is kept backward compatible to Ada 95, and can be used with Ada 2005 or 2012 as well.

2. The approach

The package Ada.Direct_IO provides system-independent random file access interface. It is a generic package instantiated with the type of the file block. Blocks can be read and written in any order indexed by block number.
The package Persistent.Blocking_Files adds a memory cache to Ada.Direct_IO file. When a block is requested it is read into the cache. Sequential reading and writing occurs on the cache until the block is there. Cached blocks are marked as updated when written and stored back into the file when the block is removed from the cache or else the file is closed.
The package Persistent.Blocking_Files.Transactional extends Persistent.Blocking_Files by adding virtual to physical block mapping while keeping the interface. When a virtual block is updated its physical counterpart is not changed. An unused physical block is allocated and the virtual block is remapped to it. This new physical block is kept overwritten until closing the transaction. The implementation maintains the list of free blocks and the list of disposed blocks which become free upon committing the transaction. Since original blocks are not changed until translation commit, an application or system crash would be equivalent to rollback.
Within the file blocks the package Persistent.Memory_Pools allocates blocks of varying size similarly to dynamic memory pools. The blocks can be allocated and freed like in standard pools. The size of a block is limited by the size of the file block. The package provides input and output streams which allow writing objects larger than the maximal block size. The streams handle chains of blocks transparently to the application. Additionally the package maintain a mutex object that allows concurrent access to the pool.
The package Persistent.Memory_Pools.Streams.Generic_External_B_Tree provides an implementation of B-trees with the buckets (nodes) allocated in the virtual blocks and leaves allocated in the pool. A B-tree is a key to value map, which can effectively searched by the key. It also can be searched for a less-or-equal and greater-or-equal key. The tree leaves can be traversed in the key order. B-trees buckets have fixed size to fit the file blocks.

3. The benchmark

The benchmarks compares performance of the B-tree implementation with an SQLite table with two columns. The SQLite3 from the amalgamation version 3080200 was used for the benchmark. The amalgamation was statically linked to the benchmark application. The SQLite benchmark ran in the full-mutex mode and used the following table:
CREATE TABLE test_table (key TEXT PRIMARY KEY, value TEXT)
The B-tree benchmark used an instance Persistent.Memory_Pools.Streams.Generic_External_B_Tree with String key and String value.
The benchmark was compiled without optimization. Two platforms were used for test environment:
Platform      OS      CPU       GNAT
A     Windows 7 64-bit SP1    Intel i7-2700K 3.5GHz    GPL 2014 (20140331)
B   Fedora 20 64-bit   Intel Core 2 6600 2.4GHz   4.8.3 (20140624)

3.1 Insert test

In this test 1000 randomly generated keys and values pairs were inserted into the table and the tree. Both random sequences used the same seed. SQLite insertion test used this prepared statement:
INSERT INTO test_table VALUES (?, ?)
Both tests committed each insertion.
Insertion test Time, ms
(per operation)
A   B
SQLite 56.068      254.926
B-Tree 0.196   0.278

3.2 Search test

All generated keys were used to extract the corresponding value. SQLite search test used this prepared statement:
SELECT value FROM test_table WHERE key=?
In both tests the result was compared against expected value.
Search test Time, ms
(per operation)
     B
SQLite 0.0449   0.0267
B-Tree 0.0133   0.0204

3.3 Update test

In this test for each generated key the corresponding value was changed to twice longer one. SQLite test used this prepared statement:
UPDATE test_table SET value=? WHERE key=?
Both tests committed each update.
Update test Time, ms
(per operation)
A      B
SQLite 76.077     250.144
B-Tree 0.247   0.293

3.4 Deletion test

In this test a half of generated keys was removed. The SQLite test used the following prepared statement:
DELETE FROM test_table WHERE key=?
Both tests committed each delete.
Delete test Time, ms
(per operation)
A      B
SQLite 94.989    275.070
B-Tree 0.283   0.309

4. Building the benchmark

To build the benchmark from sources download Simple Components for Ada and unpack it into a directory like:
mkdir test
cd test
zcat components_4_5.tgz | tar -xvf -
Go to the subdirectory test_components and compile the test:
cd test_components
gcc -c ../sqlite-sources/sqlite3.c
gnatmake -I.. test_sqlite_benchmark.adb -largs sqlite3.o -ldl
Run the test:
./test_sqlite_benchmark

5. Conclusion

Ada's Direct_IO offers a viable alternative to SQLite for Ada applications. The performance tests show sufficiently better performance of mutable operations. Direct use of B-tree operations and the persistent memory pool based on Ada stream attributes is easier for handling complex Ada types of persistent objects than breaking them into the limited set of types supported by the SQL version of SQLite.