Ada Direct I/O B-Tree vs SQLite benchmark
1. Abstract
SQLite is actively used in Ada community for storing
persistent data. The advantage of SQLite over other RDBMS is that the database is kept in a
single file. The database management system is supplied as plain C code which
can be statically linked to the application, which makes it ideal for small embedded applications
and other applications where a standalone database server looks an overkill. There
are drawbacks as well. SQLite is written in C, which feels uncomfortable to use
in an Ada application with its focus on safety and maintainability. Worse is
that SQLite has
a SQL interface with its high relational to host language impedance mismatch. Possible advantages of a relational database are minimal or non-existent for the
use case when SQLite comes in question, while the costs of coding, design and
performance needed to translate SQL statements and binding parameters are high.
In this post I present a pure Ada alternative to SQLite, based on the standard
Ada library package Ada.Direct_IO with an implementation of a transaction layer
on top of it and an implementation of B-tree resident in the file. I also
provide some benchmark tests to compare its performance against SQLite. The
software is a part of the
Simple Components for Ada library available under GM
GPL for many platforms. It is kept backward compatible to Ada 95, and can be
used with Ada 2005 or 2012 as well.
2. The approach
The package Ada.Direct_IO provides system-independent random file access
interface. It is a generic package instantiated with the type of the file block.
Blocks can be read and written in any order indexed by block number.
The package Persistent.Blocking_Files
adds a memory cache to Ada.Direct_IO file. When a block is requested it is read
into the cache. Sequential reading and writing occurs on the cache until the
block is there. Cached blocks are marked as updated when written and stored back
into the file when the block is removed from the cache or else the file is
closed.
The package
Persistent.Blocking_Files.Transactional
extends Persistent.Blocking_Files
by adding virtual to physical block mapping while keeping the interface. When a
virtual block is updated its physical counterpart is not changed. An unused
physical block is allocated and the virtual block is remapped to it. This new
physical block is kept overwritten until closing the transaction. The
implementation maintains the list of free blocks and the list of disposed blocks
which become free upon committing the transaction. Since original blocks are not
changed until translation commit, an application or system crash would be
equivalent to rollback.
Within the file blocks the package Persistent.Memory_Pools
allocates blocks of varying size similarly to dynamic memory pools. The blocks
can be allocated and freed like in standard pools. The size of a block is
limited by the size of the file block. The package provides input and output
streams which allow writing objects larger than the maximal block size. The
streams handle chains of blocks transparently to the application. Additionally
the package maintain a mutex object that allows concurrent access to the pool.
The package Persistent.Memory_Pools.Streams.Generic_External_B_Tree
provides an implementation of B-trees with the buckets (nodes) allocated in the
virtual blocks and leaves allocated in the pool. A
B-tree is a key to value map, which can effectively searched by the key. It also
can be searched for a less-or-equal and greater-or-equal key. The tree leaves can
be traversed in the key order. B-trees buckets have fixed size to fit the file
blocks.
3. The benchmark
The benchmarks compares performance of the B-tree implementation with an SQLite
table with two columns. The SQLite3 from the
amalgamation version 3080200 was used for the benchmark. The
amalgamation was statically linked to the benchmark application. The SQLite
benchmark ran in the full-mutex mode and used the following table:
CREATE TABLE test_table (key TEXT PRIMARY KEY, value TEXT)The B-tree benchmark used an instance Persistent.Memory_Pools.Streams.Generic_External_B_Tree with String key and String value.
The benchmark was compiled without optimization. Two platforms were used for test environment:
Platform OS CPU GNAT A Windows 7 64-bit SP1 Intel i7-2700K 3.5GHz GPL 2014 (20140331) B Fedora 20 64-bit Intel Core 2 6600 2.4GHz 4.8.3 (20140624)
3.1 Insert test
In this test 1000 randomly generated keys and values pairs were inserted into the table and the tree. Both random sequences used the same seed. SQLite insertion test used this prepared statement:INSERT INTO test_table VALUES (?, ?)Both tests committed each insertion.
Insertion test Time, ms
(per operation)A B SQLite 56.068 254.926 B-Tree 0.196 0.278
3.2 Search test
All generated keys were used to extract the corresponding value. SQLite search test used this prepared statement:SELECT value FROM test_table WHERE key=?In both tests the result was compared against expected value.
Search test Time, ms
(per operation)A B SQLite 0.0449 0.0267 B-Tree 0.0133 0.0204
3.3 Update test
In this test for each generated key the corresponding value was changed to twice longer one. SQLite test used this prepared statement:UPDATE test_table SET value=? WHERE key=?Both tests committed each update.
Update test Time, ms
(per operation)A B SQLite 76.077 250.144 B-Tree 0.247 0.293
3.4 Deletion test
In this test a half of generated keys was removed. The SQLite test used the following prepared statement:DELETE FROM test_table WHERE key=?Both tests committed each delete.
Delete test Time, ms
(per operation)A B SQLite 94.989 275.070 B-Tree 0.283 0.309
4. Building the benchmark
To build the benchmark from sources download
Simple Components for Ada
and unpack it into a directory like:
Go to the subdirectory test_components and compile the test:mkdir test
cd test
zcat components_4_5.tgz | tar -xvf -
Run the test:cd test_components
gcc -c ../sqlite-sources/sqlite3.c
gnatmake -I.. test_sqlite_benchmark.adb -largs sqlite3.o -ldl
./test_sqlite_benchmark
5. Conclusion
Ada's Direct_IO offers a viable alternative to SQLite for Ada applications. The
performance tests show sufficiently better performance of mutable operations.
Direct use of B-tree operations and the persistent memory pool based on Ada
stream attributes is easier for handling complex Ada types of persistent objects
than breaking them into the limited set of types supported by the SQL version of
SQLite.