Mittwoch, 17. September 2014

Ada Direct I/O B-Tree vs SQLite benchmark

1. Abstract

SQLite is actively used in Ada community for storing persistent data. The advantage of SQLite over other RDBMS is that the database is kept in a single file. The database management system is supplied as plain C code which can be statically linked to the application, which makes it ideal for small embedded applications and other applications where a standalone database server looks an overkill. There are drawbacks as well. SQLite is written in C, which feels uncomfortable to use in an Ada application with its focus on safety and maintainability. Worse is that SQLite has a SQL interface with its high relational to host language impedance mismatch. Possible advantages of a relational database are minimal or non-existent for the use case when SQLite comes in question, while the costs of coding, design and performance needed to translate SQL statements and binding parameters are high.
In this post I present a pure Ada alternative to SQLite, based on the standard Ada library package Ada.Direct_IO with an implementation of a transaction layer on top of it and an implementation of B-tree resident in the file. I also provide some benchmark tests to compare its performance against SQLite. The software is a part of the Simple Components for Ada library available under GM GPL for many platforms. It is kept backward compatible to Ada 95, and can be used with Ada 2005 or 2012 as well.

2. The approach

The package Ada.Direct_IO provides system-independent random file access interface. It is a generic package instantiated with the type of the file block. Blocks can be read and written in any order indexed by block number.
The package Persistent.Blocking_Files adds a memory cache to Ada.Direct_IO file. When a block is requested it is read into the cache. Sequential reading and writing occurs on the cache until the block is there. Cached blocks are marked as updated when written and stored back into the file when the block is removed from the cache or else the file is closed.
The package Persistent.Blocking_Files.Transactional extends Persistent.Blocking_Files by adding virtual to physical block mapping while keeping the interface. When a virtual block is updated its physical counterpart is not changed. An unused physical block is allocated and the virtual block is remapped to it. This new physical block is kept overwritten until closing the transaction. The implementation maintains the list of free blocks and the list of disposed blocks which become free upon committing the transaction. Since original blocks are not changed until translation commit, an application or system crash would be equivalent to rollback.
Within the file blocks the package Persistent.Memory_Pools allocates blocks of varying size similarly to dynamic memory pools. The blocks can be allocated and freed like in standard pools. The size of a block is limited by the size of the file block. The package provides input and output streams which allow writing objects larger than the maximal block size. The streams handle chains of blocks transparently to the application. Additionally the package maintain a mutex object that allows concurrent access to the pool.
The package Persistent.Memory_Pools.Streams.Generic_External_B_Tree provides an implementation of B-trees with the buckets (nodes) allocated in the virtual blocks and leaves allocated in the pool. A B-tree is a key to value map, which can effectively searched by the key. It also can be searched for a less-or-equal and greater-or-equal key. The tree leaves can be traversed in the key order. B-trees buckets have fixed size to fit the file blocks.

3. The benchmark

The benchmarks compares performance of the B-tree implementation with an SQLite table with two columns. The SQLite3 from the amalgamation version 3080200 was used for the benchmark. The amalgamation was statically linked to the benchmark application. The SQLite benchmark ran in the full-mutex mode and used the following table:
CREATE TABLE test_table (key TEXT PRIMARY KEY, value TEXT)
The B-tree benchmark used an instance Persistent.Memory_Pools.Streams.Generic_External_B_Tree with String key and String value.
The benchmark was compiled without optimization. Two platforms were used for test environment:
Platform      OS      CPU       GNAT
A     Windows 7 64-bit SP1    Intel i7-2700K 3.5GHz    GPL 2014 (20140331)
B   Fedora 20 64-bit   Intel Core 2 6600 2.4GHz   4.8.3 (20140624)

3.1 Insert test

In this test 1000 randomly generated keys and values pairs were inserted into the table and the tree. Both random sequences used the same seed. SQLite insertion test used this prepared statement:
INSERT INTO test_table VALUES (?, ?)
Both tests committed each insertion.
Insertion test Time, ms
(per operation)
A   B
SQLite 56.068      254.926
B-Tree 0.196   0.278

3.2 Search test

All generated keys were used to extract the corresponding value. SQLite search test used this prepared statement:
SELECT value FROM test_table WHERE key=?
In both tests the result was compared against expected value.
Search test Time, ms
(per operation)
SQLite 0.0449   0.0267
B-Tree 0.0133   0.0204

3.3 Update test

In this test for each generated key the corresponding value was changed to twice longer one. SQLite test used this prepared statement:
UPDATE test_table SET value=? WHERE key=?
Both tests committed each update.
Update test Time, ms
(per operation)
A      B
SQLite 76.077     250.144
B-Tree 0.247   0.293

3.4 Deletion test

In this test a half of generated keys was removed. The SQLite test used the following prepared statement:
DELETE FROM test_table WHERE key=?
Both tests committed each delete.
Delete test Time, ms
(per operation)
A      B
SQLite 94.989    275.070
B-Tree 0.283   0.309

4. Building the benchmark

To build the benchmark from sources download Simple Components for Ada and unpack it into a directory like:
mkdir test
cd test
zcat components_4_5.tgz | tar -xvf -
Go to the subdirectory test_components and compile the test:
cd test_components
gcc -c ../sqlite-sources/sqlite3.c
gnatmake -I.. test_sqlite_benchmark.adb -largs sqlite3.o -ldl
Run the test:

5. Conclusion

Ada's Direct_IO offers a viable alternative to SQLite for Ada applications. The performance tests show sufficiently better performance of mutable operations. Direct use of B-tree operations and the persistent memory pool based on Ada stream attributes is easier for handling complex Ada types of persistent objects than breaking them into the limited set of types supported by the SQL version of SQLite.

Freitag, 28. Dezember 2012

Type introspection in Ada

Type introspection is a language feature which allows to determine properties of a type at run-time. Ada is a statically typed language so no introspection is needed, most of the time. When it comes to class-wide types their instances are not so static.

One might think that Ada does not support type introspection, which is almost true, but not quite true. In fact, Ada (starting with Ada 95) has features which can be used to introspect types. This is stream-oriented attributes (RM 13.13.2). The language generates default implementations of T'Read and T'Write based on the type contents. Of course, the compiler knows the type components in order to be able to generate these implementations. Can this knowledge be used for run-time introspection? Yes it can.

We start with the provisions of RM 13.13.2(9):

"... For type extensions, the Write or Read attribute for the parent type is called, followed by the Write or Read attribute of each component of the extension part, in canonical order. ..."
This gives us an idea of calling T'Write on the container type noticing what its implementation would call for the components.

First we declare the abstract base type of the container:

   type Container is abstract tagged limited null record;

   procedure Introspect

             (  Stream : access Root_Stream_Type'Class;

                Item   : Container

             )  is null;

   for Container'Write use Introspect;

Then the abstract base type of the members:

   type Member is abstract tagged limited null record;

   procedure Introspect

             (  Stream : access Root_Stream_Type'Class;

                Item   : Member


   for Member'Write use Introspect;

And finally a fake stream type which will collect the information of the container type contents:

   package List_Of_Tags is
      new Ada.Containers.Vectors (Positive, Tag);

   use List_Of_Tags;

   type Scriber is new Ada.Streams.Root_Stream_Type with record

      List : Vector;

   end record;

   procedure Read

             (  Stream : in out Scriber;

                Item   : out Stream_Element_Array;

                Last   : out Stream_Element_Offset

             )  is null;

   procedure Write

             (   Stream : in out Scriber;

                 Item   : Stream_Element_Array

             )   is null;

The read and write implementations do nothing. The stream contains a vector of type tags, written from the implementation of member's write:

   procedure Introspect

             (  Stream : access Root_Stream_Type'Class;

                Item   : Member

             )  is


      Scriber'Class (Stream.all).List.Append
      (  Member'Class (Item)'Tag

   end Introspect;

This is basically all. Putting the above in a package, e.g. Introspection, it could be used as follows:

with Ada.Tags;       use Ada.Tags;

with Ada.Text_IO;    use Ada.Text_IO;

with Introspection;  use Introspection;

procedure Test is

   type A is new Member with null record;

   type B is new Member with null record;

   type C is new Container with record

      X : A;

      Y : B;

   end record;

   Object   : C;

   Contents : aliased Scriber;


   C'Write (Contents'Access, Object);

   for Index in Contents.List.First_Index
             .. Contents.List.Last_Index

      Put (Integer'Image (Index));

      Put ("  ");

      Put (Expanded_Name (Contents.List.Element (Index)));


   end loop;

end Test;

This will print:

 1  TEST.A

 2  TEST.B

Final notes:
  1. The derived types of Container and Member should not override stream attributes. This will make the compiler to generate them.
  2. The method works recursively, that is you can have container types put into containers all introspected. The trick works as follows. From T'Write of the nested container you can introspect it using another stream object and use the information obtained.
  3. Be careful introspecting from Initialize. When Initialize is called, the derived type's Initialize is not yet completed, i.e. the type is not fully constructed. This is also the reason why T'Write is better to use than T'Read, even if T'Read does not actually update the object.

Sonntag, 9. Mai 2010

The "Any Competent Programmer" BS

The "Any Competent Programmer" BS: "An columnist wrote an article asking 'Why aren't developers interested in Ada', which was pretty good, but the first comment on the article kinda got me going.

Scottish Martin's comments do absolutely make some good points, and I had no quibble with them. He ends his comment, though, with one of my pet peeves: 'A professional team can develop quality software whatever the chosen implementation language and toolset.' (And that just set me off--though Martin's just in the wrong place at the wrong time. :-)

That statement is analogous to the 'Any competent programmer can write good code in any language' trope.

The advocated language could be Ada, Lisp, Haskell, or any of many others that face an uphill struggle for acceptance. The advocacy is dismissed with the claim that programming language choice just doesn't make much difference, and after all, a competent programmer can write quality software in any language.

While this claim about the ability to create good code may be true, it's irrelevant, and is usually thrown in the face of a developer who is advocating the use of a programming language that differs from the corporate herd selection, in order to shut them up, which it too often does. The claim, though, begs the question of how much it costs, in time and money, to develop that quality software using a chosen language and toolset. And whether a different choice could lead to quality software being developed faster and more cheaply, thereby encouraging the creation of even more quality software.

I've argued about this before. Programming language choice does matter, programming toolsets do matter. Programming language and development tools are where the bits hit the hardware, and if you want quality work from a developer, you need to use quality tools.

Seriously, do Indy and Formula One racing mechanics get their tools at WalMart and Harbor Freight? Because 'a professional mechanic ...'

Montag, 12. Mai 2008

Concurrency in Ada

Ada is renown for its concurrency support. Parallel programming is a difficult
issue in all aspects. It is difficult to learn, to design, to program, to validate, but
for all, it is most difficult to reuse.

Yesterday I finished the version 3.0 of the Simple Components for Ada, where I tried
to summarize my experience and ideas in this area. The library among other things
contains some basic gears for dealing with concurrency.

The section 9 is devoted to implementation of some lock-free data
structures. These become more popular with new multi-core architectures.
Though Ada was not designed to provide lock-free primitives on the low-level,
for that obvious reason that this would be non-portable. It still has necessary
tools. Here I mean the pragma Atomic, which allows many interesting things to do.

The section 10 contains implementations of locking synchronization primitives.
Protected objects introduced in Ada 95 is an excellent mechanism of a great
power. Especially interesting is to explore the requeue statement. Events,
pulse events, arrays of events, mutexes, arrays of mutexes let be implemented
as protected objects. Using the requeue statement one can do a lot of things,
which appear impossible at the first glance. For example, this section
presents a programming pattern for using entry parameters in the barriers of.
It also discusses how race condition and deadlocks can be avoided when using
protected objects.

Two classical problems are considered on examples: the checkpoint
synchronization problem, and the dining philosophers one.

It is free as only it can be, the license is GMGPL I hope you will enjoy it.

Dienstag, 22. Januar 2008

Why I hate Gtk+/GNOME

I never liked Gtk+/GNOME all that much - but as an Ada supporter I learned to hate them.

Currently I try to create GtkAda for MinGW - from scratch. I already did so for Solaris so I have a fairly good idea what I am in for – only for MinGW everything is even more tricky.

The largest problem huge amount of dependencies in Gtk+ – see the Partner Projects page of The GNU Ada Project for details. Note that the list is never quite complete.

On Solaris I only hat to upgrade parts of Gtk+. But in MinGW I have nothing to build on and need to start from scratch. And with the very first package I run into an complete blocker which puts a interesting new angle to the "The Henn and Egg problem" problem I described a few days ago:

Glib2 needs Pkg-Config to compile and Pkg-Config uses the Glib2 library. And no "--without" or "--disable" configure options to be seen.

I googled the internet high an low without any hint whatsoever on how to break the recursive dependency. As it is it seems that it is impossible to bootstrap Gtk+ at all.

Now, one might argue that this is a developer problem and developers should be used to grieve. But that is not the way it works. Such flaky design will show in the final product as well. I for once have disabled the "GNOME stable" download channel for openSUSE long ago and with no desire for reactivation. It left me quite a few time with an unusable system. The problem was – guess what – faulty dependencies.

Montag, 21. Januar 2008

The Henn and Egg problem

Over time I have created GNAT compiler for a variety for of platforms and one thing I noticed is the constant encounter of Hen and Egg situation. — At least when you leave the save heaven of Linux. Below some examples:

Texinfo and libiconv

A classic one: The compiler needs Texinfo to create it's online help and you need an working compiler to create Texinfo. Ahh, and Texinfo needs libiconv and libiconv needs a working C++ compiler.

But these problems is easily solved since online help is only optional extra.


Similar to Texinfo only a little trickier as a working assembler and linker are an absolute must. And don't expect every platform to come with a sufficiently up-to-date assembler or linker — Solaris springs to my mind here.


C++ is an integral part of GCC compiler and needed by libiconv. But that is not the problem. The problem is that the C++ build expect the C header file to exist at prefix/include. Of course they only exist there after make install.


For us Ada users the worse one: Ada is self hosted. Being self hosted is of course very cool indeed: About 95% of the Ada compiler and runtime are written in Ada. I do pity the poor GNU-Fortran maintainer where most of the compiler and runtime is written on C — Yuck.

Of course that coolness comes for a price: You need a working Ada compiler to build the Ada compiler.


Why am I telling you all that? Now, if you are planning to build a newer compiler / tool chain for any platform apart from an up to date Linux you should – from the very onset – expect to build the compiler more then once. In fact: plan for 3 to 4 successful iterations (and an uncounted amount of unsuccessful iterations). Start off with just C and Ada, add other languages later. Don't worry if for example Texinfo is reported to old, Create a newer version later and try again. And remember that libiconv needs an up to date C++ compiler.

Mittwoch, 9. Januar 2008

The Fundamental Theory of Ada

Another great article from Marc. After the recent discussion on /. and reddit it's well worth reading so one actually knows what Ada is all about.

Go ahead have a look.

read more | digg story | discuss at reddit