How well does a Berkeley database scale?
Once I got the basics of bdb-tool
working, I decided to see how well it (and,
of course, the Berkeley database itself) worked.
The initial runs were vey encouraging, averaging over 80,000 record inserts a second even up 32 million records. So I decided to go for broke and and try all 125 million. There were two twists with this test:
- Because raven was busy re-running the 125 million record test with GNU dbm, I ran the test on sparrow
- Because I don’t have a lot of free disk space on sparrow, I attached a USB-3 drive (hardware-wise, an unusual combination of SSD and spinning platter) and stored the database there.
[brian@sparrow bdb-tool]$ DB_FN=/mnt/HDD_1TB_USB3/numbers.db; rm -f $DB_FN [brian@sparrow bdb-tool]$ RECORDS=125000000; time nice ionice -c3 /var/tmp/numbers.awk -vmax=$RECORDS | pv -ls$RECORDS | ./bdb-tool --newdb $DB_FN 125M 0:25:23 [82.1k/s] [=======================================================================>] 100% real 25m23.889s user 39m17.914s sys 2m13.861s
25 minutes! That’s on a slower computer with a slower hard drive.
I later ran the script on my home server penguin, and it took 19 minutes 20 seconds to complete. The final file size was 21 GiB.
And finally on raven:
RECORDS=$((125*1000000)); time ./numbers-to-words $RECORDS|pv -ls$RECORDS|./bdb-tool numbers.bdb 125M 0:07:01 [ 296k/s] [=======================================================================>] 100% real 7m2.233s user 7m10.390s sys 0m40.810s