Numbers-to-words benchmarks on various systems
Contents
- Results for various implementations and computers
- Running the tests
- Perl: the reference implementation
- C: the cumbersome, brittle speed demon
- awk: unexpectedly zippy; mawk even more so
- Java: the amazing slowpoke
- Lua: a respectable performer
- PHP: Pretty-Hyper Performance
- Python: maybe they should have named it Snail
- Ruby: not that much of a gem
- Tcl: not the right tool for this job
- bash: redefining “slow”
Earlier I wrote about stress testing GNU dbm and Berkeley DB databases, and of using a program to generate the required millions of records. For keys the program outputs numbers and for data writes out the numbers as English words:
Number (key) | In English (value) |
---|---|
o | zero |
1 | one |
2 | two |
3 | three |
(over 123 million rows not shown) | |
123456788 | one hundred twenty-three million, four hundred fifty-six thousand, seven hundred eighty-eight |
123456789 | one hundred twenty-three million, four hundred fifty-six thousand, seven hundred eighty-nine |
123456790 | one hundred twenty-three million, four hundred fifty-six thousand, seven hundred ninety |
(and so on) |
I wrote the original numbers-to-words program in awk. When running it to feed records to Berkeley DB I thought it could be a limiting factor to the test; that is, Berkeley DB could possibly insert more records per second than the awk program could generate.
So I re-wrote it in C. It was, unsurprisingly, a lot faster, by 8½ times. As it
turned out the awk program wasn’t the limiting factor: by itself the program
could generate 180,000 records per second, but my bdb-tool
program could insert
only 144,000 per second.
Being curious, I followed up the C program with a Perl program, just to see how it would compare. Perl was about 27% faster than awk.
Then I got carried away. Over the course of the past week I’ve implemented the alogorithm in a bunch of other languages to see how well they performed.
Results for various implementations and computers
The results and my comments are summarized below.
Raw numbers: thousands of lines generated per second
In this table, the “baseline” number is based on timing the Perl program when generating 3 million numbers, then using that time to estimate how many numbers the program can produce in about 30 seconds, rounded to the nearset 100,000.
Most of the numbers below are k/s (thousands of numbers per second), with the exception of a couple suffixed with “/s” (numbers per second.)
Computer | Baseline (n/30s) | C | Awk | Java | Lua | Perl | PHP | Python | Ruby | Tcl | Bash |
---|---|---|---|---|---|---|---|---|---|---|---|
raven | 13,300,000 | 5981 | 377 | 345 | 272 | 434 | 768 | 293 | 279 | 97 | 10.2 |
penguin | 9,300,000 | 3757 | 201 | 242 | 208 | 303 | 411 | 145 | 4.9 | ||
sparrow (F23) | 7,100,000 | 2863 | 349 | 144 | 169 | 243 | 403 | 112 | 134 | 67 | 5.1 |
sparrow (F31) | 7,100,000 | 2840 | 311 | 137 | 164 | 287 | 362 | 116 | 158 | 68 | 6.4 |
duck | 5,400,000 | 2385 | 128 | 122 | 116 | 174 | 290 | 83 | 3.8 | ||
heron | 4,200,000 | 1785 | 97 | 106 | 101 | 151 | 72 | 3.4 | |||
dkb | 4,200,000 | 1264 | 135 | 59 | 137 | 70 | 89 | 2.9 | |||
auk | 3,700,000 | 1521 | 95 | 91 | 131 | 56 | 2.7 | ||||
netvista | 1,900,000 | 653 | 44 | 35 | 40 | 73 | 88 | 27 | 33 | 19 | 1.1 |
oldmary | 1,200,000 | 537 | 21 | 20 | 31 | 49 | 33 | 21 | 12 | 13 | 612/s |
Raspberry Pi 3 | 600,000 | 214 | 39 | 12 | 20 | 26 | 11 | 14 | 6.7 | 519/s | |
Raspberry Pi 1 | 200,000 | 166 | 23 | 7.7 | 11 | 4.0 | 3.9 | 170/s |
Relative performance (Perl=100)
This table shows how fast each implementation ran relative to the Perl program. If the score is greater than 100 it ran faster; less than 100 it ran more slowly.
Computer | C | Awk | Java | Lua | Perl | PHP | Python | Ruby | Tcl | Bash |
---|---|---|---|---|---|---|---|---|---|---|
raven | 1378 | 87 | 79 | 63 | 100 | 177 | 68 | 64 | 22 | 2 |
penguin | 1240 | 66 | 80 | 69 | 100 | 136 | 48 | 2 | ||
sparrow (F23) | 1178 | 144 | 59 | 70 | 100 | 166 | 46 | 55 | 28 | 2 |
sparrow (F31) | 990 | 108 | 48 | 57 | 100 | 126 | 40 | 55 | 24 | 2 |
duck | 1371 | 74 | 70 | 67 | 100 | 167 | 48 | 2 | ||
heron | 1182 | 64 | 70 | 67 | 100 | 48 | 2 | |||
dkb | 923 | 99 | 43 | 100 | 51 | 65 | 2 | |||
auk | 1161 | 73 | 69 | 100 | 43 | 2 | ||||
netvista | 895 | 60 | 48 | 55 | 100 | 121 | 37 | 45 | 26 | 2 |
oldmary | 1096 | 43 | 41 | 63 | 100 | 67 | 43 | 24 | 27 | 2 |
Raspberry Pi 3 | 823 | 150 | 46 | 77 | 100 | 42 | 54 | 26 | 2 | |
Raspberry Pi 1 | 1509 | 209 | 70 | 100 | 36 | 35 | 2 |
Running the tests
A typical test is run like this:
RECORDS=5000000; time ./numbers-to-words.pl $RECORDS >/dev/null
Each implementation can be passed the number of records to generate on the
command line. The output is sent to /dev/null
because we’re interested only
in how many numbers can be generated every second and not the actual output
(which is actually really boring to read!)
Of course, I wasn’t content just to run the programs myself on the command line, primarily because I wanted to run them on almost a dozen different computers. So I put together a fancy shell script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
#!/bin/bash LANGUAGE_LIST='C Awk Java Lua Perl PHP Python Ruby Tcl Bash' TRIAL_COUNT=5 WITH_PIPEVIEW=false # Don't use 'true': it imparts a penalty that varies # with the language being tested ESC=$'\e' LOG_FN="benchmark.log" B_START_TIME=$SECONDS echo "$(date +'%T %F') Benchmmark started (WITH_PIPEVIEW=$WITH_PIPEVIEW)" >>$LOG_FN # First determine a value that Perl can use to run for about 30 seconds. # Benchmark 3 million to start if [ -f .benchmark-baseline ] then BASELINE="$(< .benchmark-baseline)" echo "$(date +%T) Baseline set to $BASELINE/30s from .benchmark-baseline" | tee -a $LOG_FN else echo "Determining baseline (3 million numbers):" START_TIME=$(date +%s.%N) RECORDS=3000000; ./numbers-to-words.pl $RECORDS | pv -ls$RECORDS >/dev/null END_TIME=$(date +%s.%N) ELAPSED=$(echo "scale=2; $END_TIME/1-$START_TIME/1" | bc) BASELINE=$(echo "90000000/$ELAPSED/100000*100000" | bc) echo -n "$ESC[2A$ESC[42C" echo "${ELAPSED} seconds" echo "$ESC[KPerl program can generate approximately $BASELINE numbers in 30 seconds" #BASELINE=$((30000000/ELAPSED/100000*100000)) #echo "Perl program can generate approximately $RECORDS numbers in 10 seconds" echo -n "$ESC[K" echo "$(date +%T) Baseline 3,000,000 records in $ELAPSED seconds: $BASELINE/30s" >>$LOG_FN echo $BASELINE >.benchmark-baseline fi echo # Test the languages for P_LANG in $LANGUAGE_LIST do # Determine how to run the program INTERPRETER="${P_LANG,,}" [ "${P_LANG}" == 'Tcl' ] && INTERPRETER='tclsh' [ "${P_LANG}" == 'C' ] && INTERPRETER='gcc' if ! which $INTERPRETER &>/dev/null then echo "$P_LANG: cannot locate interpreter or compiler '$INTERPRETER'" | tee -a $LOG_FN continue fi # Determine the file extension for the language's program file PROGRAM="numbers-to-words" RECORDS=$BASELINE EXT=".$INTERPRETER" [ "$P_LANG" == 'Bash' ] && EXT='.sh' RECORDS=$((BASELINE/50)) [ "$P_LANG" == 'C' ] && EXT='.c' RECORDS=$((BASELINE*10)) [ "$P_LANG" == 'Java' ] && EXT='.class' [ "$P_LANG" == 'Perl' ] && EXT='.pl' [ "$P_LANG" == 'Python' ] && EXT='.py' [ "$P_LANG" == 'Ruby' ] && EXT='.rb' [ "$P_LANG" == 'Tcl' ] && EXT='.tcl' RECORDS=$((BASELINE/5)) if [ ! "$EXT" ]; then echo "Missing file extension for $P_LANG" continue fi # Some languages need a bit of help AWK_PARAM='' if [ $P_LANG == 'Awk' ]; then INTERPRETER='awk -f' which mawk &>/dev/null && INTERPRETER='mawk -f' AWK_PARAM='-v max=' fi if [ "${P_LANG}" == 'C' ]; then gcc -O3 -o numbers-to-words numbers-to-words.c INTERPRETER='./' EXT='' fi if [ "${P_LANG}" == 'Java' ]; then which javac &>/dev/null && javac numbers-to-words.java PROGRAM='numbersToWords' EXT='' # java assumes .class extension fi [ "${P_LANG}" == 'PHP' -a -x /opt/php/bin/php ] && INTERPRETER='/opt/php/bin/php' $[ "$INTERPRETER" == './' ] || INTERPRETER="$INTERPRETER " # Run a quick test run to ensure the program works RUN="${INTERPRETER}${PROGRAM}${EXT} ${AWK_PARAM}$RECORDS" echo -e "\n$(date +%T) $P_LANG ($RUN):" >>$LOG_FN eval "${RUN/$RECORDS/1} >/tmp/numbers-to-words.tmp" if ! grep -q '1 "one"' /tmp/numbers-to-words.tmp then echo "Error running \"$RUN\" -- skipping" | tee -a $LOG_FN rm -f /tmp/numbers-to-words.tmp continue fi rm -f /tmp/numbers-to-words.tmp # Run five trials TOTAL_SEC=0 for TRIAL in $(seq $TRIAL_COUNT) do echo "$P_LANG trial $TRIAL ($RUN)" TIMEFORMAT="$ESC[$((16*(TRIAL-1)))C%2R seconds$ESC[80D$ESC[A$ESC[K$ESC[2A" START_TIME=$(date +%s.%N) if $WITH_PIPEVIEW then time $RUN | pv -ls$RECORDS >/dev/null else echo " ($(date +%T) Running; please wait)" time $RUN >/dev/null fi END_TIME=$(date +%s.%N) ELAPSED=$(echo "scale=2; $END_TIME/1-$START_TIME/1" | bc) R_S="$(echo "scale=2; $RECORDS/$ELAPSED" | bc)" echo "$(date +%T) Trial $TRIAL: $START_TIME to $END_TIME, elapsed=$ELAPSED, $R_S records/second" >>$LOG_FN TOTAL_SEC=$(echo "scale=2; $TOTAL_SEC+$ELAPSED" | bc) done # Analyse the results and display [ "$TOTAL_SEC" == 0 ] && TOTAL_SEC=-1 TOTAL_NUM=$((RECORDS * TRIAL)) echo -n "$ESC[K$P_LANG:$ESC[10D$ESC[9C" if [ $TOTAL_NUM -lt 1000000 ] then L=${#TOTAL_NUM} P=$((L-3)) [ $L -lt 4 ] && T=$TOTAL_NUM || T="${TOTAL_NUM:0:$P},${TOTAL_NUM:$P}" echo -n "$T numbers in $TOTAL_SEC seconds, " else T="$(echo "scale=1; $TOTAL_NUM/1000000" | bc) million" echo -n "${T/.0/} numbers in $TOTAL_SEC seconds, " fi echo "average $(echo "$TOTAL_NUM/$TOTAL_SEC/1000" | bc)k/sec" echo -n "$ESC[B$ESC[K$ESC[A" R_S="$(echo "$TOTAL_NUM/$TOTAL_SEC" | bc)" echo -e "$(date +%T) ${T/.0/} numbers in $TOTAL_SEC seconds, average $R_S records/second" >>$LOG_FN done echo -e "\n$(date +%T) Benchmark ended; running time $((SECONDS-B_START_TIME)) seconds" >>$LOG_FN echo -e "________________________________________________________________________________\n" >>$LOG_FN |
Perl: the reference implementation
Although I wrote the original program in awk, Perl is my “go to” language for pretty much everything beyond what I consider to be “trivial.” Therefore the reference implementation for my numbers to words program is Perl.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
#!/usr/bin/perl -w # Translate numbers to English words. For example: # 1234567: one million, two hundred thirty-four thousand, five hundred sixty-seven # Default is to generate numbers from 0 to 1,000,000. The last number can be # passed as a parameter to the program. use strict; use integer; my @number = qw( zero one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen ); my @tens = qw(x x twenty thirty forty fifty sixty seventy eighty ninety hundred); my @multiplier = qw(x thousand million billion); # Handle zero as a special case print "store 0 \"zero\"\n"; # Figuring out the thousands/million/billions on every number is expensive, # so we use the approach of looping 1 .. 999; 1,000 .. 999,999; 1,000,000 .. # 999,999,999 etc. $N tracks the number; $exponent tracks the thousands my ($N, $exponent) = (0, 0); # Process the full list of numbers from 1 to last_number my $last_number = $ARGV[0] ? $ARGV[0] : 1000000; while ($N < $last_number) { $exponent++; # Process the current "thousands" (0 .. 999, 1000 .. 9999, etc) my $max = $last_number < 1000**$exponent-1 ? $last_number : 1000**$exponent-1; while ($N++ < $max) { my $text; # Process a group (e.g. for 1,234,567, the groups are '1', '234', and '567') for (my $e = $exponent-1; $e >=0; $e--) { my $group = $N/1000**$e % 1000; $text .= ',' if $group; # Determine the hundreds $text .= ' ' . $number[$group/100] . ' hundred' if $group > 99; # Determine the words for the last two digits of the group my $i = $group % 100; $text .= ' ' . ($i >= 20 ? $tens[$i/10] . ($i%10 ? '-' . $number[$i%10] : '') : $number[$i]) if $i; # Add "thousand" or "million" if needed $text .= " $multiplier[$e]" if $e && $group; } # For speed, the generated text ends up with a preceding ", " (preventing # it requires a lot of checks that slow down the program.) Here we remove # that unwanted text before sending it to output. print "store $N \"", substr($text,2), "\"\n"; } $N--; } |
C: the cumbersome, brittle speed demon
When it comes to this task, there’s no doubt C is the speed winner. Across all computers it generally outperformed the runner-up by a factor of 8 and then some. But that blazing speed comes at a cost: C programs are often prone to nasty bugs due to its lack of memory protection. And in my experience, writing and debugging C programs takes longer due to a more primitive standard library and crashes caused by the aforementioned memory issues.
The code here is noticeably different. Most of the programs take a left-to-right
approach when building the text string. For example, the number 12,345,678
is
typically handled in three groups: 12
, then 345
, and finally 678
. But the
original C program I wrote did it in reverse, staring at 678
, then working with
345
, and finally 12
. It also builds the text string from right to left.
For most languages the left-to-right approach is a few percent faster than the reverse. However, with C the opposite is true: the fastest version I could write for the left-to-right approach was 15% slower than the right-to-left.
/* Print numbers as words */ #include <stdio.h> #include <stdlib.h> #include <string.h> void main(int argc, char **argv) { long max = 1000000; char *number[] = { NULL, "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen" }; char *tens[] = { NULL, NULL, "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety", "hundred" }; char *multiplier[] = { NULL, "thousand,", "million,", "billion," }; char text[128]; long N; int Q, exponent, i, group, l, t, need_comma; if (argc > 1) max = atol(argv[1]); printf("0 \"zero\"\n"); /* Handle zero as a special case */ for (N=1; N <= max; N++) { /* The text field is built starting at the end and going backwards */ memset(text, ' ', 127); text[127] = '"'; t = 127; /* Process the number in groups of three, right-to-left */ exponent = 0; need_comma = 0; Q = N; while (Q) { group = Q % 1000; /* Add "thousand" or "million" if needed */ if (exponent && group) { l = strlen(multiplier[exponent]); if (need_comma) { t -= l+1; memcpy(text+t, multiplier[exponent], l); } else { t -= l-1; memcpy(text+t, multiplier[exponent], l-1); } t--; } /* Determine the words for the last two digits of the group */ i = group % 100; if (i) { if (i < 20) { l = strlen(number[i]); t -= l; memcpy(text+t, number[i], l); } else { if (i%10) { /* i doesn't end with zero */ l = strlen(number[i%10]); t -= l; memcpy(text+t, number[i%10], l); text[--t] = '-'; } l = strlen(tens[i/10]); t -= l; memcpy(text+t, tens[i/10], l); } need_comma = 1; } /* Determine the hundreds */ if (group > 99) { if (i) t--; /* Add a space after "hundred" if last two digits > 00 */ t -= 8; memcpy(text+t, " hundred", 8); l = strlen(number[group/100]); t -= l; memcpy(text+t, number[group/100], l); need_comma = 1; } /* Move to the next group of thousands */ Q /= 1000; exponent++; } /* "printf("%s", string) is actually pretty expensive. You don't notice * it when you're printing only a few lines, but do it a hundred million * times and you'll see it's about 15% slower than the code presented * here. */ /* printf("%li \"%s\n", N, text+t); */ printf("%li \"", N); puts(text+t); } }
awk: unexpectedly zippy; mawk even more so
The very first implementation of numbers-to-words was written in awk. After working on the Perl and C versions, I re-wrote the awk program to make use of what I had learned.
For such a simple language, awk is actually unexpectedly fast, performing at a respectable 55% to 70% the speed of the Perl program.
Then the Raspberry Pi gave me a surprise. The Pi 3 is usually three to four times faster than the Pi 1, but on the Pi 1 the awk program was a touch faster than the 3! Of course I had to find out why, and discovered the Pi 1 was using an implementation called mawk instead of Gnu’s gawk.
mawk is 80% faster than Gnu’s. When I installed it on sparrow to test, it outran the Perl program by nearly 50% (331k records/second vs Perl’s 225.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
#!/usr/bin/awk -f # You can pass "-v max=<number>" to override the default maximum of 1,000,000 BEGIN { number[0] = "zero"; number[1] = "one"; number[2] = "two"; number[3] = "three" number[4] = "four"; number[5] = "five"; number[6] = "six"; number[7] = "seven" number[8] = "eight"; number[9] = "nine"; number[10] = "ten" number[11] = "eleven"; number[12] = "twelve"; number[13] = "thirteen" number[14] = "fourteen"; number[15] = "fifteen"; number[16] = "sixteen" number[17] = "seventeen"; number[18] = "eighteen"; number[19] = "nineteen" tens[2] = "twenty"; tens[3] = "thirty"; tens[4] = "forty"; tens[5] = "fifty" tens[6] = "sixty"; tens[7] = "seventy"; tens[8] = "eighty"; tens[9] = "ninety" multiplier[1] = "thousand"; multiplier[2] = "million"; multiplier[3] = "billion" print "0 \"zero\"" # Handle zero as a special case N = 0; exponent = 0 last_number = max ? max : 1000000 while (N < last_number) { exponent++; max = last_number < 1000^exponent-1 ? last_number : 1000^exponent-1 while (N++ < max) { text = "" for (e=exponent-1; e>=0; e--) { group = int(N/1000^e) % 1000 if (group) { text = text "," } # Determine the hundreds if (group > 99) { text = text " " number[int(group/100)] " hundred" } # Determine the words for the last two digits of the group i = group % 100; if (i) { text = text " " (i >= 20 ? tens[int(i/10)] (i%10 ? "-" number[i%10] : "") : number[i]) } # Add "thousand" or "million" if needed if (e && group) { text = text " " multiplier[e] } } print N " \"" substr(text, 3) "\"" } N-- } } |
Java: the amazing slowpoke
To be honest, I was stunned to see how badly Java performed at this task. Despite having had years to tweak the language and the VM, Java was in second last place among the major languages I tried–only Python performed worse. (I’m not including Tcl or bash because to my knowledge companies don’t write multi-million line projects in those languages.)
For the Java implementation, I chose to use StringBuilder because the memory is reclaimed when the object gos out of scope after each number, reducing the burden on the garbage collector.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
/* To run: * javac numbers-to-words.java # Compile into a class file * java numbersToWords # Run it */ class numbersToWords { public static void main(String argv[]) { String[] number = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen" }; String[] tens = { null, null, "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety" }; String[] multiplier = { null, "thousand", "million", "billion" }; int lastNumber; try { lastNumber = Integer.parseInt(argv[0]); } catch(ArrayIndexOutOfBoundsException ex) { lastNumber = 1000000; } // Handle zero as a special case System.out.println("0 \"zero\""); int exponent = 0; int N = 0; while (N < lastNumber) { exponent++; int max = lastNumber < (int)Math.pow(1000, exponent)-1 ? lastNumber : (int)Math.pow(1000, exponent)-1; while (N++ < max) { StringBuilder text = new StringBuilder(); for (int e = exponent-1; e >=0; e--) { int group = (int)(N/Math.pow(1000, e)) % 1000; if (group > 0) text.append(","); // Determine the hundreds if (group > 99) text.append(" " + (number[(int)(group/100)] + " hundred")); // Determine the words for the last two digits of the group int i = group % 100; if (i > 0) text.append(" "+(i < 20 ? number[i] : tens[(int)(i/10)] + (i%10 > 0 ? "-"+number[i%10] : ""))); // Add "thousand" or "million" if needed if (e>0 && group>0) text.append(" " + multiplier[e]); } System.out.printf("%s \"%s\"\n", N, text.toString().substring(2)); } N--; } } } |
Lua: a respectable performer
Lua is a nice little scripting language, but is probably not robust enough for serious project work (for example, it doesn’t have an error handling mechanism.) In my testing it performed just a shade behind gawk.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
#!/usr/bin/lua number = { 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen' } tens = { nil, 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety', 'hundred' } multiplier = { 'thousand', 'million', 'billion' } if arg[1] then last_number = arg[1]+0 else last_number = 1000000 end print ("0 \"zero\"") N = 0 exponent = 0 while N < last_number do exponent = exponent + 1 max = last_number < 1000^exponent-1 and last_number or 1000^exponent-1 while N < max do N = N + 1 text = '' for e = exponent-1, -1, -1 do group = math.floor(N/1000^e) % 1000 if group > 0 then text = text..',' end -- Determine the hundreds if group > 99 then text = text..' '..number[math.floor(group/100)]..' hundred' end -- Determine the words for the last two digits of the group i = group % 100 if i > 0 then if i < 20 then text = text..' '..number[i] else text = text..' '..tens[math.floor(i/10)] if i%10 > 0 then text = text..'-'..number[i%10] end end end -- Add "thousand" or "million" if needed (j tells us if we need a trailing comma) if e > 0 and group > 0 then text = text..' '..multiplier[e] end end io.write(N..' "'..string.sub(text,3).."\"\n") end end |
PHP: Pretty-Hyper Performance
For all the disdain shown PHP on tech sites like Slashdot, it’s a powerful if rather quirky language. And for an interpreted language it’s fast, with PHP 7 running 20% to 30% faster than Perl. Perhaps other languages could learn a thing or two from the Zend engine.
PHP 5, not so much. Because it’s running an old version of Fedora, sparrow had only PHP 5 on it. It ran the code below at only half the speed of Perl. As a test, I downloaded PHP 7.3 from php.net and built it. That one performed much better, doing 343k records/second on the same source code, an 84% improvement over PHP 5 and a full 50% faster than Perl.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
#!/usr/bin/php -f <?php $number = array( 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen' ); $tens = array( NULL, NULL, 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety' ); $multiplier = array(NULL, 'thousand', 'million', 'billion'); # Handle zero as a special case echo "0 \"zero\"\n"; $N = 0; $exponent = 0; $last_number = $argv[1] ? $argv[1] : 1000000; while ($N < $last_number) { $exponent++; $max = $last_number < pow(1000, $exponent)-1 ? $last_number : pow(1000, $exponent)-1; while ($N++ < $max) { $text = ''; for ($e = $exponent-1; $e >=0; $e--) { $group = intval($N/pow(1000, $e)) % 1000; if ($group > 0) { $text .= ','; } # Determine the hundreds if ($group > 99) { $text .= ' '.($number[intval($group/100)].' hundred'); } # Determine the words for the last two digits of the group $j = $group % 100; if ($j) { $text .= ' '.($j>=20 ? $tens[intval($j/10)].($j%10 ? '-'.$number[$j%10] : '') : $number[$j]); } # Add "thousand" or "million" if needed if ($e && $group) { $text .= ' '.$multiplier[$e]; } } printf("%s \"%s\"\n", $N, substr($text,2)); } $N--; } ?> |
Python: maybe they should have named it Snail
Python was a big surprise. It’s the “in” thing these days, but for this task it’s at the bottom of the heap (outside of Tcl and bash.) Typically it runs at only half the speed of the Perl program.
This source code is the unuusal right-to-left version. The Python implementation of the more common left-to-right version is even slower.
At least it’s readable. 😊
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
#!/usr/bin/python3 import sys number = [ None, 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'nineteen' ] tens = [ None, None, 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety', 'hundred' ] multiplier = [ None, 'thousand,', 'million,', 'billion,' ] try: last_num = int(sys.argv[1]) except IndexError: last_num = 1000000 # Handle zero as a special case print("0 \"zero\"") for N in range(1, last_num+1): text = "" # Process the number in groups of three, right-to-left exponent = 0 Q = N need_comma = False while Q: group = Q % 1000 # Add "thousand" or "million" if needed (j tells us if we need a trailing comma) if exponent and group: x = multiplier[exponent] text = (x if need_comma else x[0:-1]) + ' ' + text # Determine the words for the last two digits of the group j = group % 100 if j: text = (number[j] if j < 20 \ else (tens[int(j/10)] + ('-'+number[j%10] if j%10 else '')))+' '+text need_comma = True # Determine the hundreds if group > 99: text = number[int(group/100)] + ' hundred ' + text need_comma = True # Move to the next group of thousands Q = int(Q/1000) exponent += 1 print('%i "%s"' % (N, text[0:-1])) |
Ruby: not that much of a gem
Ruby was another disappointing language, typically coming in third last just before Java and Python. On netvista it was actually slower than Python, probably because CentOS 6 (the last 32 bit version of CentOS) packages Ruby version 1.8.7 instead of 2.2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
#!/usr/bin/ruby number = [ nil, "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen" ] tens = [ nil, nil, "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety", "hundred" ] multiplier = [ nil, "thousand", "million", "billion" ] # Handle zero as a special casse print("0 \"zero\"\n") last_number = ARGV[0].nil? ? 1000000 : ARGV[0].to_i n = 0 exponent = 0 while (n < last_number) exponent += 1 max = last_number < 1000**exponent-1 ? last_number : 1000**exponent-1 while (n < max) n += 1 text = '' e = exponent - 1 while e >= 0 do group = (n/1000**e).to_i % 1000 if (group > 0) then text << ',' end # Determine the hundreds if group > 99 then text << ' ' << number[(group/100).to_i] << ' hundred' end # Determine the words for the last two digits of the group i = group % 100 if i > 0 j = i % 10 text << ' ' << (i < 20 ? number[i] : (tens[(i/10).to_i] + (j > 0 ? "-#{number[j]}" : ''))) end # Add "thousand" or "million" if e > 0 and group > 0 then text << ' ' << multiplier[e] end # Move to the next group of thousands e -= 1 end print "#{n} \"" << text[2..-1] << "\"\n" end end |
Tcl: not the right tool for this job
Despite its feature set and the fact it apparently is compiled down to bytecode before execution, Tcl performed very poorly on this task, running at only a quarter of the speed of the Perl program. Its poorest performance was actually on my fastest computer (raven), where its speed was only 19% that of Perl’s.
Of all the scripting languages here, Tcl has the most unusual syntax. Many
operations and functions are implmented as comands, and getting the result
from a command requires enclosing it and its options in [
and ]
, much
like bash’s $( ... )
syntax.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
#!/usr/bin/tclsh set number [list \ nil one two three four five \ six seven eight nine ten eleven \ twelve thirteen fourteen fifteen sixteen \ seventeen eighteen nineteen ] set tens [list \ undef undef twenty thirty forty fifty sixty \ seventy eighty ninety hundred ] set multiplier [list undef thousand million billion] # Handle zero as a special case puts "0 \"zero\"" set last_number [expr { $argc > 0 ? $argv : 1000000 }] incr last_number set N 1 set exponent 0 while { $N < $last_number } { incr exponent set max [expr {$last_number < 1000**$exponent-1 ? $last_number : 1000**$exponent-1}] while { $N < $max } { set text "" set e [expr {$exponent - 1}] while { $e >= 0 } { set group [expr {$N / 1000**$e % 1000}] if { $group > 0 } { append text "," } # Determine the hundreds if { $group > 99 } { append text " " [lindex $number [expr {$group/100}]] " hundred" } # Determine the words for the last two digits of the group set i [expr {$group % 100}] if { $i < 20 } { if { $i > 0 } { append text " " [lindex $number $i] } } else { append text " " [lindex $tens [expr {$i/10}]] if { [expr {$i%10}] > 0 } { append text "-" [lindex $number [expr {$i%10}]] } } # Add "thousand" or "million" if { $e > 0 && $group > 0 } { append text " " [lindex $multiplier $e] } incr e -1 } set text [string trimleft $text ", "] puts "$N \"$text\"" incr N } } |
bash: redefining “slow”
The man page (not the info pages) for bash has this to say:
Bugs
It’s too big and too slow.
“Slow” it is, at least for this task. The bash script I wrote to implement the words to text algorithm sits very solidly in last place on the speed tests, running on averge at a mere 2% of the speed of the Perl program. On raven the compiled C program is 555 times faster than bash.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
#!/bin/bash NUMBER=( undef one two three four five six seven eight nine ten eleven twelve thirteen fourteen fifteen sixteen seventeen eighteen nineteen ) TENS=( undef undef twenty thirty forty fifty sixty seventy eighty ninety hundred ) MULTIPLIER=(undef thousand million billion) [ "$1" ] && LAST_NUMBER=$1 || LAST_NUMBER=1000000 # Handle zero as a special case echo '0 "zero"' N=0 EXPONENT=0 while [ $N -lt $LAST_NUMBER ] do EXPONENT=$((EXPONENT+1)) I=$((1000**$EXPONENT-1)) [ $LAST_NUMBER -lt $I ] && MAX=$LAST_NUMBER || MAX=$I while [ $((N++)) -lt $MAX ] do TEXT="" E=$((EXPONENT-1)) while [ $E -ge 0 ] do GROUP=$((N/1000**E % 1000)) [ "$GROUP" -gt 0 ] && TEXT="$TEXT," # Determine the hundreds [ $GROUP -gt 99 ] && TEXT="$TEXT ${NUMBER[$((GROUP/100))]} hundred" # Determine the words for the last two digits of the group I=$((GROUP % 100)) if [ $I -gt 0 ] then if [ $I -lt 20 ] then TEXT="$TEXT ${NUMBER[$I]}" else TEXT="$TEXT ${TENS[$((I/10))]}" I_MOD_10=$((I % 10)) [ $I_MOD_10 -gt 0 ] && TEXT="$TEXT-${NUMBER[$I_MOD_10]}" fi fi # Add "thousand" or "million" if needed ($j tells us if we need a trailing comma) [ $E -gt 0 -a $GROUP != 0 ] && TEXT="$TEXT ${MULTIPLIER[$E]}" E=$((E-1)) done echo "$N \"${TEXT:2}\"" done N=$((N-1)) done |