Bignum.of_string needs to handle different formats for its input. The previous version of the code was trying to parse the common format (floats), and in case of failure, was attempting to use a different format (based on the error). This resulted in the string being parsed twice in some cases.
This version is a complete rewriting of of_string to do the parsing in one step. The new code for to_string encode an automaton and remembers the positions of the various elements of the string (depending on the format).
This feature uses a function which has been upstreamed in the new version of ZArith (1.4) which is a variant of the Zarith.of_string function to work with substrings. This variant alone is responsible for a big part of the performance improvement.
The new version of the code performs better than the original one in all cases. The performance improvement are variable depending on the micro benchmark. See below.
We also tried to implement the lexing engine using OCamllex. This makes for a much more concise description, but the performance are significantly lower. OCamllex produces code which allocates some table and some state, which is avoided in the hand written code. Also, it will allocate the sub strings matched.
New version (patch for ZArith + of_substring, reimplementation of of_string)
┌─────────────────────────────────────────────────────────────────────┬──────────────┬───────────┬──────────┬──────────┬────────────┐ │ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │ ├─────────────────────────────────────────────────────────────────────┼──────────────┼───────────┼──────────┼──────────┼────────────┤ │ bigint\_bench.ml random │ 48_381.13ns │ 7_166.00w │ 1.24w │ 1.24w │ 45.12% │ │ bigint\_bench.ml:vs. Big\_int plus_self │ 293.96ns │ 72.00w │ │ │ 0.27% │ │ bigint\_bench.ml:vs. Big\_int plus_other │ 807.62ns │ 124.00w │ │ │ 0.75% │ │ bigint\_bench.ml:vs. Big\_int mult_self │ 353.98ns │ 91.00w │ │ │ 0.33% │ │ bigint\_bench.ml:vs. Big\_int mult_other │ 783.78ns │ 128.00w │ │ │ 0.73% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (decimal) │ 14_415.44ns │ 475.00w │ │ │ 13.44% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (scientific) │ 61_363.80ns │ 3_929.00w │ │ │ 57.23% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (fraction) │ 24_957.02ns │ 303.00w │ │ │ 23.28% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (decimal) │ 15_867.52ns │ 1_523.00w │ │ │ 14.80% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (scientific) │ 33_345.31ns │ 4_206.00w │ │ │ 31.10% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (fraction) │ 31_770.26ns │ 3_779.00w │ │ │ 29.63% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (decimal) │ 9_726.82ns │ 380.00w │ │ │ 9.07% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (scientific) │ 28_141.40ns │ 2_059.00w │ │ │ 26.25% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (fraction) │ 70_436.16ns │ 5_541.00w │ │ │ 65.69% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (decimal) │ 27_000.73ns │ 1_994.00w │ │ │ 25.18% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (scientific) │ 66_057.63ns │ 6_217.00w │ │ │ 61.61% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (fraction) │ 107_219.89ns │ 8_097.00w │ │ │ 100.00% │ │ bignum\_bench.ml:Bignum binprot roundtrip compact │ 5_997.81ns │ 581.00w │ │ │ 5.59% │ │ bignum\_bench.ml:Bignum binprot roundtrip classic │ 18_522.20ns │ 779.00w │ │ │ 17.27% │ │ bignum\_bench.ml:round round_decimal:0 │ 8_479.49ns │ 463.00w │ │ │ 7.91% │ │ bignum\_bench.ml:round round_decimal:3 │ 24_621.71ns │ 2_115.00w │ │ │ 22.96% │ │ bignum\_bench.ml:round round_decimal:6 │ 26_896.35ns │ 2_437.00w │ │ │ 25.09% │ │ bignum\_bench.ml:round round_decimal:9 │ 29_428.19ns │ 2_730.00w │ │ │ 27.45% │ │ bignum\_bench.ml:round round │ 8_452.31ns │ 459.00w │ │ │ 7.88% │ └─────────────────────────────────────────────────────────────────────┴──────────────┴───────────┴──────────┴──────────┴────────────┘
Original version
┌─────────────────────────────────────────────────────────────────────┬──────────────┬───────────┬──────────┬──────────┬────────────┐ │ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │ ├─────────────────────────────────────────────────────────────────────┼──────────────┼───────────┼──────────┼──────────┼────────────┤ │ bigint\_bench.ml random │ 51_218.04ns │ 7_166.00w │ 1.25w │ 1.25w │ 43.26% │ │ bigint\_bench.ml:vs. Big\_int plus_self │ 336.84ns │ 72.00w │ │ │ 0.28% │ │ bigint\_bench.ml:vs. Big\_int plus_other │ 837.73ns │ 124.00w │ │ │ 0.71% │ │ bigint\_bench.ml:vs. Big\_int mult_self │ 411.03ns │ 91.00w │ │ │ 0.35% │ │ bigint\_bench.ml:vs. Big\_int mult_other │ 808.03ns │ 128.00w │ │ │ 0.68% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (decimal) │ 29_650.60ns │ 2_415.00w │ │ │ 25.04% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (scientific) │ 92_495.93ns │ 6_465.00w │ │ │ 78.12% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (fraction) │ 39_482.77ns │ 2_060.00w │ │ │ 33.35% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (decimal) │ 16_195.93ns │ 1_523.00w │ │ │ 13.68% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (scientific) │ 34_227.78ns │ 4_059.00w │ │ │ 28.91% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (fraction) │ 32_856.17ns │ 3_779.00w │ │ │ 27.75% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (decimal) │ 19_745.71ns │ 2_149.00w │ │ │ 16.68% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (scientific) │ 51_024.99ns │ 3_853.00w │ │ │ 43.09% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (fraction) │ 88_884.15ns │ 7_819.00w │ │ │ 75.07% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (decimal) │ 32_812.27ns │ 2_498.00w │ │ │ 27.71% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (scientific) │ 77_518.77ns │ 6_369.00w │ │ │ 65.47% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (fraction) │ 118_402.78ns │ 8_907.00w │ │ │ 100.00% │ │ bignum\_bench.ml:Bignum binprot roundtrip compact │ 8_947.02ns │ 371.00w │ │ │ 7.56% │ │ bignum\_bench.ml:Bignum binprot roundtrip classic │ 22_799.74ns │ 1_039.00w │ │ │ 19.26% │ │ bignum\_bench.ml:round round_decimal:0 │ 8_176.74ns │ 463.00w │ │ │ 6.91% │ │ bignum\_bench.ml:round round_decimal:3 │ 25_798.77ns │ 2_115.00w │ │ │ 21.79% │ │ bignum\_bench.ml:round round_decimal:6 │ 28_561.23ns │ 2_437.00w │ │ │ 24.12% │ │ bignum\_bench.ml:round round_decimal:9 │ 30_861.38ns │ 2_730.00w │ │ │ 26.06% │ │ bignum\_bench.ml:round round │ 8_237.26ns │ 459.00w │ │ │ 6.96% │ └─────────────────────────────────────────────────────────────────────┴──────────────┴───────────┴──────────┴──────────┴────────────┘
Tentative version using OCamllex
┌─────────────────────────────────────────────────────────────────────┬──────────────┬────────────┬──────────┬──────────┬────────────┐ │ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │ ├─────────────────────────────────────────────────────────────────────┼──────────────┼────────────┼──────────┼──────────┼────────────┤ │ bigint\_bench.ml random │ 48_164.21ns │ 7_166.00w │ 1.25w │ 1.25w │ 39.99% │ │ bigint\_bench.ml:vs. Big\_int plus_self │ 285.84ns │ 72.00w │ │ │ 0.24% │ │ bigint\_bench.ml:vs. Big\_int plus_other │ 768.12ns │ 124.00w │ │ │ 0.64% │ │ bigint\_bench.ml:vs. Big\_int mult_self │ 343.14ns │ 91.00w │ │ │ 0.28% │ │ bigint\_bench.ml:vs. Big\_int mult_other │ 780.00ns │ 128.00w │ │ │ 0.65% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (decimal) │ 26_931.12ns │ 3_108.00w │ │ │ 22.36% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (scientific) │ 79_750.28ns │ 6_599.00w │ 0.11w │ 0.11w │ 66.21% │ │ bignum\_bench.ml:Bignum of\_string/to\_string of_string (fraction) │ 34_988.94ns │ 4_300.00w │ │ │ 29.05% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (decimal) │ 15_958.17ns │ 1_523.00w │ │ │ 13.25% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (scientific) │ 32_495.25ns │ 4_059.00w │ │ │ 26.98% │ │ bignum\_bench.ml:Bignum of\_string/to\_string to_string (fraction) │ 31_802.75ns │ 3_779.00w │ │ │ 26.40% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (decimal) │ 18_742.81ns │ 2_924.00w │ │ │ 15.56% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (scientific) │ 45_282.09ns │ 4_622.00w │ │ │ 37.60% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp of_sexp (fraction) │ 86_907.83ns │ 8_777.00w │ 0.15w │ 0.15w │ 72.16% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (decimal) │ 35_727.73ns │ 4_493.00w │ │ │ 29.66% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (scientific) │ 82_247.61ns │ 8_273.00w │ 0.13w │ 0.13w │ 68.29% │ │ bignum\_bench.ml:Bignum of\_sexp/to\_sexp to_sexp (fraction) │ 120_445.25ns │ 10_688.00w │ 0.12w │ 0.12w │ 100.00% │ │ bignum\_bench.ml:Bignum binprot roundtrip compact │ 6_734.49ns │ 371.00w │ │ │ 5.59% │ │ bignum\_bench.ml:Bignum binprot roundtrip classic │ 21_773.79ns │ 1_890.00w │ │ │ 18.08% │ │ bignum\_bench.ml:round round_decimal:0 │ 8_306.45ns │ 463.00w │ │ │ 6.90% │ │ bignum\_bench.ml:round round_decimal:3 │ 24_714.96ns │ 2_115.00w │ │ │ 20.52% │ │ bignum\_bench.ml:round round_decimal:6 │ 26_894.27ns │ 2_437.00w │ │ │ 22.33% │ │ bignum\_bench.ml:round round_decimal:9 │ 29_343.81ns │ 2_730.00w │ │ │ 24.36% │ │ bignum\_bench.ml:round round │ 8_296.05ns │ 459.00w │ │ │ 6.89% │ └─────────────────────────────────────────────────────────────────────┴──────────────┴────────────┴──────────┴──────────┴────────────┘