OCaml Multicore - September 2020
Multicore OCaml: September 2020
Welcome to the September 2020 Multicore OCaml report! This update along with the previous monthly updates have been compiled by @shakthimaan, @kayceesrk and @avsm.
Big news this month is that the systhreads compatibility support PR has been merged, which means that Dune (and other users of the Thread
module) can compile out of the box. You can now compile the multicore OCaml fork conveniently using the new opam compiler
plugin (see announcement):
opam update
opam compiler create "ocaml-multicore/ocaml-multicore:no-effect-syntax"
eval $(opam env)
This selects the branch of multicore OCaml that omits the experimental effect
syntax, and thus works with the existing ppx ecosystem. It's quite fun opam installing ecosystem packages and seeing them operate out of the box at long last. There are still a few rough edges to the thread compatibility support (mainly at the C compatibility layer, such as registering external C threads with the GC), but these will be worked out in the coming weeks. We'd like to hear of any build failures you encounter in the opam universe with this: please report them on https://github.com/ocaml-multicore/ocaml-multicore/issues
A number of performance improvements to the multicore OCaml GC and the Sandmark benchmarking project have also been completed through September:
- we have now included the Kronecker implementation from the Graph500 benchmarks to Sandmark
- an n-queen benchmark addition is in progress
- benchmark runs now provide a count of the OCaml symbols as a code size metric
- work on building Tezos with multicore OCaml, and integration with the Sandmark benchmarking test suite has also begun.
We have also begun an effort to port Lwt to take advantage of parallelism via Lwt_preemptive
. Code samples and test runs have been performed, and Sudha has written an introductory blog post about her early results. Note that this work doesn't change the core behaviour of Lwt (a cooperative futures framework with no context switching between bind
calls), but allows parallelism via explicit calls to background preemptive threads.
On the upstreaming efforts to OCaml, the 4.12 release will freeze earlier than usual in October, and so we finished submitting the last of the garbage collector colour changes and are aiming for the work on reliable safe points to go into OCaml 4.13. There have been a lot of runtime changes packed into 4.12 already, and so we will issue a call for testing when the release candidate of 4.12 is cut.
Onto the details of the PRs. As with the previous updates, the Multicore OCaml updates are listed first, which are then followed by the enhancements to the Sandmark benchmarking project. The upstream OCaml ongoing and completed updates are finally mentioned for your reference.
Multicore OCaml
Ongoing
-
ocaml-multicore/domainslib#17 Implement channels using Mutex and Condition Variables
The
lib/chan.ml
sources have been updated to implement channels using Mutex and Condition Variables, and aLU_decomposition_multicore.exe
test has been added for the same. -
ocaml-multicore/ocaml-multicore#381 Reimplementating systhreads with pthreads
This PR is actively being reviewed for the use of
pthreads
in Multicore OCaml. It introduces the Domain Execution Contexts (DEC) which allows multiple threads to run atop a domain. -
ocaml-multicore/ocaml-multicore#394 Changes to polling placement
The polls placement is done at the start of the functions and on the back-edge of loops, instead of using Feely's algorithm. This is a work-in-progress.
-
ocaml-multicore/ocaml-multicore#401 Do not handle interrupts recursively
A domain local variable is introduced to prevent handling of interrupts recursively.
-
ocaml-multicore/ocaml-multicore#402 Split handle_gc_interrupt into handling remote and polling sections
A
caml_poll_gc_work
is introduced that has information of GC work done previously incaml_handle_gc_interrupt
. This facilitatesstw_handler
to make calls to poll and not handle service interrupts, as it may lead to unwanted recursion. -
ocaml-multicore/ocaml-multicore#403 Segmentation fault when building Tezos on Multicore 4.10.0 with no-effects-syntax
This is an on-going investigation on why the package
tezos-embedded-protocol-packer
in Tezos is causing a segmentation fault when building with Multicore OCaml.
Completed
Domainslib
-
ocaml-multicore/domainslib#19 Finer grain signalling with mutex condvar for Channels
The use of fine grain locking for Mutex and condition variables helps in improving the performance for larger cores, as against a single mutex for all the signalling.
Multicore OPAM
-
ocaml-multicore/multicore-opam#31 Patch dune.2.7.1 for Multicore OCaml
The opam file for dune.2.7.1 has been added along with a patch to
bootstrap.ml
to get it working for Multicore OCaml, thanks to Chaitanya Koparkar. -
ocaml-multicore/multicore-opam#32 Add ocamlfind-secondary dependency to dune
The installation of
dune
requiresocamlfind-secondary
as a dependency for dune.2.7.1, and has been added to the OPAM file.
Multicore OCaml
-
ocaml-multicore/ocaml-multicore#395 Move to SPIN_WAIT for all spins and usleep in SPIN_WAIT
The PR provides the SPIN_WAIT macro for all the busy spin wait loops, and uses
caml_plat_spin_wait
when busy waiting. This ensures that the same spin strategy is used in different places in the code. -
ocaml-multicore/ocaml-multicore#397 Relaxation of backup thread signalling
The signalling to the backup thread from the mutator thread when leaving a blocking section is modified. It reduces the potential Operating System scheduling when re-entering OCaml.
-
ocaml-multicore/ocaml-multicore#400 Demux eventlog for backup thread
The events in the backup thread were emitting the same process ID as the main thread, and this PR separates them.
In the above illustration, the backup threads are active when the main thread is waiting on a condition variable.
Benchmarking
Ongoing
-
ocaml-bench/sandmark#159 Implement a better way to describe tasklet cpulist
We need a cleaner way to obtain the taskset list of cores for a benchmark run when we are provided with a number of domains. We should be able to specify hyper-threaded cores, NUMA zones to use, and the specific cores to use for the parallel benchmarks.
-
ocaml-bench/sandmark#173 Addition of nqueens benchmark to multicore-numerical
A draft version of the classical
n queens
benchmark has been added for review in Sandmark. This includes both the single and multicore implementation.
Completed
-
ocaml-bench/ocaml_bench_scripts#11 Add support for configure option and OCAMLRUNPARAM
The
ocaml_bench_scripts
has been updated to support passingconfigure
options and OCAMLRUNPARAM when building and running the benchmarks in Sandmark. -
ocaml-bench/sandmark#122 Measurements of code size
The output .bench JSON file produced from the benchmarks now includes a code size metric for the number of CAML symbols. A sample benchmark output is shown below:
{"name":"knucleotide.", ... ,"codesize":276859.0, ...}
The code size count for few of the benchmarks is given below:
| Benchmark | Count | |------------|-----------| | alt-ergo | 2_822_040 | | coqc | 5_869_305 | | cpdf | 1_131_376 | | nbody.exe | 276_710 | | stress.exe | 84_061 | | fft.exe | 38_914 |
-
ocaml-bench/sandmark#170 Graph500 SEQ
The Graph500 benchmark with a Kronecker graph generator has now been added to Sandmark. The generator builds three kernels for graph construction, Breadth First Search, and Single Source Shortest Paths.
-
ocaml-bench/sandmark#172 Remove
Base
,Stdio
orun dependency for trunkThe
orun
sources in Sandmark have been updated to remove the dependency on bothBase
andStdio
. They have been replaced with functions fromStdlib
,List
,String
andStr
. -
ocaml-bench/sandmark#174 Cleanup our use of sudo for chrt
The use of
sudo
has been removed from the Makefile for running parallel benchmarks, to avoid creating output files and directories that require root permissions for access. The use ofRUN_BENCH_TARGET=run_orunchrt
will execute the benchmarks usingchrt -r 1
. The user can give permissions to thechrt
binary using:$ sudo setcap cap_sys_nice=ep /usr/bin/chrt
OCaml
Ongoing
-
ocaml/ocaml#9876 Do not cache young_limit in a processor register
The PR removes the caching of
young_limit
in a register for ARM64, PowerPC and RISC-V ports, as it is problematic during polling for signals and inter-domain communication in Multicore OCaml.
Completed
-
ocaml/ocaml#9756 Garbage collectors colour change
The gray colour scheme in the Garbage Collector has been removed to facilitate merging with the Multicore OCaml collector. The existing benchmarks in Sandmark suite that did overflow the mark stack are show in the below illustration, and there is little negative impact on the change.
As always, we would like to thank all the OCaml developers and users in the community for their continued support and contribution to the project. Be well!
Acronyms
- ARM: Advanced RISC Machine
- BFS: Breadth First Search
- DEC: Domain Execution Context
- GC: Garbage Collector
- JSON: JavaScript Object Notation
- NUMA: Non-Uniform Memory Access
- OPAM: OCaml Package Manager
- OS: Operating System
- PR: Pull Request
- RISC-V: Reduced Instruction Set Computing - V
- SSSP: Single Source Shortest Path