OCaml Multicore - June 2021
Welcome to the June 2021 Multicore OCaml monthly report! This month's update along with the previous update's have been compiled by @avsm, @ctk21, @kayceesrk and @shakthimaan.
Our overall goal remains on track for generating a preview tree for OCaml 5.0 multicore domains-only parallelism over the summer.
Ecosystem compatibility for 4.12.0+domains
In May's update, I noted that our focus was now on adapting the ecosystem to work well with multicore, and I'm pleased to report that this is progressing very well.
-
The 4.12.0+domains multicore compiler variant has been merged into mainline opam-repo, so you can now
opam switch 4.12.0+domainsdirectly. Thebase-domainspackage is also available to mark your opam project as requiring theDomainsmodule, so you can even publish your early multicore-capable libraries to the mainline opam repository now. -
The OCaml standard library was made safe for parallel use by multiple domains (wiki, issue, fixes); and in particularly the
FormatandRandommodules. These modules were the main sources of incompatibilities we found when running existing OCaml code with multiple domains.
-
The
Domainmodule has had its interface slimmed with the removal ofcritical_section,wait,notifywhich has allowed significant runtime simplification. The GC C-API interface is now implemented and this means that Jane Street'sBase,Core, andAsyncnow compile on4.12+domainswithout modifications; for exampleopam install patdiffworks out of the box on a4.12+domainsswitch! -
Domainslib 0.3.0 has been released which incorporates multiple improvements including the work-stealing deques for task distribution. The performance of reading domain local variables has also been improved with a primitive and a O(1) lookup. The chapter on
Parallel Programming in Multicore OCamlhas been updated to reflect the latest developments with Domainslib.
This means that big application stacks should now compile pretty well with 4.12.0+domains (applications like the Tezos node and patdiff exercise a lot of the dependency trees in opam). If you do find incompatibilities, please do report them on the repository.
4.12.0+domains+effects
Most of our focus has been on getting the domains-only trees (for OCaml 5.0) up to speed, but we have been progressing the direct-style effects-based IO stack as well.
- The
uringbindings to Linux Io_uring are now available on opam-repository, so you can try it out on sequential OCaml too. A good mini-project would be to add a uring backend to the existing Async or Lwt engines, if anyone wants to try a substantial contribution. - The
eiolibrary is fairly usable now, for both filesystem and networking. We've submitted a talk to the OCaml workshop to dive into the innards of it in more detail, so stay tuned for that in the coming months if accepted. The main changes here have been performance improvements, and the HTTP stack is fairy competitive with (e.g.)rust-hyper.
We will soon also have a variant of this tree that removes the custom effect syntax and implements the fibres (the runtime piece) as Obj functions. This will further improve ecosystem compatibility and allow us to build direct-style OCaml libraries that use fibres internally to provide concurrency, but without exposing any use of effects in their interfaces.
Benchmarking and performance
We are always keen to get more benchmarks that exercise multicore features; if you want to try multicore out and help write benchmarks there are some suggestions on the wiki. We've got a private server which runs a Sandmark nightly benchmark pipeline with Jupyter notebooks, which we can give access to anyone who submits benchmarks. We continue to test integration of Sandmark with current-bench for better integration with GitHub PRs.
As always, the Multicore OCaml ongoing and completed tasks are listed first, which are then followed by updates from the ecosystem and their associated libraries. The Sandmark benchmarking and nightly build efforts are then mentioned. Finally, the status of the upstream OCaml Safepoints PR is provided for your reference.
Multicore OCaml
Ongoing
-
ocaml-multicore/ocaml-multicore#573 Backport trunk safepoints PR to multicore
A work-in-progress to backport the Safepoints PR from ocaml/ocaml to Multicore OCaml.
-
ocaml-multicore/ocaml-multicore#584 Modernise signal handling
A patch to bring the Multicore OCaml signals implementation closer to upstream OCaml.
-
ocaml-multicore/ocaml-multicore#598 Do not deliver signals to threads that have blocked them
A draft PR to not deliver signals to threads that are in a blocked state. The without-systhreads case needs to be handled.
-
ocaml-multicore/ocaml-multicore#600 Expose a few more GC variables in headers
The
caml_young_start,caml_young_limitandcaml_minor_heap_wszvariables have been defined in the runtime. -
ocaml-multicore/ocaml-multicore#601 Domain better participants
The iterations
0(Max_domains)from STW signalling and0(n_running_domains)from domain creation have now been removed. -
ocaml-multicore/ocaml-multicore#603 Systhreads tick thread
An initial draft PR for porting the tick thread to Multicore OCaml.
Completed
Enhancements
-
ocaml-multicore/ocaml-multicore#552 Add a
force_instrumented_runtimeoption to configureThe
configurescript now accepts a new--enable-force-instrumented-runtimeoption to facilitate use of the instrumented runtime on linker invocations to obtain event logs. -
ocaml-multicore/ocaml-multicore#558 Refactor
Domain.{spawn/join}to use no critical sectionsThe critical sections in
Domain.{spawn/join}and the use ofDomain.waithave been removed. -
ocaml-multicore/ocaml-multicore#561 Slim down
Domain.Sync: removewait,notify,critical_sectionA breaking change in
Domain.Syncthat removescritical_section,notify,wait,wait_for, andwait_until. This is to remove the need for domain-to-domain messaging in the runtime. -
ocaml-multicore/ocaml-multicore#576 Including Git hash in runtime
A Git hash is now printed in the runtime as shown below:
$ ./boot/ocamlrun -version The OCaml runtime, version 4.12.0+multicore Built with git hash 'ae3fb4bb6' on branch 'runtime_version' with tag '<tag unavailable>' -
ocaml-multicore/ocaml-multicore#579 Primitive for fetching DLS root
A new primitive has been implemented for fetching DLS, and is now a single
movinstruction onamd64.
Upstream
-
ocaml-multicore/ocaml-multicore#555 runtime:
CAML_TRACE_VERSIONis now set to a Multicore specific valueA
CAML_TRACE_VERSIONis defined to distinguish between Multicore OCaml and trunk for the runtime. -
ocaml-multicore/ocaml-multicore#581 Move our usage of inline to
Caml_inlineWe now use
Caml_inlinefor all the C inlining in the runtime to align with upstream OCaml. -
ocaml-multicore/ocaml-multicore#589 Reintroduce
adjust_gc_speedThe
caml_adjust_gc_speedfunction from trunk has been reintroduced to the Multicore OCaml runtime. -
ocaml-multicore/ocaml-multicore#590 runtime: stub
caml_stat_*interfaces in gc_ctrlThe creation of
caml_stat_*stub functions in gc_ctrl.h to introduce a compatibility layer for GC stat utilities that are available in trunk.
Fixes
-
ocaml-multicore/ocaml-multicore#562 Import fixes to the minor heap allocation code from DLABs
The multiplication factor of two used for minor heap allocation has been removed, and the
Minor_heap_maxlimit from config.h is no longer converted to a byte size for Multicore OCaml. -
ocaml-multicore/ocaml-multicore#593 Fix two issues with ephemerons
A patch to simplify ephemeron handover during termination.
-
ocaml-multicore/ocaml-multicore#594 Fix finaliser handover issue
The
caml_finish_major_cycleis used leading to the major GC phasePhase_sweep_and_mark_mainfor the correct handoff of finalisers. -
ocaml-multicore/ocaml-multicore#596 systhreads: do
st_thread_idafter initializing the thread descriptorThe thread ID was set even before initializing the thread descriptor, and this PR fixes the order.
-
ocaml-multicore/ocaml-multicore#604 Fix unguarded
caml_skiplist_emptyincaml_scan_global_young_rootsThe PR introduces a
caml_iterate_global_rootsfunction and fixes a locking bug with global roots.
Cleanups
-
ocaml-multicore/ocaml-multicore#567 Simplify some of the minor_gc code
The
not_alonevariable has been cleaned up with a simplification to the minor_gc.c code. -
ocaml-multicore/ocaml-multicore#580 Remove struct domain
The
caml_domain_stateis now the single source of domain information with the removal ofstruct domain.struct dom_internalis no longer leaking across the runtime. -
ocaml-multicore/ocaml-multicore#583 Removing interrupt queues
The locking of
struct_interruptorwhen receiving interrupts and the use ofstruct interrupthave been removed, simplifying the implementation of domains.
Sundries
-
ocaml-multicore/ocaml-multicore#582 Make global state domain-local in Random, Hashtbl and Filename
The Domain-Local is now set as the default state in
Random,HashtblandFilename. -
ocaml-multicore/ocaml-multicore#586 Make the state in Format domain-local
The default state in
Formatis now set to Domain-Local. -
ocaml-multicore/ocaml-multicore#595 Implement
caml_alloc_dependent_memoryandcaml_free_dependent_memoryDependent memory are the blocks of heap memory that depend on the GC (and finalizers) for deallocation. The
caml_alloc_dependent_memoryandcaml_free_dependent_memoryhave been added to runtime/memory.c.
Ecosystem
Ongoing
-
ocaml-multicore/eventlog-tools#3 Use ocaml/setup-ocaml@v2
An update to
.github/workflows/main.ymlto build for ocaml/setup-ocaml@v2. -
ocaml-multicore/parallel-programming-in-multicore-ocaml#7 Add a section on Domain-Local Storage
The README.md file now includes a section on Domain-Local Storage.
-
ocaml-multicore/eio#26 Grand Central Dispatch Backend
The implemention of the Grand Central Dispatch (GCD) backend for Eio is a work-in-progress.
-
ocaml-multicore/domainslib#34 Fix initial value accounting in
parallel_for_reduceA patch to fix the initial value in
parallel_for_reduceas it was being accounted for multiple times. -
ocaml-multicore/domainslib#36 Switch to default
RandommoduleThe library has been updated to use the default
Randommodule as it stores its state in Domain-Local Storage which can be called from multiple domains. The Sandmark results are given below:

-
ocaml-multicore/multicore-opam#56 Base-effects depends strictly on 4.12
A query on the use of strict 4.12.0 lower bound for OCaml in
base-effects.base/opam. -
ocsigen/lwt#860 Lwt_domain: An interfacet to Multicore parallelism
The
Lwt_domainmodule has been ported to domainslib Task pool for performing computations to CPU cores using Multicore OCaml's Domains. A few benchmark results obtained on an Intel Xeon Gold 5120 processor with 24 isolated cores is shown below:
Completed
Ocaml-Uring
The ocaml-uring repository contains bindings to io_uring for
OCaml.
-
ocaml-multicore/ocaml-uring#21 Add accept call
The
acceptcall has been added to uring along with the inclusion of theunixlibrary as a dependency. -
ocaml-multicore/ocaml-uring#22 Add support for cancellation
A
cancelmethod is added to request jobs for cancellation. The queuing operations and tests have also been updated. -
ocaml-multicore/ocaml-uring#24 Sort out cast
The
Int_valhas been changed toLong_valto remove the need for sign extension instruction on 64-bit platforms. -
ocaml-multicore/ocaml-uring#25 Fix test_cancel
A
with_uringfunction is added with aqueue_depthargument to handle tests for cancellation. -
ocaml-multicore/ocaml-uring#26 Add
openat2The
openat2method has been added giving access to all the Linux open and resolve flags. -
ocaml-multicore/ocaml-uring#27 Fine-tune C flags for better performance
The CFLAGS have been updated for performance improvements. The following results are observed for the noop benchmark:
Before: noop 10000 │ 1174227.1170 ns/run│ After: noop 10000 │ 920622.5802 ns/run│ -
ocaml-multicore/ocaml-uring#28 Don't allow freeing the ring while it is in use
The ring is added to a global set on creation and is cleaned up on exit. Also, invalid cancellation requests are checked before allocating a slot.
-
ocaml-multicore/ocaml-uring#29 Replace iovec with cstruct and clean up the C stubs
The
readvandwritevnow accept a list of Cstructs which allow access to sub-ranges of bigarrays, and to work with multiple buffers. The handling of OOM errors has also been improved. -
ocaml-multicore/ocaml-uring#30 Fix remaining TODOs in API
The
readandwritemethods have been renamed toread_fixedandwrite_fixedrespectively. TheRegion.to_cstructhas been added as an alternative to creating a sub-bigarray. An exception is now raised if the user requests for a larger size chunk. -
ocaml-multicore/ocaml-uring#31 Use
caml_enter_blocking_sectionwhen waitingThe
caml_enter_blocking_sectionandcaml_leave_blocking_sectionare used when waiting, which allows other threads to execute and the GC can run in the case of Multicore OCaml. -
ocaml-multicore/ocaml-uring#32 Compile
uringusing the C flags from OCamlUse the OCaml C flags when building uring, and remove the unused dune file.
-
ocaml-multicore/ocaml-uring#33 Prepare release
The CHANGES.md, README.md, dune-project and uring.opam files have been updated to prepare for a release.
-
ocaml-multicore/ocaml-uring#34 Convert
liburingto subtreeWe now use a subtree instead of a submodule so that the ocaml-uring can be submitted to the opam-repository.
Parallel Programming in Multicore OCaml
-
ocaml-multicore/parallel-programming-in-multicore-ocaml#5
num_domainstonum_additional_domainsThe documentation and code examples have been updated to now use
num_additional_domainsinstead ofnum_domains. -
ocaml-multicore/parallel-programming-in-multicore-ocaml#6 Update latest information about compiler versions
The compiler versions in the README.md have been updated to use 4.12 and its variants.
-
ocaml-multicore/parallel-programming-in-multicore-ocaml#8 Nudge people to the default chunk_size setting
The recommendation is to use the default
chunk_sizewhen usingparallel_for, especially when the number of domains gets larger. -
ocaml-multicore/parallel-programming-in-multicore-ocaml#9 Eventlog section updates
The
eventlog-toolslibrary can now be used for parsing trace files since Multicore OCaml includes CTF tracing support from trunk. The relevant information has been updated in the README.md file.
Eio
The eio library provides an effects-based parallel IO stack for
Multicore OCaml.
Additions
-
ocaml-multicore/eio#41 Add eio.mli file
A
lib_eio/eio.mlifile containing modules forGeneric,Flow,Network, andStdenvhave been added to the repository. -
ocaml-multicore/eio#45 Add basic domain manager
The PR allows you to run a CPU-intensive task on another domain, and adds a mutex to
tracelnto avoid overlapping output. -
ocaml-multicore/eio#46 Add Eio.Time and allow cancelling sleeps
Use
psqinstead ofbheaplibrary to allow cancellations. TheEio.Timemodule has been added tolib_eio/eio.ml. -
ocaml-multicore/eio#53 Add
Switch.sub_optA new
Switch.sub_optimplementation has been added to allow running a function with a new switch. Also,Switch.subhas been modified so that it is not a named argument. -
ocaml-multicore/eio#54 Initial FS abstraction
A module
Dirhas been added to allow file system abstraction along with the ability to create files and directories. On Linux, it usesopenat2andRESOLVE_BENEATH. -
ocaml-multicore/eio#56 Add
with_open_in,with_open_outandwith_open_dirhelpersThe
Eio.Dirmodule now contains awith_open_in,with_open_outandwith_open_dirhelper functions. -
ocaml-multicore/eio#58 Add
Eio_linux.{readv, writev}The
Eio_linux.{readv, writev}functions have been added tolib_eio_linux/eio_linux.mlwhich uses the new OCaml-Uring API. -
ocaml-multicore/eio#59 Add
Eio_linux.noopand a simple benchmarkA
Eio_linux.noopimplementation has been added for benchmarking Uring dispatch. -
ocaml-multicore/eio#61 Add generic Enter effect to simplify scheduler
A
Entereffect has been introduced to simplify the scheduler operations, and this does not have much effect on the noop benchmark as illustrated below:

Improvements
-
ocaml-multicore/eio#38 Rename Flow.write to Flow.copy
The code and documentation have been updated to rename
Flow.writetoFlow.copyfor better clarity. -
ocaml-multicore/eio#36 Use uring for accept
The
enqueue_acceptfunction now usesUring.acceptalong with theeffect Accept. -
ocaml-multicore/eio#37 Performance improvements
Optimisation for
Eunix.freeand process completed events withUring.peekfor better performance results. -
ocaml-multicore/eio#48 Simplify
SuspendoperationThe
Suspendeffect has been simplified by replacing the olderAwaitandYieldeffects with the code from Eio. -
ocaml-multicore/eio#52 Split Linux support out to
eio_linuxlibraryeunixnow has common code that is shared by different backends, andeio_linuxprovides a Linux io-uring backend. The tests and the documentation have been updated to reflect the change. -
ocaml-multicore/eio#57 Reraise exceptions with backtraces
Added support to store a reference to a backtrace when a switch catches an exception. This is useful when you want to reraise the exception later.
-
ocaml-multicore/eio#60 Simplify handling of completions
The PR adds
JobandJob_no_cancelintype io_jobalong with additionalLog.debugmessages.
Cleanups
-
ocaml-multicore/eio#42 Merge fibreslib into eio
The
Fibreslibcode is now merged witheio. You will now need to openEio.Stdinstead of openingFibreslib. -
ocaml-multicore/eio#47 Clean up the network API
The network APIs have been updated with few changes such as renaming
bindtolisten, replacingUnix.shutdown_commandwith our own type in Eio API, and replacingUnix.sockaddrwith a custom type. -
ocaml-multicore/eio#49 Remove
Eio.Private.WaitersandEio.Private.SwitchThe
Eio.Private.WaitersandEio.Private.Switchmodules have been removed, and waiting is now handled using the Eio library. -
ocaml-multicore/eio#55 Some API and README cleanups
The PR has multiple cleanups and documentation changes. The README.md has been modified to use
Eio.Flow.shutdowninstead ofEio.Flow.close, and a Time section has been added. TheEio.Networkmodule has been changed toEio.Net. TheTime.nowandTime.sleep_untilmethods have been added tolib_eio/eio.ml.
Documentation
-
ocaml-multicore/eio#43 Add design note about determinism
The README.md documentation has been updated with few design notes on Determinism.
-
ocaml-multicore/eio#50 README improvements
Updated README.md and added
doc/prelude.mlfor use with MDX.
Handling Cancellation
-
ocaml-multicore/eio#39 Allow cancelling accept operations
The PR now supports cancelling the server accept and read operations.
-
ocaml-multicore/eio#40 Support cancelling the remaining Uring operations
The cancellation request of
connect,wait_readableandawait_writableUring operations is now supported. -
ocaml-multicore/eio#44 Fix read-cancel test
The
ENOENTvalue has been correctly fixed to use -2, and the documentation for cancelling the read request has been updated. -
ocaml-multicore/eio#51 Getting
EALREADYfrom cancel is not an errorHandle
EALREADYcase inlib_eunix/eunix.mlwhere an operation got cancelled while in progress.
Sundries
-
ocaml-multicore/eventlog-tools#2 Add a pausetimes tool
A
eventlog_pausetimestool has been added toeventlog-toolsthat takes a directory of eventlog files and computes the mean, max pause times, as well as the distribution up to the 99.9th percentiles. For example:ocaml-eventlog-pausetimes /home/engil/dev/ocaml-multicore/trace3/caml-426094-* name { "name": "name", "mean_latency": 718617, "max_latency": 33839379, "distr_latency": [191,250,707,16886,55829,105386,249272,552640,1325621,13312993,26227671] } -
ocaml-multicore/kcas#9 Backoff with
cpu_relaxThe
Domain.Sync.{critical_section, wait_for}have now been replaced withDomain.Sync.cpu_relax, which matches the implementation with lockfree. -
ocaml-multicore/retro-httpaf-bench#10 Add Eio benchmark
The Eio benchmark has now been added to the retro-httpaf-bench GitHub repository.
-
ocaml-multicore/retro-httpaf-bench#11 Do a recursive checkout in the CI build
The
build_image.ymlworkflow has been updated to perform a recursive checkout of the submodules for the CI build. -
domainslib#29 Task stealing with Chase Lev deques
The task-stealing Chase Lev deques for scheduling tasks across domains is now merged, and shows promising results on machines with 128 CPU cores.
-
ocaml-multicore/multicore-opam#55 Add 0.3.0 release of domainslib
The opam file for
domainslib.0.3.0has been added to the multicore-opam repository.
Benchmarking
Ongoing
-
ocaml-bench/sandmark-nightly#1 Cannot alter comparison input values
The
TimestampandVariantfields in the dropdown option in theparallel_nightly.ipynbnotebook get reset when recomputing the whole workbook.

-
ocaml-bench/sandmark#230 Build for 4.13.0+trunk with dune.2.8.1
The
ocaml-migrate-parsetree.2.2.0andppxlib.0.22.2packages are now available for 4.13.0+trunk, and we are currently porting the Irmin Layers benchmark in Sandmark from using Irmin 2.4 to 2.6. -
ocaml-bench/sandmark#231 View results for a set of benchmarks in the nightly notebooks
A feature request to filter the list of benchmarks when using the Sandmark Jupyter notebooks.
-
ocaml-bench/sandmark#233 Update pausetimes_multicore to fit with the latest Multicore changes
The pausetimes are now updated for both the 4.12.0 upstream and 4.12.0 Multicore branches to use the new Common Trace Format (CTF). The generated graphs for both the sequential and parallel pausetime results are illustrated below:

-
ocaml-bench/sandmark#235 Update selected benchmarks as a set for baseline benchmark
The baseline benchmark for comparison should only be one from the user selected benchmarks in the Jupyter notebooks.

-
ocaml-bench/sandmark#236 Implement pausetimes support in sandmark_nightly
The sequential and parallel pausetimes graph results need to be implemented in the Sandmark nightly Jupyter notebooks. The results are similar to the Figures 10 and 12 produced in the Retrofitting Parallelism ont OCaml, ICFP 2020 paper.
-
ocaml-bench/sandmark#237 Run sandmark_nightly on a larger machine
The testing of Sandmark nightly sequential and parallel benchmark runs have been done on a 24-core machine, and we would like to deploy the same on a 64+ core machine to benefit from the recent improvements to Domainslib.
-
ocaml-bench/sandmark#241 Switch to default Random module
An on-going discussion on whether to switch to using
Random.Statefor the sequential Minilight, global roots micro-benchmarks and Evolutionary Algorithm.
Completed
-
ocaml-bench/sandmark#232
num_domains->num_additional_domainsThe benchmarks have been updated to now use
num_additional_domains, to be consistent with the naming in Domainslib. -
ocaml-bench/sandmark#239 Port grammatrix to Task pool
The Multicore Grammatrix benchmark has now been ported to use Domainslib Task pool. The time and speedup graphs are given below:

OCaml
Ongoing
-
ocaml/ocaml#10039 Safepoints
The PR is currently being testing and evaluated for both ARM64 and PowerPC architectures, in particular, the branch relaxations applied to
Ipollinstructions.
Our thanks to all the OCaml users, developers and contributors in the community for their continued support to the project. Stay safe!