OCaml Multicore - January 2021

Welcome to a double helping of the multicore monthlies, with December 2020 and January 2021 bundled together (the team collectively collapsed into the end of year break for a well deserved rest). We encourage you to review all the previous monthly updates for 2020 which have been compiled by @shakthimaan, @kayceesrk, and me.

Looking back over 2020, we achieved a number of major milestones towards upstreaming multicore OCaml. The major highlights include the implementation of the eventlog tracing system to make debugging complex parallelism practical, the enormous rebasing of from OCaml 4.06 to 4.11, a chapter on parallel programming, the publication of "Retrofitting Parallelism onto OCaml" at ICFP 2020, the production use of the Sandmark benchmark, and the implementation of system threading integration. While all this was happening in the multicore code trees, the upstreaming efforts into mainline OCaml also went into full gear, with @xavierleroy leading the efforts from the core team to ensure that the right pieces went into various releases of OCaml with the same extensive code review as any other features get.

The end of 2020 saw enhancements and updates to the ecosystem libraries, with more tooling becoming available. In particular, we would like to thank:

  • @mattpallissard for getting merlin and dot-merlin-reader working with Multicore OCaml 4.10. This makes programming using OCaml Platform tools like the VSCode plugin much more pleasant.
  • @eduardorfs for testing the no-effect-syntax Multicore OCaml branch with a ReasonML project.

@kayceesrk also gave a couple of public talks online:

We're really grateful to the OCaml core developers for giving this effort so much of their time and focus in 2020! We're working on a broader plan for 2021's exciting multicore roadmap which will be included in the next monthly after a core OCaml developer's meeting ratifies it soon. The broad strategy remains consistent: putting pieces of functionality steadily into each upcoming OCaml release so that each can be reviewed and tested in isolation, ahead of the OCaml 5.0 release which will include domains parallelism.

With OCaml 4.12 out in beta, our January has mainly been spent tackling some of the big pieces needed for OCaml 4.13. In particular, the safe points PR has seen a big update (and corresponding performance improvements), and we have been working on the design and implementation of Domain-Local Allocation Buffers (DLAB). We've also started the process of figuring out how to merge the awesome sequential best-fit allocator with our multicore major GC, to get the best of both worlds in OCaml 5.0. The multicore IO stack has also restarted development, with focus on Linux's new io_uring kernel interface before retrofitting the old stalwart epoll and kqueue interfaces.

Tooling-wise, the multicore Merlin support began in December is now merged, thanks to @mattpallissard and @eduardorfs. We continue to work on the enhancements for Sandmark 2.0 benchmarking suite for an upcoming alpha release -- @shakthimaan gave an online seminar about these improvements to the multicore team which has been recorded and will be available in the next monthly for anyone interested in contributing to our benchmarking efforts.

As with previous reports, the Multicore OCaml updates are listed first for the month of December 2020 and then January 2021. The upstream OCaml ongoing work is finally mentioned for your reference after the multicore-tree specific pieces..

December 2020

Multicore OCaml

Ongoing

Ecosystem

  • ocaml-multicore/lockfree#6 Current status and potential improvements

    An RFC that lists the current status of the lockfree library, and possible performance improvements for the Kcas dependency, test suite and benchmarks.

  • ocaml-multicore/lockfree#7 Setup travis CI build

    A .travis.yml file, similar to the one in https://github.com/ocaml-multicore/domainslib/ needs to be created for the CI build system.

  • ocaml-multicore/effects-examples#20 Add WebServer example

    An open task to add the httpaf based webserver implementation to the effects-examples repository.

  • ocaml-multicore/effects-examples#21 Investigate CI failure

    The CI build fails on MacOS with a time out, but, it runs fine on Linux. An on-going investigation is pending.

  • ocaml-multicore/multicore-opam#39 Multicore Merlin

    Thanks to @mattpallissard (Matt Pallissard) and @eduardorfs (Eduardo Rafael) for testing merlin and dot-merlin-reader, and to get it working with Multicore OCaml 4.10! The same has been tested with VSCode and Atom, and a screenshot of the UI is shown below. PR 39 Multicore Merlin Screenshot|435x350

API

  • ocaml-multicore/ocaml-multicore#448 Reintroduce caml_stat_accessors in the C API

    The caml_stat_minor_words, caml_stat_promoted_words, caml_allocated_words caml_stat_minor_collections fields are not exposed in Multicore OCaml. This is a discussion to address possible solutions for the same.

  • ocaml-multicore/ocaml-multicore#459 Replace caml_root API with global roots

    A work-in-progress to convert variables of type caml_root to value, and to register them as global root or generational global root, in order to remove the caml_root API entirely.

Sundries

  • ocaml-multicore/ocaml-multicore#450 "rogue" systhreads and domain termination

    An RFC to discuss on the semantics of domain termination for non-empty thread chaining. In Multicore OCaml, a domain termination does not mean the end of a program, and slot reuse adds complexity to the implementation.

  • ocaml-multicore/ocaml-multicore#451 Note for OCaml 5.0: Get rid of compatibility.h

    OCaml Multicore removed modify and initialize from compatibility.h, and this is a tracking issue to remove compatibility.h for OCaml 5.0.

  • ocaml-multicore/ocaml-multicore#458 no-effect-syntax: Remove effects from typedtree

    The PR removes the the effect syntax use from typedtree.ml, and enables external applications that use the AST to work with domains-only Multicore OCaml.

  • ocaml-multicore/ocaml-multicore#461 Remove stw/leader_collision events from eventlog

    A patch to make viewing and analyzing the logs better by removing the stw/leader_collision log messages.

Completed

  • ocaml-multicore/effects-examples#23 Migrate to dune

    The build scripts were using OCamlbuild, and they have been ported to now use dune.

  • ocaml-multicore/ocaml-multicore#402 Split handle_gc_interrupt into handling remote and polling sections

    The PR includes the addition of caml_poll_gc_work that contains the polling of GC work done in caml_handle_gc_interrupt. This facilitates handling of interrupts recursively without introducing new state.

  • ocaml-multicore/ocaml-multicore#439 Systhread lifecycle work

    The improvement fixes a race condition in caml_thread_scan_roots when two domains are initializing, and rework has been done for improving general resource handling and freeing of descriptors and stacks.

  • ocaml-multicore/ocaml-multicore#446 Collect GC stats at the end of minor collection

    The GC statistics is collected at the end of a minor collection, and the double buffering of GC sampled statistics has been removed. The change does not have an impact on the existing benchmark runs as observed against stock OCaml from the following illustration:

    PR 446 Graph Image|690x317

  • ocaml-multicore/ocaml-multicore#454 Respect ASM_CFI_SUPPORTED flag in amd64

    The CFI directives in amd64.S are now guarded by ASM_CFI_SUPPORTED, and thus compilation with --disable-cfi will now provide a clean build.

  • ocaml-multicore/ocaml-multicore#455 No blocking section on fork

    A patch to handle the case when a rogue thread attempts to take over the thread masterlock and to prevent a child thread from moving to an invalid state. Dune can now be used safely with Multicore OCaml.

Benchmarking

Ongoing

  • ocaml-bench/rungen#1 Fix compiler warnings and errors for clean build

    The patch provides minor fixes for a clean build of rungen with dune to be used with Sandmark 2.0.

  • ocaml-bench/orun#2 Fix compiler warnings and errors for clean build

    The unused variables and functions have been removed to remove all the warnings and errors produced when building orun with dune.

  • ocaml-bench/sandmark#198 Noise in Sandmark

    An RFC to measure the noise between multiple execution runs of the benchmarks to better understand the performance with various hardware configuration settings, and with ASLR turned on and off.

  • ocaml-bench/sandmark#200 Global roots microbenchmark

    The patch includes globroots_seq.ml, globroots_sp.ml, and globroots_mp.ml that adds microbenchmarks to measure the efficiency of global root scanning.

  • We are continuing to integrate the existing Sandmark benchmark test suite with a Sandmark 2.0 native dune build environment for use with opam compiler switch environment. The existing benchmarks have been ported to the same to use their respective dune files. The orun and rungen packages now live in separate GitHub repositories.

Completed

  • ocaml-bench/sandmark#196 Filter benchmarks based on tag

    The benchmarks can now be filtered based on tags instead of custom target .json files. You can now build the benchmarks using the following commands:

    $ TAG='"run_in_ci"' make run_config_filtered.json 
    $ RUN_CONFIG_JSON=run_config_filtered.json make ocaml-versions/4.10.0+multicore.bench
    
  • ocaml-bench/sandmark#201 Fix compiler version in CI

    A minor update in .drone.yml to use ocaml-versions/4.10.0+multicore.bench in the CI for 4.10.0+multicore+serial.

OCaml

Ongoing

  • ocaml/ocaml#9876 Do not cache young_limit in a processor register

    This PR for the removal of young_limit caching in a register for ARM64, PowerPC and RISC-V ports hardware is currently under review.

January 2021

Multicore OCaml

Ongoing

  • ocaml-multicore/ocaml-multicore#464 Replace Field_imm with Field

    The patch replaces the Field immediate use with Field from the concurrent minor collector.

  • ocaml-multicore/ocaml-multicore#468 Finalisers causing segfault with multiple domains

    An on-going test case where Finalisers cause segmentation faults with multiple domains.

  • The design and implementation of Domain-Local Allocation Buffers (DLAB) is underway, and the relevant notes on the same are available in the following DLAB Wiki.

Completed

Ecosystem

  • ocaml-bench/rungen#1 Fix compiler warnings and errors for clean build

    Minor fixes for a clean build of rungen with dune to be used with Sandmark 2.0.

  • ocaml-bench/orun#2 Fix compiler warnings and errors for clean build

    A patch to remove unused variables and functions without any warnings and errors when building orun with dune.

  • ocaml-bench/rungen#2 Added meta files for dune-release lint

    The dune-release lint checks for rungen now pass with the inclusion of CHANGES, LICENSE and updates to rungen.opam files.

  • ocaml-bench/orun#3 Add meta files for dune-release lint

    The CHANGES, LICENSE, README.md and orun.opam files have been added to prepare the sources for an opam.ocaml.org release.

  • ocaml-multicore/multicore-opam#39 Multicore Merlin

    Thanks to @mattpallissard (Matt Pallissard) and @eduardorfs (Eduardo Rafael) for testing merlin and dot-merlin-reader, and to get it working with Multicore OCaml 4.10! The changes work fine with VSCode and Atom. The corresponding PR#40 is now merged.

  • ocaml-multicore/ocaml-multicore#45 Merlin and OCaml-LSP installation instructions

    The README.md file has been updated to include installation instructions to use Merlin and OCaml LSP Server.

Sundries

  • ocaml-multicore/ocaml-multicore#458 no-effect-syntax: Remove effects from typedtree

    The PR enables external applications that use the AST to work with domains-only Multicore OCaml, and removes the effect syntax use from typedtree.ml.

  • ocaml-multicore/ocaml-multicore#461 Remove stw/leader_collision events from eventlog

    The stw/leader_collision log messages have been cleaned up to make it easier to view and analyze the logs.

  • ocaml-multicore/ocaml-multicore#462 Move from Travis to GitHub Actions

    The continuous integration builds are now updated to use GitHub Actions instead of Travis CI, in order to be similar to that of upstream CI.

  • ocaml-multicore/ocaml-multicore#463 Minor GC: Restrict global roots scanning to one domain

    The live domains scan all the global roots during a minor collection, and the patch restricts the global root scanning to just one domain. The sequential and parallel macro benchmark results are given below:

PR 463 OCaml Multicore Sequential |690x318

PR 463 OCaml Multicore Parallel |690x458

  • ocaml-multicore/ocaml-multicore#467 Disable the pruning of the mark stack

    A PR to disable the mark stack overflow for a concurrency bug that occurs when remarking a pool in another domain when that domain also does allocations.

Benchmarking

Ongoing

  • ocaml-bench/sandmark#202 Add bench clean target in the Makefile

    A benchclean target has been added to the Makefile to only remove _build and _results. The _opam folder is retained with the required packages and dependencies installed, so that the benchmarks can be quickly re-built and executed.

  • ocaml-bench/sandmark#203 Implement ITER support

    The use of ITER has been correctly implemented with multiple instances of the benchmarks being built, and to repeat the executions of the benchmarks. This helps to take averages from multiple runs for metrics. For example, using ITER=2 produces two .summary.bench files as shown below:

    $ ls _build/
      4.10.0+multicore_1  4.10.0+multicore_2  log
    
    $ ls _results/
      4.10.0+multicore_1.orun.summary.bench  4.10.0+multicore_2.orun.summary.bench
    
  • ocaml-bench/sandmark#204 Adding layers.ml as a benchmark to Sandmark

    Th inclusion of Irmin layers benchmark and its dependencies into Sandmark. This is a work-in-progress.

  • We are continuing the enhancements for Sandmark 2.0 that uses a native dune to build and execute the benchmarks, and also port and test with the current Sandmark configuration files. The orun and rungen packages have been moved to their respective repositories. The use of a meta header entry to the .summary.bench file, ITER support, and package override features have been implemented.

Completed

  • ocaml-bench/sandmark#200 Global roots microbenchmark

    The implementation of globroots_seq.ml, globroots_sp.ml, and globroots_mp.ml to measure the efficiency of global root scanning has been added to the microbenchmarks.

OCaml

Ongoing

  • ocaml/ocaml#10039 Safepoints

    An update to the draft Safepoints implementation that uses the prologue eliding algorithm and is now rebased to trunk.The runtime benchmark results on sherwood (an AMD EPYC 7702) and thunderx (a Cavium ThunderX CN8890) are shown below:

    PR 10039 OCaml Sherwood |690x391 PR 10039 OCaml ThunderX |690x389

Completed

  • ocaml/ocaml#9876 Do not cache young_limit in a processor register

    The PR removes the caching of young_limit in a register for ARM64, PowerPC and RISC-V ports hardware.

Our thanks to all the OCaml users and developers in the community for their continued support and contribution to the project, and we look forward to working with you in 2021!

Acronyms

  • API: Application Programming Interface
  • ARM: Advanced RISC Machine
  • ASLR: Address Space Layout Randomization
  • AST: Abstract Syntax Tree
  • CFI: Call Frame Information
  • CI: Continuous Integration
  • GC: Garbage Collector
  • ICFP: International Conference on Functional Programming
  • JSON: JavaScript Object Notation
  • OPAM: OCaml Package Manager
  • PR: Pull Request
  • RFC: Request For Comments
  • RISC-V: Reduced Instruction Set Computing - V
  • UI: User Interface