OCaml Compiler - March 2022 to September 2022
I’m happy to publish the sixth issue of the “OCaml compiler development newsletter”. You can find all issues using the tag compiler-newsletter .
Note: the content of the newsletter is by no means exhaustive, only a few of the compiler maintainers and contributor had the time to write something, which is perfectly fine.
Feel free of course to comment or ask questions!
If you have been working on the OCaml compiler and want to say something, please feel free to post in this thread! If you would like me to get in touch next time I prepare a newsletter issue (some random point in the future), please let me know by Discuss message or by email at (gabriel.scherer at gmail).
Context
The Multicore merge is behind us now. We are in the final preparation stages for 5.0 (but by no means the end of the Multicore-related work, many things were left to do in 5.1 and further releases). The non-Multicore-development has been restarting slowly but surely.
@yallop Jeremy Yallop
We're starting up the modular macros work at Cambridge again, with the aim of adding support for typed, hygienic, compile-time computation to OCaml. Back in 2015 we presented our original design at the OCaml Users and Developers Workshop, and we subsequently went on to develop a prototype in a branch of the OCaml compiler. We're planning to complete, formalise, and fully implement the design in the coming months.
@dra27 David Allsopp
Various bits of house-keeping on the compiler distribution have been managed for 5.0, taking advantage of the major version increment. All the compiler's C symbols are now prefixed caml_
, vastly reducing the risks of conflicts with other libraries (#10926 and #11336). The 5.x compiler now installs all of its additional libraries (Unix, Str, etc.) to separate directories, with a friendly warning that you need to specify -I +unix
etc. which makes life slightly easier for build systems (#11198) and the compiler now also ships META
files for all its libraries by default (#11007 and #11399). Various other bits of 5.0-related poking include a deprecation to allow the possibility in future for sub-commands to the ocaml
command. Instead of ocaml script
, one should now write ocaml ./script
(#11253). The compiler's bootstrap process (this is the mechanism which updates the initial compiler in boot/ocamlc
which is used to build the compiler "from cold") is now reproducible (#11149), easing the review process for pull requests which need to include changes to the boot compiler. Previously we required a core developer to re-do the bootstrap and then separately merge the work, where now a core developer can merely pull the branch and check that the committed artefact is reproducible.
Looking beyond the release of OCaml 5.0, I've also been working to resurrect the disabled Cygwin port of OCaml (#11642) and, more importantly, getting the MSVC native Windows port working again (that's still WIP!).
At the OCaml Workshop this year, I demonstrated my "relocatable compiler" project, which aims both to eliminate various kinds of "papercut" when using bytecode executables but, much more importantly, allows compiler installations to be copied to new locations and still work, which paves the way for faster creation of opam switches everywhere. It was great to be able to meet so many people in person in Slovenia for the first in-person workshop since 2019, but unfortunately that came at the cost of catching COVID, which has slowed me down for the weeks since! The next stage for the relocatable compiler is to have an opam remote which can be added to allow opt-in testing of it with OCaml 4.08-5.0 and then to start opening pull requests hopefully for inclusion of the required changes in OCaml 5.1 or 5.2.
@sadiqj Sadiq Jaffer
The bulk of my upstream work over the last year in OCaml 5.0 has been on Runtime Events, a new tracing and metrics system that sits inside the OCaml runtime. The initial PR can be found at #10964 and there was a separate documentation PR in #11349. Lucas Pluvinage has followed up with PR #11474 which adds custom application events to Runtime Events and I hope isn't too far off merging. We gave a talk at the OCaml Users and Developers workshop on Runtime Events and I'm hoping there will be a video up on that soon.
@garrigue Jacques Garrigue
We have continued our work on refactoring the type checker for clarity and abstraction.
An interesting result was PR #11027: separate typing of counter-examples from Typecore.type_pat
. Namely, around 2015 type_pat
was converted to CPS style to allow a more refined way to check the exhaustiveness of GADT pattern-matching. A few more changes made the code more and more complex, but last year in #10311 we could extract a lot of code as case-specific constraints. This in turn made possible separating type_pat
into two functions: type_pat
itself, only used to type patterns in the source program, which doesn't need backtracking, and check_counter_example_pat
, a much simpler function which works on counter examples generated by the exhaustiveness checker.
I have also added a -safer-matching
flag for people who don't want the correctness of compilation to depend on the subtle typing arguments involved in this analysis (#10834).
In another direction, we have reorganized the representation of type parameter variances, to make the lattice involved more explicit (#11018).
We have a few PRs waiting for merging: #11536 introduces some wrapper functions for level management in types, #11569 removes the encoding used to represent the path of hash-types associated with a class, as it was not used in any meaningful way.
There are also a large number of bug fixes (#10738, #10823, #10959, #11109, #11340, #11648). The most interesting of them is #11648, which extends type_desc
to allow keeping expansions inside types. This is needed to fix a number of bugs, including #9314, but the change is invasive, and reviewing may take a while.
Coqgen, the Coq backend (previously named ocaml_in_coq), is still progressing, with the addition of constructs such as loops, lazy values and exceptions Coqgen, and we are trying to include GADTs in a comprehensive way.
@gasche Gabriel Scherer
@nojb Nicolás Ojeda Bär has done very nice work on using tail-modulo-cons for some List functions of the standard library, which I helped review along with Xavier Leroy:
- #11362: List.map, List.mapi, List.map2
- #11402: List.init, List.filter, List.filteri, List.filter_map
Some of those functions were hand-optimized to use non-tail-rec code on small inputs. Nicolás' micro-benchmarks showed that often the TMC-transformed version was a bit slower on very small lists, up to 20% slower on lists of less than five elements. We wanted to preserve the performance of the existing code exactly, so we did some manual unrollling in places. (The code is a bit less readable than the obvious versions, but much more readable than was there before.)
I worked on fixing a 5.0 performance regression for bytecode-compiled programs ( #11337 ). I started with the intuition that the overhead came from having a concurrent skip list in the 5.x runtime instead of a non-concurrent skip list in the 4.x runtime, and wrote tricky code to use a sequential skip list again. Soon I found out that the performance regression was due to something completely different and had to dive into the minor-root-scanning code.
When I started looking at the multicore runtime, I had no idea how to print a backtrace from a C segfault without using valgrind
. I wrote some documentation on debugging in runtime/HACKING.adoc in the hope of helping other people.
I spent some time reading lambda/switch.ml
, which compiles shallow-match-on-constructor-tags into conditionals and jump tables. The file contains some references to research papers from the 90s, but it was unclear to me how they connected to the implementation. After a nice discussion with Luc Maranget I could propose a documentation PR #11446 to explain this in the source code itself. Thanks to Vincent Laviron for the nice review -- as always.
@gadmm Guillaume Munch-Maccagnoni
(written by @gasche)
Guillaume worked on updating the "GC timing hooks" and caml_scan_roots_hook
of the OCaml runtime to be multicore-safe, and added a new hook caml_domain_terminated_hook
. (#10965, #11209) We rely on runtime hooks in our experimental boxroot library, and updating hooks for 5.0 was necessary to have a correct 5.0 version of boxroots.
Also related to our boxroot experiments, Guillaume wanted an efficient way to check whether a domain thread was currently holding its runtime lock -- it does not when executing long non-OCaml computations that do not access the OCaml runtime. Guillaume changed the Caml_state
macro to provoke an error when accessing the domain state without holding the domain runtime lock -- a programming mistake that could easily go unnoticed before in simple testing and crash in hard-to-debug ways on different inputs -- and introduced a new Caml_state_opt
macro that is NULL when the runtime lock is not held. (#11138, #11272, #11506).
Guillaume worked on quality treatment of asynchronous actions in the new Multicore runtime. (#10915, #11039, #11057, #11095, #11190). Asynchronous actions are callbacks called by an OCaml program by the environment (instead of an explicit request by the programmer at the point they happen). They include for example signal handlers, finalizers called by the GC, Statmemprof callbacks. Supporting them well requires tricky code, because the runtime must ensure that such actions are executed promptly, but in a context where running OCaml code is safe. (For example it is easy to have bugs where a asynchronous action raises an exception in the middle of a runtime function that is not exception-safe.) The 4.x runtime had a lot of asynchronous-action fixes between 4.10 and 4.14, but sadly many of these improvements were not backported in the Multicore branch (they required expert adaptation to a very different runtime codebase), and were thus lost in the Multicore merge. The present work tries to come back to a good state for 5.0 and 5.1 -- some of the fixes were unfortunately not merged in time for 5.0. Statmemprof support is currently disabled for 5.x, and this work will also be useful for Statmemprof.