OCaml Compiler - October 2021

I’m happy to publish the fourth issue of the “OCaml compiler development newsletter”. (This is by no means exhaustive: many people didn’t end up having the time to write something, and it’s fine.)

Feel free of course to comment or ask questions!

If you have been working on the OCaml compiler and want to say something, please feel free to post in this thread! If you would like me to get in touch next time I prepare a newsletter issue (some random point in the future), please let me know by email at (gabriel.scherer at gmail).

Previous issues:


October 2021 was a special month for some of us, as it was the last month before the Sequential Glaciation -- a multi-months freeze on all features not related to Multicore, to facilitate Multicore integration.

Xavier Leroy (@xavierleroy)

Knowing that winter is coming, I tied some loose ends in preparation for release 4.14, including more deprecation warnings #10675, proper termination of signal handling #10726, and increasing the native stack size limit when the operating system allows #10736. The latter should mitigate the problem of “Stack Overflow” crashing non-tail-recursive code for large inputs that hit operating-system restrictions.

I also worked on reimplementing the Random standard library module using more modern pseudo-random number generation (PRNG) algorithms. In RFC#28, Gabriel Scherer proposed to change the random-number generation algorithm of the standard library Random module to be "splittable", to offer better behavior in a Multicore world. ("Splitting" a random-number generator state gives two separate states that supposedly produce independent streams of random numbers; few RNG algorithms support splitting, and its theory is not well-understood.)

My first proposal was based on the Xoshiro256++ PRNG, which is fast and statistically strong: #10701. However, Xoshiro does not support full splitting, only a limited form called "jumping", and the discussion showed that jumping was not enough. Then a miracle happened: at exactly the same time (OOPSLA conference in october 2021), Steele and Vigna proposed LXM, a family of PRNGs that have all the nice properties of Xoshiro and support full splitting. I promptly reimplemented the Random module using LXM #10742, and I find the result very nice. I hope this implementation will be selected to replace the existing Random module.

Tail-recursion modulo constructors

Gabriel Scherer (@gasche) finished working on the TMC (Tail modulo constructor) PR (#9760) in time for the glaciation deadline, thanks to a well-placed full-day meeting with Pierre Chambart (@chambart), who had done the last review of the work. They managed to get something that we both liked, and the feature is now merged upstream.

Note that this is the continuation of the TRMC work started by Frédéric Bour (@let-def) in #181 in May 2015 (also with major contributions from Basile Clément (@Elarnon)); this merge closed one of the longest-open development threads for the OCaml compiler.

One may now write:

let[@tail_mod_cons] rec map f = function
| [] -> []
| x::xs -> f x :: (map[@tailcall]) f xs

and get an efficient tail-recursive definition of map.

A section of the manual is in progress to describe the feature: #10740.

(On the other hand, there was no progress on the constructor-unboxing work, which will have to wait for 5.0.)

Progress on native code emission and linking

As part of RFC#15: Fast native toplevel using JIT, there was a batch of small changes on native-code emission and linking, and on the native toplevel proposed by @NathanRebours and David @dra27: #10690, #10714, #10715.

Module shapes for easier tooling

Ulysse Gérard, Thomas Refis and Leo White proposed a new program analysis within the OCaml compiler, designed to help external tools understand the structure of implementation files (implementations of OCaml modules), in particular to implement the "locate definition" function -- which is non-trivial in presence of include, open, etc.

The result of their analysis is a "shape" describing the items (values, types, etc.) of a module in an easy-to-process yet richly-structured form.

Florian Angeletti (@Octachron) allowed to merge this PR thanks to his excellent review work, running against the Glaciation deadline.

(The authors of the PR initially wanted to add new kinds of compilation artifacts for OCaml compilation units to store shape information in .cms and .cmsi files, instead of the too-large .cmt files. People were grumpy about it, so this part was left out for now.)

UTF decoding and validation support in the Stdlib

In #10710 support for UTF decoding and validation was added by Daniel Bünzli (@dbuenzli), a long-standing missing feature of the standard library. The API was carefully designed to avoid allocations and exceptions while providing an easy-to-use decoding interface.

Convenience functions for Seq.t

The type Seq.t of on-demand (but non-memoized) sequences of values was contributed by Simon Cruanes (@c-cube) in 2017, with only a minimal set of function, and increased slowly since. A large import of >40 functions was completed just in time before the glacation by François Potter (@fpottier) and Simon, thanks to reviews by @gasche, @dbuenzli and many others. This is work that started in February 2020 thanks to issue #9312 from Yawar Amin.

Behold:

val is_empty : 'a t -> bool
val uncons : 'a t -> ('a * 'a t) option
val length : 'a t -> int
val iter : ('a -> unit) -> 'a t -> unit
val fold_left : ('a -> 'b -> 'a) -> 'a -> 'b t -> 'a
val iteri : (int -> 'a -> unit) -> 'a t -> unit
val fold_lefti : (int -> 'b -> 'a -> 'b) -> 'b -> 'a t -> 'b
val for_all : ('a -> bool) -> 'a t -> bool
val exists : ('a -> bool) -> 'a t -> bool
val find : ('a -> bool) -> 'a t -> 'a option
val find_map : ('a -> 'b option) -> 'a t -> 'b option
val iter2 : ('a -> 'b -> unit) -> 'a t -> 'b t -> unit
val fold_left2 : ('a -> 'b -> 'c -> 'a) -> 'a -> 'b t -> 'c t -> 'a
val for_all2 : ('a -> 'b -> bool) -> 'a t -> 'b t -> bool
val exists2 : ('a -> 'b -> bool) -> 'a t -> 'b t -> bool
val equal : ('a -> 'b -> bool) -> 'a t -> 'b t -> bool
val compare : ('a -> 'b -> int) -> 'a t -> 'b t -> int
val init : int -> (int -> 'a) -> 'a t
val unfold : ('b -> ('a * 'b) option) -> 'b -> 'a t
val repeat : 'a -> 'a t
val forever : (unit -> 'a) -> 'a t
val cycle : 'a t -> 'a t
val iterate : ('a -> 'a) -> 'a -> 'a t
val mapi : (int -> 'a -> 'b) -> 'a t -> 'b t
val scan : ('b -> 'a -> 'b) -> 'b -> 'a t -> 'b t
val take : int -> 'a t -> 'a t
val drop : int -> 'a t -> 'a t
val take_while : ('a -> bool) -> 'a t -> 'a t
val drop_while : ('a -> bool) -> 'a t -> 'a t
val group : ('a -> 'a -> bool) -> 'a t -> 'a t t
val memoize : 'a t -> 'a t
val once : 'a t -> 'a t
val transpose : 'a t t -> 'a t t
val append : 'a t -> 'a t -> 'a t
val zip : 'a t -> 'b t -> ('a * 'b) t
val map2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t
val interleave : 'a t -> 'a t -> 'a t
val sorted_merge : ('a -> 'a -> int) -> 'a t -> 'a t -> 'a t
val product : 'a t -> 'b t -> ('a * 'b) t
val map_product : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t
val unzip : ('a * 'b) t -> 'a t * 'b t
val split : ('a * 'b) t -> 'a t * 'b t
val partition_map : ('a -> ('b, 'c) Either.t) -> 'a t -> 'b t * 'c t
val partition : ('a -> bool) -> 'a t -> 'a t * 'a t
val of_dispenser : (unit -> 'a option) -> 'a t
val to_dispenser : 'a t -> (unit -> 'a option)
val ints : int -> int t

A few of the nice contributions from new contributors we received

Dong An (@kirisky) finished a left-open PR from Anukriti Kumar (#9398, #10666) to complete the documentation of the OCAMLRUNPARAM variable.

Dong An also improved the README description of which C compiler should be available on MacOS or Windows to build the compiler codebase: #10685.

Thanks to Wiktor Kuchta, the ocaml toplevel now shows a tip at startup about the #help directive to get help: #10527. (Wiktor is not really a "new" contributor anymore, with many nice contributions over the last few months.)

While we are at it, a PR from @sonologico, proposed in May 2020, was merged just a few months ago (#9621). It changes the internal build system for the ocamldebug debugger to avoid module-name clashes when linking user-defined printing code. Most of the delay came from maintainers arguing over which of the twelve name-conflict-avoidance hacks^Wfeatures should be used.