Flambda is the term used to describe a series of optimisation passes provided by the native code compilers as of OCaml 4.03.
Flambda aims to make it easier to write idiomatic OCaml code without incurring performance penalties.
To use the Flambda optimisers it is necessary to pass the -flambda option to the OCaml configure script. (There is no support for a single compiler that can operate in both Flambda and non-Flambda modes.) Code compiled with Flambda cannot be linked into the same program as code compiled without Flambda. Attempting to do this will result in a compiler error.
Whether or not a particular ocamlopt uses Flambda may be determined by invoking it with the -config option and looking for any line starting with “flambda:”. If such a line is present and says “true”, then Flambda is supported, otherwise it is not.
Flambda provides full optimisation across different compilation units, so long as the .cmx files for the dependencies of the unit currently being compiled are available. (A compilation unit corresponds to a single .ml source file.) However it does not yet act entirely as a whole-program compiler: for example, elimination of dead code across a complete set of compilation units is not supported.
Optimisation with Flambda is not currently supported when generating bytecode.
Flambda should not in general affect the semantics of existing programs. Two exceptions to this rule are: possible elimination of pure code that is being benchmarked (see section 21.14) and changes in behaviour of code using unsafe operations (see section 21.15).
Flambda does not yet optimise array or string bounds checks. Neither does it take hints for optimisation from any assertions written by the user in the code.
Consult the Glossary at the end of this chapter for definitions of technical terms used below.
The Flambda optimisers provide a variety of command-line flags that may be used to control their behaviour. Detailed descriptions of each flag are given in the referenced sections. Those sections also describe any arguments which the particular flags take.
Commonly-used options:
Less commonly-used options:
Advanced options, only needed for detailed tuning:
Flambda operates in rounds: one round consists of a certain sequence of transformations that may then be repeated in order to achieve more satisfactory results. The number of rounds can be set manually using the -rounds parameter (although this is not necessary when using predefined optimisation levels such as with -O2 and -O3). For high optimisation the number of rounds might be set at 3 or 4.
Command-line flags that may apply per round, for example those with -cost in the name, accept arguments of the form:
The flags -Oclassic, -O2 and -O3 are applied before all other flags, meaning that certain parameters may be overridden without having to specify every parameter usually invoked by the given optimisation level.
Inlining refers to the copying of the code of a function to a place where the function is called. The code of the function will be surrounded by bindings of its parameters to the corresponding arguments.
The aims of inlining are:
These goals are often reached not just by inlining itself but also by other optimisations that the compiler is able to perform as a result of inlining.
When a recursive call to a function (within the definition of that function or another in the same mutually-recursive group) is inlined, the procedure is also known as unrolling. This is somewhat akin to loop peeling. For example, given the following code:
let rec fact x = if x = 0 then 1 else x * fact (x - 1) let n = fact 4
unrolling once at the call site fact 4 produces (with the body of fact unchanged):
let n = if 4 = 0 then 1 else 4 * fact (4 - 1)
This simplifies to:
let n = 4 * fact 3
Flambda provides significantly enhanced inlining capabilities relative to previous versions of the compiler.
Inlining is performed together with all of the other Flambda optimisation passes, that is to say, after closure conversion. This has three particular advantages over a potentially more straightforward implementation prior to closure conversion:
In -Oclassic mode the behaviour of the Flambda inliner mimics previous versions of the compiler. (Code may still be subject to further optimisations not performed by previous versions of the compiler: functors may be inlined, constants are lifted and unused code is eliminated all as described elsewhere in this chapter. See sections 21.3.3, 21.8.1 and 21.10. At the definition site of a function, the body of the function is measured. It will then be marked as eligible for inlining (and hence inlined at every direct call site) if:
Non-Flambda versions of the compiler cannot inline functions that contain a definition of another function. However -Oclassic does permit this. Further, non-Flambda versions also cannot inline functions that are only themselves exposed as a result of a previous pass of inlining, but again this is permitted by -Oclassic. For example:
module M : sig val i : int end = struct let f x = let g y = x + y in g let h = f 3 let i = h 4 (* h is correctly discovered to be g and inlined *) end
All of this contrasts with the normal Flambda mode, that is to say without -Oclassic, where:
The Flambda mode is described in the next section.
The Flambda inlining heuristics, used whenever the compiler is configured for Flambda and -Oclassic was not specified, make inlining decisions at call sites. This helps in situations where the context is important. For example:
let f b x = if b then x else ... big expression ... let g x = f true x
In this case, we would like to inline f into g, because a conditional jump can be eliminated and the code size should reduce. If the inlining decision has been made after the declaration of f without seeing the use, its size would have probably made it ineligible for inlining; but at the call site, its final size can be known. Further, this function should probably not be inlined systematically: if b is unknown, or indeed false, there is little benefit to trade off against a large increase in code size. In the existing non-Flambda inliner this isn’t a great problem because chains of inlining were cut off fairly quickly. However it has led to excessive use of overly-large inlining parameters such as -inline 10000.
In more detail, at each call site the following procedure is followed:
Inlining within recursive functions of calls to other functions in the same mutually-recursive group is kept in check by an unrolling depth, described below. This ensures that functions are not unrolled to excess. (Unrolling is only enabled if -O3 optimisation level is selected and/or the -inline-max-unroll flag is passed with an argument greater than zero.)
There is nothing particular about functors that inhibits inlining compared to normal functions. To the inliner, these both look the same, except that functors are marked as such.
Applications of functors at toplevel are biased in favour of inlining. (This bias may be adjusted: see the documentation for -inline-lifting-benefit below.)
Applications of functors not at toplevel, for example in a local module inside some other expression, are treated by the inliner identically to normal function calls.
The inliner will be able to consider inlining a call to a function in a first class module if it knows which particular function is going to be called. The presence of the first-class module record that wraps the set of functions in the module does not per se inhibit inlining.
Method calls to objects are not at present inlined by Flambda.
If the -inlining-report option is provided to the compiler then a file will be emitted corresponding to each round of optimisation. For the OCaml source file basename.ml the files are named basename.round.inlining.org, with round a zero-based integer. Inside the files, which are formatted as “org mode”, will be found English prose describing the decisions that the inliner took.
Inlining typically results in an increase in code size, which if left unchecked, may not only lead to grossly large executables and excessive compilation times but also a decrease in performance due to worse locality. As such, the Flambda inliner trades off the change in code size against the expected runtime performance benefit, with the benefit being computed based on the number of operations that the compiler observes may be removed as a result of inlining.
For example given the following code:
let f b x = if b then x else ... big expression ... let g x = f true x
it would be observed that inlining of f would remove:
Formally, an estimate of runtime performance benefit is computed by first summing the cost of the operations that are known to be removed as a result of the inlining and subsequent simplification of the inlined body. The individual costs for the various kinds of operations may be adjusted using the various -inline-...-cost flags as follows. Costs are specified as integers. All of these flags accept a single argument describing such integers using the conventions detailed in section 21.2.1.
(Default values are described in section 21.5 below.)
The initial benefit value is then scaled by a factor that attempts to compensate for the fact that the current point in the code, if under some number of conditional branches, may be cold. (Flambda does not currently compute hot and cold paths.) The factor—the estimated probability that the inliner really is on a hot path—is calculated as 1/(1 + f)d, where f is set by -inline-branch-factor and d is the nesting depth of branches at the current point. As the inliner descends into more deeply-nested branches, the benefit of inlining thus lessens.
The resulting benefit value is known as the estimated benefit.
The change in code size is also estimated: morally speaking it should be the change in machine code size, but since that is not available to the inliner, an approximation is used.
If the estimated benefit exceeds the increase in code size then the inlined version of the function will be kept. Otherwise the function will not be inlined.
Applications of functors at toplevel will be given an additional benefit (which may be controlled by the -inline-lifting-benefit flag) to bias inlining in such situations towards keeping the inlined version.
As described above, there are three parameters that restrict the search for inlining opportunities during speculation:
These parameters are ultimately bounded by the arguments provided to the corresponding command-line flags (or their default values):
Note in particular that -inline does not have the meaning that it has in the previous compiler or in -Oclassic mode. In both of those situations -inline was effectively some kind of basic assessment of inlining benefit. However in Flambda inlining mode it corresponds to a constraint on the search; the assessment of benefit is independent, as described above.
When speculation starts the inlining threshold starts at the value set by -inline (or -inline-toplevel if appropriate, see above). Upon making a speculative inlining decision the threshold is reduced by the code size of the function being inlined. If the threshold becomes exhausted, at or below zero, no further speculation will be performed.
The inlining depth starts at zero and is increased by one every time the inliner descends into another function. It is then decreased by one every time the inliner leaves such function. If the depth exceeds the value set by -inline-max-depth then speculation stops. This parameter is intended as a general backstop for situations where the inlining threshold does not control the search sufficiently.
The unrolling depth applies to calls within the same mutually-recursive group of functions. Each time an inlining of such a call is performed the depth is incremented by one when examining the resulting body. If the depth reaches the limit set by -inline-max-unroll then speculation stops.
The inliner may discover a call site to a recursive function where something is known about the arguments: for example, they may be equal to some other variables currently in scope. In this situation it may be beneficial to specialise the function to those arguments. This is done by copying the declaration of the function (and any others involved in any same mutually-recursive declaration) and noting the extra information about the arguments. The arguments augmented by this information are known as specialised arguments. In order to try to ensure that specialisation is not performed uselessly, arguments are only specialised if it can be shown that they are invariant: in other words, during the execution of the recursive function(s) themselves, the arguments never change.
Unless overridden by an attribute (see below), specialisation of a function will not be attempted if:
The compiler can prove invariance of function arguments across multiple functions within a recursive group (although this has some limitations, as shown by the example below).
It should be noted that the unboxing of closures pass (see below) can introduce specialised arguments on non-recursive functions. (No other place in the compiler currently does this.)
This function might be written like so:
let rec iter f l = match l with | [] -> () | h :: t -> f h; iter f t
and used like this:
let print_int x = print_endline (Int.to_string x) let run xs = iter print_int (List.rev xs)
The argument f to iter is invariant so the function may be specialised:
let run xs = let rec iter' f l = (* The compiler knows: f holds the same value as foo throughout iter'. *) match l with | [] -> () | h :: t -> f h; iter' f t in iter' print_int (List.rev xs)
The compiler notes down that for the function iter’, the argument f is specialised to the constant closure print_int. This means that the body of iter’ may be simplified:
let run xs = let rec iter' f l = (* The compiler knows: f holds the same value as foo throughout iter'. *) match l with | [] -> () | h :: t -> print_int h; (* this is now a direct call *) iter' f t in iter' print_int (List.rev xs)
The call to print_int can indeed be inlined:
let run xs = let rec iter' f l = (* The compiler knows: f holds the same value as foo throughout iter'. *) match l with | [] -> () | h :: t -> print_endline (Int.to_string h); iter' f t in iter' print_int (List.rev xs)
The unused specialised argument f may now be removed, leaving:
let run xs = let rec iter' l = match l with | [] -> () | h :: t -> print_endline (Int.to_string h); iter' t in iter' (List.rev xs)
The compiler cannot currently detect invariance in cases such as the following.
let rec iter_swap f g l = match l with | [] -> () | 0 :: t -> iter_swap g f l | h :: t -> f h; iter_swap f g t
The benefit of specialisation is assessed in a similar way as for inlining. Specialised argument information may mean that the body of the function being specialised can be simplified: the removed operations are accumulated into a benefit. This, together with the size of the duplicated (specialised) function declaration, is then assessed against the size of the call to the original function.
The default settings (when not using -Oclassic) are for one round of optimisation using the following parameters.
Parameter | Setting |
-inline | 10 |
-inline-branch-factor | 0.1 |
-inline-alloc-cost | 7 |
-inline-branch-cost | 5 |
-inline-call-cost | 5 |
-inline-indirect-cost | 4 |
-inline-prim-cost | 3 |
-inline-lifting-benefit | 1300 |
-inline-toplevel | 160 |
-inline-max-depth | 1 |
-inline-max-unroll | 0 |
-unbox-closures-factor | 10 |
When -O2 is specified two rounds of optimisation are performed. The first round uses the default parameters (see above). The second uses the following parameters.
Parameter | Setting |
-inline | 25 |
-inline-branch-factor | Same as default |
-inline-alloc-cost | Double the default |
-inline-branch-cost | Double the default |
-inline-call-cost | Double the default |
-inline-indirect-cost | Double the default |
-inline-prim-cost | Double the default |
-inline-lifting-benefit | Same as default |
-inline-toplevel | 400 |
-inline-max-depth | 2 |
-inline-max-unroll | Same as default |
-unbox-closures-factor | Same as default |
When -O3 is specified three rounds of optimisation are performed. The first two rounds are as for -O2. The third round uses the following parameters.
Parameter | Setting |
-inline | 50 |
-inline-branch-factor | Same as default |
-inline-alloc-cost | Triple the default |
-inline-branch-cost | Triple the default |
-inline-call-cost | Triple the default |
-inline-indirect-cost | Triple the default |
-inline-prim-cost | Triple the default |
-inline-lifting-benefit | Same as default |
-inline-toplevel | 800 |
-inline-max-depth | 3 |
-inline-max-unroll | 1 |
-unbox-closures-factor | Same as default |
Should the inliner prove recalcitrant and refuse to inline a particular function, or if the observed inlining decisions are not to the programmer’s satisfaction for some other reason, inlining behaviour can be dictated by the programmer directly in the source code. One example where this might be appropriate is when the programmer, but not the compiler, knows that a particular function call is on a cold code path. It might be desirable to prevent inlining of the function so that the code size along the hot path is kept smaller, so as to increase locality.
The inliner is directed using attributes. For non-recursive functions (and one-step unrolling of recursive functions, although @unroll is more clear for this purpose) the following are supported:
For recursive functions the relevant attributes are:
A compiler warning will be emitted if it was found impossible to obey an annotation from an @inlined or @specialised attribute.
module F (M : sig type t end) = struct let[@inline never] bar x = x * 3 let foo x = (bar [@inlined]) (42 + x) end [@@inline never] module X = F [@inlined] (struct type t = int end)
Simplification, which is run in conjunction with inlining, propagates information (known as approximations) about which variables hold what values at runtime. Certain relationships between variables and symbols are also tracked: for example, some variable may be known to always hold the same value as some other variable; or perhaps some variable may be known to always hold the value pointed to by some symbol.
The propagation can help to eliminate allocations in cases such as:
let f x y = ... let p = x, y in ... ... (fst p) ... (snd p) ...
The projections from p may be replaced by uses of the variables x and y, potentially meaning that p becomes unused.
The propagation performed by the simplification pass is also important for discovering which functions flow to indirect call sites. This can enable the transformation of such call sites into direct call sites, which makes them eligible for an inlining transformation.
Note that no information is propagated about the contents of strings, even in safe-string mode, because it cannot yet be guaranteed that they are immutable throughout a given program.
Expressions found to be constant will be lifted to symbol bindings—that is to say, they will be statically allocated in the object file—when they evaluate to boxed values. Such constants may be straightforward numeric constants, such as the floating-point number 42.0, or more complicated values such as constant closures.
Lifting of constants to toplevel reduces allocation at runtime.
The compiler aims to share constants lifted to toplevel such that there are no duplicate definitions. However if .cmx files are hidden from the compiler then maximal sharing may not be possible.
The following language semantics apply specifically to constant float arrays. (By “constant float array” is meant an array consisting entirely of floating point numbers that are known at compile time. A common case is a literal such as [| 42.0; 43.0; |].
Toplevel let-expressions may be lifted to symbol bindings to ensure that the corresponding bound variables are not captured by closures. If the defining expression of a given binding is found to be constant, it is bound as such (the technical term is a let-symbol binding).
Otherwise, the symbol is bound to a (statically-allocated) preallocated block containing one field. At runtime, the defining expression will be evaluated and the first field of the block filled with the resulting value. This initialise-symbol binding causes one extra indirection but ensures, by virtue of the symbol’s address being known at compile time, that uses of the value are not captured by closures.
It should be noted that the blocks corresponding to initialise-symbol bindings are kept alive forever, by virtue of them occurring in a static table of GC roots within the object file. This extended lifetime of expressions may on occasion be surprising. If it is desired to create some non-constant value (for example when writing GC tests) that does not have this extended lifetime, then it may be created and used inside a function, with the application point of that function (perhaps at toplevel)—or indeed the function declaration itself—marked as to never be inlined. This technique prevents lifting of the definition of the value in question (assuming of course that it is not constant).
The transformations in this section relate to the splitting apart of boxed (that is to say, non-immediate) values. They are largely intended to reduce allocation, which tends to result in a runtime performance profile with lower variance and smaller tails.
This transformation is enabled unless -no-unbox-free-vars-of-closures is provided.
Variables that appear in closure environments may themselves be boxed values. As such, they may be split into further closure variables, each of which corresponds to some projection from the original closure variable(s). This transformation is called unboxing of closure variables or unboxing of free variables of closures. It is only applied when there is reasonable certainty that there are no uses of the boxed free variable itself within the corresponding function bodies.
In the following code, the compiler observes that the closure returned from the function f contains a variable pair (free in the body of f) that may be split into two separate variables.
let f x0 x1 = let pair = x0, x1 in Printf.printf "foo\n"; fun y -> fst pair + snd pair + y
After some simplification one obtains:
let f x0 x1 = let pair_0 = x0 in let pair_1 = x1 in Printf.printf "foo\n"; fun y -> pair_0 + pair_1 + y
and then:
let f x0 x1 = Printf.printf "foo\n"; fun y -> x0 + x1 + y
The allocation of the pair has been eliminated.
This transformation does not operate if it would cause the closure to contain more than twice as many closure variables as it did beforehand.
This transformation is enabled unless -no-unbox-specialised-args is provided.
It may become the case during compilation that one or more invariant arguments to a function become specialised to a particular value. When such values are themselves boxed the corresponding specialised arguments may be split into more specialised arguments corresponding to the projections out of the boxed value that occur within the function body. This transformation is called unboxing of specialised arguments. It is only applied when there is reasonable certainty that the boxed argument itself is unused within the function.
If the function in question is involved in a recursive group then unboxing of specialised arguments may be immediately replicated across the group based on the dataflow between invariant arguments.
Having been given the following code, the compiler will inline loop into f, and then observe inv being invariant and always the pair formed by adding 42 and 43 to the argument x of the function f.
let rec loop inv xs = match xs with | [] -> fst inv + snd inv | x::xs -> x + loop2 xs inv and loop2 ys inv = match ys with | [] -> 4 | y::ys -> y - loop inv ys let f x = Printf.printf "%d\n" (loop (x + 42, x + 43) [1; 2; 3])
Since the functions have sufficiently few arguments, more specialised arguments will be added. After some simplification one obtains:
let f x = let rec loop' xs inv_0 inv_1 = match xs with | [] -> inv_0 + inv_1 | x::xs -> x + loop2' xs inv_0 inv_1 and loop2' ys inv_0 inv_1 = match ys with | [] -> 4 | y::ys -> y - loop' ys inv_0 inv_1 in Printf.printf "%d\n" (loop' [1; 2; 3] (x + 42) (x + 43))
The allocation of the pair within f has been removed. (Since the two closures for loop’ and loop2’ are constant they will also be lifted to toplevel with no runtime allocation penalty. This would also happen without having run the transformation to unbox specialise arguments.)
The transformation to unbox specialised arguments never introduces extra allocation.
The transformation will not unbox arguments if it would result in the original function having sufficiently many arguments so as to inhibit tail-call optimisation.
The transformation is implemented by creating a wrapper function that accepts the original arguments. Meanwhile, the original function is renamed and extra arguments are added corresponding to the unboxed specialised arguments; this new function is called from the wrapper. The wrapper will then be inlined at direct call sites. Indeed, all call sites will be direct unless -unbox-closures is being used, since they will have been generated by the compiler when originally specialising the function. (In the case of -unbox-closures other functions may appear with specialised arguments; in this case there may be indirect calls and these will incur a small penalty owing to having to bounce through the wrapper. The technique of direct call surrogates used for -unbox-closures is not used by the transformation to unbox specialised arguments.)
This transformation is not enabled by default. It may be enabled using the -unbox-closures flag.
The transformation replaces closure variables by specialised arguments. The aim is to cause more closures to become closed. It is particularly applicable, as a means of reducing allocation, where the function concerned cannot be inlined or specialised. For example, some non-recursive function might be too large to inline; or some recursive function might offer no opportunities for specialisation perhaps because its only argument is one of type unit.
At present there may be a small penalty in terms of actual runtime performance when this transformation is enabled, although more stable performance may be obtained due to reduced allocation. It is recommended that developers experiment to determine whether the option is beneficial for their code. (It is expected that in the future it will be possible for the performance degradation to be removed.)
In the following code (which might typically occur when g is too large to inline) the value of x would usually be communicated to the application of the + function via the closure of g.
let f x = let g y = x + y in (g [@inlined never]) 42
Unboxing of the closure causes the value for x inside g to be passed as an argument to g rather than through its closure. This means that the closure of g becomes constant and may be lifted to toplevel, eliminating the runtime allocation.
The transformation is implemented by adding a new wrapper function in the manner of that used when unboxing specialised arguments. The closure variables are still free in the wrapper, but the intention is that when the wrapper is inlined at direct call sites, the relevant values are passed directly to the main function via the new specialised arguments.
Adding such a wrapper will penalise indirect calls to the function (which might exist in arbitrary places; remember that this transformation is not for example applied only on functions the compiler has produced as a result of specialisation) since such calls will bounce through the wrapper. To mitigate this, if a function is small enough when weighed up against the number of free variables being removed, it will be duplicated by the transformation to obtain two versions: the original (used for indirect calls, since we can do no better) and the wrapper/rewritten function pair as described in the previous paragraph. The wrapper/rewritten function pair will only be used at direct call sites of the function. (The wrapper in this case is known as a direct call surrogate, since it takes the place of another function—the unchanged version used for indirect calls—at direct call sites.)
The -unbox-closures-factor command line flag, which takes an integer, may be used to adjust the point at which a function is deemed large enough to be ineligible for duplication. The benefit of duplication is scaled by the integer before being evaluated against the size.
In the following code, there are two closure variables that would typically cause closure allocations. One is called fv and occurs inside the function baz; the other is called z and occurs inside the function bar. In this toy (yet sophisticated) example we again use an attribute to simulate the typical situation where the first argument of baz is too large to inline.
let foo c = let rec bar zs fv = match zs with | [] -> [] | z::zs -> let rec baz f = function | [] -> [] | a::l -> let r = fv + ((f [@inlined never]) a) in r :: baz f l in (map2 (fun y -> z + y) [z; 2; 3; 4]) @ bar zs fv in Printf.printf "%d" (List.length (bar [1; 2; 3; 4] c))
The code resulting from applying -O3 -unbox-closures to this code passes the free variables via function arguments in order to eliminate all closure allocation in this example (aside from any that might be performed inside printf).
The simplification pass removes unused let bindings so long as their corresponding defining expressions have “no effects”. See the section “Treatment of effects” below for the precise definition of this term.
This transformation is analogous to the removal of let-expressions whose defining expressions have no effects. It operates instead on symbol bindings, removing those that have no effects.
This transformation is only enabled by default for specialised arguments. It may be enabled for all arguments using the -remove-unused-arguments flag.
The pass analyses functions to determine which arguments are unused. Removal is effected by creating a wrapper function, which will be inlined at every direct call site, that accepts the original arguments and then discards the unused ones before calling the original function. As a consequence, this transformation may be detrimental if the original function is usually indirectly called, since such calls will now bounce through the wrapper. (The technique of direct call surrogates used to reduce this penalty during unboxing of closure variables (see above) does not yet apply to the pass that removes unused arguments.)
This transformation performs an analysis across the whole compilation unit to determine whether there exist closure variables that are never used. Such closure variables are then eliminated. (Note that this has to be a whole-unit analysis because a projection of a closure variable from some particular closure may have propagated to an arbitrary location within the code due to inlining.)
Flambda performs a simple analysis analogous to that performed elsewhere in the compiler that can transform refs into mutable variables that may then be held in registers (or on the stack as appropriate) rather than being allocated on the OCaml heap. This only happens so long as the reference concerned can be shown to not escape from its defining scope.
This transformation discovers closure variables that are known to be equal to specialised arguments. Such closure variables are replaced by the specialised arguments; the closure variables may then be removed by the “removal of unused closure variables” pass (see below).
The Flambda optimisers classify expressions in order to determine whether an expression:
This is done by forming judgements on the effects and the coeffects that might be performed were the expression to be executed. Effects talk about how the expression might affect the world; coeffects talk about how the world might affect the expression.
Effects are classified as follows:
There is a single classification for coeffects:
It is assumed in the compiler that, subject to data dependencies, expressions with neither effects nor coeffects may be reordered with respect to other expressions.
Compilation of modules that are able to be statically allocated (for example, the module corresponding to an entire compilation unit, as opposed to a first class module dependent on values computed at runtime) initially follows the strategy used for bytecode. A sequence of let-bindings, which may be interspersed with arbitrary effects, surrounds a record creation that becomes the module block. The Flambda-specific transformation follows: these bindings are lifted to toplevel symbols, as described above.
Especially when writing benchmarking suites that run non-side-effecting algorithms in loops, it may be found that the optimiser entirely elides the code being benchmarked. This behaviour can be prevented by using the Sys.opaque_identity function (which indeed behaves as a normal OCaml function and does not possess any “magic” semantics). The documentation of the Sys module should be consulted for further details.
The behaviour of the Flambda simplification pass means that certain unsafe operations, which may without Flambda or when using previous versions of the compiler be safe, must not be used. This specifically refers to functions found in the Obj module.
In particular, it is forbidden to change any value (for example using Obj.set_field or Obj.set_tag) that is not mutable. (Values returned from C stubs are always treated as mutable.) The compiler will emit warning 59 if it detects such a write—but it cannot warn in all cases. Here is an example of code that will trigger the warning:
let f x = let a = 42, x in (Obj.magic a : int ref) := 1; fst a
The reason this is unsafe is because the simplification pass believes that fst a holds the value 42; and indeed it must, unless type soundness has been broken via unsafe operations.
If it must be the case that code has to be written that triggers warning 59, but the code is known to actually be correct (for some definition of correct), then Sys.opaque_identity may be used to wrap the value before unsafe operations are performed upon it. Great care must be taken when doing this to ensure that the opacity is added at the correct place. It must be emphasised that this use of Sys.opaque_identity is only for exceptional cases. It should not be used in normal code or to try to guide the optimiser.
As an example, this code will return the integer 1:
let f x = let a = Sys.opaque_identity (42, x) in (Obj.magic a : int ref) := 1; fst a
However the following code will still return 42:
let f x = let a = 42, x in Sys.opaque_identity (Obj.magic a : int ref) := 1; fst a
High levels of inlining performed by Flambda may expose bugs in code thought previously to be correct. Take care, for example, not to add type annotations that claim some mutable value is always immediate if it might be possible for an unsafe operation to update it to a boxed value.
The following terminology is used in this chapter of the manual.