package qcheck-lin
Install
Dune Dependency
Authors
Maintainers
Sources
md5=cddc14e04ad92fa9434916203c853063
sha512=0d3ddfac0a682b8610a74235e4c1ce2e7edddab291b615e6b2ab50e57a2cfd2e2cdd82891a21e051e2c4cb1fe6a1a2f682b5ff092a61e69584b98beebeb545bc
README.md.html
Multicore tests
Property-based tests of (parts of) the OCaml multicore compiler and run time.
This project contains
a randomized test suite of OCaml 5.x, packaged up in
multicoretests.opam
two reusable testing libraries:
Lin
packaged up inqcheck-lin.opam
andSTM
packaged up inqcheck-stm.opam
All of the above build on QCheck, a black-box, property-based testing library in the style of QuickCheck.
The two libraries are already quite helpful
Installation instructions, and running the tests
The multicore test suite requires OCaml 5.x:
opam update
opam switch create 5.0.0
Installing the libraries
The two testing libraries are available as packages qcheck-lin
and qcheck-stm
from the opam repository. The full versions require OCaml 5.x and reduced, non-Domain
versions are available for OCaml 4.12.x to 4.14.x. They can be installed in the usual way:
opam install qcheck-lin
opam install qcheck-stm
Bleeding edge users can pin
and install the latest main
as follows:
opam pin -y https://github.com/ocaml-multicore/multicoretests.git#main
To use the Lin
library in parallel mode on a Dune project, add the following dependency to your dune rule:
(libraries qcheck-lin.domain)
Using the STM
library in sequential mode requires the dependency (libraries qcheck-stm.sequential)
and the parallel mode similarly requires the dependency (libraries qcheck-stm.domain)
.
Running the test suite
We have not released the test suite on the opam repository at this point. The test suite can be built and run from a clone of this repository with the following commands:
opam install . --deps-only --with-test
dune build
dune runtest -j1 --no-buffer --display=quiet
Individual tests can be run by invoking dune exec
. For example:
$ dune exec src/atomic/stm_tests.exe -- -v
random seed: 51501376
generated error fail pass / total time test name
[✓] 1000 0 0 1000 / 1000 0.2s sequential atomic test
[✓] 1000 0 0 1000 / 1000 180.8s parallel atomic test
================================================================================
success (ran 2 tests)
See src/README.md for an overview of the current PBTs of OCaml 5.x.
It is also possible to run the test suite in the CI, by altering .github/workflows/common.yml to target a particular compiler PR:
COMPILER_REF: 'refs/pull/12345/head'
or a particular branch of a particular fork:
COMPILER_REPO: 'login/ocaml'
COMPILER_REF: 'refs/heads/test-me'
Since ocaml/ocaml#13458 the test suite can be triggered on an ocaml/ocaml PR (or on a fork of it) by adding the run-multicoretests
label.
A Linearization Tester
The Lin
module lets a user test an API for sequential consistency, i.e., it performs a sequence of random commands in parallel, records the results, and checks whether the observed results can be linearized and reconciled with some sequential execution. The library offers an embedded, combinator DSL to describe signatures succinctly. As an example, the required specification to test (a small part of) the Hashtbl
module is as follows:
module HashtblSig =
struct
type t = (char, int) Hashtbl.t
let init () = Hashtbl.create ~random:false 42
let cleanup _ = ()
open Lin
let a,b = char_printable,nat_small
let api =
[ val_ "Hashtbl.add" Hashtbl.add (t @-> a @-> b @-> returning unit);
val_ "Hashtbl.remove" Hashtbl.remove (t @-> a @-> returning unit);
val_ "Hashtbl.find" Hashtbl.find (t @-> a @-> returning_or_exc b);
val_ "Hashtbl.mem" Hashtbl.mem (t @-> a @-> returning bool);
val_ "Hashtbl.length" Hashtbl.length (t @-> returning int); ]
end
module HT = Lin_domain.Make(HashtblSig)
;;
QCheck_base_runner.run_tests_main [
HT.lin_test `Domain ~count:1000 ~name:"Lin Hashtbl test";
]
The first line indicates the type of the system under test along with bindings init
and cleanup
for setting it up and tearing it down. The api
then contains a list of type signature descriptions using combinators unit
, bool
, int
, returning
, returning_or_exc
, ... in the style of Ctypes. The functor Lin_domain.Make
expects a description of the tested commands and outputs a module with a QCheck test lin_test
that performs the linearization test.
The QCheck linearization test iterates a number of test instances. Each instance consists of a "sequential prefix" of calls to the above commands, followed by a spawn
of two parallel Domain
s that each call a sequence of operations. Lin
chooses the individual operations and arguments arbitrarily and records their results. The framework then performs a search for a sequential interleaving of the same calls, and succeeds if it finds one.
Since Hashtbl
s are not safe for parallelism, if you run dune exec doc/example/lin_tests.exe
the output can produce the following output, where each tested command is annotated with its result:
Messages for test Lin Hashtbl test:
Results incompatible with sequential execution
|
|
.------------------------------------.
| |
Hashtbl.add t 'a' 0 : () Hashtbl.add t 'a' 0 : ()
Hashtbl.length t : 1 Hashtbl.length t : 1
In this case, the test tells us that there is no sequential interleaving of these calls which would return 1
from both calls to Hashtbl.length
. For example, in the following sequential interleaving the last call should return 2
:
Hashtbl.add t 'a' 0;;
let res1 = Hashtbl.length t;;
Hashtbl.add t 'a' 0;;
let res2 = Hashtbl.length t;;
See src/atomic/lin_tests.ml for another example of testing the Atomic
module.
A Parallel State-Machine Testing Library
STM
contains a revision of qcstm extended to run parallel state-machine tests akin to Erlang QuickCheck, Haskell Hedgehog, ScalaCheck, .... To do so, the STM
library also performs a sequence of random operations in parallel and records the results. In contrast to Lin
, STM
then checks whether the observed results are linearizable by reconciling them with a sequential execution of a model
description. The model
expresses the intended meaning of each tested command. As such, it requires more of the user compared to Lin
. The corresponding code to describe a Hashtbl
test using STM
is given below:
open QCheck
open STM
(** parallel STM tests of Hashtbl *)
module HashtblModel =
struct
type sut = (char, int) Hashtbl.t
type state = (char * int) list
type cmd =
| Add of char * int
| Remove of char
| Find of char
| Mem of char
| Length [@@deriving show { with_path = false }]
let init_sut () = Hashtbl.create ~random:false 42
let cleanup (_:sut) = ()
let arb_cmd (s:state) =
let char =
if s=[]
then Gen.printable
else Gen.(oneof [oneofl (List.map fst s); printable]) in
let int = Gen.nat in
QCheck.make ~print:show_cmd
(Gen.oneof
[Gen.map2 (fun k v -> Add (k,v)) char int;
Gen.map (fun k -> Remove k) char;
Gen.map (fun k -> Find k) char;
Gen.map (fun k -> Mem k) char;
Gen.return Length; ])
let next_state (c:cmd) (s:state) = match c with
| Add (k,v) -> (k,v)::s
| Remove k -> List.remove_assoc k s
| Find _
| Mem _
| Length -> s
let run (c:cmd) (h:sut) = match c with
| Add (k,v) -> Res (unit, Hashtbl.add h k v)
| Remove k -> Res (unit, Hashtbl.remove h k)
| Find k -> Res (result int exn, protect (Hashtbl.find h) k)
| Mem k -> Res (bool, Hashtbl.mem h k)
| Length -> Res (int, Hashtbl.length h)
let init_state = []
let precond (_:cmd) (_:state) = true
let postcond (c:cmd) (s:state) (res:res) = match c,res with
| Add (_,_), Res ((Unit,_),_)
| Remove _, Res ((Unit,_),_) -> true
| Find k, Res ((Result (Int,Exn),_),r) -> r = (try Ok (List.assoc k s) with Not_found -> Error Not_found)
| Mem k, Res ((Bool,_),r) -> r = List.mem_assoc k s
| Length, Res ((Int,_),r) -> r = List.length s
| _ -> false
end
module HT_seq = STM_sequential.Make(HashtblModel)
module HT_dom = STM_domain.Make(HashtblModel)
;;
QCheck_base_runner.run_tests_main
(let count = 200 in
[HT_seq.agree_test ~count ~name:"Hashtbl test sequential";
HT_dom.agree_test_par ~count ~name:"Hashtbl test parallel"; ])
Again this requires a type sut
for the system under test, and bindings init_sut
and cleanup
for setting it up and tearing it down. The type cmd
describes the tested commands.
The type state = (char * int) list
describes with a pure association list the internal state of a Hashtbl
. The init_state
represents the empty Hashtbl
mode and the state transition function next_state
describes how it changes across each cmd
. For example, Add (k,v)
appends the key-value pair onto the association list.
arb_cmd
is a generator of cmd
s, taking state
as a parameter. This allows for state
-dependent cmd
generation, which we use to increase the chance of producing a Remove 'c'
, Find 'c'
, ... following an Add 'c'
. Internally arb_cmd
uses QCheck combinators Gen.return
, Gen.map
, and Gen.map2
to generate one of 5 different commands.
run
executes the tested cmd
over the sut
and wraps the result up in a result type res
offered by STM
. Combinators unit
, bool
, int
, ... allow to annotate the result with the expected type. postcond
expresses a post-condition by matching the received res
, for a cmd
with the corresponding answer from the model
. For example, this compares the Boolean result r
from Hashtbl.mem
with the result from List.mem_assoc
. Similarly precond
expresses a pre-condition.
The module is phrased as functors:
the functor
STM_sequential.Make
produces a module with a functionagree_test
to test whether the model agrees with thesut
across a sequential run of an arbitrary command sequence andthe functor
STM_domain.Make
produces a module with a functionagree_test_par
which tests in parallel byspawn
ing two domains withDomain
similarly toLin
and searches for a sequential interleaving over the model.
When running the above with the command dune exec doc/example/stm_tests.exe
one may obtain the following output:
Messages for test Hashtbl test parallel:
Results incompatible with linearized model
|
|
.------------------------------------.
| |
(Add ('e', 5268)) : () (Add ('!', 4)) : ()
Length : 1 Length : 1
This illustrates how two hashtable Add
commands may interfere when executed in parallel, leaving only 1
entry in the resulting Hashtbl
- which is not reconcilable with the declarative model description.
The above examples are available from the doc/example directory. The doc directory also contains our 2022 OCaml Workshop paper presenting the project in a bit more detail.
Repeatability Efforts
Both Lin
and STM
perform randomized property-based testing with QCheck. When rerunning a test to shrink/reduce the test input, QCheck thus starts from the same Random
seed to limit non-determinism. This is however not suffient for multicore programs where CPU scheduling and garbage collection may hinder reproducibility.
Lin
and STM
primarily uses test repetition to increase reproducibility and it is sufficient that only a single repetition triggers an issue. Currently repeating a non-deterministic QCheck property can be done in two different ways:
a
repeat
-combinator lets you test a property, e.g., 50 times rather than just 1. (Pro: a failure is found faster, Con: wasted, repetitive testing when there are no failures)a
QCheck
PR extendsTest.make
with a~retries
parameter causing it to only perform repetition during shrinking. (Pro: each test is cheaper so we can run more, Con: more tests are required to trigger a race)
Issues
Replacing blocking functions by non-blocking ones caused deadlocks (new, fixed, runtime)
A recently merged PR replacing blocking functions by unblocking ones caused a regression in the form of deadlocks in the parallel Sys
STM test.
Unboxed Dynarray
STM tests segfaults (new, fixed, runtime)
Early STM
tests of the unboxed Dynarray
PR triggered segfaults caused by mixing flat float arrays with boxed arrays.
Race condition in backup thread logic (new, fixed, runtime)
An assertion error revealed a race condition between two atomic updates underlying the coordination between a spawned domain and its backup thread.
Marking color problem when adopting orphaned Ephemerons (new, runtime)
An assertion error during the upstreaming of a mark-delay improvement, revealed a problem with the marking color of orphaned Ephemerons.
Parallel usage of flush
may trigger Sys_error
exception (new, runtime)
The Out_channel
tests found that flush
may raise a Sys_error
exception when used in parallel with a close
.
Registered bytecode fragments leading to bytecode crashes (new, fixed, runtime)
Both the Gc
and Domain.DLS
tests triggered crashes due to bytecode fragments in callbacks not being properly unregistered, fixed in a separate PR.
Out of date Gc.control
documentation (new, fixed, stdlib)
Tests of the Gc
module revealed that Gc.control
records contain constant zero fields ignored by Gc.set
Out of date Gc.quick_stat
documentation (new, fixed, stdlib)
Tests of the Gc
module revealed that Gc.quick_stat
did not return a record with 4 zero fields as documented
Shared heap assertion failure (known, runtime)
New GC tests offered a simple reproducer for consistently triggering a shared heap assertion error
Unsafe GC interaction in Gc.counters
binding (known, fixed, runtime)
New GC tests spotted an issue with unsafe root registration in Gc.counters
in 5.2.0, already fixed upstream
Assertion error s->running
in backup thread termination (new, fixed, runtime)
Tests of In_channel
would trigger an occasional race in a debug assertion, due to a TOCTOU race for incoming interrupts during backup thread termination
Parallel Dynlink
tests under Windows could deadlock or crash (known, fixed, flexdll)
Tests of Dynlink
on Windows revealed that the underlying FlexDLL was unsafe for parallel usage
Sys.rename
regression under MinGW/MSVC (new, fixed, runtime)
Earlier fixes to bring Windows behaviour closer to other platforms introduced an unfortunate cornercase regression if attempting to Sys.rename
a parent directory to an empty child directory
Regression causing a Cygwin configure to fail (new, fixed, configure)
A configure PR accidentally introduced a regression causing a flexlink test to fail for a Cygwin build
Crash and hangs on MinGW (new, fixed, runtime)
We observed crashes and hangs of the threadomain
test under MinGW, which turned out to be due to unsafe systhread yielding.
Regression on output to closed Out_channel
s (new, fixed, runtime)
While revising out Out_channel
tests we discovered a regression when outputting to a closed Out_channel
Failure to build dune
with trunk (new, fixed, dune)
A change to the OCaml compiler's internals revealed that dune
was not using CAML_INTERNALS
according to the OCaml manual
Hard abort regression on 'failure to create domains' (new, fixed, runtime)
The tests found a regression where a failure to create a domain would trigger an abort rather than an exception
Assertion failures in runtime/domain.c
on trunk (new, fixed, runtime)
A PR merged to trunk
reintroduced off-by-one assertion errors in caml_reset_young_limit
Assertion failure triggered in runtime/memprof.c (new, fixed, runtime)
The thread_joingraph
test triggered an assertion boundary case in caml_memprof_renew_minor_sample
from memprof.c
Assertion boundary case in caml_reset_young_limit
(new, fixed, runtime)
The thread_joingraph
test triggered an assertion boundary case in caml_reset_young_limit
which was too strict
Assertion race condition in install_backup_thread
(new, fixed, runtime)
A repro test case submitted upstream from multicoretests
to the ocaml compiler test suite and two separate multicoretests
all triggered an race condition in install_backup_thread
Float register preservation on ppc64 (new, fixed, codegen)
The sequential Float.Array
STM
test revealed that a float register was not properly preserved on ppc64, sometimes resulting in random float
values appearing
Signal-based overflow on ppc64 crash (new, fixed, codegen)
The sequential STM
tests of Array
, Bytes
, and Float.Array
would trigger segfaults on ppc64
Frame pointer Effect
crashes (new, fixed, codegen)
Negative Lin
Effect
tests exercising exceptions for unhandled Effect
s triggered a crash on a frame pointer switch
s390x Effect
crashes (new, fixed, codegen)
Negative Lin
Effect
tests exercising exceptions for unhandled Effect
s also triggered a crash on the newly restored s390x backend
Sys.rename
behaves differently on corner cases under MingW (new, fixed, stdlib)
Sequential STM
tests targeting Sys.rename
found two corner cases where MingW behaves differently
flexdll
contains a race condition in its handling of errors (new, fixed, flexdll)
Parallel Lin
tests of the Dynlink
module found a race condition in accesses to the global variables storing the last error.
Buffer.add_string
contained a race condition (new, fixed, stdlib)
Parallel STM
tests of the Buffer
module found a segfault, leading to the discovery of an assertion failure revealing a race condition in the add_string
function
Parallel Weak
Hashset
usage may crash the runtime (new, fixed, runtime)
Parallel STM
tests found a combination of Weak
Hashset
functions that may cause the run-time to abort
or segfault
Sys.readdir
on MingW disagrees with Linux behavior (new, fixed, stdlib)
Sequential STM
tests of Sys
showed how Sys.readdir
of a non-existing directory on MingW Windows returns an empty array
, thus disagreeing with the Linux and macOS behavior
seek
on a closed in_channel
may read uninitialized memory (new, fixed, runtime)
A failure of Lin
In_channel
tests revealed that seek
on a closed in_channel
may read uninitialized memory
Parallel usage of Weak
could produce weird values (new, fixed, runtime)
Racing Weak.set
and Weak.get
can in some cases produce strange values
Bytecode interpreter can segfault on unhandled Effect
(new, fixed, runtime)
In seldom cases the tests would trigger a segfault in the bytecode interpreter when treating an unhandled Effect
Ephemeron
can fail assert and abort (new, fixed, runtime)
In some cases (even sequential) the Ephemeron
tests can trigger an assertion failure and abort.
Parallel usage of Bytes.escaped
is unsafe (new, fixed, stdlib)
The Bytes
tests triggered a segfault which turned out to be caused by an unsafe Bytes.escaped
definition.
Infinite loop in caml_scan_stack
on ARM64 (known, fixed, runtime)
The tests triggered an apparent infinite loop on ARM64 while amd64 would complete the tests as expected.
Unsafe Buffer
module (new, fixed, stdlib)
The tests found that the Buffer
module implementation is unsafe under parallel usage - initially described in multicoretests#63.
MacOS segfault (new, fixed, runtime)
The tests found an issue causing a segfault on MacOS.
In_channel
and Out_channel
unsafety (new, fixed, runtime)
The tests found a problem with In_channel
and Out_channel
which could trigger segfaults under parallel usage. For details see issue ocaml-multicore/multicoretests#13 and this ocaml/ocaml#10960 comment.
Cornercase issue in Domainslib
(new, fixed, domainslib)
The tests found an issue in Domainslib.parallel_for_reduce
which would yield the wrong result for empty arrays. As of domainslib#100 the Domainslib
tests have been moved to the Domainslib
repo.
Specification of Lockfree.Ws_deque
(new, fixed, lockfree)
The initial tests of ws_deque
just applied the parallelism property agree_prop_par
. However that is not sufficient, as only the original domain (thread) is allowed to call push
, pop
, ..., while a spawn
ed domain should call only steal
.
A custom, revised property test runs a cmd
prefix, then spawn
s a "stealer domain" with steal
, ... calls, while the original domain performs calls across a broder random selection (push
, pop
, ...). As of lockfree#43 this test has now been moved to the lockfree
repo.
Here is an example output illustrating how size
may return -1
when used in a "stealer domain". The first line in the Failure
section lists the original domain's commands and the second lists the stealer domains commands (Steal
,...). The second Messages
section lists a rough dump of the corresponding return values: RSteal (Some 73)
is the result of Steal
, ... Here it is clear that the spawned domain successfully steals 73, and then observes both a -1
and 0
result from size
depending on timing. Size
should therefore not be considered threadsafe (none of the two papers make any such promises though):
$ dune exec src/ws_deque_test.exe
random seed: 55610855
generated error fail pass / total time test name
[✗] 318 0 1 317 / 10000 2.4s parallel ws_deque test (w/repeat)
--- Failure --------------------------------------------------------------------
Test parallel ws_deque test (w/repeat) failed (8 shrink steps):
Seq.prefix: Parallel procs.:
[] [(Push 73); Pop; Is_empty; Size]
[Steal; Size; Size]
+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Messages for test parallel ws_deque test (w/repeat):
Result observations not explainable by linearized model:
Seq.prefix: Parallel procs.:
[] [RPush; (RPop None); (RIs_empty true); (RSize 0)]
[(RSteal (Some 73)); (RSize -1); (RSize 0)]
================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)
Segfault in Domainslib (known, fixed, domainslib)
Testing Domainslib.Task
s with one dependency and with 2 work pools found a segfault in domainslib. As of domainslib#100 the domainslib/task_one_dep.ml
test in question has been moved to the Domainslib
repo.
Dead-lock in Domainslib (known, fixed, domainslib)
A reported deadlock in domainslib motivated the development of these tests:
https://github.com/ocaml-multicore/domainslib/issues/47
https://github.com/ocaml-multicore/ocaml-multicore/issues/670
The tests domainslib/task_one_dep.ml
and domainslib/task_more_deps.ml
were run with a timeout to prevent deadlocking indefinitely. domainslib/task_one_dep.ml
could trigger one such deadlock. As of domainslib#100 these tests have been moved to the Domainslib
repo.
The test exhibits no non-determistic behaviour when repeating the same tested property from within the QCheck test. However it fails (due to timeout) on the following test input:
$ dune exec -- src/task_one_dep.exe -v
random seed: 147821373
generated error fail pass / total time test name
[✗] 25 0 1 24 / 100 36.2s Task.async/await
--- Failure --------------------------------------------------------------------
Test Task.async/await failed (2 shrink steps):
{ num_domains = 3; length = 6;
dependencies = [|None; (Some 0); None; (Some 1); None; None|] }
================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)
This corresponds to the following program with 3+1 domains and 6 promises. It loops infinitely with both bytecode/native:
...
open Domainslib
(* a simple work item, from ocaml/testsuite/tests/misc/takc.ml *)
let rec tak x y z =
if x > y then tak (tak (x-1) y z) (tak (y-1) z x) (tak (z-1) x y)
else z
let work () =
for _ = 1 to 200 do
assert (7 = tak 18 12 6);
done
let pool = Task.setup_pool ~num_additional_domains:3 ()
let p0 = Task.async pool work
let p1 = Task.async pool (fun () -> work (); Task.await pool p0)
let p2 = Task.async pool work
let p3 = Task.async pool (fun () -> work (); Task.await pool p1)
let p4 = Task.async pool work
let p5 = Task.async pool work
let () = List.iter (fun p -> Task.await pool p) [p0;p1;p2;p3;p4;p5]
let () = Task.teardown_pool pool
This project has been created and is maintained by Tarides.