Page
Library
Module
Module type
Parameter
Class
Class type
Source
NOTE TO POTENTIAL CONTRIBUTORS: reach out so I can adjust my work style -- start using branches for refactoring. Otherwise you face frustration as the code might be broken. Tagged versions of the code are guaranteed to work as well as the given stage of the project permitted.
NEWS: the upcoming version 0.4.0 has significant design changes around the backend API and synchronization, with Cuda streams exposed as virtual devices. Version 0.4.1 will come soon after, with tests for (and small improvements to) mixed precision computation; it will require OCaml 5.2.
OCANNL is sponsored by Ahrefs! Visit the Ahrefs website.
The long-term goal is to provide several "low-level" backends, aiming to seek inspiration from projects such as TinyGrad, TVM, Luminal.
for
loops.The library users can compile any amount of code into a routine (i.e. a compilation unit). The user decides explicitly what the scope of a compilation unit is, by putting together the corresponding code. Depending on the use case:
Tensor axes are split into kinds: batch, input and output. Tensor dimensions have optional labels.
OCANNL has full support for the einsum
notation, integrated with shape inference. Supports static indexing, with a built-in operation to take a slice of the batch axes, integrated with shape inference. Extensible to more static indexing patterns as needs arise.
OCANNL offers two main levels of abstraction.
Differentiable computations, centered around the %op
syntax extension.
%op
stands for "operation", it's meant to express tensors: Tensor.t
, and tensor functions.Plain computations, centered around the %cd
syntax extension. It integrates the arrayjit
backend library with shape inference.
%cd
stands for "code", it's meant to express assignments: Assignments.t
.The support for mixed-precision computations is upcoming.
It's a feature, not a bug!
2*.m
or m*.2
.m
by a constant number, e.g. m*2
, broadcasts the number to the shape of the input axes of the tensor. This results in an output-axes-only tensor (multi-axis-vector) that is the scaled sum over the input axes of the tensor m
.m
, e.g. 2*m
, broadcasts the number to the shape of the output axes of the tensor. This results in a tensor whose inputs are of the same shape as the inputs of m
, and the output shape is 1D (scalar), that is the scaled sum over the output axes of the tensor m
.A possible route to learning OCANNL:
Read the implementation overview:
To use debugging as provided by configuring Utils.settings.debug_log_from_routines <- true
with the cuda
backend, you need to wrap the code scheduling tasks and synchronizing cuda
devices with Utils.capture_stdout_logs
. The reason is that CUDA kernels are allowed to use printf
, but not fprintf
-- the driver dumps the printing buffer of a device to stdout
at certain times (e.g. when synchronizing the device). For an example, see the implementation of Train.example_train_loop
. Specifically, it wraps two sections: the call to Train.parallel_update
, and the body of the returned infer_callback
.
IMPORTANT: due to potential bugs, debug logging from CUDA in complex settings currently only works as intended for very small computation sizes.
This is very tentative.
0.4.1
0.5
0.6
For more details, see CHANGES.
v0.3 shape inference, jitted routines: a major rewrite of the whole project.
v0.2 inching toward GPU:
v0.1 GCCJIT backend:
Gccjit
backend, single and double precision floats, code compiled as a monolithic update step function.OCANNL follows different design choices than OWL. For example:
Some aspects are more centralized in OCANNL than in OWL and form the "infrastructure":
Tensor
implements "putting pieces together".Train
has the optimization "frontend" and utilities.arrayjit
, which may one day become a standalone library: generates the code, performs backend-agnostic optimizations (virtual nodes whose computation is inlined), implements the backends.Some aspects that are more core to OWL are less encapsulated in OCANNL, so it should be more natural to extend them.
Although the project is called ocannl
, the main package is called neural_nets_lib
, to avoid the (opam linter's) complaint that the name can be confused with other packages. This also clarifies that ocannl
is composed of arrayjit
and neural_nets_lib
.
The dependency on ocaml-cudajit
is optional, so you have to install it first to enable the Cuda backend.