package arrayjit

You can search for identifiers within the package.

in-package search v0.2.0

On This Page

OCANNL -- OCaml Compiles Algorithms for Neural Networks Learning
Usage
Upcoming milestones
1. Releases
Why not just use OWL?
Installation

arrayjit
- CHANGES
- README
- Library arrayjit
  - Arrayjit
    
    Assignments
    
    Lazy
    
    Tn
    
    Debug_runtime
    
    Backend_utils
    
    Lazy
    
    Debug_runtime
    
    Types
    
    Variants_of_config
    
    Tn
    
    C_syntax
    
    Backends
    
    Debug_runtime
    
    No_device_backend
    
    Backend
    
    Multicore_backend
    
    Pipes_multicore_backend
    
    Sync_backend
    
    Simple_backend
    
    Simple_no_device_backend
    
    C_device
    
    Cc_backend
    
    Sync_cc_backend
    
    Pipes_cc_backend
    
    Gccjit_device
    
    Gccjit_backend
    
    Sync_gccjit_backend
    
    Pipes_gccjit_backend
    
    Cuda_backend
    
    Cc_backend
    
    Lazy
    
    Debug_runtime
    
    Tn
    
    Cuda_backend
    
    Gcc_backend
    
    Variants_of_config
    
    Indexing
    
    Variants_of_symbol
    
    CompareSymbol
    
    Symbol
    
    Variants_of_axis_index
    
    Low_level
    
    Lazy
    
    Nd
    
    Tn
    
    Debug_runtime
    
    Scope_id
    
    Variants_of_visits
    
    Ndarray
    
    Debug_runtime
    
    A
    
    Ops
    
    Lazy
    
    Rand
    
    Random
    
    Random_for_tests
    
    Lib
    
    Tnode
    
    Lazy
    
    Nd
    
    Debug_runtime
    
    Registry
    
    Utils
    
    Set_O
    
    Debug_runtime
    
    Lazy
    
    Monad_infix
    
    Let_syntax
    
    T_unforcing
    
    Variants_of_mutable_list
- Library arrayjit.ppx_arrayjit
  - Ppx_arrayjit
    
    Ppx_helper
- Sources
  - arrayjit
    
    arrayjit.ml
    
    assignments.ml
    
    backend_utils.ml
    
    backends.ml
    
    cc_backend.ml
    
    cuda_backend.ml
    
    gcc_backend.ml
    
    indexing.ml
    
    low_level.ml
    
    ndarray.ml
    
    ops.ml
    
    rand.ml
    
    tnode.ml
    
    utils.ml
  - arrayjit.ppx_arrayjit
    
    ppx_arrayjit.ml
    
    ppx_helper.ml

Legend:
Page
Library
Module
Module type
Parameter
Class
Class type
Source

ocannl

NOTE TO POTENTIAL CONTRIBUTORS: reach out so I can adjust my work style -- start using branches for refactoring. Otherwise you face frustration as the code might be broken. Tagged versions of the code are guaranteed to work as well as the given stage of the project permitted.

NEWS: the upcoming version 0.4.0 has significant design changes around the backend API and synchronization, with Cuda streams exposed as virtual devices. Version 0.4.1 will come soon after, with tests for (and small improvements to) mixed precision computation; it will require OCaml 5.2.

OCANNL is sponsored by Ahrefs! Visit the Ahrefs website.

OCANNL -- OCaml Compiles Algorithms for Neural Networks Learning

A from-scratch, compiled Deep Learning framework.
Implements backpropagation (i.e. first-order reverse mode autodiff) and shape inference.
The long-term goal is to provide several "low-level" backends, aiming to seek inspiration from projects such as TinyGrad, TVM, Luminal.
- OCANNL starts with a high-level representation, but can compile everything down to for loops.
The library users can compile any amount of code into a routine (i.e. a compilation unit). The user decides explicitly what the scope of a compilation unit is, by putting together the corresponding code. Depending on the use case:
- the whole training update step can be a single routine,
- or the step can be composed of a gradient update routine (a forward pass and a backprop pass) and a params update routine (e.g. SGD with momentum, ADAM, etc.),
- or the user can compile parts of a model separately, manually composing the corresponding forward pass code and the backprop code.
Tensor axes are split into kinds: batch, input and output. Tensor dimensions have optional labels.
- The labels ensure a more precise semantics for dimension matching.
- In the future we might introduce axis labels as an alternative to positional axis selection, it would be a separate naming mechanism.
OCANNL has full support for the einsum notation, integrated with shape inference. Supports static indexing, with a built-in operation to take a slice of the batch axes, integrated with shape inference. Extensible to more static indexing patterns as needs arise.
- OCANNL does not have dynamic indexing (using the last axis of one tensor as indices into another tensor). If it's needed, it can be added (we had a prototype once, removed to reduce complexity). Then it would also be integrated with shape inference.
OCANNL has a suite of tutorials doubling as tests with inline expectations.
OCANNL offers two main levels of abstraction.
- Differentiable computations, centered around the %op syntax extension.
  - %op stands for "operation", it's meant to express tensors: Tensor.t, and tensor functions.
- Plain computations, centered around the %cd syntax extension. It integrates the arrayjit backend library with shape inference.
  - %cd stands for "code", it's meant to express assignments: Assignments.t.
The support for mixed-precision computations is upcoming.
- E.g. higher-precision network components, or gradients at a higher precision than values.
- Currently (v0.3), you can select the precision, and individual computation nodes track their precision, but mixing precisions might break things.
Should be easily extensible.
Model surgery should be starightforward (not sure if we are there yet).
It's a feature, not a bug!
- To scale a tensor by a number, always use pointwise-multiplication, e.g. 2*.m or m*.2.
- Matrix-multiplying a tensor m by a constant number, e.g. m*2, broadcasts the number to the shape of the input axes of the tensor. This results in an output-axes-only tensor (multi-axis-vector) that is the scaled sum over the input axes of the tensor m.
- Matrix-multiplying a constant number by a tensor m, e.g. 2*m, broadcasts the number to the shape of the output axes of the tensor. This results in a tensor whose inputs are of the same shape as the inputs of m, and the output shape is 1D (scalar), that is the scaled sum over the output axes of the tensor m.
- The matrix-multiply operation behaves pointwise along the batch axes.

Usage

A possible route to learning OCANNL:

Get some basic grasp of the aims and design of the project by reading or skimming files in test/ and bin/.
Read the syntax extensions documentation lib/syntax_extensions.md.
Read the introductory part of the shape inference documentation lib/shape_inference.md.
Improve your understanding by reading or skimming lib/shape.mli, lib/tensor.mli, lib/operation.ml, lib/train.ml, and (since 0.4.1) lib/nn_blocks.ml.
Read arrayjit/lib/writing_a_backend.md.
Read the implementation overview:
1. Shape inference details lib/shape_inference.md.
2. Backend-independent optimizations arrayjit/lib/lowering_and_inlining.md -- lowering means translating (compiling) from the high-level representation (as assignments) to the low-level representation.
3. More documentation to come.

Upcoming milestones

This is very tentative.

0.4.1
- Half precision. Maybe improvements for mixed-precision computations.
- Resolve remaining issues with the new scheduler.
- Initial version of lib/nn_blocks.ml.
0.5
- More of primitive numeric operations.
- Useful building blocks for models in lib/nn_blocks.ml.
- A language model example.
0.6
- Getting more out of GPUs: better CUDA generation.

Releases

For more details, see CHANGES.

v0.4 merge buffers, C-syntax backend builder: a significant refactoring of the API.
v0.3 shape inference, jitted routines: a major rewrite of the whole project.
- v0.3.3: continuous integration and opam release.
- v0.3.2: new shape inference feature: tracking leftmost axes -- complete inference for splicing, ellipsis-in-the-middle allowed in einsum notation.
- v0.3.1: sanitizing code inclusion (rootness checks).
- v0.3.0: declarative shape inference; replaced the session interface with a "jitted code routines" API. Cuda defunct.
v0.2 inching toward GPU:
- v0.2.1 naive-cuda: a Cuda backend where blocks and threads are exposed via dedicated axis types.
- v0.2.0 stack-as-device: treating the C function stack as the "device memory".
v0.1 GCCJIT backend:
- v0.1.2: multicore computations using a thread-local "task id" index.
- v0.1.1: inlining scalar constants, improved inlining for virtual nodes.
- v0.1.0: a Gccjit backend, single and double precision floats, code compiled as a monolithic update step function.
v0.0 untagged: basic design around shape inference, high-level and low-level code representation. Now-abandoned Meta-OCaml and OCaml backends.

Why not just use OWL?

OCANNL follows different design choices than OWL. For example:

OCANNL is not functorized.
OCANNL has fewer abstraction layers.
OCANNL has a more powerful shape inference.
OCANNL only supports backpropagation, while OWL supports full forward and backward auto-diff.
Some aspects are more centralized in OCANNL than in OWL and form the "infrastructure":
- Tensor indexing mechanisms are not extensible, other than changing OCANNL code.
- Shape inference is fully handled by OCANNL and not extensible, other than changing OCANNL code.
- Tensor implements "putting pieces together".
- Train has the optimization "frontend" and utilities.
- arrayjit, which may one day become a standalone library: generates the code, performs backend-agnostic optimizations (virtual nodes whose computation is inlined), implements the backends.
Some aspects that are more core to OWL are less encapsulated in OCANNL, so it should be more natural to extend them.
- Specifically, Operation and Train are just collections of functions.
OCANNL provides lower-level compilation backends than OWL, it is more self-contained in this sense.

Installation

Although the project is called ocannl, the main package is called neural_nets_lib, to avoid the (opam linter's) complaint that the name can be confused with other packages. This also clarifies that ocannl is composed of arrayjit and neural_nets_lib.

The dependency on ocaml-cudajit is optional, so you have to install it first to enable the Cuda backend.