package hugin

  1. Overview
  2. Docs

doc/CHANGES.html

Changelog

All notable changes to this project will be documented in this file.

  • Only document user-facing changes (features, bug fixes, performance improvements, API changes, etc.)
  • Add new entries at the top of the appropriate section (most recent first)

[1.0.0~alpha3] - Unreleased

This release reshapes raven's foundations. Every package received API improvements, several were rewritten, and two new packages — nx-oxcaml and kaun-board — were built as part of our Outreachy internships.

Highlights

  • Unified tensor typeNx.t and Rune.t are now the same type. Downstream packages no longer need to choose between them or convert at boundaries. Rune is now a pure transformation library (grad, vjp, vmap) over standard Nx tensors.
  • nx-oxcaml (new, Outreachy) — Pure-OCaml tensor backend using OxCaml's unboxed types and SIMD intrinsics. Performance approaches the C backend — in pure OCaml.
  • kaun-board (new, Outreachy) — TUI dashboard for monitoring training runs in the terminal. Live metrics, loss curves, and system stats.
  • quill — Rewritten from the ground up with two interfaces: a terminal UI with syntax highlighting and code completion, and a web frontend via quill serve with a CodeMirror 6 editor, WebSocket-based execution, autocompletion, and diagnostics.
  • brot — The tokenization library formerly known as saga. Complete rewrite with a cleaner API. 1.3-6x faster than HuggingFace Tokenizers on most benchmarks.
  • nx — Redesigned backend interface, RNG with effect-based scoping. Einsum 8-20x faster, matmul dispatch at BLAS parity with NumPy.

Breaking changes

  • nx: Redesigned backend interface with new Nx_buffer type. Removed nx.datasets library. Moved NN functions to Kaun (use Kaun.Fn). Renamed im2col/col2im to extract_patches/combine_patches. RNG uses effect-based implicit scoping instead of explicit key threading. Removed in-place mutation operations (ifill, iadd, isub, imul, idiv, ipow, imod, imaximum, iminimum and _s variants). Removed Symbolic_shape module; shapes are concrete int array throughout. Removed Instrumentation module.
  • rune: Rune.t no longer exists — use Nx.t everywhere. Rune no longer re-exports tensor operations; use open Nx for tensor ops and Rune.grad, Rune.vjp, etc. for autodiff. Remove any Rune.to_nx / Rune.of_nx calls. Removed enable_debug, disable_debug, with_debug; use Rune.debug f x instead.
  • rune: Removed JIT/LLVM backend. This will come back in a future release with a proper ML compiler.
  • kaun: Rewritten core modules API, datasets, and HuggingFace integration. Removed kaun-models.
  • brot: Renamed from saga. Rewritten API focused on tokenization.

Nx

  • Unify Nx.t and Rune.t into a single tensor type. A new nx.effect library (Nx_effect) implements the backend interface with OCaml 5 effects: each operation raises an effect that autodiff/vmap/debug handlers can intercept, falling back to the C backend when unhandled. Nx.t is now Nx_effect.t everywhere — no more type conversions between Nx and Rune.
  • Make transcendental, trigonometric, and hyperbolic operations (exp, log, sin, cos, tan, asin, acos, atan, atan2, sinh, cosh, tanh, asinh, acosh, atanh, erf, sigmoid) polymorphic over all numeric types including complex, matching the backend and effect definitions.
  • Make isinf, isfinite, ceil, floor, round polymorphic (non-float dtypes return all-false/all-true or no-op as appropriate).
  • Redesign backend interface with more granular operations (e.g. dedicated unary and binary kernels). This improves performance by letting backends optimize individual ops directly, and prepares for the JIT pipeline which will decompose composite operations at the compiler level instead of the frontend.
  • Rewrite Nx_buffer module with new interface. The backend now returns Nx_buffer.t instead of raw bigarrays.
  • Add new C kernels for unary, binary, and sort operations, and route new backend ops to C kernels.
  • Add scipy-style correlate, convolve, and sliding window filters.
  • Generalize unfold/fold to arbitrary leading dimensions.
  • Remove neural-network functions from Nx (softmax, log_softmax, relu, gelu, silu, sigmoid, tanh). These now live in Kaun.Fn.
  • Rename im2col/col2im to extract_patches/combine_patches.
  • Remove nx.datasets module. Datasets are now in kaun.datasets.
  • Simplify Nx_io interface. Inline vendor libraries (safetensors, and npy) directly into nx_io.
  • Move the Rng module from Rune into Nx with effect-based implicit scoping. Random number generation uses Nx.Rng.run to scope RNG state instead of explicit key threading.
  • Reduce matmul dispatch overhead to reach BLAS parity with NumPy.
  • Fix Threefry2x32 to match the Random123 standard.
  • Fix save_image crash on multi-dimensional genarray.
  • Pre-reduce independent axes in einsum to avoid OOM on large contractions.
  • Make Nx backends pluggable via Dune virtual libraries. The new nx.backend virtual library defines the backend interface, with the C backend (nx.c) as the default implementation. Alternative backends (e.g., nx-oxcaml) can be swapped in at link time. The Nx_c module is renamed to Nx_backend.
  • Fix .top libraries failing to load in utop with "Reference to undefined compilation unit Parse".
  • Fix OpenMP flag filtering in discover.ml: strip -Xpreprocessor -fopenmp as a pair on macOS to prevent dangling -Xpreprocessor from consuming subsequent flags and causing linker failures. (@Alizter)
  • Add missing bool→low-precision cast support (f16/bf16/fp8) in the C backend.
  • Add UInt32/UInt64 dtypes, rename complex dtypes to Complex64/Complex128, and drop Complex16/QInt8/QUInt8/Int/NativeInt as tensor element dtypes.
  • Remove in-place mutation operations (ifill, iadd, isub, imul, idiv, ipow, imod, imaximum, iminimum and _s variants). Use functional operations instead.
  • Remove Symbolic_shape module; shapes are now concrete int array throughout.
  • Remove Instrumentation module. Nx no longer wraps operations in tracing spans. Debugging tensor operations is handled by Rune's effect-based debug handler.
  • Fix critical correctness issue in fancy slicing (L) where permutations were ignored if the number of indices matched the dimension size (e.g., slice [L [1; 0]] x returned x unmodified).
  • Rewrite slice implementation to use as_strided for contiguous operations, reducing overhead to O(1) for view-based slices and separating gather operations for better performance.
  • Optimize set_slice by replacing scalar-loop index calculations with vectorized coordinate arithmetic, significantly improving performance for fancy index assignments.
  • Improve einsum performance 8–20× with greedy contraction path optimizer (e.g., MatMul 100×100 f32 207.83 µs → 10.76 µs, 19×; BatchMatMul 200×200 f32 8.78 ms → 435.39 µs, 20×)
  • Rewrite diagonal using flatten + gather approach instead of O(N²) eye matrix masking, reducing memory from O(N²) to O(N)
  • Improve error messages for shape operations (broadcast, reshape, blit) with per-dimension detail and element counts.

nx-oxcaml (new)

New pure-OCaml tensor backend that can be swapped in at link time via Dune virtual libraries. Uses OxCaml's unboxed types for zero-cost tensor element access, SIMD intrinsics for vectorized kernels, and parallel matmul. Performance approaches the native C backend — in pure OCaml. Supports the full Nx operation set: elementwise, reductions, matmul, gather/scatter, sort/argsort, argmax/argmin, unfold/fold, pad, cat, associative scan, and threefry RNG. (@nirnayroy, @tmattio)

Rune

  • Unify tensor types: Rune.t is now Nx.t. Rune no longer re-exports the Nx frontend — it is a pure transformation library exporting only grad, grads, value_and_grad, vjp, jvp, vmap, no_grad, detach, and debugging/gradcheck utilities. All tensor creation and manipulation uses Nx directly.
  • Remove Tensor module and Nx_rune backend. Effect definitions moved to the new nx.effect library shared with Nx.
  • Remove Rune.to_nx / Rune.of_nx (no longer needed — types are identical).
  • Remove Rune.enable_debug, Rune.disable_debug, Rune.with_debug. Use Rune.debug f x to run a computation with debug logging enabled.
  • Remove JIT compilation support from Rune. The Rune.Jit module and LLVM/Metal backends have been removed and will be re-introduced later as a standalone package.
  • Update to new Nx_buffer.t type.
  • Propagate new backend operations through effects and autodiff.
  • Rewrite Autodiff module to fix critical JVP correctness issues, enable higher-order derivatives (nested gradients), and introduce vjp as a first-class primitive.
  • Fix pointer-based hashing in autodiff, correcting nested JVP handler behavior.
  • Add autodiff support for as_strided, enabling gradients through slicing and indexing operations
  • Add autodiff support for cummax and cummin cumulative operations
  • Add autodiff support for FFT operations
  • Add autodiff support for some linear algebra operations: QR decomposition (qr), Cholesky decomposition (cholesky), and triangular solve (triangular_solve).

Kaun

  • Simplify and redesign the core API for better discoverability and composability. Layers, optimizers, and training utilities now follow consistent patterns and compose more naturally.
  • Add Fn module with conv1d, conv2d, max_pool, avg_pool — neural network operations that were previously in Nx now live here with a cleaner, more focused API.
  • Redesign datasets and HuggingFace integration with simpler, more composable APIs.
  • Remove kaun-models library. Pre-built models now live in examples.
  • Reinitialize dataset each epoch to avoid iterator exhaustion (#147, @Shocker444, @tmattio)

kaun-board (new)

TUI dashboard for monitoring training runs in the terminal. Displays live metrics, loss curves, and system stats. Extracted from kaun's console module into a standalone package. (#166, #167, #170, @Arsalaan-Alam)

Brot

  • Rename the library from saga to brot.
  • Simplify brot to a tokenization-only library. Remove the sampler, n-gram models, and I/O utilities. The sampler is rewritten with nx tensors and moved to dev/mimir as the seed of an experimental inference engine.
  • Merge brot.tokenizers sub-library into brot.
  • Remove dependency on Nx.
  • Use Buffer.add_substring instead of char-by-char loop in whitespace pre-tokenizer.
  • Compact BPE symbols in-place after merges, avoiding an intermediate array allocation.
  • Replace list cons + reverse with forward List.init in BPE word_to_tokens.
  • Use pre-allocated arrays with Array.blit instead of Array.append in encoding merge and padding, halving per-field allocations.
  • Avoid allocating an unused words array in post-processor encoding conversion.
  • Reduce WordPiece substring allocations from O(n²) to O(n) per word by building the prefixed candidate string once per position.
  • Add encode_ids fast path that bypasses Encoding.t construction entirely when only token IDs are needed.
  • Add ASCII property table for O(1) character classification in pre-tokenizers, replacing O(log n) binary search for is_alphabetic (600 ranges), is_numeric (230 ranges), and is_whitespace (10 ranges). Yields 12-27% speedup on encode benchmarks with ~30% allocation reduction.
  • Add inline ASCII fast paths in all pre-tokenizer loops, skipping UTF-8 decoding and using Buffer.add_char instead of String.sub for single-byte characters. Combined with the property table, yields 20-30% total speedup and 36-55% allocation reduction vs baseline.
  • Parallelize batch encoding with OCaml 5 domains.
  • Optimize BPE merge loop with open-addressing hash, flat arrays, and shift-based heap.
  • Add trie-based WordPiece lookup and normalizer fast path.
  • Remove dependency on str library.
  • Generate unicode data offline, removing runtime dependency on uucp.
  • Remove unused Grapheme module. Grapheme cluster segmentation is not needed for tokenization.
  • Remove uutf dependency in favour of OCaml Stdlib unicode support.

Fehu

  • Simplify and redesign the core API. Environments and training utilities now follow consistent functional patterns that are easier to use and compose.
  • Remove fehu.algorithms — fehu now only depends on rune, and users bring their own algorithms. Examples provided for well-known RL algorithms like DQN and REINFORCE.

Sowilo

  • Cleaner public API — internal implementation split into focused submodules while the public surface stays small.
  • Faster grayscale conversion, edge detection, and gaussian blur.

Quill

Rewritten from the ground up. Terminal UI with syntax highlighting, code completion, and a compact single-line footer. Web frontend via quill serve with a CodeMirror 6 editor, WebSocket-based execution, autocompletion, and diagnostics. Markdown notebook format shared across both interfaces.

Interactive REPL: quill with no file argument launches a toplevel with syntax highlighting, tab completion, persistent history, smart phrase-aware submission, and piped mode.

Hugin

Rewritten from the ground up with a declarative, composable API. Plots are built by combining inert mark descriptions (line, point, bar, hist, heatmap, contour, errorbar, etc.) with layers, decorating them (title, xlabel, legend, etc.), and laying them out (grid, hstack, vstack). A compilation pass resolves data to a Scene IR that separate backends render.

  • New declarative specification API replacing the imperative figure/axes/artist architecture. Marks compose with layers, decorations chain functionally, and grid layouts nest arbitrarily.
  • ucairo — Minimal Cairo FFI bindings (36 C stubs) replacing the cairo2 opam dependency.
  • Dual-backend rendering: Cairo (PNG, PDF, interactive SDL window) and SVG from a shared Scene IR.
  • OKLCH perceptual color space with Color.oklch, Color.hex, named CSS colors, and alpha support.
  • Curated colormaps (Cmap.viridis, plasma, inferno, magma, cividis, turbo, coolwarm, spectral).
  • Theme system with light, dark, and minimal presets.
  • Linear, log, and symlog axis scaling with automatic tick generation.
  • Legend placement with configurable location and multi-column layout.
  • Interactive show with SDL window resizing, Escape/Q to close.
  • Rewritten examples and documentation.

Talon

  • Remove jsont, bytesrw, and csv dependencies from Talon. CSV support is now built-in via the talon.csv sub-library with a minimal RFC 4180 parser.
  • Remove talon.json sub-library.

1.0.0~alpha2 - 2025-11-03

We're excited to announce the release of Raven 1.0.0~alpha2! Less than a month after alpha1, this release notably includes contributions from Outreachy applicants in preparation for the upcoming two internships.

Some highlights from this release include:

  • NumPy-compatible text I/O with Nx_io.{save,load}_text
  • Lots of new functions in Nx/Rune, including neural-net ones dropout, log_softmax, batch_norm, layer_norm, and activation functions like celu and celu, and generic ones like conjugate, index_put, and more.
  • Addition of .top libraries for nx, rune, and hugin that auto-install pretty-printers in the OCaml toplevel. You can run e.g. #require "nx.top".
  • Addition of a visualization API in Fehu via the new fehu.visualize library, supporting video recording.
  • Redesign of Kaun core datastructure and checkpointing subsystem for complete snapshotting.
  • Many, many bug fixes and correctness improvements.

We've also made numerous performance improvements across the board:

  • Nx elementwise ops: 5–50× faster (e.g., Add 50×50 f32 88.81 µs → 1.83 µs, 48×; Mul 100×100 f32 78.51 µs → 2.41 µs, 33×).
  • Nx conv2d: 4–5× faster on common shapes; up to 115× on heavy f64 batched cases (e.g., B16 C64→128 16×16 K3 f64 1.61 s → 13.96 ms).
  • Rune autodiff: 1.2–3.7× faster on core grads (e.g., MatMulGrad Medium 34.04 ms → 11.91 ms, 2.86×; Large 190.19 ms → 50.97 ms, 3.73×).
  • Talon dataframes: big wins in joins and group-bys (Join 805.35 ms → 26.10 ms, 31×; Group-by 170.80 ms → 19.03 ms, ; Filter 9.93 ms → 3.39 ms, ).
  • Brot tokenizers: realistic workloads 4–17% faster (e.g., WordPiece encode single 136.05 µs → 115.92 µs, 1.17×; BPE batch_32 24.52 ms → 22.27 ms, 1.10×)

We're closing 8 user-reported issues or feature requests and are totalling 30 community contributions from 8 unique contributors.

Nx

  • Fix einsum output axis ordering for free axes (e.g., i,jk->jki, ij,klj->kli) by correcting final transpose permutation and intermediate left-axis reordering.
  • Add Nx_io.Cache_dir module with consolidated cache directory utilities respecting RAVEN_CACHE_ROOT, XDG_CACHE_HOME, and HOME fallback, replacing project-specific cache logic across the whole raven ecosystem (#134, @Arsalaan-Alam)
  • Add Nx_io.save_txt / Nx_io.load_txt with NumPy-compatible formatting, comments, and dtype support (#120, @six-shot)
  • Optimize multi_dot for matrix chains, reducing intermediate allocations and improving performance
  • Add public index_put function for indexed updates
  • Clarify reshape documentation to match its view-only semantics
  • Provide nx.top, rune.top, and hugin.top libraries that auto-install pretty printers in the OCaml toplevel and update Quill to load them
  • Add ifill for explicit in-place fills and make fill return a copied tensor
  • Speed up contiguous elementwise ops via vectorized loops
  • Fast-path contiguous single-axis reductions to avoid iterator fallback
  • Speed up float reductions with contiguous multi-axis fast paths
  • Fast-path padding-free unfold to lower conv2d overhead
  • Move neural-network operations (softmax, log_softmax, relu, gelu, silu, sigmoid, tanh) from Kaun to Nx
  • Add public conjugate function for complex number conjugation (#125, @Arsalaan-Alam)
  • Fix complex vdot to conjugate first tensor before multiplication, ensuring correct mathematical behavior (#123, @Arsalaan-Alam)
  • Update comparison and conditional operations to use boolean tensors (#115, @nirnayroy)
  • Add support for rcond parameter and underdetermined systems to lstsq (#102, @Shocker444)
  • Fix matrix_rank/pinv Hermitian fast paths to use eigen-decomposition and match NumPy for complex inputs (#96, @six-shot, @tmattio)
  • Optimize matmul BLAS dispatch for strided tensors, improving matrix multiplication performance
  • Fix slow builds reported since alpha1 (#88, @tmattio)
  • Fix macOS ARM crash when loading extended bigarray kinds
  • Add float16 and bfloat16 support to safetensors I/O, including precise conversions that preserve denormals/NaNs (#84, @six-shot, @tmattio)
  • Refined View internals for leaner contiguity checks and stride handling, cutting redundant materialization on hot paths
  • Merge Lazy_view into the core View API so movement ops operate on a single composed view
  • Documented the reworked View interface
  • Documented the Symbolic_shape interface
  • Added Accelerate framework flag when compiling on macOS, fixing issues in some environments (#129, @nirnayroy)

Hugin

  • Fix random SIGBUS/bus errors on macOS when closing Hugin.show windows by destroying SDL windows with the correct pointer in the finalizer.
  • Let Hugin.show windows close cleanly via the window button or Esc/q, avoiding frozen macOS REPL sessions

Rune

  • Add Rune.no_grad and Rune.detach to mirror JAX stop-gradient semantics
  • Improve gradient performance slightly by replace the reverse-mode tape's linear PhysicalTbl with an identity hash table
  • Fix Rune.Rng.shuffle flattening outputs for multi-dimensional tensors; the shuffle now gathers along axis 0 and keeps shapes intact
  • Replace Rune.Rng.truncated_normal clipping with rejection sampling so samples stay inside the requested interval without boundary spikes
  • Add support for categorical sampling with Rune.Rng.categorical (#89, @nirnayroy)
  • Allow plain llvm-config in discovery, fixing build in some platforms (#71, @stepbrobd)

Kaun

  • Added Similarity and Polysemy analysis to the BERT example (#137, @nirnayroy)
  • Support attention masks via the new Kaun.Attention module
  • Support loading sharded Hugging Face safetensors
  • Fix BERT and GPT‑2 model loading
  • API simplification: removed type parameters from public types; Ptree now supports mixed‑dtype trees via packed tensors with typed getters.
  • Checkpointing overhaul: versioned Train_state with schema tagging, explicit Checkpoint.{Snapshot,Artifact,Manifest,Repository} (retention, tags, metadata), and simple save/load helpers for snapshots and params.
  • Overhaul dataset combinators: derive tensor specs from Rune dtype, fix sampling/window bugs, validate weighted sampling, and respect drop_remainder
  • Make dataset prefetch truly asynchronous with background domains and allow reusing an external Domainslib pool via parallel_map ~pool
  • Use Dataset.iter for epoch batches to reduce overhead
  • Update BERT and GPT-2 tokenizer cache to use Nx.Cache for consistent cache directory resolution (#134, @Arsalaan-Alam)
  • Honor text dataset encodings via incremental Uutf decoding (#122, @Satarupa22-SD).
  • Preserve empty sequential modules when unflattening so indices stay aligned for checkpoint round-tripping
  • Prevent Training.fit/evaluate from consuming entire datasets eagerly and fail fast when a dataset yields no batches, avoiding hangs and division-by-zero crashes
  • Allow metric history to tolerate metrics that appear or disappear between epochs so dynamic metric sets no longer raise during training
  • Make Optimizer.clip_by_global_norm robust to zero gradients and empty parameter trees to avoid NaNs during training
  • Split CSV loader into from_csv and from_csv_with_labels to retain labels when requested (#114, @Satarupa22-SD)
  • Implement AUC-ROC and AUC-PR in Kaun metrics and simplify their signatures (#124, #131, @Shocker444)
  • Add mean absolute percentage error, explained variance, R² (with optional adjustment), KL-divergence, and top-k accuracy to Kaun metrics
  • Add NDCG, MAP, and MRR ranking metrics to Kaun metrics
  • Add BLEU, ROUGE, and METEOR metrics to Kaun for pre-tokenized sequences, removing tokenizer dependencies
  • Add SSIM, IoU, and Dice metrics for vision workloads in Kaun

Talon

  • Remove automatic sentinel-based null detection for numeric columns; explicit masks (via [_opt] constructors) now define missing data semantics
  • Replace join nested loops with hashed join indices, cutting lookup from O(n·m) to near O(n)
  • Reuse a shared Nx-based column reindexer so filter/sample paths avoid repeated array copies
  • Fix fillna to honor column null masks and replacements, restoring expected nullable semantics
  • Preserve null masks when reindexing during joins so sentinel values remain valid data
  • Handle numeric index columns in pivot, preventing distinct keys from collapsing into a single bucket
  • Respect null masks when serializing numeric columns to JSON, emitting JSON null instead of sentinel values
  • Detect big integers as int64 in Talon CSV loader (#121, @Arsalaan-Alam)
  • Allow forcing column types in Talon JSON loader (#104, @nirnayroy)
  • Add documentation to compare Talon and Pandas (#154, Satarupa22-SD)

Saga

  • Remove legacy Normalizers.nmt and Normalizers.precompiled constructors (and their JSON serializers) so the public surface only advertises supported normalizers
  • Tighten template processor JSON parsing: require integer type ids, drop the legacy special-token list format, and ensure multi-id special tokens round-trip with the new record fields
  • Make tokenizer JSON loading tolerant of HuggingFace quirks (missing model.type, string-encoded merges), restoring compatibility with upstream tokenizer.json files
  • Cache byte-level encode/decode lookup tables to avoid rebuilding them during tokenization, trimming avoidable allocations
  • Skip BPE dropout sampling when dropout is disabled, removing redundant RNG work on common hot paths
  • Fix Unigram tokenization so longest matches are emitted without aborting the sequence when a vocab hit occurs
  • Recompute pad token ids when the pad special string changes, preventing padding with stale ids
  • Fix Unigram token_to_id/id_to_token vocabulary lookups (#117, @RidwanAdebosin)
  • Optimize Pre_tokenizers.whitespace to reduce allocations and improve tokenization performance
  • Simplify tokenizers interface

Sowilo

  • Add resize (nearest & bilinear) that works for 2D, batched, and NHWC tensors
  • Update grayscale conversion and RGB/BGR channel swaps to run entirely on Rune ops, keeping batched inputs compatible with JIT backends
  • Make median_blur compute the true median so salt-and-pepper noise is removed as expected
  • Fix erode/dilate so custom structuring elements (e.g. cross vs. square) and batched tensors produce the correct morphology result

Fehu

  • Added snapshot-based save/load for DQN and REINFORCE agents (#127, @RidwanAdebosin, @tmattio)
  • Added typed Render payloads with enforced render_mode selection in Env.create, auto human-mode rendering, and vectorized Env.render accessors so environments consistently expose frames for downstream tooling
  • Introduced the Fehu_visualize library with ffmpeg/gif/W&B sinks, overlay combinators, rollout/evaluation recorders, and video wrappers for single and vectorized environments, providing a cohesive visualization stack for Fehu
  • Added a Fehu.Policy helper module (random/deterministic/greedy) and sink with_* guards so visualization sinks handle directory creation and cleanup automatically
  • Added Buffer.Replay.sample_tensors to streamline batched training loops and exploration handling
  • Reworked Fehu_algorithms.Dqn around init/step/train primitives with functional state, warmup control, and snapshotting helpers
  • Rebuilt Fehu_algorithms.Reinforce on the same init/step/train interface with optional baselines, tensor-based rollouts, snapshot save/load, and updated tests/examples/docs using the new workflow
  • Upgraded the GridWorld environment to return ANSI and RGB-array frames using the new render types, and updated the DQN example to optionally record pre- and post-training rollouts via FEHU_DQN_RECORD_DIR using Fehu_visualize sinks
  • Reworked space sampling to return (value, next_rng) and split keys internally, fixing correlated draws in Box/Multi-discrete/Tuple/Dict/Sequence/Text samplers while adding Space.boundary_values for deterministic compatibility checks
  • Extended vectorized environments to reuse space boundary probes and now store structured final_observation payloads in Info, improving downstream consumption
  • Added Buffer.Replay.add_many and Buffer.Replay.sample_arrays, preserved backing storage on clear, and exposed struct-of-arrays batches for vectorised learners
  • Tightened Env.create diagnostics with contextual error messages and an optional ~validate_transition hook for custom invariants
  • Enriched Wrapper utilities with map_info, Box clip_action/clip_observation, and time-limit info reporting elapsed steps
  • Upgraded Info values to carry int/float/bool arrays with stable JSON round-tripping (handling NaN/∞) and sorted metadata serialization for deterministic diffs
  • Improved training helpers: Welford-based normalization with optional unbiased variance, documented done = terminated || truncated, and returned nan when explained variance is undefined
  • Treat time-limit truncations as terminals when computing rollout advantages and expose the truncated flag in buffer steps
  • Require callers of Training.compute_gae to pass final bootstrapping values and ensure Training.evaluate feeds the current observation to policies
  • Allow Space.Sequence.create to omit max_length, keeping sequences unbounded above while preserving validation and sampling semantics
  • Validate vectorized environments by round-tripping sample actions/observations across every instance, preventing incompatible spaces from slipping through
  • Finish clipped value loss support in Fehu.Training (#119, @nirnayroy)

Nx-datasets

  • Migrate to Nx.Cache for cache directory resolution, enabling consistent behavior. (#133, @Arsalaan-Alam)
  • Fix cache directory resolution to respect RAVEN_CACHE_ROOT (or fall back to XDG_CACHE_HOME/HOME), allowing custom cache locations. (#128, @Arsalaan-Alam)
  • Switch CIFAR-10 loader to the binary archive so parsing succeeds again
  • Add a CIFAR-10 example
  • Standardize dataset examples on Logs
  • Use Logs for dataset loader logging (#95, @Satarupa22-SD)

1.0.0~alpha1 - 2025-10-02

This release expands the Raven ecosystem with three new libraries (Talon, Saga, Fehu) and significant enhancements to existing ones. alpha1 focuses on breadth—adding foundational capabilities across data processing, NLP, and reinforcement learning—while continuing to iterate on core infrastructure.

New Libraries

Talon - DataFrame Processing

We've added Talon, a new DataFrame library inspired by pandas and polars:

  • Columnar data structures that support mixed types (integers, floats, strings, etc.) within a single table (aka heterogeneous datasets)
  • Operations: filter rows, group by columns, join tables, compute aggregates
  • Load and save data in CSV and JSON formats
  • Seamless conversion to/from Nx arrays for numerical operations
Saga - NLP & Text Processing

Saga is a new text processing library for building language models. It provides:

  • Tokenizers: Byte-pair encoding (BPE), WordPiece subword tokenization, and character-level splitting
  • Text generation: Control output with temperature scaling, top-k filtering, nucleus (top-p) sampling, and custom sampling strategies
  • Language models: Train and generate text with statistical n-gram models (bigrams, trigrams, etc.)
  • I/O: Read large text files line-by-line and batch-process corpora
Fehu - Reinforcement Learning

Fehu brings reinforcement learning to Raven, with an API inspired by Gymnasium and Stable-Baselines3:

  • Standard RL environment interface (reset, step, render) with example environments like Random Walk and CartPole
  • Environment wrappers to modify observations, rewards, or episode termination conditions
  • Vectorized environments to collect experience from multiple parallel rollouts
  • Training utilities: Generalized advantage estimation (GAE), trajectory collection and management
  • RL algorithms: Policy gradient method (REINFORCE), deep Q-learning (DQN) with replay buffer
  • Use Kaun neural networks as function approximators for policies and value functions

Major Enhancements

Nx - Array Computing

We've significantly expanded Nx's following early user feedback from alpha0:

  • Complete linear algebra suite: LAPACK-backed operations matching NumPy including singular value decomposition (SVD), QR factorization, Cholesky decomposition, eigenvalue/eigenvector computation, matrix inverse, and solving linear systems
  • FFT operations: Fast Fourier transforms (FFT/IFFT) for frequency domain analysis and signal processing
  • Advanced operations: Einstein summation notation (einsum) for complex tensor operations, extract/construct diagonal matrices (diag), cumulative sums and products along axes
  • Extended dtypes: Machine learning-focused types including bfloat16 (brain floating point), complex16, and float8 for reduced-precision training
  • Symbolic shapes: Internal infrastructure for symbolic shape inference to enable dynamic shapes in future releases (not yet exposed in public API)
  • Lazy views: Array views only copy and reorder memory when stride patterns require it, avoiding unnecessary allocations
Rune - Autodiff & JIT

We've continued iterating on Rune's autodiff capabilities, and made progress on upcoming features:

  • Forward-mode AD: Compute Jacobian-vector products (jvp) for forward-mode automatic differentiation, complementing existing reverse-mode
  • JIT: Ongoing development of LLVM-based just-in-time compilation for Rune computations (currently in prototype stage)
  • vmap: Experimental support for vectorized mapping to automatically batch operations (work-in-progress, not yet stable)
  • LLVM backend: Added compilation backend with support for LLVM versions 19, 20, and 21
  • Metal backend: Continued work on GPU acceleration for macOS using Metal compute shaders
Kaun - Deep Learning

We've expanded Kaun with high-level APIs for deep learning. These APIs are inspired by popular Python frameworks like TensorFlow, PyTorch, and Flax, and should feel familiar to users building models in Python:

  • High-level training: Keras-style fit() function to train models with automatic batching, gradient computation, and parameter updates
  • Training state: Encapsulated training state (TrainState) holding parameters, optimizer state, and step count; automatic history tracking of loss and metrics
  • Checkpoints: Save and load model weights to disk for model persistence and transfer learning
  • Metrics: Automatic metric computation during training including accuracy, precision, recall, F1 score, mean absolute error (MAE), and mean squared error (MSE)
  • Data pipeline: Composable dataset operations (map, filter, batch, shuffle, cache) inspired by TensorFlow's tf.data for building input pipelines
  • Model zoo: Reference implementations of classic and modern architectures (LeNet5 for basic CNNs, BERT for masked language modeling, GPT2 for autoregressive generation) including reusable transformer components
  • Ecosystem integration: Load HuggingFace model architectures (kaun.huggingface), access common datasets like MNIST and CIFAR-10 (kaun.datasets), and use standardized model definitions (kaun.models)

Contributors

Thanks to everyone who contributed to this release:

  • @adamchol (Adam Cholewi) - Implemented the initial associative_scan native backend operation for cumulative operations
  • @akshay-gulab (Akshay Gulabrao)
  • @dhruvmakwana (Dhruv Makwana) - Implemented einsum for Einstein summation notation
  • @gabyfle (Gabriel Santamaria) - Built PocketFFT bindings that replaced our custom FFT kernels
  • @lukstafi (Lukasz Stafiniak) - Major contributions to Fehu and FunOCaml workshop on training Sokoban agents
  • @nickbetteridge
  • @sidkshatriya (Sidharth Kshatriya)

1.0.0~alpha0 - 2025-07-05

Initial Alpha Release

We're excited to release the zeroth alpha of Raven, an OCaml machine learning ecosystem bringing modern scientific computing to OCaml.

Added

Core Libraries
  • Nx - N-dimensional array library with NumPy-like API

    • Multi-dimensional tensors with support for several data types.
    • Zero-copy operations: slicing, reshaping, broadcasting
    • Element-wise and linear algebra operations
    • Swappable backends: Native OCaml, C, Metal
    • I/O support for images (PNG, JPEG) and NumPy files (.npy, .npz)
  • Hugin - Publication-quality plotting library

    • 2D plots: line, scatter, bar, histogram, step, error bars, fill-between
    • 3D plots: line3d, scatter3d
    • Image visualization: imshow, matshow
    • Contour plots with customizable levels
    • Text annotations and legends
  • Quill - Interactive notebook environment

    • Markdown-based notebooks with live formatting
    • OCaml code execution with persistent session state
    • Integrated data visualization via Hugin
    • Web server mode for browser-based editing
ML/AI Components
  • Rune - Automatic differentiation and JIT compilation framework

    • Reverse-mode automatic differentiation
    • Functional API for pure computations
    • Basic JIT infrastructure (in development)
  • Kaun - Deep learning framework (experimental)

    • Flax-inspired functional API
    • Basic neural network components
    • Example implementations for XOR and MNIST
  • Sowilo - Computer vision library

    • Image manipulation: flip, crop, color conversions
    • Filtering: gaussian_blur, median_blur
    • Morphological operations and edge detection
Supporting Libraries
  • Nx-datasets - Common ML datasets (MNIST, Iris, California Housing)
  • Nx-text - Text processing and tokenization utilities

Known Issues

This is an alpha release with several limitations:

  • Quill editor has UI bugs being addressed
  • APIs may change significantly before stable release

Contributors

Initial development by the Raven team. Special thanks to all early testers and contributors.

@axrwl @gabyfle @hesterjeng @ghennequin @blueavee

And to our early sponsors:

@daemonfire300 @gabyfle @sabine