package saga

  1. Overview
  2. Docs

saga

API

Library saga

  • Saga Saga - Fast tokenization and text processing for ML in OCaml.

Library saga.models

Library saga.tokenizers

  • Saga_tokenizers Tokenizers library - text tokenization for ML. This module provides fast and flexible tokenization for machine learning applications, supporting multiple algorithms from simple word splitting to advanced subword tokenization like BPE, Unigram, WordLevel, and WordPiece. The API is designed to match Hugging Face Tokenizers v0.21 as closely as possible, adapted to idiomatic OCaml with functional style, records for configurations, polymorphic variants for enums, default values for optionals, and result types for fallible operations. The central type is Tokenizer.t, which represents a configurable tokenization pipeline.