package saga

  1. Overview
  2. Docs

Module Saga_tokenizers.DecodersSource

Decoding module for converting token IDs back to text.

Sourcetype t

Main decoder type

Constructors

Sourceval bpe : ?suffix:string -> unit -> t

Create a BPE decoder.

  • parameter suffix

    Suffix to remove (default: "")

Sourceval byte_level : unit -> t

Create a byte-level decoder

Sourceval byte_fallback : unit -> t

Create a byte fallback decoder

Sourceval wordpiece : ?prefix:string -> ?cleanup:bool -> unit -> t

Create a WordPiece decoder.

  • parameter prefix

    Prefix to remove (default: "##")

  • parameter cleanup

    Whether to cleanup tokenization artifacts (default: true)

Sourceval metaspace : ?replacement:char -> ?add_prefix_space:bool -> unit -> t

Create a Metaspace decoder.

  • parameter replacement

    Character to replace spaces with (default: '▁')

  • parameter add_prefix_space

    Whether prefix space was added (default: true)

Sourceval ctc : ?pad_token:string -> ?word_delimiter_token:string -> ?cleanup:bool -> unit -> t

Create a CTC decoder.

  • parameter pad_token

    Padding token (default: "<pad>")

  • parameter word_delimiter_token

    Word delimiter token (default: "|")

  • parameter cleanup

    Whether to cleanup artifacts (default: true)

Sourceval sequence : t list -> t

Combine multiple decoders in sequence

Sourceval replace : pattern:string -> content:string -> unit -> t

Create a replace decoder.

  • parameter pattern

    Pattern to match

  • parameter content

    Replacement string

Sourceval strip : ?left:bool -> ?right:bool -> ?content:char -> unit -> t

Create a strip decoder.

  • parameter left

    Strip from left (default: false)

  • parameter right

    Strip from right (default: false)

  • parameter content

    Character to strip (default: ' ')

Sourceval fuse : unit -> t

Create a fuse decoder that merges tokens

Operations

Sourceval decode : t -> string list -> string

Decode a list of tokens back to text

Serialization

Sourceval to_json : t -> Yojson.Basic.t
Sourceval of_json : Yojson.Basic.t -> t