package carton

  1. Overview
  2. Docs

Decoder of a PACK file.

Along this module, the type ('a, 's) io with a 's scheduler is needed for some operations (which use a syscall). To be able to use them, the use must create a new type 's which represents the scheduler. To do that with LWT for example:

module Lwt_scheduler = Make (Lwt)

let scheduler =
  let open Lwt.Infix in
  let open Lwt_scheduler in
  {
    bind = (fun x f -> inj (x >>= fun x -> prj (f x)));
    return = (fun x -> inj x);
  }

The produced module has 2 functions inj and prj to pass from or to an LWT value. The user can use these functions like:

let fiber =
  let ( >>= ) = scheduler.bind in
  let return = scheduler.return in

  weight_of_offset scheduler ~map t ~weight:null 0L >>= fun weight ->
  let raw = make_raw ~weight in
  of_offset scheduler ~map t raw ~cursor:0L in
prj fiber ;;
- : (Carton.v, [> error ]) Lwt.t = <abstr>
module W : sig ... end
type weight = private int

Type of weight. weight is not length of object but bytes needed to extract it.

val null : weight

zero weight.

val weight_of_int_exn : int -> weight

weight_of_int_exn n is the weight of n.

type ('fd, 's) read = 'fd -> bytes -> off:int -> len:int -> (int, 's) Carton__.Sigs.io

Type of read syscall.

module Idx : sig ... end
module Fp (Uid : sig ... end) : sig ... end
type ('fd, 'uid) t

Type of state used to access to any objects into a Carton file.

/

val header_of_entry : map:'fd W.map -> ('fd, 'uid) t -> int64 -> W.slice -> int * int * int * W.slice

/

val with_z : Bigstringaf.t -> ('fd, 'uid) t -> ('fd, 'uid) t

with_z new t replaces the used temporary buffer by t by new. Indeed, when the user wants to extract an object, the internal temporary buffer is used to store the inflated object. By this way, a parallel/concurrent computation of 2 extractions with the same t is unsafe.

So, this function allows the user to create a new t with a new dedicated temporary buffer (physically different from the old one) to be able to start a parallel/concurrent process.

val with_w : 'fd W.t -> ('fd, 'uid) t -> ('fd, 'uid) t

with_w w t replaces the used table W.t by w. As with_z, the purpose of this function is to be able to parallelize multiple t.

val with_allocate : allocate:(int -> De.window) -> ('fd, 'uid) t -> ('fd, 'uid) t

with_allocate allocate t replaces the function to allocate the window needed to inflate objects by allocate. As with_z, the purpose of this function is to be able to parallelize multiple t.

val fd : ('fd, 'uid) t -> 'fd

fd t returns the underlying used fd resource to map memory parts of it. On Unix, even if a mapped memory part can live if fd is the close, the resource should be open as long as the user extracts objects.

type raw

Type of a Carton object as is into a Carton file.

val make_raw : weight:weight -> raw

make_raw ~weight allocates a raw.

val weight_of_raw : raw -> weight
type v

Type of values.

val v : kind:[ `A | `B | `C | `D ] -> ?depth:int -> Bigstringaf.t -> v

v ~kind ?depth raw is a value raw typed by kind. ?depth is an optional value to know at which depth the object exists into the PACK file it came from (default to 1).

val kind : v -> [ `A | `B | `C | `D ]

kind v is the type of the object v.

val raw : v -> Bigstringaf.t

raw v is the contents of the object v.

Note. The Bigstringaf.t can be larger (and contain extra contents) than len v (see len). The user should Bigstringaf.sub it with the real length of the object.

val len : v -> int

len v is the length of the object v.

val depth : v -> int

depth v is the depth of the object into the PACK file it came from.

val copy : ?flip:bool -> ?weight:weight -> v -> v

copy v creates a fresh new object which is equal to the given v.

val make : 'fd -> ?sector:int64 -> z:Zl.bigstring -> allocate:(int -> Zl.window) -> uid_ln:int -> uid_rw:(string -> 'uid) -> ('uid -> int64) -> ('fd, 'uid) t

make fd ~z ~allocate ~uid_ln ~uid_rw where returns a state associated to fd which is the user-defined representation of a Carton file. Some informations are needed:

  • z is an underlying buffer used to inflate an object.
  • allocate is an allocator of underlying window used to inflate an object.
  • uid_ln is the length of raw representation of user-defined uid.
  • uid_rw is the cast-function from a string to user-defined uid.
  • where is the function to associate an uid to an offset into the associated Carton file.

Each argument depends on what the user wants. For example, if t is used by Verify.verify, allocate must be thread-safe according to IO. where is not used by Verify.verify. uid_ln and uid_rw depends on the Carton file associated by fd. Each functions available below describes precisely what they do on t.

Weight of object.

Before to extract an object, we must know resources needed to extract it. weight_of_offset/weight_of_uid do an simple analyse and return the larger length needed to store the requested object such as:

weight_of_offset unix ~map t ~weight:null 0L >>= fun weight ->
assert ((null :> int) <= (weight :> int)) ;
Fmt.epr "Object at %08Lx needs %d byte(s).\n%!" 0L (weight :> int) ;
let resource = make_raw ~weight in
...

An object can need an other object (see OBJ_OFS_DELTA and OBJ_REF_DELTA). In this case, the resource needed must be larger/enough to store both objects. So the analyse is recursive over the delta-chain.

Note. If the given PACK file represented by t is bad, Cycle is raised. It means that an object A refers to an object B which refers to our last object A.

Note. This process is not tail-rec and discover at each step if it needs to continue the delta-chain or not.

exception Cycle
val weight_of_offset : map:'fd W.map -> ('fd, 'uid) t -> weight:weight -> ?visited:int64 list -> int64 -> weight

weight_of_offset sched ~map t ~weight offset returns the weight of the given object available at offset into t. This function assumes:

weight_of_offset sched ~map t ~weight:a offset >>= fun b ->
assert ((a :> int) <= (b :> int))

Note. This function can try to partially inflate objects. So, this function can use internal buffers and it is not thread-safe.

Note. This function can try to look-up an other object if it extracts an OBJ_REF_DELTA object. However, if we suppose that we process a PACKv2, an OBJ_REF_DELTA usually points to an external object (see thin-pack).

val weight_of_uid : map:'fd W.map -> ('fd, 'uid) t -> weight:weight -> ?visited:int64 list -> 'uid -> weight

weight_of_offset sched ~map t ~weight uid returns the weight of the given object identified by uid into t. This function assumes the same assumption as weight_of_offset.

Note. As weight_of_offset, this function can inflate objects and use internal buffers and it is not thread-safe.

Note. Despite weight_of_offset, this function look-up the object from the given reference.

val length_of_offset : map:'fd W.map -> ('fd, 'uid) t -> int64 -> int

Value of object.

val of_offset : map:'fd W.map -> ('fd, 'uid) t -> raw -> cursor:int64 -> v

of_offset sched ~map raw ~cursor is the object at the offset cursor into t. The function is not tail-recursive. It discovers at each step if the object depends on another one (see OBJ_REF_DELTA or OBJ_OFS_DELTA).

Note. This function does not allocate larges resources (or, at least, only the given allocate function to t is able to allocate a large resource). raw (which should be created with the associated weight given by weight_of_offset) is enough to extract the object.

val of_uid : map:'fd W.map -> ('fd, 'uid) t -> raw -> 'uid -> v

As of_offset, of_uid sched ~map raw uid is the object identified by uid into t.

Path of object.

Due to the fact that of_offset/of_uid are not tail-rec, an other solution exists to extract an object from the PACK file. However, this solution requires a meta-data path to be able to extract an object.

A path is the delta-chain of the object. It assumes that a delta-chain can not be larger than 60 (see Git assumptions). From it, the way to construct an object is well-know and the step to discover if an object depends on an other one is deleted - and we ensure that the reconstruction is bound over our path.

This solution fits well when we want to memoize the extraction.

type path

The type of paths.

val path_to_list : path -> int64 list

path_to_list path returns the delta-chain of the given path.

val kind_of_path : path -> [ `A | `B | `C | `D ]

kind_of_path path returns the kind of the object associated to the given path. An assumption exists about PACK format, a delta-chain refers to several objects which must have the same type/kind.

val path_of_offset : map:'fd W.map -> ('fd, 'uid) t -> cursor:int64 -> path

path_of_offset sched ~map t ~cursor is that path of the given object available at cursor.

Note. This function can try to partially inflate objects. So, this function can use internal buffers and it is not thread-safe.

Note. This function can try to look-up an other object if it extracts an OBJ_REF_DELTA object. However, if we suppose that we process a PACKv2, an OBJ_REF_DELTA usually points to an external object (see thin-pack).

val path_of_uid : map:'fd W.map -> ('fd, 'uid) t -> 'uid -> path

path_of_uid sched ~map t uid is the path of the given object identified by uid into t.

Note. As weight_of_offset, this function can inflate objects and use internal buffers and it is not thread-safe.

Note. Despite weight_of_offset, this function look-up the object from the given reference.

val of_offset_with_path : map:'fd W.map -> ('fd, 'uid) t -> path:path -> raw -> cursor:int64 -> v

of_offset_with_path sched ~map t ~path raw ~cursor is the object available at cursor into t. This function is tail-recursive and bound to the given path.

val of_offset_with_source : map:'fd W.map -> ('fd, 'uid) t -> v -> cursor:int64 -> v

of_offset_with_source ~map t ~path source ~cursor is the object available at cursor into t. This function is tail-recursive and use the given source if the requested object is a patch.

Uid of object.

Unique identifier of objects is a user-defined type which is not described by the format of the PACK file. By this fact, the way to digest an object is at the user's discretion. For example, Git prepends the value by an header such as:

let digest v =
  let kind = match kind v with
    | `A -> "commit"
    | `B -> "tree"
    | `C -> "blob"
    | `D -> "tag" in
  let hdr = Fmt.str "%s %d\000" kind (len v) int
  let ctx = Digest.empty in
  feed_string ctx hdr ;
  feed_bigstring ctx (Bigstringaf.sub (raw v) 0 (len v)) ;
  finalize ctx

Of course, the user can decide how to digest a value (see digest). However, 2 objects with the same contents but different types should have different unique identifier.

type 'uid digest = kind:[ `A | `B | `C | `D ] -> ?off:int -> ?len:int -> Bigstringaf.t -> 'uid
val uid_of_offset : map:'fd W.map -> digest:'uid digest -> ('fd, 'uid) t -> raw -> cursor:int64 -> [ `A | `B | `C | `D ] * 'uid
val uid_of_offset_with_source : map:'fd W.map -> digest:'uid digest -> ('fd, 'uid) t -> kind:[ `A | `B | `C | `D ] -> raw -> depth:int -> cursor:int64 -> 'uid
type 'uid children = cursor:int64 -> uid:'uid -> int64 list
type where = cursor:int64 -> int
type 'uid oracle = {
  1. digest : 'uid digest;
  2. children : 'uid children;
  3. where : where;
  4. weight : cursor:int64 -> weight;
}

Verify.

When the user get a PACK file, he must generate an IDX file (see Idx) from it - to be able to look-up objects from their uid. Verify is a process which try to create an OCaml representation of the IDX file. This process requires some information (see oracle) which can be collected by a first analyse (see Fp). Then, the process wants to take the opportunity to parallelize extraction (depending on the IO implementation).

module Verify (Uid : sig ... end) (Scheduler : sig ... end) (IO : sig ... end) : sig ... end
module Ip (Scheduler : sig ... end) (IO : sig ... end) (Uid : sig ... end) : sig ... end