Library
Module
Module type
Parameter
Class
Class type
Decoder of a PACK file.
This module implements what is needed to decode a PACKv2 file. It is independent of any scheduler. For cooperation issues, we recommend that you refer to the documentation for Cachet, the library used to read a PACK file. More specifically, Carton is based on the use of Unix.map_file
(or equivalent). Access to a block-device or file does not block, but it can take time. In short, the cooperation points have to be added by the user. As an atomic operation such as reading the PACK file (via Unix.map_file
) cannot be interleaved by cooperation points.
The module is divided into 3 parts:
First_pass
module for analysing a PACK stream. This is useful when the PACK file is transmitted over the network. In such a case, this analysis can be applied.Unix.map_file
function.module H = H
module Zh = Zh
val bigstring_of_string : string -> Cachet.bigstring
module Kind : sig ... end
A PACK file contains several types of object. According to Git, it contains commits (`A
), trees (`B
), blobs (`C
) and tags (`D
). Carton is far enough removed from Git to abstract itself from the actual type of these objects.
module Size : sig ... end
The size is a non-negative number which corresponds to the size of a Blob
in memory in bytes.
module Uid : sig ... end
An object can be identified by a unique identifier that needs to be calculated by an algorithm such as a hash algorithm. These identifiers can be used to refer to a possible source when we have a First_pass.kind.Ref
entry.
module First_pass : sig ... end
Once it is possible to use Unix.map_file
(or equivalent) on a PACK file (i.e. once it is available in a file system), it is possible to extract all the objects in this PACK file.
Extraction consists of either:
First_pass.kind.Base
entryIn both cases, we use bigstring
s. The advantage of the latter is that they are not relocated by the OCaml GC. The disadvantage is their allocation (via malloc()
), which can take a long time.
Memory usage is also a disadvantage. If an object is 1 Go in size, we are obliged to allocate a bigstring
of 1 Go (or more). It is not possible to stream-out all objects - only First_pass.kind.Base
objects can be streamed-out.
To limit the use of bigstring
s, there are various functions that let you know in advance:
As far as patch entries are concerned (First_pass.kind.Ofs
and First_pass.kind.Ref
), their source can also be an object from a patch which itself requires an object from a patch. This is referred to as the depth of the object in the PACK file. The maximum depth is 50: in other words, it may be necessary to reconstruct 49 objects upstream in order to build the requested object.
The advantage is, of course, the compression ratio. In addition to compressing the entries with zlib
, some objects are just patches compared to other objects. For example, if the PACK file contains a blob with content A
and another blob with content A+B
, the latter could be a patch containing only +B
and requiring our first blob as a source.
For simple use, the user must first calculate the size of the buffers needed to store the object in memory. They then need to allocate a Blob
to hold the object. Finally, the object can be reconstructed according to its position (cursor
) in the PACK file or according to its unique identifier if the user has the IDX file that allows the position of the object in the PACK file to be associated with its identifier (see Classeur
).
let t = Carton.make ~map ~z ~allocate ~ref_length in
let size = Carton.size_of_offset t ~cursor in
let blob = Carton.Blob.make ~size in
Carton.of_offset t blob ~cursor
val make :
?pagesize:int ->
?cachesize:int ->
map:'fd Cachet.map ->
'fd ->
z:Zl.bigstring ->
allocate:(int -> Zl.window) ->
ref_length:int ->
(Uid.t -> int) ->
'fd t
make ~map fd ~z ~allocate ~ref_length where
creates a representation of the PACK file whose read access is managed by the map
function. A few arguments are required so that Carton does not allocate buffers arbitrarily but gives the user fine-grained control over its allocation policy (since it essentially involves allocating bigstrings
).
z
is required to store a deflated entryallocate
function is required to get a Zl.window
required to deflate entriesref_length
of the unique identifiers that can be used to refer to patches. In the case of Git, this value is 20
(the size of a SHA1 hash)where
may be required to find out the position of an object according to its unique identifier (see Classeur
).Note: If where
is proposed and exhaustive, the *of_uid*
functions can be used.
make
calls Cachet.make
with the cachesize
and pagesize
arguments. These must be multiples of 2. For more details about these arguments and map
, please refer to the Cachet documentation.
val of_cache :
'fd Cachet.t ->
z:Zl.bigstring ->
allocate:(int -> Zl.window) ->
ref_length:int ->
(Uid.t -> int) ->
'fd t
of_cache cache ~z ~allocate ~ref_length where
is equivalent to make
but uses the cache
already available and initialised by the user.
copy t
makes a copy of the PACK file representation, which implies a new empty cache and a copy of the internal buffers. In this way, the result of this copy can be used in parallel safely, even if our first value t
attempts to extract objects at the same time.
val fd : 'fd t -> 'fd
fd t
returns the file-descriptor given by the user to make the representation of the PACK file t
.
val tmp : 'fd t -> De.bigstring
val ref_length : 'fd t -> int
val map : 'fd t -> cursor:int -> consumed:int -> Cachet.Bstr.t
module Blob : sig ... end
The Blob
is a tuple of temporary buffers used to store an object that has been decompressed or reconstructed using a patch and a source.
module Visited : sig ... end
size_of_uid pack ?visited ~cursor size
returns the size of the buffers (see Blob
s) required to extract the object located at cursor
from the PACK file. This does not correspond to the size of the object.
size_of_uid pack ?visited ~uid size
returns the size of the buffers (see Blob
s) required to extract the object identified by uid
from the PACK file. This does not correspond to the size of the object.
The given pack
must be able to recognize the object's position based on its unique identifier. In other words, pack
must be constructed with an exhaustive where
function for all the identifiers in the PACK file.
actual_size_of_offset pack ~cursor
returns the true size of the object located at cursor in the given pack
PACK file.
module Value : sig ... end
of_offset pack blob ~cursor
is the object at the offset cursor
into the given pack
.
Note: This function does not allocate larges resources (or, at least, only the given allocate
function to t
is able to allocate a large resource). blob
(which should be created with the associated Size.t
given by size_of_offset
) is enough to extract the object.
Note: This function is not tail-recursive. In other words, it can discover, step by step, the patches needed to rebuild the object. Even though a well-formed PACK file should not contain objects deeper than 50
, if you want to rebuild an object and are sure that the function is tail-recursive, you need to calculate its Path.t
first.
As of_offset
, of_uid pack block ~uid
is the object identified by uid
into the given pack
.
The given pack
must be able to recognize the object's position based on its unique identifier. In other words, pack
must be constructed with an exhaustive where
function for all the identifiers in the PACK file.
Due to the fact that of_offset
/of_uid
are not tail-rec, an other solution exists to extract an object from the PACK file. However, this solution requires a meta-data Path.t
to be able to extract an object.
A Path.t
is the delta-chain of the object. It assumes that a delta-chain can not be larger than 50
(see Git assumptions). From it, the way to construct an object is well-know and the step to discover if an object depends on an other one is deleted - and we ensure that the reconstruction is bound over our Path.t
.
module Path : sig ... end
of_offset_with_source ~map t ~path source ~cursor
is the object available at cursor
into t
. This function is tail-recursive and use the given source
if the requested object is a patch.
type identify =
| Identify : 'ctx First_pass.identify -> identify
Carton can be asked to calculate the identifier of an object but does not require the algorithm used (SHA1 or SHA256 for example) to be known. It only handles the result of this calculation, which is represented by a Uid.t
. For more details on how to implement identify, please refer to what is explained in the first phase of analysing a PACK file. You then simply need to "surround" your value with Carton.Identify
to completely abstract the algorithm used to calculate the object identifier.
type children = cursor:int -> uid:Uid.t -> int list