package zipc

  1. Overview
  2. Docs

Module ZipcSource

ZIP archives.

Consult the quick start and limitations.

References.

Archive members

Sourcetype compression =
  1. | Bzip2
  2. | Deflate
    (*

    Via Zipc_deflate.

    *)
  3. | Lzma
  4. | Stored
    (*

    No compression.

    *)
  5. | Xz
  6. | Zstd
  7. | Other of int

The type for compression formats.

Zipc only handles Stored and Deflate but third party libraries can be used to support others formats or to plug an alternate implementation of Deflate.

Sourceval pp_compression : Format.formatter -> compression -> unit

pp_compression formats compression formats.

Sourcemodule Fpath : sig ... end

File paths and modes.

Sourcemodule Ptime : sig ... end

POSIX time.

Sourcemodule File : sig ... end

Archive file data.

Sourcemodule Member : sig ... end

Archive members.

Archives

Sourcetype t

The type for ZIP archives.

Sourceval empty : t

empty is an empty archive.

Sourceval is_empty : t -> bool

is_empty z is true iff z is empty.

Sourceval mem : Fpath.t -> t -> bool

mem p z is true iff z has a member with path p.

Sourceval find : Fpath.t -> t -> Member.t option

find p z is the member with path p of z (if any).

Sourceval fold : (Member.t -> 'a -> 'a) -> t -> 'a -> 'a

fold f z acc folds f over the members of z starting with acc in increasing lexicographic member path order. In particular this means that directory members, if they exist, are folded over before any of their content (assuming paths without relative segments).

Sourceval add : Member.t -> t -> t

add member z is z with member added. Overrides a previous member with the same path in z (if any).

Sourceval remove : Fpath.t -> t -> t

remove p is z with member with path p removed (if any).

Sourceval member_count : t -> int

member_count z is the number of members in z.

Sourceval to_string_map : t -> Member.t Map.Make(String).t

to_string_map z is z as a map from Member.path to their values.

Sourceval of_string_map : Member.t Map.Make(String).t -> t

of_string_map map is map as a ZIP archive.

Warning. It is assumed that in map each key k maps to a member m with Member.path m = k. This is not checked by the function.

Decode

Sourceval string_has_magic : string -> bool

string_has_magic s is true iff s has at least 4 bytes and starts with PK\x03\04 or PK\x05\06 (empty archive).

Sourceval of_binary_string : string -> (t, string) result

of_binary_string s decodes a ZIP archive from s.

Note. ZIP archives's integrity constraints are unclear. For now based on sanity and certain archives found in the wild that are supported by the unzip tool the following is done:

  • As a rule of thumb, all member metadata is determined only from the archive's central directory file header; local file headers and data descriptors are ignored.
  • If a directory member pretends to have file data this data is ignored.
  • If a path is defined more than once, the second definition takes over.
  • If the central directory CRC-32 of a file member is 0 we lookup and use the value found in its local file header.

Encode

Sourceval encoding_size : t -> int

encoding_size z is the number of bytes needed to encode z.

Sourceval to_binary_string : ?first:Fpath.t -> t -> (string, string) result

to_binary_string z is the encoding of archive z. Error _ is returned with a suitable error message in case z has more members than Member.max.

If a member with path first exists in z then this member's data is written first in the ZIP archive. It defaults to "mimetype" to support the EPUB OCF ZIP container constraint (you are however in charge of making sure this member is not compressed in this case).

Note.

  • Member.mtime that are before the Ptime.dos_epoch are silently truncated to that date.
  • Except for first, member data is encoded in the (deterministic) increasing lexical order of their path.
  • The encoding does not use data descriptors, so bit 3 of File.gp_flags is always set to 0 on encoding.
Sourceval write_bytes : ?first:Fpath.t -> t -> ?start:int -> bytes -> (unit, string) result

write_bytes t ~start b writes to_binary_string to bytes starting at start (defaults to 0).

Raises Invalid_argument if b is too small.

Limitations

Up to the limitations listed below Zipc is suitable for the following:

  • Reading and writing the subset of ZIP archives defined by ISO/IEC 21320-1 which is used as a documentation container for the Office Open XML or OpenDocument file formats. This subset mandates only stored or deflate compression formats.
  • Reading and writing the EPUB file format which loosely refers to the previous standard in its definition. These may however be ZIP64 if needed (see below).
  • Reading and writing dozen of others formats that are based on ZIP like .jar, .usdz (mandates no compression), .kmz, etc. Note that these formats do not always formally restrict the compression formats but deflate seems to be widely used.

It is not the aim of Zipc to be able to read every ZIP archive out there. The format is quite loose, highly denormalized, has plenty of ways to encode metadata and allows many modern and legacy compression algorithms to be used. Hence take into account the following points:

  • The current implementation is simple, it needs the whole archive in-memory for encoding or decoding.
  • The current implementation does not preserve the information about the order of files in the ZIP archive and generally writes members in the lexicographic order of their path save for the first one which can be specified with the optional argument first in Zipc.to_binary_string and defaults to "mimetype". This supports the EPUB OCF ZIP container constraint which is the only format we are aware of that mandates an ordering in ZIP archives. A more general scheme (e.g. a Zipc.Member.order property) could be devised would that be needed.
  • It handles only deflate and stored (no compression) compression formats. It has decent performance but if you find yourself limited by it or need other formats, third-party compression libraries can be easily integrated.
  • It is possible to rewrite an archive without touching or decompressing some of its members, however some metadata like comment fields may be lost in the process. See also of_binary_string.
  • For now it does not handle ZIP64. ZIP64 is needed if your ZIP archive or decompressed file sizes exceed 4Go (232-1 bytes) or if you need more than 65535 archive members.
  • It does not handle encrypted ZIP archives. Most standards avoid this anyways.
  • It does not handle multipart archives. Most standards avoid this anyways.
  • On 32-bit platforms one is severly limited by Sys.max_string_size.
  • Compressed and decompressed sizes are uint32 values in Zip archives but are represented by an OCaml int in Zipc. This is not a problem on 64-bit platforms but can be in on 32-bit platforms and js_of_ocaml where Int.max_int is respectively 230-1 and 231-1. See Zipc.File.max_size for more information.