Library
Module
Module type
Parameter
Class
Class type
Representing PDF Files in Memory
A stream is either in memory, or at a position and of a length in an Pdfio.input
.
PDF objects. An object is a tree-like structure containing various things. A PDF file is basically a directed graph of objects.
You should not expect to manipulate these types and functions directly.
type objectdata =
| Parsed of pdfobject
| ParsedAlreadyDecrypted of pdfobject
| ToParse
| ToParseFromObjectStream of (int, int list) Stdlib.Hashtbl.t
* int
* int
* int ->
int list ->
(int * (objectdata Stdlib.ref * int)) list
This type represents a possibly-parsed, possibly-decrypted, possibly-read-from-an-object-stream object.
type pdfobjmap = (pdfobjmap_key, objectdata Stdlib.ref * int) Stdlib.Hashtbl.t
The object map maps object numbers pdfobjmap_key
to a reference to the object data and the generation number
val pdfobjmap_empty : unit -> pdfobjmap
Make an empty object map
val pdfobjmap_find : pdfobjmap_key -> pdfobjmap -> objectdata Stdlib.ref * int
Find an object in the object map
type pdfobjects = {
mutable maxobjnum : int;
mutable parse : (pdfobjmap_key -> pdfobject) option;
mutable pdfobjects : pdfobjmap;
mutable object_stream_ids : (int, int) Stdlib.Hashtbl.t;
}
The objects. Again, you won't normally manipulate this directly. maxobjnum
is the biggest object number seen yet. parse
is a function to parse a non-object stream object given its object number, pdfobjects
is the object map itself. object_stream_ids
is a hash table of (object number, was-stored-in-obect-stream-number) pairs, which is used to reconstruct stream objects when preserving them upon write.
type saved_encryption = {
from_get_encryption_values : Pdfcryptprimitives.encryption
* string
* string
* int32
* string
* string option
* string option;
encrypt_metadata : bool;
perms : string;
}
type deferred_encryption = {
crypt_type : Pdfcryptprimitives.encryption;
file_encryption_key : string option;
obj : int;
gen : int;
key : int array;
keylength : int;
r : int;
}
type t = {
mutable major : int;
mutable minor : int;
mutable root : int;
mutable objects : pdfobjects;
mutable trailerdict : pdfobject;
mutable was_linearized : bool;
mutable saved_encryption : saved_encryption option;
}
A Pdf document. Major and minor version numbers, object number of root, the objects objects and the trailer dictionary as a Dictionary
pdfobject
.
val empty : unit -> t
The empty document (PDF 1.0, no objects, no root, empty trailer dictionary). Note this is not a well-formed PDF.
This exception is raised when some malformity in a PDF is found -- quite a wide range of circumstances, and may be raised from many functions.
val input_pdferror : Pdfio.input -> string -> string
This function, given a Pdfio.input
and an ancilliary string, builds an error string which includes the source of the Pdfio.input (filename, string, bytes etc) so we can trace what it was originally built from
val getstream : pdfobject -> unit
Get a stream from disc if it hasn't already been got. The input is a Stream pdfobject
.
Lookup an object in a document, parsing it if required. Raises Not_found
if the object does not exist.
lookup_fail errtext doc key dict
looks up a key in a PDF dictionary or the dictionary of a PDF stream. Fails with PDFError errtext
if the key is not found. Follows indirect object links.
Same, but with customised exception.
lookup_direct doc key dict
looks up the key, resolving indirections at source and destination, returning an option type.
lookup_immediate key dict
looks up the key returning the value, without following indirects at either source or destination.
lookup_chain doc start keys
looks up the key in a nested dictionary. For example lookup_chain pdf pdf.Pdf.trailerdict ["/Root"; "/StructTreeRoot";
"/RoleMap"]
Return the object number of an indirect dictionary object, if it is indirect.
Same as lookup_direct
, but allow a second, alternative key.
replace_dict_entry dict key value
replaces a dictionary entry, raising Not_found
if it's not there.
add_dict_entry dict key value
adds a dictionary entry, replacing if already there.
Make a PDF object direct -- that is, follow any indirect links.
val objcard : t -> int
Return the size of the object map.
val removeobj : t -> int -> unit
Remove the given object
Parse a PDF rectangle structure into min x, min y, max x, max y.
val parse_matrix : t -> string -> pdfobject -> Pdftransform.transform_matrix
Calling parse_matrix pdf name dict
parses a PDF matrix found under key name
in dictionary dict
into a Transform.transform_matrix
. If there is no matrix, the identity matrix is returned.
val make_matrix : Pdftransform.transform_matrix -> pdfobject
Build a matrix pdfobject
.
Make a number of PDF documents contain no mutual object numbers. They can then be merged etc. without clashes.
val unique_key : string -> pdfobject -> string
Given a dictionary and a prefix (e.g gs), return a name, starting with the prefix, which is not already in the dictionary (e.g /gs0).
Iterate over the objects in a document. The iterating functions recieves both object number and object from the object map.
Iterate over the objects in a document. The iterating functions recieves object number, generation number and object from the object map.
Map over all pdf objects in a document. Does not include trailer dictionary.
Iterate over just the stream objects in a document.
val remove_unreferenced : t -> unit
Garbage-collect a pdf document.
These functions were previsouly undocumented. They are documented here for now, and in the future will be categorised more sensibly.
val page_reference_numbers : t -> int list
List, in order, the page reference numbers of a PDF's page tree.
val objnumbers : t -> int list
List the object numbers in a PDF.
Use the given function on each element of a PDF dictionary.
Similarly for an Array
. The function is applied to each element.
val changes : t -> (int, int) Stdlib.Hashtbl.t
Calculate the changes required to renumber a PDF's objects 1..n.
Renumber an object given a change table.
val bigarray_of_stream : pdfobject -> Pdfio.bytes
Fetch a stream, if necessary, and return its contents (with no processing).
val objects_of_list :
(int -> pdfobject) option ->
(int * (objectdata Stdlib.ref * int)) list ->
pdfobjects
Make a objects entry from a parser and a list of (number, object) pairs.
Calling objects_referenced no_follow_entries no_follow_contains pdf
pdfobject
find the objects reachable from the given object. Dictionary keys in no_follow_entries
are not explored. Dictionaries containing entries in no_follow_contains
are not explored.
Generate and ID for a PDF document given its prospective file name (and using the current date and time). If the file name is blank, the ID is still likely to be unique, being based on date and time only. If environment variable CAMLPDF_REPRODUCIBLE_IDS=true is set, the ID will instead be set to a standard value.
val find_indirect : string -> pdfobject -> int option
Find the indirect reference given by the value associated with a key in a dictionary.
Calling nametree_lookup pdf k dict
looks up the name in the document's name tree
Return an ordered list of the key-value pairs in a given name tree.
val change_id : t -> string -> unit
Change the /ID string in a PDF's trailer dicfionary