package camlpdf
Install
dune-project
Dependency
Authors
Maintainers
Sources
md5=5ec4c14006769e68be97a3ed70d46bc7
sha512=2480a282a6ce09444ed14a3f41561375ecf7e3c57f7559a48ad9791d9f824e59820bfd39aa36910ff6bdc9b160cef76ab90dbbfe628c836c3f5c5081dfd5e452
doc/camlpdf/Pdftext/index.html
Module Pdftext
Parsing fonts and extracting text from content streams and PDF strings
Data Types
type type3_glpyhs = {fontbbox : float * float * float * float;fontmatrix : Pdftransform.transform_matrix;charprocs : (string * Pdf.pdfobject) list;type3_resources : Pdf.pdfobject;
}type fontdescriptor = {ascent : float;descent : float;avgwidth : float;maxwidth : float;flags : int;fontbbox : float * float * float * float;italicangle : float;capheight : float;xheight : float;stemv : float;fontfile : fontfile option;charset : string list option;tounicode : (int, string) Hashtbl.t option;
}type encoding = | ImplicitInFontFile| StandardEncoding| MacRomanEncoding| WinAnsiEncoding| MacExpertEncoding| CustomEncoding of encoding * differences| FillUndefinedWithStandard of encoding
type simple_font = {fonttype : simple_fonttype;basefont : string;firstchar : int;lastchar : int;widths : int array;fontdescriptor : fontdescriptor option;fontmetrics : fontmetrics option;encoding : encoding;
}type composite_CIDfont = {cid_system_info : cid_system_info;cid_basefont : string;cid_fontdescriptor : fontdescriptor;cid_widths : (int * float) list;cid_default_width : int;
}type font = | StandardFont of standard_font * encoding| SimpleFont of simple_font| CIDKeyedFont of string * composite_CIDfont * cmap_encoding
String representations of fonts
val string_of_standard_font : standard_font -> stringReturns a string such as "Times-Bold" for Pdftext.TimesBold etc.
val standard_font_of_name : string -> standard_font optionParses a string such as "/Times-Bold" or "/TimesNewRoman,Bold" to Pdftext.TimesRomanBold etc.
val string_of_font : font -> stringA debug string for the whole font datatype.
Reading a Font
val read_font : Pdf.t -> Pdf.pdfobject -> fontRead a font from a given document and object
Writing a Font
Write a font to a given document, returning the object number for the main font dictionary
Utility functions
Is a PDF string UTF16be (i.e does it have a byte order marker at the beginning)?
val is_identity_h : font -> boolIs a font Identity H?
A UTF16BE string for a list of unicode codepoints (with BOM)
Text from strings outside page content
Take a pdf string (which will be either pdfdocencoding or UTF16BE) and return a string representing the same unicode codepoints in UTF8
Take a UTF8 string and convert to pdfdocencoding (if no unicode-only characters are used) or UTF16BE (if they are))
Build a pdf string in pdfdocencoding (if no unicode-only characters are used) or UTF16BE (if they are)
Produce a list of unicode codepoints from a pdfdocencoding or UTF16BE pdf document string
Remake a UTF16BE string into a PDFDocEncoding string if all characters are in PDFDocEncoding
Text from strings inside page content
val text_extractor_of_font : Pdf.t -> Pdf.pdfobject -> text_extractorBuild a text extractor from a document and font object
val text_extractor_of_font_real : font -> text_extractorBuild a text extractor from a document and a font
val codepoints_of_text : text_extractor -> string -> int listReturn a list of unicode points from a given extractor and string (for example from a Pdfpages.Op_Tj or Op_TJ operator).
val glyphnames_of_text : text_extractor -> string -> string listReturn a list of glyph names from a given extractor and string
Building text for strings inside page content
val charcode_extractor_of_font :
?debug:bool ->
Pdf.t ->
Pdf.pdfobject ->
int ->
int optionReturn the character code for a given unicode codepoint, if it exists in the encoding and font object. If debug is set (default false) missing characters are reported to stderr.
val charcode_extractor_of_font_real : ?debug:bool -> font -> int -> int optionReturn the character code for a given unicode codepoint, if it exists in the encoding and font. If debug is set (default false) missing characters are reported to stderr.
Reverse table of all the entries in an encoding.
val parse_tounicode : Pdf.t -> Pdf.pdfobject -> (int * string) listParse a /ToUnicode entry.