package ez_search
Library
Module
Module type
Parameter
Class
Class type
This module implements full-text search with regexps in a set of files. Two steps are required: in the first step, a database is generated from all the files; in the second step, searches are performed in the database.
module TYPES : sig ... end
index_directory ~db_dir ?db_name ~select DIRECTORY
index all files in DIRECTORY
, and store the index in db_dir
. Every top-directory in DIRECTORY is considered as a file_entry
name, and file_name
are relative paths within top-directories. select
takes a path in argument and returns true
if the content of the path should be indexed. WARNING: not reentrant. temporary chdir to the directory.
val index_files :
db_dir:string ->
?db_name:string ->
((file_entry:string -> file_name:string -> file_content:string -> unit) ->
unit) ->
unit
index_files ~db_dir ?db_name f
creates an index on disk in directory db_dir
with database name db_name
. f
is called by index_files
with a function that should be called for each file to index with arguments ~file_entry ~file_name
~file_content
.
val load_db :
db_dir:string ->
?db_name:string ->
?use_mapfile:bool ->
unit ->
TYPES.db
load_db ~db_dir ?db_name ?use_mapfile ()
loads the database in memory. use_mapfile
controls whether to use a memory-mapped file or load it normally. Memory-mapped files are normally more efficient, but support may be more unstable.
val count_lines_total : db:TYPES.db -> int
count_lines_total ~db
counts the number of '\n' in the database. Needs some time to iter on the whole text.
val length : db:TYPES.db -> int
length ~db
returns the number of chars in the database.
val search :
db:TYPES.db ->
f:(TYPES.occurrence -> bool) ->
?pos:int ->
?last:TYPES.occurrence_file ->
?len:int ->
(pos:int -> len:int -> string -> int) ->
unit
search ~db ~f ?pos ?last ?len find
searches with find
in the database, starting either from pos
, from after the last occurrence last
, or from the beginning. Calls f
for every occurrence found. f
returns a boolean, that should be true
if the search should continue after, or false
if the search should terminate immediately. len
is the string length to use.
val search_and_count :
db:TYPES.db ->
?is_regexp:bool ->
?is_case_sensitive:bool ->
?ncores:int ->
?maxn:int ->
?find:(pos:int -> len:int -> string -> int) ->
?engine:[ `Re | `Str ] ->
string ->
int * TYPES.occurrence list
search_and_count ~db ?is_regexp ?is_case_sensitive ?ncores
?maxn ?find term
searches term
in the database, either using find
if provided, or a mix of Str
and memmem
otherwise (depending on is_regexp
and is_case_sensitive
). Uses Parmap
to split the computation on multiple cores, with at most ncores
if provided. Returns a very close approximation of the number of occurrences (exact on 1 core), and a list of at least maxn
occurrences.
val occurrence_file : db:TYPES.db -> TYPES.occurrence -> TYPES.occurrence_file
occurrence_file ~db pos
returns the file occurrence of the match.
val occurrence_line : db:TYPES.db -> TYPES.occurrence_file -> int
occurrence_line ~db occ
returns the line number in the file.
val occurrence_context :
db:TYPES.db ->
line:int ->
TYPES.occurrence_file ->
max:int ->
TYPES.occurrence_context
occurrence_context ~db ~line occ ~max
returns the context of the occurrence of in the file. The line
number of the occurrence, as provided by occurrence_line
should be provided. The parameter max
controls how many lines should be returned before and after the occurrence.
val file_content : db:TYPES.db -> TYPES.file -> string
file_content ~db file
returns the content of the file, as retrieved from the database.
val files : db:TYPES.db -> TYPES.file array
files ~db
returns all the files stored in the database.
val pos : TYPES.occurrence -> int
val text : db:TYPES.db -> string