Library
Module
Module type
Parameter
Class
Class type
The entry point to BAP.
This module is an entry point to BAP and serves the two goals:
BAP is designed to be friendly and act as a library, so that it can be seamlessly embedded into user applications, in cases when the main frontend of BAP, the bap
utility, couldn't suffice the user requirements. Being a guest, BAP will act respectfully to its host and won't interfere with system utilities, unless allowed to, e.g., it won't terminate the program, hijack the control flow, spam into channels and, in general, will keep quiet and minimize possible side-effects.
Embedding is achieved by a simple call to the Bap_main.init ()
procedure which takes a few optional arguments. By default it will just initialize plugins, peeking configuration from the predefined locations and environment variables (which in turn, could be also specified). If command line arguments are passed, the init
procedure will evaluate them. See the bap
utility for the description of the command line interface and semantics of command line arguments.
Sine BAP is relying on dynamic loading, for correct behavior the host program shall provide information to the dynamic loader about the units that are linked into the host program. Failure to do so may result in an undefined behavior with the segmentation fault being the most favorable outcome.
This requirement could be achieved by using the ocamlfind
tool to build the host program and specifying -package findlib.dynload
in the linking command 1
.
Alternatively, if dune
is used, then adding findlib.dynload
to the libraries dependencies of the host application (e.g., (libraries findlib.dynload)
should also work) 2
.
Finally, if neither of approaches suffice, the dependencies could be manually set using the Findlib.record_package
function.
1
: http://projects.camlcity.org/projects/dl/findlib-1.5.6/doc/ref-html/lib/Fl_dynload.html 2
: https://jbuilder.readthedocs.io/en/latest/advanced-topics.html#dynamic-loading-of-packages
It is much more common and recommended to use the bap
utility to initialize and run BAP. The user code could be injected in predefined extension points, and will be called by the system with all the necessary input parameters. This approach minimizes the amount of the boilerplate that has to be written and lets an analyst to inject its analysis in the right place of a pipeline.
There are plenty of extension points in BAP, too many to mention here, but writing a disassembling pass would be a good example. Using the Project.register_pass
function an analyst can get straight to the point and apply its analysis as a transformation to the project
data structure without being obligated to create this structure on the first hand, thus relinquishing to the BAP framework the responsibility of parsing the command line arguments, selecting proper loaders and disassembler parameters.
This approach also establishes a unified interface to BAP making the whole system easier to use and understand.
A plugin is compiled and packed code that could be loaded in runtime. A plugin is a bundle that in addition to the machine and byte code of the extension itself, contains the meta information that describes plugin properties, requirements, and provided features. It may also optionally contain the code for dependencies, which leverages plugin portability, so that it can be loaded when the development environment is no longer available. (Note, by default all dependencies except the bap library itself and core_kernel) are packed into the plugin.
The bapbuild
tool is used to build plugins from OCaml source code. The bapbundle
tool could be used to deploy the plugin into a place where it will be automatically loaded by the framework. In short, to build and deploy a plugin with OCaml code located in a file named example.ml
execute the following two commands:
1. bapbuild example.plugin
2. bapbundle install example.plugin
The bapbuild
tool will scan the dependencies of the example.ml
file and build them automatically if they are present in the current directory, e.g., if example.ml
references the Analysis
module and analysis.ml
is present in the current folder, then it will be automatically built and linked into the final product. A dependency on an external package could be specified using the -pkg
and -pkgs
option (the latter accepts a comma separated list of dependencies). Underneath the hood, bapbuild
is the standard OCaml ocamlbuild
tool extended with a few rules that are necessary to build and pack plugins.
The bapbuild
tool has its limitations, for example, only one plugin per folder could be built. When the source base grows very big it is becoming hard to manage it with bapbuild
, so using some configuration system is advised, e.g., OASIS or dune. A plugin, then, could be built as a normal OCaml library and later packed with bapbuild
.
After the plugin is deployed to the place where it could be found by BAP, it will be loaded every time the Bap_main.init
function is called. All toplevel expressions of all modules constituting the plugin will be evaluated, however, a well-behaving plugin shall not evaluate any side-effectful expressions except those that are provided by the Extension
module.
The Extension
module let the extension to
1) declare configuration parameters; 2) declare command line arguments; 3) declare an extension; 4) declare a command; 5) specify meta attributes such as documentation, features, and requirements.
When an extension is enabled by the framework (see the Features section which describes the process of selection), it will be evaluated with the context, capturing the computation environment, passed to it as function argument.
Commands are special kinds of extensions which stand aside because the play the role of the main
function in BAP, i.e., a command is an OCaml function which will be evaluated as the main function, when BAP is run.
Commands can have their own command line arguments, which are then reified into OCaml values and passed to the specified function as arguments.
BAP employs a system of simple semantic tags to denote required and provided capabilities of its various components. This system facilitates fine granular selection of components that are required for an application to satisfy it needs.
Both the main system and its extensions may explicitly state the set of features that they provide or expect, as well as the set of requirements that they require or implement.
Both, features and requirements are intentionally denoted with string tags with no specific requirements.
A feature is a high-level description of an application and its environment. It is used to describe to the extensions what the application is doing and what should be expected by an extension.
Features are specified by the application via the init
function. An extension may define a specific set of features that it expects to be present and won't be loaded by application which do not specify the expected features.
Some general examples of features are user-interface
, interactive
, toplevel
.
Another common use case of features is denoting an tag that is specific to the given application or an organization, e.g., my-verification-framework
or cmu.edu
, and specify them in plugins to ensure that they are loaded only in the specified environments, but not in more general.
The more features an application specifies the more general it is, i.e., more extensions will be available for it. The more features an extension specifies, the less general it is, i.e., it could be used in fewer applications.
The list of features known to the bap
utility, could be obtained by using the bap list features
command.
The requirements are more fine granular descriptions of system capabilities that are used to define system dependencies without relying on concrete implementations. For example, if an application needs to parse ELF files it may explicitly define this dependency by adding the elf
tag to the list of its requirements.
By using requirements in this manner it is possible to build an application that loads some minimal set of dependencies.
Requirements are also playing an important role in the caching subsystem and in general leverage reproducibility of BAP applications by enabling pure functional relationships between BAP components.
Every BAP extension is evaluated in the context, which is captured by a value of type ctxt
that is passed to each extension function. The context is an immutable value that fully describes the set of configuration parameters, command line arguments, and other descriptors of the environment in which the BAP subsystem is evaluating.
It is possible to reduce the context into its cryptographic digest, which, in turn, could be used as key in some persistent storage, which, useful for implementing caching. However, computing a digest of the whole context could be overconservative, since it may also capture variables that are irrelevant to a given computation. For that reason, we provide a mechanism to refine the context by specifying a set of tags that relate the computation to the environment.
For example, the disassembler command, provided by the disassemble.plugin
depends on a predefined set of features provided by different plugins, namely, disassembler
, lifter
, symbolizer
, rooter
, reconstructor
, brancher
, and loader
^1
. Therefore, it depends on extensions that provide those features, and when parameters of those extensions change, it is reflected by the context refinement that the disassemble plugin is using to compute the key for storing the disassembled program in the cache storage.
In other words, it is important to specify explicitly features of your extensions, to ensure that any change in their configuration is reflected and propagated to the components that may depend on your extension.
Use the bap list tags
command to list all semantics tags and plugins that provide them.
^1
: The list is not definitive and may change, consult the plugin documentation for the exhaustive and up-to-date list.
The Bap_main
library provides a few functions that could be used to create composable command line interfaces. The final grammar specification is build from pieces and is having the following EBNF definition:
G = | "bap", common-options | "bap" "<command1>", command1-grammar, common-options .. | "bap" "<commandN>", commandN-grammar, common-options common-options = | "" | {"-L", [=], string} | {"--load-path", [=], string} | {"--plugin-path", [=], string} | ["--log-dir", [=], string | "--log-dir", [=], string] | "--recipe", recipe-grammar | "--version" | "--help", [[=], help-format] | "--help-<plugin1>", [[=], help-format] ... | "--help-<pluginN>", [[=], help-format] | "--<plugin1>" ... | "--<pluginN>" | "--no-<plugin1>" ... | "--no-<pluginN>" | plugin1-grammar ... | pluginN-grammar
Each command can define its own syntax and use the full power of the command line (including positional arguments and short keys) as long as it doesn't introduce conflicts with the common-options
grammar.
The common-options
grammar defines the syntax that is used to specify plugin configuration parameters. Each plugin can register its own parameters, but in a restricted way, e.g., no positional arguments, all parameter names must be long and will be automatically prefixed with the plugin name. Plugins configuration parameters form the configuration context for each invocation of BAP. These parameters also do not need an access to the command line, and could be specified via configuration files, environment, etc.
A couple of predefined rules are added to the common-options
grammar. First of all, for each registered <plugin>
the "--no-<plugin>"
option is added, which if specified, will disable the plugin. A disabled plugin will still contribute to the command line grammar, but the extensions which are registered with it will not be loaded. Unless the extension is the command itself, which will be still evaluated if selected on the command line.
Also, for each registered <plugin>
an option --<plugin>
will be added to enable the backward compatibility with the old style of specifying passes.
Another rule which is added on per plugin basis, is the --help-<plugin>
rule which will render a manual page for the given <plugin>
.
The -L
and --logdir
options are preparsed on the command line and are used to specify the plugins search path (which obviously should be specified before we can load plugins) and the logging destination which we would like to know as soon as possible.
The --recipe
option is very special, as it changes the command line itself. Every occurrence of the --recipe
option will parse the provided recipe, which will be evaluated to the list of arguments which will be substituted instead of the specified --recipe
option. See bap recipes
for more information about the recipes.
Finally, the common --version
and --help
options are added with an expected semantics.
For the detailed description of the command line interface read the manual page generated with bap --help
.
Note, the actual parser is less strict than the grammar and may accepts inputs that are not recognized by the grammar.
val init :
?features:string list ->
?requires:string list ->
?library:string list ->
?argv:string array ->
?env:(string -> string option) ->
?log:[ `Formatter of Stdlib.Format.formatter | `Dir of string ] ->
?out:Stdlib.Format.formatter ->
?err:Stdlib.Format.formatter ->
?man:string ->
?name:string ->
?version:string ->
?default:(ctxt -> (unit, error) Stdlib.result) ->
unit ->
(unit, error) Stdlib.result
init ()
initializes the BAP framework.
Attention: function is only needed when BAP framework is embedded in another application. It shall not be called in BAP plugins. If you're not sure whether you need to call it, then don't call it.
The init ()
expression evaluates to Ok ()
if the system initialization terminated normally and is fully complete. It returns Error condition
in case if the evaluation terminated abnormally with the condition
value that describes the reasons and consequences of this abnormal termination (note, despite the name, it is not always an error, e.g., a user may have requested the help message, using the --help command).
If init ()
terminates with any value other that Ok ()
the BAP framework is considered to be unitialized and shouldn't be used.
The initialization procedure uses the provided parameters to evaluate command line and environment arguments, loads the requested plugins and dispatches commands if any are requested through the command line.
This function could be invoked only once per lifetime of a process, and consecutive
@parameter features, if specified, denotes a set of features of an application that extensions can expect. Extensions that require a feature which is not in the features
provided by the application will not be evaluated.
@parameter requires if specified then only those extensions that provide at least one feature in requires
will be evaluated.
@parameter library specifies a list of folders that will be prepended to the plugins search paths list (which already contains some precompiled location and the value of the BAP_PLUGIN_PATH environment variable, which in turn could also be a list).
@parameter argv is the array of command line arguments, with the first value being the program name. If not specified, then it defaults to [|Sys.progname|]
, i.e., no command line arguments will be evaluated. If you want to let init
process the command line passed to the process, use the Sys.argv
variable.
@parameter env, if specified, then this function will be used to access environment variables. Otherwise, the environment variables are looked up using the Sys.getenv
function.
@parameter log, if specified, then the specified location will be used for logging. If `Formatter ppf
is passed then all log messages will be printed into ppf
(every message is flushed). If `Dir path
is passed, then all log messages will be printed in the Filename.concat path "log"
file. If such file exists, then it will be renamed to "log~1", if "log~1", in turn, exists, it will will be renamed to "log~2" and so on, until "log~99" is reached, which will be discarded. If the log
parameter is not specified, then the logging will be performed in a directory which name is obtained either from the command line (via the --logdir
parameter) or from the environment (using the BAP_LOG_DIR
variable). If neither is present then the logging will be performed into a directory prescribed by the XDG standard for the application - i.e., to the `$XDG_STATE_HOME/bap`, where the environment variable XDG_STATE_HOME
defaults to $HOME/.local/state
. If, for some reason, it wasn't possible to create a log file, then logging will fallback to the stderr
channel. Note, a usual log rotating routine will be applied in the log directory, as described above.
@parameter out if specified, then this channel will be used to report help and other informational messages, if such are requested through command line.
@parameter err if specified, then this channel will be used to report error and other diagnostic messages in case of configuration problems. Nothing will be printed in this channel if the initialization procedure went normally (and evaluated to Ok ()
).
@parameter man is the manual describing the purposes and basic usage of the utility in which bap is embedded. It is useful if the host program is going to use BAP command line parsing facilities, so it will be rendered when the --help
option is specified. A simple markdown syntax is understood, i.e., paragraphs, section headers, itemized lists, and verbatim code sections.
@parameter name is used as the name of the process. If not specified, then Sys.progname
is used.
@parameter version defaults to the BAP Framework version.
@parameter default, if specified, then this function will be invoked when no command was specified in the command line.
module Extension : sig ... end
Writing and declaring BAP extensions.