from_file filename creates a project from a provided input source. The reconstruction is a multi-pass process driven by the following input variables, provided by a user:
brancher decides instruction successors;rooter decides function starts;symbolizer decides function names;reconstructor provides algorithm for symtab reconstruction;
The project is built incrementally and iteratively until a fixpoint is reached. The fixpoint is reached when an information stops to flow from the input variables.
The overall algorithm of can depicted with the following diargram, where boxes denote data and ovals denote processes:
+---------+ +---------+ +---------+
| brancher| |code/data| | rooter |
+----+----+ +----+----+ +----+----+
| | |
| v |
| ----------- |
+------>( disasm )<------+
-----+-----
|
v
+----------+ +---------+ +----------+
|symbolizer| | CFG | | reconstr +
+-----+----+ +----+----+ +----+-----+
| | |
| v |
| ----------- |
+------>( reconstr )<------+
-----+-----
|
v
+---------+
| symtab |
+----+----+
|
v
-----------
( lift IR )
-----+-----
|
v
+---------+
| program |
+---------+
The input variables, are represented with stream of values. Basically, they can be viewed as cells, that depends on some input. When input changes, the value is recomputed and passed to the stream. Circular dependencies are allowed, so a rooter may actually depend on the program term. In case of circular dependencies, the above algorithm will be run iteratively, until a fixpoint is reached. A criterium for the fixpoint, is when no data need to be recomputed. And the data must be recomputed when its input is changed or needs to be recomputed.
User provided input can depend on any information, but a good start is the information provided by the Info module. It contains several variables, that are guaranteed to be defined in the process of reconstruction.
For example, let's assume, that a create_source function actually requires a filename as its input, to create a source t, then it can be created as easily as:
Stream.map Input.file ~f:create_source
As a more complex, example let's assume, that a source now requires that both arch and file are known. We can combine two different streams of information with a merge function:
Stream.merge Input.file Input.arch ~f:create_source, where create_source is a function of type: string -> arch -> t.
If the source requires more than two arguments, then a Stream.Variadic, that is a generalization of a merge function can be used. Suppose, that a source of information requires three inputs: filename, architecture and compiler name. Then we first define a list of arguments,
let args = Stream.Variadic.(args Input.arch $Input.file $Compiler.name)
and apply them to our function create_source:
Stream.Variadic.(apply ~f:create_source args.
Sources, specified in the examples above, will call a create_source when all arguments changes. This is an expected behavior for the arch and file variables, since the do not change during the program computation. Mixing constant and non-constant (with respect to a computation) variables is not that easy, but still can be achieved using either and parse combinators. For example, let's assume, that a source requires arch and cfg as its input:
Stream.either Input.arch Input.cfg |>
Stream.parse inputs ~init:nil ~f:(fun create -> function
| First arch -> None, create_source arch
| Second cfg -> Some (create cfg), create)
In the example, we parse the stream that contains either architectures or control flow graphs with a state of type, cfg -> t Or_error.t. Every time an architecture is changed, (i.e., a new project is started), we recreate a our state, by calling the create_source function. Since, we can't proof, that architecture will be decided before the cfg, or decided at all we need to provide an initial nil function. It can return either a bottom value, e.g., let nil _ = Or_error.of_string "expected arch"
or it can just provide an empty information.