package ppx_nanocaml

  1. Overview
  2. Docs
Framework for writing nanopass-style compilers

Install

Dune Dependency

Authors

Maintainers

Sources

0.1.0.tar.gz
md5=38877f4878af035303e02b3c9a399e98

README.md.html

nanocaml

nanocaml is an implementation of nanopass for OCaml using a PPX preprocessor.

What

Nanopass compilers are compilers that utilize many tiny "passes". Most modern compilers are built with a handful of passes, but nanopass compilers take this to the extreme.

The Nanopass Framework is a toolkit available for Scheme/Racket as a domain specific language aimed at writing nanopass compilers. This framework automates much of the boilerplate involved in writing the various ASTs and functions involved in a compiler's passes. nanocaml makes all of this available in OCaml.

Why

Nanopass compilers have proven to be extremely easy to design and extend. Being that OCaml is already a great language for writing compilers (with tools like OCamllex/OCamlyacc built-in to the standard distribution), nanocaml is just the cherry on top for an already great set of tools for creating compilers.

Compared to other implementations of the Nanopass Framework, nanocaml allows compiler writers to utilize OCaml's static type system in order to typecheck passes without any testing. This increases the safety net for the programmer and enables a quicker workflow by eliminating the need to run the compiler and its test suite whenever incremental changes are made.

How

nanocaml is designed as a simple OCaml preprocessor that takes advantage of many existing features in the language. A nanocaml language is defined using a module with the %language extension point, an entry type (the type of the whole AST) and a set of AST types composed of polymorphic variants. Passes are written using let%pass and return a record with the relevant AST processors (e.g. expr : L0.expr -> L1.expr). For examples of nanocaml compilers, see the examples/ directory.

Example/Tutorial

Part I: Defining your base language

The first step of writing a Nanopass compiler is creating the base language to extend off of. Let's say we are implementing a compiler that transforms a Scheme-like language into the Lambda Calculus. We should define our base language as an AST for Scheme.

module%language Scheme0 = struct
  type expr = (* Define all forms of expressions *)
    [ `Var of string
    (* x *)
    | `Let of string * expr * expr
    (* (let (x y) e) *)
    | `Lambda of string * expr
    (* (lambda (x) e) *)
    | `Apply of expr * expr
    (* (f x) *)
    | `Begin of expr list
    (* (begin e ...) *)
    ]
end

The module represents the language as a whole, while the expr type is the non-terminal <expr>. The right hand side is made up of all the productions for <expr> in the Scheme0 language. Each production is separated by a bar, and is given a constructor name + an optional stored type. Other examples include:

type my_random_nonterminal =
  [ `Unit (* No stored type *)
  | `Int of int
  | `Rec of my_other_nonterminal
  ]
and my_other_nonterminal =
  [ `X of my_random_nonterminal (* Mutual recursion *)
  ]

Part II: Extending your language

After a first language has been established, we must define the language to transform it to. We use the add and del fields to either add or delete productions -- in order to alter an existing production, you must drop the old one and then implement a new production.

module%language SchemeNoLet = struct
  include Scheme0 (* Means that SchemeNoLet extends Scheme0 *)
  type expr =
    { del : [ `Let of string * expr * expr (* Must perfectly match the existing production rule *)
            ]
    } (* Nothing to add *)
end

This new language is just like Scheme0, but it ditches the let production. That means that in order to translate from Scheme0 to SchemeNoLet we need to write a pass that removes the lets.

Part III: Writing your pass

With an input and output language now declared, we can write the first pass. This pass will remove the let expressions and replace them with an equivalent lambda expression. The rest of the pass is autogenerated, so we only need to write the rule for removing the let.

The rule is shown below in pseudo-code:

(let (x y) e) => ((lambda (x) e) y)

Now for the Nanocaml version:

let[@pass Scheme0 => SchemeNoLet (* Declare the type of the pass*)] remove_let =
  [%passes (* Start writing the passes over each non-terminal. In this case, we only have [expr] *)
    let[@entry] rec expr = function (* [@entry] means that the pass function will take an entry and recurse from there *)
      | `Let (id, value [@r] (* [@r] = Recursively apply this pass *), body [@r]) -> (* Matches the Scheme0 let expression *)
        `Apply (`Lambda (id, body), value) (* Must be a SchemeNoLet-compatible AST *)
  ]

Part IV: Using your pass

For the sake of example, I will include a demonstration of applying the pass to a program.

let input_program = (* (let (f (lambda (x) x)) (f f)) *)
  `Let ("f",
    `Lambda ("x", `Var "x"),
    `Apply (`Var "f", `Var "f"))

let expected_result = (* ((lambda (f) (f f)) (lambda (x) x)) *)
  `Apply (`Lambda ("f", `Apply (`Var "f", `Var "f")),
          `Lambda ("x", `Var "x"))

let () =
  let real_result = remove_let input_program in (* Apply the transformation to a Scheme0-compatible AST *)
  if real_result = expected_result
    then print_endline "Test passed!"
    else print_endline "Test failed!"