package yuujinchou

  1. Overview
  2. Docs

Design Principles of Name Modifiers for Hierarchical Names

Here are the most important principles for name modifiers.

Expressiveness

The modifier language should be able to approximate all features in popular programming languages. Language-specific features (such as phases in Racket) can be implemented using the modifier hook.

Explicit Sequencing

One common mistake in other languages is the lack of explicit ordering between effectful operations such as hiding (except), renaming (renaming), and selecting (only). The sequencing operator must be explicit, and we have seq for this purpose.

Implicit Namespaces

We believe hierarchical names are essential in organizing top-level definitions, and perhaps even local definitions and record field names as well. Moreover, it should facilicate API patching: often, a minor revision of the same API only introduces more bindings, and support of older versions can be done by supplying missing functions. For example, if the function is_prefix is newly introduced in the string namespace, the client can supply its own is_prefix to work with both the current and earlier versions. To make this possible, it should be easy to inject bindings into a namespace. If multiple parts of the client are patching the same API, it should also be easy to take the union of all the patches. Many of our decisions are guided by making API patching as easy as possible.

Contrary to most designs, the namespaces in this library are implicit. The "namespace" a is simply a collection of bindings which happen to share the prefix a, or equivalently, the subtree rooted at a. The two names a and a.x can coexist and they are independent of each other. This might be our most surprising move to trivialize API patching. We might adjust this principle in the future, but so far it seems to be okay.

This principle has profound impact on the implementation---in particular, the union operation is effectively recursive. Consider the following pseudocode:

section a {
  def x := 1
}
section b {
  section a {
    def y := 2
  }
}
modify_visible [union [all; renaming ["b"] []]]

Both a.x and a.y will be available after the modification of the visible namespace because b.a and a will be recursively merge. In contrary, the binding a.x would be shadowed if it were OCaml code:

module A =
  struct
    let x = 1
  end
module B =
  struct
    module A =
      struct
        let y = 2
      end
  end
open A (* [A.x] is gone *)

The reason is that the previous pseudocode is equivalent to the following:

def a.x := 1
def b.a.y := 2
modify_visible [union [all; renaming ["b"] []]]

where the union of the original namespace (via all), which contains a.x and b.a.y, and the output of renaming, which contains a.y, results into a namespace with a.x, a.y, and b.a.y. The binding a.y does not hide a.x because there is no explicit namespace a to hide a.x. In comparison, there was an OCaml module A to shadow another module A and its member A.x. This radically different treatment makes it easy to patch a deeply nested namespace with one union operator.

To recover efficient group operations on implicit namespaces as one might expect for explicit namespaces, all modifiers in Language work on subtrees instead of individual bindings. For example, only ["x"] will select all bindings with the prefix x, not just the binding x. Therefore, in most cases, one can pretend that namespaces exist. (We tried providing modifiers for both subtrees and individual bindings, but the ones for individual bindings seem useless.)

Guards against Typos

No No-Ops

Hiding, renaming, selecting, and excluding modifiers must match at least one existing binding. For example, if there is no binding with the prefix x, then selecting the namespace rooted at x should fail or at least produce a warning. The reason is that the lack of matching bindings strongly indicates a typo. In this library, such an error will trigger the effect not_found.

As a consequence of this principle, a modifier cannot be understood as a function from an old name to a set of new names, but a function from an old namespace (implemented as a trie in this library) to a new namespace. Otherwise, it is difficult or impossible to detect whether there are unexpected no-ops.

Explicit Unions

Arbitrary unions should be allowed for easy API patching, but they should not happen without using import, include, section, or a dedicated operator. We have union dedicated to the union operation.

One interesting case is renaming. According to this principle, the renaming from x to y should either (1) drop all bindings with the prefix y or (2) simply err when there exists at least one binding with the prefix y. It is tempting to err for rigorous programming; to facilitate API patching, however, we silently drop bindings with the prefix y so that users do not have to worry about existing bindings in the newer version of the library. In the event that there is accidental dropping due to renaming, it is likely that the expected bindings will be missing and the code will not scope check, and thus the user is still protected.

Conciseness

No Export Control

In many designs, the name modifiers are coupled with export control (public v.s. private). We believe these two concerns can be completely separated. As a proof, we have built Scope that handles visible and export namespaces, on top of the modifier engine Modifier that is oblivious to how it's used.

A Reasonably Small Core

The core language should be relatively small, possibly exploiting the compositionality to remove redundant constructors. For example, the core language does not have none p because it can be defined as the sequencing of the emptiness checking and the union of nothing. This, however, does not mean the number of constructors should be theoretically minimal: In earlier versions of yuujinchou, we have a "split" modifier that applies one modifier to a subtree and another to all bindings outside the subtree, but such a "mega-modifier" only complicates the core language and the engine implementation. To the extreme, one can have one single constructor in the core language to handle all features, but that would not be helpful.

The current design has six constructors in the core languages:

  1. emptiness checking.
  2. scoping: in_.
  3. renaming: renaming but without emptiness checking.
  4. sequencing: seq.
  5. union: union.
  6. hooks: hook.

If not for the emptiness checking, we could have just five constructors. One might consider merging emptiness checking with renaming and implementing only p as seq [renaming p []; renaming [] p], but this would perform two effects when the subtree at p is empty, and moreover the second effect will have the wrong path. (That is, for example only a.b would warn about the emptiness of both the subtree at a.b and the entire tree.) Therefore, the emptiness checking is separated from the renaming in the core language.

Algebraic Effects

We believe algebraic effects are the future!

Side Notes on Modules

There are fundamental differences between ML-style modules (records) and namespaces: A module (or a record) is typed, in the sense that there will be a single type assigned to it. The typedness enables considering an abstract module of a module type. Signatures and functors in Standard ML and OCaml follow this approach. On the other hand, a namespace is untyped, which forbids the notion of abstract namespaces but admits flexible operations. There has been work on supporting flexible operations on modules (e.g., see the ICFP paper An expressive language of signatures by Norman Ramsey, Kathleen Fisher, and Paul Govereau), but we are not obliged to maintain valid signatures for namespaces and can derive a much simpler design. In the context of proof assistants, one should also recognize that top-level definitions might not have a type in an internalized universe anyway. Instead, having separate notions of namespaces and modules/records, as in C++ and our proof assistants, could lead to a better design.

OCaml

Innovation. Community. Security.