Declarative Environments
The ECMAScript5.1 interpreter I have detailed so far is little more than a calculator. Now, using the state monad and references, we will add basic support for environments to the interpreter.
I overview ECMAScript 5.1 environments before discussing Atum’s declarative environment implementation. This covers a lot of code, but I’ve tried to simplify and break code into manageable pieces.
The result is an interpreter with basic environment support, allowing variables to be bound and mutated. Support for scopes is also added, but will not be used until functions are implemented.
Overview
ECMAScript defines two classes of environments. Declarative environments (10.2.1.1) store bindings in an environment record. This is the standard type of environment found in many programming languages, and I’m only going to cover declarative environments in this post. Object environments (10.2.1.2) store bindings on a hosted object. I’ll cover object environments later, as they are used by with
statements and for the global environment.
Environments bind/map identifiers to values. Most programming languages also have some concept of scope, which defines the subset of valid bindings at a given point in a program. Scoping often stores multiple environments in a stack or linked list, with language elements like curly brackets pushing and popping environments onto and off of the stack.
Immutable Binding Environment
In simple languages with only immutable bindings, all bindings can be stored in the current environment. When a new scope is entered, a new environment is pushed onto the environment stack and we copy all of the old bindings into the new environment. Likewise, exiting a scope pops an environment off the stack, restoring the previous environment and all of its bindings.
// Top level environment, block scope JS
// Annotated to show top level env
var a = 1 // [(a, 1)]
var b = 2 // [(b, 2), (a, 1)]
{
var c = 10 // [(c, 10), (b, 2), (a, 1)]
var b = 3 // [(b, 3), (c, 10), (b, 2), (a, 1)]
a = b * c // [(b, 3), (c, 10), (b, 2), (a, 30)]
}
a // 1 // [(b, 2), (a, 1)]
b // 2 // [(b, 2), (a, 1)]
c // undefined // [(b, 2), (a, 1)]
Without additional logic however, this will not work properly with mutable bindings. Since bindings are copied, the a
mutation inside the block would not be seen once the block is exited.
Dynamic Scoping
A refined approach is to maintain a single copy of each unique binding. We still use an environment stack, but instead of coping bindings, environment operations like lookups delegate to lower environments in the stack until finding their target.
An environment operation may change any environment in the stack. Mutating a
for example, skips past the top environment and updates the a
binding in the outer environment.
// Basic Env Stack, block scope JS
// Annotated to show stack
var a = 1 // [[(a, 1)]]
var b = 2 // [[(b, 2), (a, 1)]]
{
var c = 10 // [[(c, 10)], [(b, 2), (a, 1)]]
var b = 3 // [[(b, 3), (c, 10)], [(b, 2), (a, 1)]]
a = b * c // [[[(b, 3), (c, 10)], [(b, 2), (a, 30)]]
}
a // 30 // [(b, 2), (a, 30)]
b // 2 // [(b, 2), (a, 30)]
c // undefined // [(b, 2), (a, 30)]
This mutates a
correctly, but it doesn’t do what we want for function calls.
A function call pushes a new environment onto the stack. And since the stack is used to delegate lookups from higher levels, the calling environment of the function will be visible in the function body. This is a rough version of dynamic scoping.
// Env stack, block scope JS
function f() { return a; };
{
var a = 1;
f() // 1
}
{
var a = 2;
f() // 2
}
Lexical Scoping
A lexical scoping binding is determined by where it appears in the program source, instead of where it is evaluated. The current environment still holds a set of active bindings, but it delegates to the environment in which it was defined. This may or may not be the previous environment in the execution environment stack. Though it does not have block scoping, this is how ECMAScript environments work.
// lexical scope, block scope JS
var a = 10
function f() { return a; };
{
var a = 1;
f() // 10
}
{
a = 2;
f() // 2
}
Each ECMAScript environment holds a reference to a parent/enclosing environment. Multiple chains of environment may exist in a program, creating a spaghetti stack.
Function calls push an environment onto the environment stack, but the new environment delegates to the environment in which the function was defined. We can create closures by allowing functions to capture their defining environment.
Atum Environment Overview
Both declarative and object environments in Atum implement the Environment
interface. Environments hold a set of binding and a reference to a parent/outer environment, creating chains of environments. Environment chains are persistent data structures and, since any environment in a chain may be mutated and a single environment may be a member of multiple chains, references must be used.
ECMAScript identifiers resolve to environment references, which are internal references to values stored in environments. The arithmetic operations must be updated to handle internal references correctly.
The current environment is held in the ECMAScript computation context. A library of computation to query and update environments will be defined. The semantics of ECMAScript will be expressed using the environment operations.
Environment Interface
Declarative and object environments both extend and implement the Environment
interface (source).
var Environment = record.declare(null, [
'outer', // Nullable reference to enclosing environment.
'record']); // Bindings.
Environments are stored using references, so all environment operations must map to computations instead of direct values.
Queries
Environment objects themselves only have methods to query and update their own bindings. The environment operations for chains of environments will be defined later using these methods.
/// Computation returning does this environment have binding for `name`.
Environment.prototype.hasOwnBinding = function(name) { };
/// Computation that returns the bound value for `name`.
Environment.prototype.getBindingValue = function(name) { };
Updates
Mutation operations change an environment. This updates the value stored in memory for the reference to the environment. The ref
parameter is the reference to the environment that the operation is called on.
/// Computation to set a binding for `name` in this environment.
Environment.prototype.setMutableBinding = function(ref, name, value) { };
/// Computation to create a new immutable binding for `name` in the environment.
Environment.prototype.putImmutableBinding = function(ref, name, value) { };
/// Computation to delete the binding for `name` in the environment.
Environment.prototype.deleteBinding = function(ref, name) { };
Declarative Bindings and Environment Records
Declarative environments store their bindings in an environment record object (ECMA 10.2.1). Environment records map (string) identifiers to values (source).
Binding
Atum environment record bindings are stored in a Binding
object. For immutable values like strings and numbers, the binding will be the value object. A bindings to a mutable value, like an object, stores a reference to the object.
var Binding = record.declare(null,[
'value', // Value held.
'mutable']); // Is the environment binding mutable?
Environment Record
Atum uses a hashtrie as its environment record. Hashtries are persistent data structures, with good lookup and update performance.
var emptyEnvironmentRecord = hashtrie.empty;
Query operations:
var getBinding = hashtrie.get;
var hasBinding = hashtrie.has;
var deleteBinding = hashtrie.remove;
var hasMutableBinding = function(name, record) {
return (hasBinding(name, record) && getBinding(name, record).mutable);
};
var getBindingValue = function(name, record) {
return getBinding(name, record).value;
};
Update operations:
var setMutableBinding = function(name, v, record) {
return hashtrie.set(name, Binding.create(v, true), record);
};
var putImmutableBinding = function(name, v, record) {
return hashtrie.set(name, Binding.create(v, false), record);
};
Declarative Environment
DeclarativeEnvironment
holds a set of bindings in an environment record (source).
var DeclarativeEnvironment = record.extend(Environment,
[],
function(outer, record) {
this.outer = outer;
this.record = (record || emptyEnvironmentRecord);
});
Queries
Atum’s declarative environment queries do need to be computations, but queries on object environments do. It is simpler to use a common interface.
DeclarativeEnvironment.prototype.hasBinding = function(name) {
return compute.just(
hasBinding(name, this.record));
};
DeclarativeEnvironment.prototype.getBindingValue = function(name) {
return compute.just(
getBindingValue(name, this.record));
};
Updates
The update operations create new environment records and change the value stored for ref
to a new DeclarativeEnvironment
with the updated record. Both the DeclarativeEnvironment
and environment record are persistent.
DeclarativeEnvironment.prototype.setMutableBinding = function(ref, name, value) {
return ref.setValue(
this.setRecord(
setMutableBinding(name, value, this.record)));
};
DeclarativeEnvironment.prototype.putImmutableBinding = function(ref, name, value) {
return ref.setValue(
this.setRecord(
putImmutableBinding(name, value, this.record)));
};
DeclarativeEnvironment.prototype.deleteBinding = function(ref, name) {
return ref.setValue(
this.setRecord(
deleteBinding(name, this.record)));
};
Creation
Declarative environments are stored in the computation context memory with iref references. All references to environments must go though the iref indirection. Otherwise, you end up in a situation like the immutable binding environment demonstrated before, where an inner environment can not change the value of a binding in an outer environment.
var createDeclativeEnvironment = function(outer) {
return iref.create(
DeclarativeLexicalEnvironment.create(outer));
};
Adding Environments to the State
Now we can start using declarative environments in the interpreter. The DeclarativeEnvironment
operations covered so far only work with a single environment, so a set of operations for working with the entire environment chain will be covered as well.
Execution Context
We added state to the interpreter in a previous post using a ComputeContext
object. But ComputeContext
only holds general computation state.
Atum stores ECMAScript specific state in a ExecutionContext
(ECMA 10.3), stored in the ComputeContext
userData
field. The ExecutionContext
holds all state information required to evaluate ECMAScript source, and the initial execution context will be very basic (source).
var ExecutionContext = record.declare(null, [
'strict', // Is the executing code strict?
'lexicalEnvironment']); // Reference to current environment.
ExecutionContext.empty = ExecutionContext.create(
false,
null);
Changing Execution Environment
The lexicalEnvironment
field holds a reference to the current environment. A small set of operations get and changes the current environment.
var getEnvironment = compute.extract(function(ctx) {
return ctx.lexicalEnvironment;
});
var modifyEnvironment = function(f) {
return compute.modifyContext(function(ctx) {
return ctx.setLexicalEnvironment(f(ctx.lexicalEnvironment));
});
};
var setEnvironment = compose(
modifyEnvironment,
constant);
Internal Reference Operations
Environments in Atum are stored with Iref
. Both Iref
and EnvironmentReference
are internal reference types used to implement the interpreter.
A few helper computations simplify work with Atum internal references (source).
/// Get the value stored for ref.
var getValue = function(ref) {
return (ref instanceof InternalReference ?
ref.getValue() :
compute.just(ref));
};
/// Get the value stored from the result of c.
var getFrom = function(c) {
return compute.bind(c, getValue);
};
dereference
combines bind
and getValue
:
/// Attempt to dereference an internal reference,
/// continuing execution with the result of `f`.
var dereference = function(ref, f) {
return compute.bind(getValue(ref), function(o) {
return f(o, ref);
});
};
/// Attempt to dereference the result of computation `c`.
var dereferenceFrom = function(c, f) {
return compute.bind(c, function(ref) {
return dereference(ref, f);
});
};
Environment References
An EnvironmentReference
is an internal reference to a value held in an environment (source).
During interpretation, when an identifier is encountered, it is not deference to a value immediately. Instead, an EnvironmentReference
is created for the identifier, with the identifier name referencing a binding in the current environment. This binding may not actually exist, but we won’t know this until the EnvironmentReference
is dereferenced.
An unresolvable reference is created when no binding exists for name
in any environment. This creates an EnvironmentReference
without a base
.
var EnvironmentReference = record.declare(new InternalReference, [
'name', // Identifier name
'base', // Internal reference to environment
'strict'], // Is the binding in strict mode?
function(name, base, strict) {
this.name = name + '';
this.base = base;
this.strict = !!strict;
// Can the binding be resolved?
this.isUnresolvable = !base;
});
Lookups
Dereferencing an environment reference attempts to resolve the binding for name
on the base environment.
In strict mode, dereferencing an unresolvable EnvironmentReference
is a runtime error (ECMA Annex C). But regular mode environment reference may be unresolvable, and dereferencing an unresolvable environment reference returns undefined
.
/// Dereference the base environment.
EnvironmentReference.prototype.getBase = function() {
if (this.isUnresolvable)
throw 'Reference error'; // Replaced by hosted throw later
return internal_reference.getValue(this.base);
};
/// Get the value stored for this reference
EnvironmentReference.prototype.getValue = function() {
var name = this.name,
strict = this.strict;
return compute.bind(this.getBase(), function(env) {
return compute.bind(env.hasOwnBinding(name, strict), funciton(has) {
if (has)
return env.getBindingValue(name, strict);
// Regular mode unresolvable reference
return undef.UNDEFINED;
});
});
};
Updates
setValue
updates the binding for name
in environment base
.
Setting a strict unresolvable environment reference is a runtime error. Setting an unresolvable regular mode reference creates new global binding (I’ll cover globals later).
EnvironmentReference.prototype.setValue = function(value) {
if (this.isUnresolvable && this.strict)
throw 'Reference error'; // Replaced by hosted throw later
if (this.isUnresolvable)
throw 'Reference error'; // Replaced by global set later
var name = this.name,
strict = this.strict;
return compute.next(
compute.bind(this.getBase(), function(env) {
return environment.setEnvironmentMutableBinding(
env,
strict,
name,
value);
}),
compute.just(this));
};
Delete is similar to setValue
. Deleting a strict environment reference is always a runtime error, even when the reference is resolvable. Deleting a unresolvable reference in normal mode does nothing.
EnvironmentReference.prototype.deleteReference = function() {
if (this.strict)
throw 'Reference error'; // Replaced by hosted throw later
if (this.isUnresolvable)
return compute.yes;
return environment.deleteEnvironmentBinding(this.base, this.name);
};
Environment Operations
The environment operations operate on chains of environments and bring everything together (source).
Lookups
Environment lookups are used to implement identifiers. A lookup creates an environment reference, while actually resolving name
to a value is handled in EnvironmentReference.prototype.getValue
.
getEnvironmentBinding
starts by checking the top level environment for the reference env
using hasOwnBinding
. If hasOwnBinding
evaluates to true, an environment reference to the environment referenced by env
is created.
Otherwise, the search continues on the outer
environment of env
. If no binding has been found and no more environments remain, an unresolvable reference is returned.
var getEnvironmentBinding = function(env, strict, name) {
return internal_reference.dereference(env, function(lex, ref) {
return compute.bind(lex.hasOwnBinding(name), function(hasBinding) {
if (hasBinding)
return environment_reference.create(name, ref, strict);
// Check outer
if (lex.outer)
return getStrictnessEnvironmentBinding(lex.outer, strict, name);
// unresolvable reference
return environment_reference.create(name, null, strict);
});
});
};
Puts
Puts create a new binding in an environment. This may overwrite an existing binding in the target environment, or hide a binding the outer environments. Put operations are used for variable and function declarations.
var putEnvironmentMutableBinding = function(env, strict, name, value) {
return internal_reference.dereference(env, function(lex, ref) {
return lex.setMutableBinding(ref, name, value);
});
};
Sets
Sets are used by environment references to update an existing binding in an environment.
Like lookups, sets walk the chain of environments until finding the first one with the target binding. But unlike lookups, sets always succeed. If we reach the outermost environment and have not found the target binding yet, we simply create a new binding in it.
var setEnvironmentMutableBinding = function(env, strict, name, value) {
return internal_reference.dereference(env, function(lex, ref) {
// If we have no more environments to check, just create a binding
// in the current environment.
if (!lex.outer)
return putEnvironmentMutableBinding(env, strict, name, value);
return compute.bind(lex.hasOwnBinding(name), function(found) {
if (found)
return lex.setMutableBinding(ref, name, value);
// Check outer
return setEnvironmentMutableBinding(lex.outer, name, value);
});
});
};
Delete
Finally, delete operates much like setEnvironmentMutableBinding
.
var deleteEnvironmentBinding = function(env, name) {
return internal_reference.dereference(env, function(lex, ref) {
// If we have no more environments to check, just delete the binding
// in the current environment. This binding more not actually
// exist.
if (!lex.outer)
return lex.deleteBinding(ref, name);
return compute.bind(lex.hasOwnBinding(name), function(found) {
if (found)
return lex.deleteBinding(ref, name);
// Check outer
return deleteEnvironmentBinding(lex.outer, name);
});
});
};
Environments in the Interpreter
All that remains is to hook up the environment operations to the interpreter (source).
Identifiers
During interpretation, identifiers are mapped to EnvironmentReference
in the current environment using getEnvironmentBinding
(source).
var identifierSemantics = function(name) {
return compute.binary(
execution_context.strict, // get current strictness
getEnvironment, // get current env
function(strict, env) {
return getEnvironmentBinding(env, strict, name);
});
};
Expression Statements and Programs
By supporting identifiers, any expression may evaluate to an internal reference. But we don’t actually want internal references to leak out of the interrupter.
// This code should return `3` when run by the interpreter,
// not an internal reference to `a`.
var a = 3;
a;
Expressions statements must therefore evaluate their expression and deference any resulting internal references to values.
var expressionStatement = internal_reference.getFrom;
Since our programs may contain multiple statements, we also have to add a new production for a program made up of zero of more statements.
var program = function(stmts) {
return foldl(
compute.next,
compute.empty,
stmts);
};
Variable Declarations
The variable declaration is the primary element that creates new bindings (ECMA 12.2). Every variable declarations contains one or more variable declarators (source).
/// Evaluator array of declarators left to right.
var variableDeclaration = function(decls) {
return foldl(
compute.next,
compute.empty,
decls);
};
The ECMAScript variable declaration initialization specification is fairly complex (ECMA 10.5), so I’m going to use simplified logic for now.
When a variable declarator is evaluated, it creates and initializes a new binding in the current environment. Declarators without initial values are initialized to undefined
.
The initializer may evaluate to an internal reference:
// `b` should reference `10`, not the internal reference to `a`.
// Otherwise, changing `a` would change `b` as well.
var a = 10;
var b = a;
All declarators must therefore dereference their initial value.
var variableDeclarator = function(id, init) {
return internal_reference.dereferenceFrom(
init || undefined.UNDEFINED,
function(value) {
return compute.bind(
execution_context.strict, // Current strictness
getEnvironment, // Current environment
function(strict, env) {
return putEnvironmentMutableBinding(
env,
strict,
name,
value);
});
});
});
};
Assignment
Assignment takes a lefthand side computation that evaluates to an internal reference, such as an identifier, and a right hand subexpression for the new value.
Assignment updates the value stored for the LHS, and returns the result of the RHS. It is a runtime error if the LHS is not an internal reference.
/// Error if `ref` is not a reference/
var _dereference = function(ref) {
return compute.bind(ref, function(x) {
if (!(x instanceof internal_reference_value.InternalReference))
throw 'Reference Error'; // Replaced by hosted error later
return compute.just(x);
});
};
var assignment = function(left, right) {
return compute.binary(
_dereference(left),
internal_reference.getFrom(right),
function(l, r) {
return compute.next(
l.setValue(r),
compute.just(r));
});
};
Updating Expressions
Finally, subexpressions like a
in 3 + a
may also evaluate to internal references. None of the lifted operations defined previously handle internal references, so we have to add another level of computations to deference internal references and pass the values to the arithmetic operations.
binaryOperator
takes some binary computation op
and returns a binary computation that deferences the result of two computations resulting in internal references, and forwards the dereferenced values op
var binaryOperator = function(op) {
return function(left, right) {
return compute.binary(
internal_reference.getFrom(left),
internal_reference.getFrom(right),
op);
};
};
Versions of all the arithmetic operations
using binaryOperator
on the lifted operations form the top level mapping of the interpreter.
var subtractOperator = binaryOperator(number.subtract);
var multiplyOperator = binaryOperator(number.multiply);
...
var equalOperator = binaryOperator(compare.equal);
Conclusions
The resulting interpreter supports variable declarations, basic mutations, and lookups. It only uses a single environment, but adding multiple environments and closures will be fairly trivial.
var a = 3;
var b = 4;
a; // 3
a = b * 2;
a; // 8
b; // 4
var a = 6; // rebind a
a; // 8
Running a Program
Running a program requires that an environment be initialized before the program body is evaluated:
/// Running computation `c` with completion `k`.
var evaluate = function(c, k) {
// Set empty environment before running `c`
var prog = compute.next(
compute.bind(
createDeclativeEnvironment(null),
setEnvironment),
c);
return c(
ComputeContext.empty.setUserData(ExecutionContext.empty),
cons(k, NIL));
};
require([
'parse-ecma/lex/lexer',
'parse-ecma/parse/parser'],
function(
parser,
lexer){
var evaluateText = function(input, k) {
var prog = mapSemantics(parser.parse(lexer.lex(input)));
return evaluate(prog, k);
};
})
Next
I’ll start discussing objects and meta objects before moving on to functions.