Jake Donham > Technical Difficulties > Reconstructing TypeScript, part 1
Reconstructing TypeScript, part 1: bidirectional type checking
2021-09-15
This post is part of a series about implementing type checking for a TypeScript-like language. In the last post I gave some background about type checking, TypeScript's type system, and the approach we'll use, called bidirectional type checking.
Now let's write a type checker! We're going to start with a tiny fragment of the language, and build it up incrementally in subsequent posts. In this fragment we'll handle only primitive literals like
7
"TypeScript is cool."
false
object literals like
{ x: 7, y: 9 }
{ foo: "bar", baz: false }
and member expressions like
{ x: 7, y: 9 }.x
(We don't have variables yet, so member expressions look a little weird!)
We'll also handle type ascriptions like
{ x: 7, y: "foo" } as { x: number, y: number }
In actual TypeScript, as
is an unsafe cast, a way to tell the type checker, "trust me, this expression has this type!" Here it is a way to guide the type checker (more on this below), but it's not unsafe—the type checker still ensures that the program doesn't attempt any unsupported operations.
Parsing code into an abstract syntax tree
Recall from part 0 that type checking works on an abstract syntax tree representation of code, where each expression in a program is a node in the tree, with its subexpressions as children. So first we need to parse code strings to ASTs. We use @babel/parser with the TypeScript plugin (see parse.ts):
function parseExpression(input: string): Expression {
return Babel.parseExpression(input, {
plugins: [ 'typescript' ]
});
}
which produces a value of type Expression
from @babel/types. For example, if we call parseExpression
on an object expression
{ x: 7, y: 9 }
we get an ObjectExpression
AST with ObjectProperty
ASTs as children:
{
type: "ObjectExpression",
properties: [
{
type: "ObjectProperty",
key: { type: "Identifier", name: "x" },
value: { type: "NumericLiteral", value: 7 }
},
{
type: "ObjectProperty",
key: { type: "Identifier", name: "y" },
value: { type: "NumericLiteral", value: 9 }
}
]
}
Each node has a type
property saying what kind of syntax it represents, and each kind of node has kind-specific properties for subexpressions and other attributes. (I've trimmed out properties that describe the location of each AST node in the original code string.)
To keep things simple, our type checker will handle only expressions, like:
"TypeScript is cool"
{ x: 7, y: 9 }
x > 0 ? x : -x
not statements, like:
const x = 7
for (let i = 0; i < 10; i++) { ... }
throw new Error("bad thing")
so we only need to parse expressions. (We'll also parse types, see below.)
The full Expression
type is pretty complicated, and it's not always obvious what the different properties mean. So it's really helpful to be able to browse the AST produced by a piece of code. I use the excellent AST Explorer for this. (Set the language to JavaScript
and the parser to @babel/parser.
Then click the gear icon next to @babel/parser
, enable the typescript
plugin, and disable the flow
plugin.)
Be aware that AST Explorer parses programs, not expressions; so if you feed it an expression, the AST is wrapped in Program
and ExpressionStatement
nodes. Usually it's OK to write an expression where a statement is expected, but for object expressions you need to wrap parentheses around them, or else the parser sees {
and tries to parse a block of statements.
Representing types
To synthesize and check types, we need a way to represent them. In this fragment we support only a few types—primitive literals, and objects made up of properties (see types.ts):
type Type = Null | Boolean | Number | String | Object;
type Null = { type: 'Null'; }
type Boolean = { type: 'Boolean'; }
type Number = { type: 'Number'; }
type String = { type: 'String'; }
type Object = {
type: 'Object';
properties: { name: string, type: Type }[];
}
For example, the type
{ x: number, y: number }
is represented by the value
{
type: 'Object',
properties: [
{ name: 'x', type: { type: 'Number' } },
{ name: 'y', type: { type: 'Number' } }
]
}
Constructors
We have predefined constants and helper functions to construct types (see constructors.ts):
const nullType: Null = { type: 'Null' };
const boolean: Boolean = { type: 'Boolean' };
const number: Number = { type: 'Number' };
const string: String = { type: 'String' };
The properties of an object type are represented as an array, but it's often convenient to construct object types by passing an object mapping names to types, like:
const type = object({ x: number, y: number });
so the object
constructor takes either an array of properties or an object mapping names to types:
function object(
properties: { name: string, type: Type }[] | { [name: string]: Type }
): Object {
if (Array.isArray(properties)) {
return { type: 'Object', properties }
} else {
return object(
Object.entries(properties).map(([ name, type ]) => ({ name, type }))
);
}
}
This is a nice example of the flexibility of union types: rather than define a separate function to convert an object argument to an array, we give the argument a union type. If Array.isArray(properties)
is true, we know from the argument type that properties
must be an array of { name: string, type: Type }
objects, so we can directly construct an Object
type. Otherwise it must be an object mapping property names to Type
s, so we walk the object entries to build an array.
Validators
We have validators for distinguishing different kinds of Type
(see validators.ts):
function isNull(t: Type): t is Null { return t.type === 'Null'; }
function isBoolean(t: Type): t is Boolean { return t.type === 'Boolean'; }
function isNumber(t: Type): t is Number { return t.type === 'Number'; }
function isString(t: Type): t is String { return t.type === 'String'; }
function isObject(t: Type): t is Object { return t.type === 'Object'; }
The return type t is Null
(for example) is a type predicate; when a call to the validator returns true
, the type checker narrows t
to type Null
, just as it would have if the test t.type === Null
had appeared directly in the code.
Type module
We wrap these types, constructors, validators, and other type-related functions into a module (see type/index.ts) so we can write Type.Boolean
(the type), Type.boolean
(the constructor), Type.object({ ... })
, Type.isObject
, Type.toString
(see toString.ts), and so on.
Parsing types
For writing tests it's useful to parse a type on its own. Babel doesn't provide a function for this, but we can parse an as
expression and pull out the type:
function parseType(input: string): Type {
const ast = parseExpression(`_ as ${input}`);
if (!AST.isTSAsExpression(ast)) bug(`unexpected ${ast.type}`);
return Type.ofTSType(ast.typeAnnotation);
}
Here Type.ofTSType
(see ofTSType.ts) converts a Babel TSType
AST (which represents a parsed type) to our Type
representation. And bug
(see err.ts) throws an exception indicating a bug in the code.
The call to bug
shows a use of narrowing: the return type of bug
is never
, meaning that the function never returns a value (it throws an exception), so the TypeScript type checker reasons that in the remainder of the function, AST.isTSAsExpression(ast)
must be true; so the type of ast
must be TSAsExpression
(the AST
validators are also type predicates), and we can safely access ast.typeAnnotation
. We use this pattern to handle unexpected cases throughout the code.
Synthesizing types from expressions
Now we have enough machinery in place to do some actual type checking:
Recall from part 0 that to synthesize a type from an expression, we synthesize the types of its subexpressions, then combine them according to the top-level operation of the expression; for atomic expressions like literal values, we return the corresponding type.
First we case over the expression type and dispatch to helper functions (see synth.ts):
function synth(ast: AST.Expression): Type {
switch (ast.type) {
case 'NullLiteral': return synthNull(ast);
case 'BooleanLiteral': return synthBoolean(ast);
case 'NumericLiteral': return synthNumber(ast);
case 'StringLiteral': return synthString(ast);
case 'ObjectExpression': return synthObject(ast);
case 'MemberExpression': return synthMember(ast);
default: bug(`unimplemented ${ast.type}`);
}
}
In each case
of the switch
, the type of ast
is narrowed to the corresponding arm of Expression
(which is a big union of all the kinds of expression), so the helper functions receive the specific arm type.
For primitive literals we return the corresponding type:
function synthNull(ast: AST.NullLiteral): Type { return Type.nullType; }
function synthBoolean(ast: AST.BooleanLiteral): Type { return Type.boolean; }
function synthNumber(ast: AST.NumericLiteral): Type { return Type.number; }
function synthString(ast: AST.StringLiteral): Type { return Type.string; }
For object expressions (of type ObjectExpression
), we synthesize a type for each property value expression, then return an object type that associates property names to types:
function synthObject(ast: AST.ObjectExpression): Type {
const properties =
ast.properties.map(prop => {
if (!AST.isObjectProperty(prop)) bug(`unimplemented ${prop.type}`);
if (!AST.isIdentifier(prop.key)) bug(`unimplemented ${prop.key.type}`);
if (!AST.isExpression(prop.value)) bug(`unimplemented ${prop.value.type}`);
if (prop.computed) bug(`unimplemented computed`);
return {
name: prop.key.name,
type: synth(prop.value)
};
});
return Type.object(properties);
}
The Babel parser parses the full JavaScript expression syntax, but we're only implementing part of it, so we call bug(`unimplemented`)
for cases that we don't want to handle.
For member expressions, we synthesize the type of the left-hand side, check that it's an object and contains the named property (using Type.propType
, see propType.ts), and return the type of the property. We call err
(see err.ts) to raise an exception if these checks fail (indicating an error in the input).
function synthMember(ast: AST.MemberExpression): Type {
const prop = ast.property;
if (!AST.isIdentifier(prop)) bug(`unimplemented ${prop.type}`);
if (ast.computed) bug(`unimplemented computed`);
const object = synth(ast.object);
if (!Type.isObject(object)) err('. expects object', ast.object);
const type = Type.propType(object, prop.name);
if (!type) err(`no such property ${prop.name}`, prop);
return type;
}
Again we skip some cases we don't want to handle.
Subtyping
Recall from part 0 that in some cases we need to check whether a type A
is a subtype of another type B
. I find it really helpful to think of subtyping as an adversarial game: I pass a value of type A
to an opponent, who is allowed to perform any operations on the value that are allowed by type B
. If my opponent can't perform any unsupported operation on the value, I win—A
is a subtype of B
. Otherwise my opponent wins—A
is not a subtype of B
.
My opponent can take the result of an operation and perform further operations on it. So A
is a subtype of B
when all the operations supported on B
are also supported on A
; and further, that the result of each operation on A
is a subtype of the result of the same operation on B
.
Since we have only a handful of types in this fragment, the subtyping function is simple (see isSubtype.ts):
function isSubtype(a: Type, b: Type): boolean {
if (isNull(a) && isNull(b)) return true;
if (isBoolean(a) && isBoolean(b)) return true;
if (isNumber(a) && isNumber(b)) return true;
if (isString(a) && isString(b)) return true;
if (isObject(a) && isObject(b)) {
return b.properties.every(({ name: bName, type: bType }) => {
const aType = propType(a, bName);
if (!aType) return false;
else return isSubtype(aType, bType);
});
}
return false;
}
Each primitive type is a subtype of itself but no other type. An object type A
is a subtype of an object type B
if all the properties we can access on B
can also be accessed on A
(the order of properties doesn't matter), and each of those properties on A
is a subtype of the corresponding property on B
.
For example, this type of rectangles with labelled points:
{
upperLeft: { label: string, x: number, y: number },
lowerRight: { label: string, x: number, y: number },
}
is a subtype of this type of rectangles with unlabelled points:
{
upperLeft: { x: number, y: number },
lowerRight: { x: number, y: number },
}
Checking expressions against types
Recall from part 0 that to check an expression against an expected type, we break down the expression and type and recursively check each expression part against the corresponding type part. When we can't break down the expression or type, we synthesize a type for the expression, then check to see that the synthesized type is a subtype of the expected type.
First we case over the expression and expected type and dispatch to helper functions (see check.ts):
function check(ast: Expression, type: Type) {
if (AST.isObjectExpression(ast) && Type.isObject(type))
return checkObject(ast, type);
const synthType = synth(ast);
if (!Type.isSubtype(synthType, type))
err(`expected ${Type.toString(type)}, got ${Type.toString(synthType)}`, ast);
}
In this fragment, we can only break down object expressions against object types. Otherwise we synthesize a type and compare it to the expected type with Type.isSubtype
.
To check an object expression against an object type, we match expression properties with type properties by name, then check each property value expression against the corresponding property type:
function checkObject(ast: ObjectExpression, type: Type.Object) {
const astProps: { name: string, expr: Expression, key: Identifier }[] =
ast.properties.map(prop => {
if (!AST.isObjectProperty(prop)) bug(`unimplemented ${prop.type}`);
if (prop.computed) bug(`unimplemented computed`);
if (!AST.isIdentifier(prop.key)) bug(`unimplemented ${prop.key.type}`);
if (!AST.isExpression(prop.value)) bug(`unimplemented ${prop.value.type}`);
return {
name: prop.key.name,
expr: prop.value as Expression,
key: prop.key
};
});
type.properties.forEach(({ name }) => {
const astProp = astProps.find(({ name: astName }) => astName === name);
if (!astProp) err(`missing property ${name}`, ast);
});
astProps.forEach(({ name, expr, key }) => {
const propType = Type.propType(type, name);
if (propType) check(expr, propType);
else err(`extra property ${name}`, key);
});
}
We flag an error if the object expression is missing a property present in the type. We also flag an error if it has an extra property that's not present in the type. You might wonder why, since an extra property can't do any harm; in isSubtype
we allow extra properties. But in a literal object expression, an extra property might be a bug: we meant to set an optional property but mistyped it. So we follow actual TypeScript and flag it.
Now that we've defined check
, we can add a case to synth
to synthesize as
expressions by checking the expression against the given type:
function synthTSAs(ast: AST.TSAsExpression): Type {
const type = Type.ofTSType(ast.typeAnnotation);
check(ast.expression, type);
return type;
}
So as
gives us a way to tell the type checker to switch from synthesis to checking. This can be useful for debugging complicated type errors. It's also a way to hide information about the type of an expression (but the language fragment so far is too simple to give a good example of this—more later).
Example
Update: if you don't want to read through this (pretty tedious!) explanation, head down to Try it! and click "check object error" to see the call tree for this example.
Let's walk through an example in detail:
{
x: 7,
y: { a: "foo", b: "bar" }.b
} as { x: number, y: number }
(See here for the full AST.) Type checking starts with a call to synth
:
synth({
type: "TSAsExpression",
expression: { type: "ObjectExpression", properties: [ ... ] }
typeAnnotation: ...
})
The top-level node is TSASExpression
, so we call check
on the expression
property with the given type (first translating the parsed typeAnnotation
to our Type
representation with Type.ofTSType
):
check(
// check this expression
{
type: "ObjectExpression",
properties: [
{
type: "ObjectProperty",
key: { type: "Identifier", name: "x" },
value: { type: "NumericLiteral", value: 7 }
},
{
type: "ObjectProperty",
key: { type: "Identifier", name: "y" },
value: { type: "MemberExpression", ... }
}
]
},
// against this type
{
type: "Object",
properties: [
{ name: "x", type: { type: "Number" } },
{ name: "y", type: { type: "Number" } }
]
}
)
To check an ObjectExpression
against an Object
type, we check each property value expression against the corresponding property type:
// property x
check({ type: "NumericLiteral", value: 7 }, { type: "Number" })
// property y
check({ type: "MemberExpression", ... }, { type: "Number" })
We can't break down either of these further, so for each we synth
a type for the expression, then compare it to the expected type with isSubtype
. For x
we synth
type Number
and check that it's a subtype of Number
. For y
we first synth
a type for { a: "foo", b: "bar" }.b
:
synth({
type: "MemberExpression",
object: {
type: "ObjectExpression",
properties: [
{
type: "ObjectProperty",
key: { type: "Identifier", name: "a" },
value: { type: "StringLiteral", value: "foo" }
},
{
type: "ObjectProperty",
key: { type: "Identifier", name: "b" },
value: { type: "StringLiteral", value: "bar" }
},
]
},
property: { type: "Identifier", name: "b" }
})
To synth
a type for a MemberExpression
, we first synth
a type for the object
property—to synth
a type for an ObjectExpression
we synth types for the property values and return this type:
{
type: "Object",
properties: [
{ name: "a", type: { type: "String" } },
{ name: "b", type: { type: "String" } }
]
}
then project the b
property to get type String
for the whole MemberExpression
. Finally we check that String
is a subtype of Number
, and flag an error because it is not.
Try it!
You can try out the type checker below. In the top box, click on an example button or type an expression (remember that the supported expressions are primitive literals, object expressions, member expressions, and as
ascriptions). In the bottom box you'll see a trace of the type checker execution, ending in a synthesized type (or an error). The trace is a tree of function calls; click on a function call to expand the tree under that call, or mouse over a call to highlight the matching return value.
The plan
For the full code of part 1 see https://github.com/jaked/reconstructing-typescript/tree/part1.
Next time we'll add functions and function calls to the language—we'll add a type environment to track the types of variables, and see how subtyping for functions requires contravariance.
Thanks to Julia Evans, Tony Chow, and Will Lachance for helpful feedback on a draft of this post.
Please email me with comments, criticisms, or corrections.