Jake Donham > Technical Difficulties > Reconstructing TypeScript, part 1

Reconstructing TypeScript, part 1: bidirectional type checking

2021-09-15

This post is part of a series about implementing type checking for a TypeScript-like language. In the last post I gave some background about type checking, TypeScript's type system, and the approach we'll use, called bidirectional type checking.

Now let's write a type checker! We're going to start with a tiny fragment of the language, and build it up incrementally in subsequent posts. In this fragment we'll handle only primitive literals like

7
"TypeScript is cool."
false

object literals like

{ x: 7, y: 9 }
{ foo: "bar", baz: false }

and member expressions like

{ x: 7, y: 9 }.x

(We don't have variables yet, so member expressions look a little weird!)

We'll also handle type ascriptions like

{ x: 7, y: "foo" } as { x: number, y: number }

In actual TypeScript, as is an unsafe cast, a way to tell the type checker, "trust me, this expression has this type!" Here it is a way to guide the type checker (more on this below), but it's not unsafe—the type checker still ensures that the program doesn't attempt any unsupported operations.

Parsing code into an abstract syntax tree

Recall from part 0 that type checking works on an abstract syntax tree representation of code, where each expression in a program is a node in the tree, with its subexpressions as children. So first we need to parse code strings to ASTs. We use @babel/parser with the TypeScript plugin (see parse.ts):

function parseExpression(input: string): Expression {
  return Babel.parseExpression(input, {
    plugins: [ 'typescript' ]
  });
}

which produces a value of type Expression from @babel/types. For example, if we call parseExpression on an object expression

{ x: 7, y: 9 }

we get an ObjectExpression AST with ObjectProperty ASTs as children:

{
  type: "ObjectExpression",
  properties: [
    {
      type: "ObjectProperty",
      key: { type: "Identifier", name: "x" },
      value: { type: "NumericLiteral", value: 7 }
    },
    {
      type: "ObjectProperty",
      key: { type: "Identifier", name: "y" },
      value: { type: "NumericLiteral", value: 9 }
    }
  ]
}

Each node has a type property saying what kind of syntax it represents, and each kind of node has kind-specific properties for subexpressions and other attributes. (I've trimmed out properties that describe the location of each AST node in the original code string.)

To keep things simple, our type checker will handle only expressions, like:

"TypeScript is cool"
{ x: 7, y: 9 }
x > 0 ? x : -x

not statements, like:

const x = 7
for (let i = 0; i < 10; i++) { ... }
throw new Error("bad thing")

so we only need to parse expressions. (We'll also parse types, see below.)

The full Expression type is pretty complicated, and it's not always obvious what the different properties mean. So it's really helpful to be able to browse the AST produced by a piece of code. I use the excellent AST Explorer for this. (Set the language to JavaScript and the parser to @babel/parser. Then click the gear icon next to @babel/parser, enable the typescript plugin, and disable the flow plugin.)

Be aware that AST Explorer parses programs, not expressions; so if you feed it an expression, the AST is wrapped in Program and ExpressionStatement nodes. Usually it's OK to write an expression where a statement is expected, but for object expressions you need to wrap parentheses around them, or else the parser sees { and tries to parse a block of statements.

Representing types

To synthesize and check types, we need a way to represent them. In this fragment we support only a few types—primitive literals, and objects made up of properties (see types.ts):

type Type = Null | Boolean | Number | String | Object;

type Null = { type: 'Null'; }
type Boolean = { type: 'Boolean'; }
type Number = { type: 'Number'; }
type String = { type: 'String'; }

type Object = {
  type: 'Object';
  properties: { name: string, type: Type }[];
}

For example, the type

{ x: number, y: number }

is represented by the value

{
  type: 'Object',
  properties: [
    { name: 'x', type: { type: 'Number' } },
    { name: 'y', type: { type: 'Number' } }
  ]
}

Constructors

We have predefined constants and helper functions to construct types (see constructors.ts):

const nullType: Null = { type: 'Null' };
const boolean: Boolean = { type: 'Boolean' };
const number: Number = { type: 'Number' };
const string: String = { type: 'String' };

The properties of an object type are represented as an array, but it's often convenient to construct object types by passing an object mapping names to types, like:

const type = object({ x: number, y: number });

so the object constructor takes either an array of properties or an object mapping names to types:

function object(
  properties: { name: string, type: Type }[] | { [name: string]: Type }
): Object {
  if (Array.isArray(properties)) {
    return { type: 'Object', properties }
  } else {
    return object(
      Object.entries(properties).map(([ name, type ]) => ({ name, type }))
    );
  }
}

This is a nice example of the flexibility of union types: rather than define a separate function to convert an object argument to an array, we give the argument a union type. If Array.isArray(properties) is true, we know from the argument type that properties must be an array of { name: string, type: Type } objects, so we can directly construct an Object type. Otherwise it must be an object mapping property names to Types, so we walk the object entries to build an array.

Validators

We have validators for distinguishing different kinds of Type (see validators.ts):

function isNull(t: Type): t is Null       { return t.type === 'Null'; }
function isBoolean(t: Type): t is Boolean { return t.type === 'Boolean'; }
function isNumber(t: Type): t is Number   { return t.type === 'Number'; }
function isString(t: Type): t is String   { return t.type === 'String'; }
function isObject(t: Type): t is Object   { return t.type === 'Object'; }

The return type t is Null (for example) is a type predicate; when a call to the validator returns true, the type checker narrows t to type Null, just as it would have if the test t.type === Null had appeared directly in the code.

Type module

We wrap these types, constructors, validators, and other type-related functions into a module (see type/index.ts) so we can write Type.Boolean (the type), Type.boolean (the constructor), Type.object({ ... }), Type.isObject, Type.toString (see toString.ts), and so on.

Parsing types

For writing tests it's useful to parse a type on its own. Babel doesn't provide a function for this, but we can parse an as expression and pull out the type:

function parseType(input: string): Type {
  const ast = parseExpression(`_ as ${input}`);
  if (!AST.isTSAsExpression(ast)) bug(`unexpected ${ast.type}`);
  return Type.ofTSType(ast.typeAnnotation);
}

Here Type.ofTSType (see ofTSType.ts) converts a Babel TSType AST (which represents a parsed type) to our Type representation. And bug (see err.ts) throws an exception indicating a bug in the code.

The call to bug shows a use of narrowing: the return type of bug is never, meaning that the function never returns a value (it throws an exception), so the TypeScript type checker reasons that in the remainder of the function, AST.isTSAsExpression(ast) must be true; so the type of ast must be TSAsExpression (the AST validators are also type predicates), and we can safely access ast.typeAnnotation. We use this pattern to handle unexpected cases throughout the code.

Synthesizing types from expressions

Now we have enough machinery in place to do some actual type checking:

Recall from part 0 that to synthesize a type from an expression, we synthesize the types of its subexpressions, then combine them according to the top-level operation of the expression; for atomic expressions like literal values, we return the corresponding type.

First we case over the expression type and dispatch to helper functions (see synth.ts):

function synth(ast: AST.Expression): Type {
  switch (ast.type) {
    case 'NullLiteral':      return synthNull(ast);
    case 'BooleanLiteral':   return synthBoolean(ast);
    case 'NumericLiteral':   return synthNumber(ast);
    case 'StringLiteral':    return synthString(ast);
    case 'ObjectExpression': return synthObject(ast);
    case 'MemberExpression': return synthMember(ast);

    default: bug(`unimplemented ${ast.type}`);
  }
}

In each case of the switch, the type of ast is narrowed to the corresponding arm of Expression (which is a big union of all the kinds of expression), so the helper functions receive the specific arm type.

For primitive literals we return the corresponding type:

function synthNull(ast: AST.NullLiteral): Type       { return Type.nullType; }
function synthBoolean(ast: AST.BooleanLiteral): Type { return Type.boolean; }
function synthNumber(ast: AST.NumericLiteral): Type  { return Type.number; }
function synthString(ast: AST.StringLiteral): Type   { return Type.string; }

For object expressions (of type ObjectExpression), we synthesize a type for each property value expression, then return an object type that associates property names to types:

function synthObject(ast: AST.ObjectExpression): Type {
  const properties =
    ast.properties.map(prop => {
      if (!AST.isObjectProperty(prop)) bug(`unimplemented ${prop.type}`);
      if (!AST.isIdentifier(prop.key)) bug(`unimplemented ${prop.key.type}`);
      if (!AST.isExpression(prop.value)) bug(`unimplemented ${prop.value.type}`);
      if (prop.computed) bug(`unimplemented computed`);
      return {
        name: prop.key.name,
        type: synth(prop.value)
      };
    });
  return Type.object(properties);
}

The Babel parser parses the full JavaScript expression syntax, but we're only implementing part of it, so we call bug(`unimplemented`) for cases that we don't want to handle.

For member expressions, we synthesize the type of the left-hand side, check that it's an object and contains the named property (using Type.propType, see propType.ts), and return the type of the property. We call err (see err.ts) to raise an exception if these checks fail (indicating an error in the input).

function synthMember(ast: AST.MemberExpression): Type {
  const prop = ast.property;
  if (!AST.isIdentifier(prop)) bug(`unimplemented ${prop.type}`);
  if (ast.computed) bug(`unimplemented computed`);
  const object = synth(ast.object);
  if (!Type.isObject(object)) err('. expects object', ast.object);
  const type = Type.propType(object, prop.name);
  if (!type) err(`no such property ${prop.name}`, prop);
  return type;
}

Again we skip some cases we don't want to handle.

Subtyping

Recall from part 0 that in some cases we need to check whether a type A is a subtype of another type B. I find it really helpful to think of subtyping as an adversarial game: I pass a value of type A to an opponent, who is allowed to perform any operations on the value that are allowed by type B. If my opponent can't perform any unsupported operation on the value, I win—A is a subtype of B. Otherwise my opponent wins—A is not a subtype of B.

My opponent can take the result of an operation and perform further operations on it. So A is a subtype of B when all the operations supported on B are also supported on A; and further, that the result of each operation on A is a subtype of the result of the same operation on B.

Since we have only a handful of types in this fragment, the subtyping function is simple (see isSubtype.ts):

function isSubtype(a: Type, b: Type): boolean {
  if (isNull(a) && isNull(b)) return true;
  if (isBoolean(a) && isBoolean(b)) return true;
  if (isNumber(a) && isNumber(b)) return true;
  if (isString(a) && isString(b)) return true;

  if (isObject(a) && isObject(b)) {
    return b.properties.every(({ name: bName, type: bType }) => {
      const aType = propType(a, bName);
      if (!aType) return false;
      else return isSubtype(aType, bType);
    });
  }

  return false;
}

Each primitive type is a subtype of itself but no other type. An object type A is a subtype of an object type B if all the properties we can access on B can also be accessed on A (the order of properties doesn't matter), and each of those properties on A is a subtype of the corresponding property on B.

For example, this type of rectangles with labelled points:

{
  upperLeft: { label: string, x: number, y: number },
  lowerRight: { label: string, x: number, y: number },
}

is a subtype of this type of rectangles with unlabelled points:

{
  upperLeft: { x: number, y: number },
  lowerRight: { x: number, y: number },
}

Checking expressions against types

Recall from part 0 that to check an expression against an expected type, we break down the expression and type and recursively check each expression part against the corresponding type part. When we can't break down the expression or type, we synthesize a type for the expression, then check to see that the synthesized type is a subtype of the expected type.

First we case over the expression and expected type and dispatch to helper functions (see check.ts):

function check(ast: Expression, type: Type) {
  if (AST.isObjectExpression(ast) && Type.isObject(type))
    return checkObject(ast, type);

  const synthType = synth(ast);
  if (!Type.isSubtype(synthType, type))
    err(`expected ${Type.toString(type)}, got ${Type.toString(synthType)}`, ast);
}

In this fragment, we can only break down object expressions against object types. Otherwise we synthesize a type and compare it to the expected type with Type.isSubtype.

To check an object expression against an object type, we match expression properties with type properties by name, then check each property value expression against the corresponding property type:

function checkObject(ast: ObjectExpression, type: Type.Object) {
  const astProps: { name: string, expr: Expression, key: Identifier }[] =
    ast.properties.map(prop => {
      if (!AST.isObjectProperty(prop)) bug(`unimplemented ${prop.type}`);
      if (prop.computed) bug(`unimplemented computed`);
      if (!AST.isIdentifier(prop.key)) bug(`unimplemented ${prop.key.type}`);
      if (!AST.isExpression(prop.value)) bug(`unimplemented ${prop.value.type}`);
      return {
        name: prop.key.name,
        expr: prop.value as Expression,
        key: prop.key
      };
    });

  type.properties.forEach(({ name }) => {
    const astProp = astProps.find(({ name: astName }) => astName === name);
    if (!astProp) err(`missing property ${name}`, ast);
  });

  astProps.forEach(({ name, expr, key }) => {
    const propType = Type.propType(type, name);
    if (propType) check(expr, propType);
    else err(`extra property ${name}`, key);
  });
}

We flag an error if the object expression is missing a property present in the type. We also flag an error if it has an extra property that's not present in the type. You might wonder why, since an extra property can't do any harm; in isSubtype we allow extra properties. But in a literal object expression, an extra property might be a bug: we meant to set an optional property but mistyped it. So we follow actual TypeScript and flag it.

Now that we've defined check, we can add a case to synth to synthesize as expressions by checking the expression against the given type:

function synthTSAs(ast: AST.TSAsExpression): Type {
  const type = Type.ofTSType(ast.typeAnnotation);
  check(ast.expression, type);
  return type;
}

So as gives us a way to tell the type checker to switch from synthesis to checking. This can be useful for debugging complicated type errors. It's also a way to hide information about the type of an expression (but the language fragment so far is too simple to give a good example of this—more later).

Example

Update: if you don't want to read through this (pretty tedious!) explanation, head down to Try it! and click "check object error" to see the call tree for this example.

Let's walk through an example in detail:

{
  x: 7,
  y: { a: "foo", b: "bar" }.b
} as { x: number, y: number }

(See here for the full AST.) Type checking starts with a call to synth:

synth({
  type: "TSAsExpression",
  expression: { type: "ObjectExpression", properties: [ ... ] }
  typeAnnotation: ...
})

The top-level node is TSASExpression, so we call check on the expression property with the given type (first translating the parsed typeAnnotation to our Type representation with Type.ofTSType):

check(
  // check this expression
  {
    type: "ObjectExpression",
    properties: [
      {
        type: "ObjectProperty",
        key: { type: "Identifier", name: "x" },
        value: { type: "NumericLiteral", value: 7 }
      },
      {
        type: "ObjectProperty",
        key: { type: "Identifier", name: "y" },
        value: { type: "MemberExpression", ... }
      }
    ]
  },
  // against this type
  {
    type: "Object",
    properties: [
      { name: "x", type: { type: "Number" } },
      { name: "y", type: { type: "Number" } }
    ]
  }
)

To check an ObjectExpression against an Object type, we check each property value expression against the corresponding property type:

// property x
check({ type: "NumericLiteral", value: 7 }, { type: "Number" })

// property y
check({ type: "MemberExpression", ... }, { type: "Number" })

We can't break down either of these further, so for each we synth a type for the expression, then compare it to the expected type with isSubtype. For x we synth type Number and check that it's a subtype of Number. For y we first synth a type for { a: "foo", b: "bar" }.b:

synth({
  type: "MemberExpression",
  object: {
    type: "ObjectExpression",
    properties: [
      {
        type: "ObjectProperty",
        key: { type: "Identifier", name: "a" },
        value: { type: "StringLiteral", value: "foo" }
      },
      {
        type: "ObjectProperty",
        key: { type: "Identifier", name: "b" },
        value: { type: "StringLiteral", value: "bar" }
      },
    ]
  },
  property: { type: "Identifier", name: "b" }
})

To synth a type for a MemberExpression, we first synth a type for the object property—to synth a type for an ObjectExpression we synth types for the property values and return this type:

{
  type: "Object",
  properties: [
    { name: "a", type: { type: "String" } },
    { name: "b", type: { type: "String" } }
  ]
}

then project the b property to get type String for the whole MemberExpression. Finally we check that String is a subtype of Number, and flag an error because it is not.

Try it!

You can try out the type checker below. In the top box, click on an example button or type an expression (remember that the supported expressions are primitive literals, object expressions, member expressions, and as ascriptions). In the bottom box you'll see a trace of the type checker execution, ending in a synthesized type (or an error). The trace is a tree of function calls; click on a function call to expand the tree under that call, or mouse over a call to highlight the matching return value.

The plan

For the full code of part 1 see https://github.com/jaked/reconstructing-typescript/tree/part1.

Next time we'll add functions and function calls to the language—we'll add a type environment to track the types of variables, and see how subtyping for functions requires contravariance.

Thanks to Julia Evans, Tony Chow, and Will Lachance for helpful feedback on a draft of this post.

Please email me with comments, criticisms, or corrections.