Jake Donham > Technical Difficulties > Reconstructing TypeScript, part 4

Reconstructing TypeScript, part 4: union types

2021-10-14

This post is part of a series about implementing type checking for a TypeScript-like language. The language fragment we've covered so far (in parts 1, 2, and 3) supports:

primitives, object expressions and member lookups, functions and function applications, and some arithmetic and logical operators; and
primitive, object, function, and singleton types.

In this part we'll add union types like

string | boolean
'red' | 'green' | 'blue'
{ x: number, y: number } | undefined

{ type: 'cartesian', x: number, y: number } |
  { type: 'polar', angle: number, magnitude: number }

to describe collections of values where we know each value satisfies (at least) one arm of the union, but we don't know which one.

What's a union type?

I wrote in part 0 that a type

describes attributes shared by [a collection of] values: what operations are supported; and, for some operations, what result they return.

Now I need to complicate that explanation a little to make sense of union types.

First, a reminder that these are all ways of saying the same thing:

a value supports the operations described by a type
a value satisfies a type
a type contains a value

In what follows I'll use all these ways to describe the same situation, don't be confused!

Values satisfying a type may have additional attributes that aren't described by the type (for example, extra object properties); different values may have different additional attributes; and we may view the same value as different types at different points in a program. For example, in

const v1 = { x: 7, y: 9 };
const v2 = { type: 'cartesian', x: 7, y: 9 };

const magnitude = (v: { x: number, y: number }) =>
  Math.sqrt(x * x + y * y);

magnitude(v1);
magnitude(v2);

v1 and v2 have different types, but in the body of magnitude they're viewed as the same type. So a type isn't a complete description of a collection of values; it describes what's known about the values at a certain point in a program.

A union type describes a situation where what's known about a collection of values is that each value satisfies (at least) one of several types (called arms of the union), but we don't know which one. What can we do with this knowledge—what operations are supported on union types? Since we don't know which arm a given value satisfies, the operations supported on union types are those that are supported on every arm. For example, in the type

{ type: 'cartesian', x: number, y: number } |
  { type: 'polar', angle: number, magnitude: number }

we can look up the type property, because it's supported on every arm; but we can't look up the other properties. We can also use in to test for the presence of properties (this is supported on all objects), test for truthiness (this is supported on all values), and so on.

Performing an operation on a value might yield information about which arm of the union the value satisfies; from that information we can deduce that additional operations are supported. For example, if the type property is 'cartesian' we can deduce that the value satisfies the first arm of the union, so it's safe to look up the x property—we narrow the union type by eliminating some arms. (We'll return to narrowing in part 6.)

A union with one arm contains the same values as the arm—every value of this type must satisfy the single arm.

A union with no arms contains no values—a value of union type must satisfy one of the arms, but there are no arms to satisfy, so there can be no values. This type is called never; it's useful to describe functions that always throw an exception (like bug and err in our implementation), or empty collections (an empty array has type never[]).

Equivalent union types

Union types give us lots of ways to describe the same collection of values:

The order of the arms of a union doesn't matter; so for example

'red' | 'blue' | 'green'
'blue' | 'red' | 'green'
'green' | 'red' | 'blue'

all contain the same values.

If a union arm is a nested union, the arms of the inner union can be lifted up to the outer union (if a value satisfies the nested union, then it satisfies one of its arms); so for example

'red' | ('green' | 'blue')
('red' | 'green') | 'blue'
'red' | 'green' | 'blue'

all contain the same values.

If one arm of a union is a subtype of another, it can be removed (all values that satisfy it also satisfy the other arm); so for example

{ type: 'cartesian', x: number, y: number } | { x: number, y: number }
{ x: number, y: number } | never
{ x: number, y: number }

all contain the same values. (We need to be careful about equivalent arms: in a union like number | number we don't want to remove both arms because each is a subtype of the other—instead we keep just one equivalent arm.)

Finally, an object type with properties of union type supports the same operations as a union of object types; so for example

{ foo: 1 | 2, bar: 3 | 4 }
{ foo: 1 | 2, bar: 3 } | { foo: 1 | 2, bar: 4 }
{ foo: 1, bar: 3 | 4 } | { foo: 2, bar: 3 | 4 }

all contain the same values.

Representing union types

To represent union types, we add a Union arm to the Type union containing a list of arms, and a Never arm to represent never (see types.ts):

type Type = ... | Never | Union;

type Never = { type: 'Never'; }

type Union = { type: 'Union'; types: Type[]; }

We add a constructor function for Never (see constructors.ts, and below for the Union constructor):

const never: Never = { type: 'Never' };

and validator functions (see validators.ts):

function isNever(type: Type): type is Never {
  return type.type === 'Never';
}

function isUnion(type: Type): type is Union {
  return type.type === 'Union';
}

and cases for Union and Never in toString.ts and ofTSType.ts.

Normalizing union types

We saw above that there are many ways union types can be equivalent. To simplify the type checker, and to make its output more readable, we normalize union types in the constructor: flatten nested unions and remove redundant arms. Here's the code (see union.ts):

function collapseSubtypes(ts: Type[]): Type[] {
  return ts.filter((t1, i1) =>       // an arm is kept if
    ts.every((t2, i2) =>             // for every arm
      i1 === i2 ||                   // (except itself)
      !isSubtype(t1, t2) ||          // it's not a subtype of the other arm
      (isSubtype(t2, t1) && i1 < i2) // or it's equivalent to the other arm
                                     // and this is the first equivalent arm
    )
  );
}

function flatten(ts: Type[]): Type[] {
  return ([] as Type[]).concat(
    ...ts.map(t => isUnion(t) ? t.types : t)
  );
}

function union(...types: Type[]): Type {
  types = flatten(types);
  types = collapseSubtypes(types);

  if (types.length === 0) return never;
  if (types.length === 1) return types[0];
  return { type: 'Union', types }
}

We don't normalize the ordering of arms, or distribute object types over unions; so it's still possible to write equivalent types in different ways. There are tradeoffs here among type checker performance, compact and readable types, and keeping types close to how they're written in the code. (That's not to say that I've explored these tradeoffs in any depth; I just did something straightforward that seems pretty close to what actual TypeScript does.)

Actual TypeScript normalizes types in some situations but not others; in particular, if you write down a type explictly, redundant object types aren't removed. I don't know why not!

Synthesizing with union types

In our type checker so far, when synthesizing the type of an expression that operates on a subexpression—for example, a member expression like foo.bar—we synthesize the type of the subexpression, then check that it supports the operation and compute the result type, like:

function synthMember(env: Env, ast: AST.MemberExpression): Type {
  const prop = ast.property;
  ...
  const object = synth(env, ast.object);
  if (!Type.isObject(object)) err('. expects object', ast.object);
  const type = Type.propType(object, prop.name);
  if (!type) err(`no such property ${prop.name}`, prop);
  return type;
}

Now we need to handle the case where the subexpression has union type. We saw above that an operation is supported on a union type when it's supported on every arm of the union. We don't know which arm a value satisfies, and each arm might return a different result type for the operation. So the result type of the operation on the union must be the union of the result types of the operation on each arm. For example, for an expression foo of type

{ bar: boolean } | { bar: string }

the result type of foo.bar is

boolean | string

Every operation follows this same pattern, so we implement it using a helper function (see map.ts):

function map(t: Type, fn: (t: Type) => Type) {
  if (isUnion(t)) return union(...t.types.map(fn));
  else            return fn(t);
}

The idea is that computing the result type of an operation from its operand type is a Type => Type function. We apply this function to a union type by applying it to each arm then taking the union of the results; for other types we just apply it. (Here map depends on normalization—t.types contains no unions.)

Now we synthesize a type from the subexpression as before, then map over it to compute the result type of the operation, like:

function synthMember(env: Env, ast: AST.MemberExpression): Type {
  const prop = ast.property;
  ...
  const object = synth(env, ast.object);
  return Type.map(object, object => {
    if (!Type.isObject(object)) err('. expects object', ast.object);
    const type = Type.propType(object, prop.name);
    if (!type) err(`no such property ${prop.name}`, prop);
    return type;
  });
}

We change the rest of the operations the same way (see synthCall, synthBinary, synthLogical, synthUnary). But when a subexpression is used without operating on it—as a property value of an object expression, or passed as a function argument—then we don't need to map over it, we just use the type directly. For example, in:

const value: string | boolean = ...
const obj = { foo: value }

we use the type of value directly as the property type, so the type of obj is

{ foo: string | boolean }

We can now improve the result type of && (and ||), since we can express the situation where we don't know whether the left or right side is returned (see synth.ts):

case '&&':
  if (Type.isFalsy(left))
    return left;
  else if (Type.isTruthy(left))
    return right;
  else
    return Type.union(left, right);

This is still not completely precise: when the left side is returned, it must be falsy, so we can narrow its type to falsy values. We'll fix this in part 6.

Subtyping union types

A union type is a subtype of another type if each of its arms is a subtype of the other type—so never, having no arms, is a subtype of every type. A type is a subtype of a union type if it's a subtype of at least one of the arms. (To see why these rules are valid, the view from part 1 of subtyping as an adversarial game might be useful.) Here's the code:

function isSubtype(a: Type, b: Type): boolean {
  if (Type.isNever(a)) return true;

  if (Type.isUnion(a))
    return a.types.every(a => isSubtype(a, b));

  if (Type.isUnion(b))
    return b.types.some(b => isSubtype(a, b));
  ...

Unfortunately, these straightforward rules are not complete—there are pairs of types that contain the same values, but are not subtypes of one another according to this function. For example:

type a = { foo: 1 | 2 }
type b = { foo: 1 } | { foo: 2 }

a and b contain the same values, and b is a subtype of a, but a is not a subtype of b according to our isSubtype function. The problem is that we have to commit to a type of foo too early by choosing one of the arms of b. One approach might be to distribute object expressions over unions before (or maybe simultaneously with) checking subtyping; but let's move on for now. (Actual TypeScript does better here—I don't know what algorithm it uses.)

Checking against union types

Recall that we have two motivations for checking expressions against types:

to produce more localized error messages (see part 0)
to avoid needing type annotations on function arguments (see part 2)

For union types, we could implement a checking rule like

function checkUnion(env: Env, ast: AST.Expression, type: Type.Union) {
  let error: unknown = undefined;
  for (const t of type.types) {
    try {
      check(env, ast, t);
      return;
    }
    catch (e) {
      if (!error) error = e;
    }
  }
  throw error;
}

That is, try to check the expression against each arm of the union; if any arm succeeds, then the expression satisfies the union type. If no arm succeeds, throw the exception we got for the first arm.

This might let us avoid type annotations on function arguments when the expression satisfies the union type. But when the expression doesn't satisfy the union type, the error message is much worse. For example, in

type vector =
  { type: 'cartesian', x: number, y: number } |
    { type: 'polar', angle: number, magnitude: number }

const v: vector = { type: 'polar', angle: ..., mangitude: ... }

(where we're trying to make a 'polar' vector but misspell the magnitude property) we'd get an error like 'polar' is not a subtype of 'cartesian', because that's the exception we get when trying to check the expression against the first arm.

We could come up with a fancier scheme here—maybe score the arms by how closely they match the expression?—but for now we'll just fall back as usual to the default: synthesize a type from the expression, then check that it's a subtype of the union type. That way we get a more useful error like:

{ type: 'polar', angle: number, mangitude: number } is not a subtype of
  { type: 'cartesian', x: number, y: number } |
    { type: 'polar', angle: number, magnitude: number }

Try it!

You can try out the type checker below. Click on an example button or type an expression in the top box. In the bottom box you'll see a trace of the type checker execution, ending in a synthesized type (or an error). The trace is a tree of function calls; click on a function call to expand the tree under that call, or mouse over a call to highlight the matching return value.

Anonymous functions passed to Type.map are labelled to match the enclosing function; and since they usually close over the subexpression AST, they're annotated with the AST for context. So the anonymous function in synthBinary for an addition is labelled ...synthBinary[_ + _].

The only way to create union types in the current language fragment is with type ascription.

Notice the quadratic blowup in the + example—maybe this is why actual TypeScript doesn't synthesize singleton types for arithmetic operators?

The plan

For the full code of part 4 see https://github.com/jaked/reconstructing-typescript/tree/part4. To view the changes between part 3 and part 4 see https://github.com/jaked/reconstructing-typescript/compare/part3...part4.

Next time we'll add intersection types, on the way to implementing narrowing.

Please email me with comments, criticisms, or corrections.