Jake Donham > Technical Difficulties > Reconstructing TypeScript, part 5

Reconstructing TypeScript, part 5: intersection types

2021-10-28

This post is part of a series about implementing type checking for a TypeScript-like language. See part 0 for background, and parts 1, 2, 3, 4 for the implementation so far.

In this part we'll add intersection types like

{ x: number } & { y: number } & { z: number }

((n: number) => string) & ((s: string) => number)

to describe collections of values that satisfy all the parts (separated by &) of the intersection. We haven't talked about intersection types yet, but they'll be important when we cover narrowing in the next part. They can also be useful in ordinary programming (I'll give some examples below).

Intersection types are unusual, although some other languages support them in a limited form—for example, implementing multiple interfaces in Java or Scala is a kind of intersection, as is function overloading.

What's an intersection type?

In part 4 I explained a type as a description of what's known about a collection of values (at a certain point in a program). Some types describe concrete attributes (like "supports the bar property"); others describe logical operations on pieces of knowledge:

A union type A | B is a logical OR of its arms: for any value satisfying the type, we know that either the value satisfies A, or the value satisfies B, or the value satisfies both A and B; but we don't know which one.

An intersection type A & B is a logical AND of its parts: for any value satisfying A & B we know the value satisfies both A and B. For example, the type

{ x: number } & { y: number } & { z: number }

has properties x, y, and z, all of type number.

One use for intersections is to add fields to an existing type. For example: a hassle that comes up with ORM systems is that IDs are issued by the database; when you create an object it doesn't have an ID, but when you look it up it does; so you can't use the same type for both purposes. Usually you get around this by making the ID an optional property, but that's not quite right. In TypeScript you can define the core object type without the ID property:

type Obj = { ... }

then augment it with an id property using an intersection

type ObjWithID = Obj & { id: number }

to get a type that supports all the same operations as Obj and also the id property.

Another use is to give types to overloaded functions. For example, the browser getElementsByTagName function returns different types based on its argument, so we can give it an intersection type:

const getElementsByTagName:
  ((tag: 'div') => HTMLDivElement[]) &
  ((tag: 'p') => HTMLParagraphElement[]) &
  ...

(In actual TypeScript you can declare overloaded functions with overload signatures, and as we'll see its type checker doesn't handle function intersections very well.)

An intersection with only one part contains the same values as the single part. (There's no way to write this type in TypeScript's concrete syntax, but it comes up in the code.)

An intersection with no parts contains all values! This type is called unknown. It supports only operations supported on every value, like calling typeof or testing for truthiness.

(In the same way that logical AND is dual to logical OR, intersection is dual to union. So a lot of the code we'll need for intersections looks like the code for unions, but flipped somehow.)

Equivalent intersection types

Like union types, intersection types give us lots of ways to describe the same collection of values:

The order of the parts of an intersection doesn't matter; so for example

{ x: number } & { y: number } & { z: number }
{ y: number } & { x: number } & { z: number }
{ z: number } & { x: number } & { y: number }

all contain the same values.

If a part of an intersection is a nested intersection, the parts of the inner intersection can be lifted up to the outer intersection; so for example

{ x: number } & ({ y: number } & { z: number })
({ x: number } & { y: number }) & { z: number }
{ x: number } & { y: number } & { z: number }

all contain the same values.

If one part of an intersection is a supertype of another, it can be removed; so for example

{ type: 'cartesian', x: number, y: number } & { x: number, y: number }
{ type: 'cartesian', x: number, y: number } & unknown
{ type: 'cartesian', x: number, y: number }

all contain the same values. (Here's an example of duality between union and intersection: For unions we remove the subtypes—a value satisfying a union must satisfy one of the arms; if it satisfies an arm it also satisfies all subtypes of the arm; so subtypes are redundant. For intersections we remove the supertypes—a value satisfying an intersection must satisfy all the parts; if it satisfies any supertype of a part it also satisfies the part; so supertypes are redundant.)

Intersections distribute over object types, so for example

{ foo: { bar: number } } & { foo: { baz: string } }
{ foo: { bar: number } & { baz: string } }

contain the same values; and also

{ x: number } & { y: number } & { z: number }
{ x: number, y: number, z: number }

contain the same values.

Intersections also distribute over unions (just as multiplication distributes over addition) so

(1 | 2) & (2 | 3)
(1 & (2 | 3)) | (2 & (2 | 3))
(1 & 2) | (1 & 3) | (2 & 2) | (2 & 3)

all contain the same values.

Some types have empty intersection: for example, a value can't be both a number and a string, or satisfy both singleton types 7 and 9. An empty intersection contains no values, so it's equivalent to never. In the previous example, all but one of the intersections in the 3rd line are empty, so the type is equivalent to

Now that we have intersections, we can distribute functions over unions—for example

(x: number | string) => boolean
((x: number) => boolean) & ((x: string) => boolean

contain the same values.

Representing intersection types

To represent intersection types, we add an Intersection arm to the Type union containing a list of parts, and an Unknown arm to represent unknown (see types.ts):

type Type = ... | Unknown | Intersection;

type Unknown = { type: 'Unknown'; }

type Intersection = { type: 'Intersection'; types: Type[]; }

We add a constructor function for Unknown (see constructors.ts, and below for the Intersection constructor):

const unknown: Types.Unknown = { type: 'Unknown' };

and validator functions (see validators.ts):

function isUnknown(type: Type): type is Unknown {
  return type.type === 'Unknown';
}

function isIntersection(type: Type): type is Intersection {
  return type.type === 'Intersection';
}

and cases for Intersection and Unknown in toString.ts and ofTSType.ts.

Normalizing intersection types

As we do for unions (see part 4), we normalize intersection types in the constructor, to simplify the type checker and make its output more readable.

Normalizing intersections is more complicated than normalizing unions. We want to detect and eliminate empty intersections (this will be important for narrowing), but unions make it more difficult. For example, in

(1 | 2) & (2 | 3)

it's not obvious that we can eliminate most of the union cases. But if we distribute the intersection over the unions

(1 & 2) | (1 & 3) | (2 & 2) | (2 & 3)

then it's easy to check each intersection for emptiness

never | never | 2 | never

and eliminate the empty intersections

So that's what we do: in addition to flattening nested intersections and removing redundant parts (as we do for unions), we also distribute intersections over unions and eliminate empty intersections. As with unions, we don't distribute intersections over object or function types (or objects / functions over intersections).

Distributing intersections over unions

Here's a helper function for distributing over unions (see union.ts):

function distributeUnion(ts: Type[]): Type[][] {
  const accum: Type[][] = [];

  function dist(ts: Type[], i: number): void {
    if (i === ts.length) {
      accum.push(ts);
    } else {
      const ti = ts[i];
      if (isUnion(ti)) {
        for (const t of ti.types) {
          const ts2 = ts.slice(0, i).concat(t, ts.slice(i + 1));
          dist(ts2, i + 1);
        }
      } else {
        dist(ts, i + 1);
      }
    }
  }

  dist(ts, 0);
  return accum;
}

This function takes a list of types (which may contain unions) and returns the Cartesian product (roughly speaking) over the unions. So if you call

distributeUnion([ 1 | 2, 3, 4 | 5 ])

you get back

[
  [ 1, 3, 4 ],
  [ 1, 3, 5 ],
  [ 2, 3, 4 ],
  [ 2, 3, 5 ]
]

Detecting empty intersections

Here's a function to check if two types overlap—that is, their intersection is not empty (see intersection.ts):

function overlaps(x: Type, y: Type): boolean {
  if (isNever(x) || isNever(y)) return false;
  if (isUnknown(x) || isUnknown(y)) return true;

  if (isUnion(x))
    return x.types.some(x => overlaps(x, y));
  if (isUnion(y))
    return y.types.some(y => overlaps(x, y));

  if (isIntersection(x))
    return x.types.every(x => overlaps(x, y));
  if (isIntersection(y))
    return y.types.every(y => overlaps(x, y));

  if (isSingleton(x) && isSingleton(y)) return x.value === y.value;
  if (isSingleton(x)) return x.base.type === y.type;
  if (isSingleton(y)) return y.base.type === x.type;

  if (isObject(x) && isObject(y)) {
    return x.properties.every(({ name, type: xType }) => {
      const yType = propType(y, name);
      if (!yType) return true;
      else return overlaps(xType, yType);
    });
  }

  return x.type === y.type;
}

It's interesting to compare overlaps with isSubtype: isSubtype(a, b) is true when all of a is contained in b; overlaps(a, b) is true when part of a is contained in b (or, equivalently, part of b is contained in a).

Empty intersections of object types

overlaps recurses inside object types, so it detects that

{ foo: 1 } & { foo: 2 }

is empty. Object properties may contain unions and intersections (since we don't distribute objects over unions and intersections) so overlaps must handle those cases. So overlaps detects that

{ foo: 1 | 2 } & { foo: 3 | 4 }

is empty, but we don't normalize

{ foo: 1 | 2 } & { foo: 2 | 3 }

to the equivalent

{ foo: 2 }

Empty intersections of function types

I'm not sure what to do about functions here. Clearly there are non-overlapping functions—for example

((x: number) => string) & ((x: number) => boolean))

is empty; there is no function that returns both string and boolean for the same input. But when the argument types don't overlap it's OK to return different types, for example

((x: number) => string) & ((x: string) => boolean))

so maybe there is a way to compute this precisely by checking for argument type overlap. In the meantime we just say that all functions overlap.

Actual TypeScript doesn't seem to be precise here either: it detects empty intersections in objects

type x = { foo: 7 }
type y = { foo: 9 }
type z = x & y // z has type never

but not in functions

type f = (x: number) => string
type g = (x: number) => boolean
type h = f & g // h has type f & g

Normalizing intersections

Now we can put the pieces together. To normalize an intersection:

flatten nested intersections
distribute the intersection over unions to get a union of intersections
for each intersection, check for emptiness and remove redundant parts

Here's the code (see intersection.ts):

function collapseSupertypes(ts: Type[]): Type[] {
  return ts.filter((t1, i1) =>       // a part is kept if
    ts.every((t2, i2) =>             // for every part
      i1 === i2 ||                   // (except itself)
      !isSubtype(t2, t1) ||          // it's not a supertype of the other part
      (isSubtype(t1, t2) && i1 < i2) // or it's equivalent to the other part
                                     // and this is the first equivalent part
    )
  );
}

function flatten(ts: Type[]): Type[] {
  return ([] as Type[]).concat(
    ...ts.map(t => isIntersection(t) ? t.types : t)
  );
}

function intersectionNoUnion(ts: Type[]): Type {
  if (ts.some((t1, i1) => ts.some((t2, i2) =>
    i1 < i2 && emptyIntersection(t1, t2)
  )))
    return never;
  ts = collapseSupertypes(ts);

  if (ts.length === 0) return unknown;
  if (ts.length === 1) return ts[0];
  return { type: 'Intersection', types: ts }
}

function intersection(...ts: Type[]): Type {
  ts = flatten(ts);
  ts = distributeUnion(ts).map(intersectionNoUnion);
  return union(...ts);
}

Synthesizing with intersection types

Recall from part 4 that when we synthesize the type of an expression that operates on a subexpression (like a member expression foo.bar), we

synthesize the type of the subexpression (foo in the example); then
Type.map over the subexpression type with a callback that
- checks that a type supports the operation (accessing the bar property)
- computes the result type (the type of bar)

function synthMember(env: Env, ast: AST.MemberExpression): Type {
  const prop = ast.property;
  ...
  const object = synth(env, ast.object);
  return Type.map(object, object => {
    if (!Type.isObject(object)) err('. expects object', ast.object);
    const type = Type.propType(object, prop.name);
    if (!type) err(`no such property ${prop.name}`, prop);
    return type;
  });
}

For non-union types, Type.map calls the callback directly on the type; for union types it calls the callback on each arm of the union and returns the union of the results.

Now that we have intersection types we need to handle them in a similar way. For unions, we need every arm to support the operation (since the subexpression may satisfy any of the arms). For intersections, we need only one part to support the operation (since the subexpression satisfies all the parts); when more than one part supports it, we return the intersection of the results. Here's the new map (see map.ts):

function map(t: Type, fn: (t: Type) => Type) {
  if (isUnion(t)) {
    return union(...t.types.map(t => map(t, fn)));

  } else if (isIntersection(t)) {
    const ts: Type[] = [];
    let error: unknown = undefined;
    for (const tt of t.types) {
      try {
        ts.push(fn(tt));
      } catch (e) {
        if (!error) error = e;
      }
    }
    if (ts.length === 0) {
      throw error;
    } else {
      return intersection(...ts);

  } else {
    return fn(t);
  }
}

For example, in

// foo has type { baz: string } & { bar: 1 | 2 } & { bar: 2 | 3 }
foo.bar

calling the synthMember callback on the first part of the intersection throws an exception because bar is not supported; the other parts succeed with result types 1|2 and 2|3; so the overall result type is the intersection

(1 | 2) & (2 | 3)

which normalizes to

If the callback fails on all the parts, map throws an exception; the error message is the one from the first part. (Actual TypeScript gives a more helpful error here, referencing the whole intersection type, not just a part.)

Synthesizing function types with intersections

Recall from part 2 that to synthesize a type from a function, we bind the argument types in the environment and synthesize a result type from the function body:

function synthFunction(env: Env, ast: AST.ArrowFunctionExpression): Type {
  ...
  const bindings = ast.params.map(param => {
    ...
    return {
      name: param.name,
      type: Type.ofTSType(param.typeAnnotation.typeAnnotation),
    };
  });
  const args = bindings.map(({ type }) => type);
  const bodyEnv =
    bindings.reduce(
      (env, { name, type }) => env.set(name, type),
      env
    );
  const ret = synth(bodyEnv, ast.body);
  return Type.functionType(args, ret);
}

So for a function expression

(x: number | string) => x

we synthesize the type

(number | string) => (number | string)

This type is not very precise—it doesn't capture the fact that you always get back a value of the same type you put in.

Now that we have intersections, we can split the argument cases to get a more precise type:

distribute unions in the function arguments;
for each resulting argument list
- bind the argument types in the environment;
- synthesize a result type from the function body; and
intersect the resulting function types.

Here's the code (see synth.ts):

function synthFunction(env: Env, ast: AST.ArrowFunctionExpression): Type {
  ... // same as before up to `const bodyEnv = `
  const argsLists = Type.distributeUnion(args);
  const funcTypes = argsLists.map(args => {
    const bodyEnv =
      bindings.reduce(
        (env, { name }, i) => env.set(name, args[i]),
        env
      );
    const ret = synth(bodyEnv, ast.body);
    return Type.functionType(args, ret);
  })
  return Type.intersection(...funcTypes);
}

Now for the example above, we synthesize a type from the body in the case where x: number and in the case where x: string, then intersect the results to get

(number => number) & (string => string)

This type is more precise than the original—it captures the fact that you always get back a value of the same type that you put in.

Actual TypeScript does not split cases like this. My guess is that this is for performance reasons: for most functions the result type doesn't vary according to the arguments, so it's a waste of effort to synthesize the body for each arm of union args. Maybe it's better not to split cases in synthesis, but only in checking, so you can get it if you need it with a type ascription. (But actual TypeScript doesn't do this either, see below.)

Subtyping intersection types

Since a value of intersection type must satisfy all the parts of the intersection

an intersection type is a subtype of another type when at least one of its parts is a subtype of the other type;
a type is a subtype of an intersection type when it's a subtype of all the parts of the intersection; and
every type is a subtype of unknown (an intersection of no parts).

Here's the code (see isSubtype.ts):

function isSubtype(a: Type, b: Type): boolean {
  if (Type.isNever(a)) return true;
  if (Type.isUnknown(b)) return true;

  if (Type.isUnion(a))        return a.types.every(a => isSubtype(a, b));
  if (Type.isUnion(b))        return b.types.some(b => isSubtype(a, b));

  if (Type.isIntersection(a)) return a.types.some(a => isSubtype(a, b));
  if (Type.isIntersection(b)) return b.types.every(b => isSubtype(a, b));
  ...

(I've left in the cases for unions and never because they show the duality between union and intersection.)

We saw in part 4 that these straightforward subtyping rules for unions are incomplete. A similar issue comes up for intersections—for example

{ foo: { bar: 7 } & { baz: 9 } }
{ foo: { bar: 7 } } & { foo: { baz: 9 } }

contain the same values, and the first is a subtype of the second, but the second is not a subtype of the first according to these rules. Again, actual TypeScript does better here; it detects that each is a subtype of the other. I need to find out how it works!

(By the way, you can check subtyping in the TypeScript playground like so

type IsSubtype<T1, T2> = T1 extends T2 ? true : false

const t: IsSubtype<
  { foo: { bar: 7 } & { baz: 9 } },
  { foo: { bar: 7 } } & { foo: { baz: 9 } }
> = true

If T1 is not a subtype of T2, TypeScript will flag an error that type true is not assignable to type false.)

Checking against intersection types

Recall from part 4 that checking against union types isn't very useful because it doesn't produce good error messages. It works better for intersection types: since an expression of intersection type must satisfy all the parts, we can just check them all (see check.ts):

function check(env: Env, ast: AST.Expression, type: Type): void {
  if (Type.isIntersection(type))
    return type.types.forEach(type => check(env, ast, type));
  ...

Now we can check overloaded function types, like:

(x => x) as ((x: number) => number) & ((x: string) => string)
(x => x) as ((x: number) => number) & ((x: string) => boolean)

When checking fails we get an error mentioning the part that failed.

Actual TypeScript doesn't handle this kind of example; for

const f: ((x: 7) => 7) & ((x: 9) => 9) = (x => x)

it produces

Type '(x: 7 | 9) => 7 | 9' is not assignable to type '((x: 7) => 7) & ((x: 9) => 9)'.
  Type '(x: 7 | 9) => 7 | 9' is not assignable to type '(x: 7) => 7'.
    Type '7 | 9' is not assignable to type '7'.
      Type '9' is not assignable to type '7'

It seems like TypeScript doesn't check the function against the type in this case, but instead synthesizes a type for the function under the assumption that x has type 7|9 (it undoes the rewrite we do in synthesis), then checks subtyping (or assignability in TypeScript terms). I don't know why!

Try it!

You can try out the type checker below. Click on an example button or type an expression in the top box. In the bottom box you'll see a trace of the type checker execution, ending in a synthesized type (or an error). The trace is a tree of function calls; click on a function call to expand the tree under that call, or mouse over a call to highlight the matching return value.

Callbacks passed to Type.map are labelled to match the enclosing function; and since they usually close over the subexpression AST, they're annotated with the AST for context. So the callback in synthBinary for an addition is labelled ...synthBinary[_ + _].

The plan

For the full code of part 5 see https://github.com/jaked/reconstructing-typescript/tree/part5. To view the changes between part 4 and part 5 see https://github.com/jaked/reconstructing-typescript/compare/part4...part5.

Next time we'll finally implement narrowing!

Please email me with comments, criticisms, or corrections.