Structural typing and algebraic data types in typescript

October 02, 2020

This was a small talk given internally at work. It was a lot of fun to put together, and I learned a lot about the topics myself. It was recorded, but the video includes co-workers' voices, names, and faces, so, to preserve their privacy, I'm not posting the video here.

Instead, here is a lightly edited transcript, created with whisper.cpp, interspersed with the slides and example code.

All right. Thanks, everyone, for coming. This is my dev review on structural typing and algebraic data types. These are some cool type system features and we're going to be looking at them using TypeScript.

For anyone who I haven't met in person at least or only in a stand-up or something, I'm Grant. I'm a software engineer here.

I'm going to be using TypeScript. I don't want this to turn into a TypeScript versus JavaScript debate, though we can totally digress into that after the presentation, if anyone wants to.

All right. So just some signposting.

We're going to talk about why am I even presenting about this. Then we're going to do a refresher on static types, and then we'll go to structural typing, and then to algebraic data types, and then a final example combining them.

So on each topic, I have like two slides and then a bunch of examples, and I'll probably pause for questions after the examples.

So yeah, why even talk about this?

Because like a lot of us, or at least a lot of my peers, I learned Java as my first programming language in college. Well, I learned a couple before that, but Java formally, and that was my first introduction to static typing.

I find that a lot of people have an impression of static typing as this very strict system that prevents the programmer from doing a lot of things, and a lot of people talk about fighting with the type system or fighting with a compiler.

These features, which are not features in Java, for example, but these features help make static type systems more flexible and more fun to work with while still retaining a lot of the benefits of them.

If you haven't seen these before, you probably will in the future, so it might be good to at least have some background knowledge of them, and I kind of learned about them recently.

And I think they're cool, so I thought I'd share.

Why am I using TypeScript for the examples?

Really, just because it's a language that a lot of us at least have some familiarity with. We've used it in some projects here, and a lot of people have at least seen it, Or we've seen JavaScript, and in which case the syntax will look very familiar.

Importantly, TypeScript has both of these features that I want to show off.

Here are some other languages that have these features. The most interesting thing here is the only mainstream language that I could find that has structural typing, but not algebraic data types is Go, so that's just a fun piece of trivia.

So before we get into the fun features, the structural typing and the algebraic data types, we're going to refresh on static types.

I should say I assume that the audience has familiarity with programming languages in general, and at least has some exposure to static types, so we'll try to move through this one quickly, but of course, feel free to stop me and ask questions.

So static typing, feature of compiled languages, really the whole point is that at the compile stage, the compiler can tell you if you're wrong. It can catch mistakes, and the way that it does that is you, while you're coding, you define explicitly the types of things and you define what are the properties that are going to be available on this variable, and what properties do I expect to be available on this parameter, this function, etc.

And by programming that in, the compiler can see that and then can tell you before you make a mistake. It'll catch it before you actually run the code, it'll catch it ahead of time, and prevent you from ever running the code with a mistake in first place.

That's the reason static type systems exist in a nutshell. So we're going to have a couple examples just on static types.

class Cat {
  name: string;
  constructor(name: string) {
    this.name = name;
  }
  meow(): string {
    return `${this.name} says: meow`;
  }
}

const bella = new Cat("bella");
const stella = new Cat("stella");

bella.meow();
// Significance: Compiler will prevent this human error
// stella.bark();

So here I'm defining a type called Cat, and an instance of a Cat. Every Cat has a name, and every Cat knows how to meow(). Pretty straightforward.

And here I'm creating two Cats, bella and stella.

And so what's happening here is that I can call bella.meow(), and the TypeScript compiler can guarantee that this is okay because TypeScript knows that bella is a Cat, and TypeScript knows that every Cat knows how to meow, so it can guarantee me that this is okay.

On the other hand, I'm a human, right, so I'm coding and maybe like three layers of abstraction deep, I forget that stella is a Cat, and I try to get stella to bark(), TypeScript will yell at me. TypeScript will say that's not possible, it will give me this red squiggly line, and it will tell me that's not okay.

(I'm going to be hovering over things a lot as proof that TypeScript knows about something throughout these examples.)

And the significance of that is that the compiler knows this so the compiler can prevent me from making a mistake along these lines. So the compiler will not even let this code run because the compiler knows better than me. The compiler knows that stella cannot bark().

function meowAreYou(cat: Cat) {
  console.log(`${cat.meow()} are you?`);
};

meowAreYou(bella);
meowAreYou(stella);

Now just getting ever so slightly more complicated, we have a function called meowAreYou(), and this function takes one parameter, and we're saying this parameter has to be a Cat. So we're telling TypeScript this has to be a Cat.

And so we can of course call meowareu on both bella and stella, and TypeScript knows that this is okay, it knows that this function requires a Cat, and it knows that bella is a Cat. So it allows us to do this, there's no red squiggly line. And same with stella. Great.

class Dog {
  name: string;
  constructor(name: string) {
    this.name = name;
  }
  bark(): string {
    return `${this.name} says: bark`;
  }
}

const fergie = new Dog("fergie");

// meowAreYou(fergie);

Now we have a new class, a new type called Dog. A Dog also has a name, but a dog doesn't know how to meow(); a dog only knows how to bark().

And so we can create a dog, and then we can try to pass this to the meowAreYou function.

But we can see that if we uncomment this, TypeScript knows better than us. TypeScript can tell us fergie is not a Cat. fergie does not know how to meow(), so you cannot pass it to this.

So TypeScript will prevent us from ever making this mistake.

All right, so next up, we have structural typing. And so in order to talk about structural typing, I want to have something to compare it against. And so what I kind of would consider is the status quo of typing would be nominal typing.

So in nominal typing, it might be what you're used to. It's like what Java has, for example. And so we're just going to walk through a scenario here, and we're going to pretend that we're the compiler.

The human is written a function that requires a Vehicle as a parameter. And so we as the compiler, we need to decide what counts as a Vehicle.

What are we going to allow the human to pass into this function?

Well, if the human tries to pass a Vehicle into this function, that's obviously okay. A Vehicle is a Vehicle. Okay, that's obvious.

What if they try to pass a Boulder? Well, the name Boulder is not the same as the name Vehicle. And we're a nominal typing compiler, so we're just going to look at that name. Well, that's not the same. So we're not going to let the human pass in a Boulder as a Vehicle.

Okay, what about a Truck? If the human is passing in a Truck as a Vehicle? Well, the human happened to explicitly say, told us that a Truck extends Vehicle. The human is saying that it does count as a Vehicle, and the human is guaranteeing that to us. Well, in that case, the human said a Truck is a Vehicle and we know that a Vehicle is a Vehicle, so we'll let the human pass in a Truck as a Vehicle.

That's kind of like a super simplified version of what is a nominal typing.

So what is structural typing? Here's a Wikipedia quote. But structural typing kind of goes one layer deeper.

Instead of just looking at the name, it actually looks at what are the properties that are required to be a Vehicle and does what the human is passing in -- Does that have all of those required properties?

So I might actually look at that Boulder. And be like, well, the Boulder has all of the same fields as a Vehicle and has all the same methods as a Vehicle. So even though the name Boulder is not the same as the name Vehicle, I've looked at it as the compiler and I can tell that it's compatible, so I'm going to allow a Boulder to act as a Vehicle.

// Classic OOP example using structural typing for implicit
// polymorphism
type Vehicle = {
  color: string;
};

function drive(v: Vehicle, distance: number) {
  console.log(`driving ${distance} units in a ${v.color} vehicle`);
};

So here we're saying the type Vehicle is an object with a color. That's it. And we're defining a function here called drive(). And the first parameter of drive() has to be a Vehicle. Cool.

// Straightforward instantiation and use
const vehicle: Vehicle = {
  color: "purple",
};

drive(vehicle, 12);

So here's just the obvious straightforward case. We can explicitly say, we can explicitly create a Vehicle, color, purple, and pass in to drive().

Of course, this works. There is no red squiggly line. We've explicitly said this is a Vehicle. Of course, we can use it as a Vehicle.

// Objects with correct "structure" are valid
const obj = {
  color: "blue",
};

drive(obj, 10);

Now here, it starts to get a little bit more interesting. Okay. So here's an object. We have not told the compiler that this is a Vehicle. We've just said it's an object. And we've assigned it a literal that has a color that is a string. It's blue.

And so the TypeScript compiler looks at this object and sees, well, okay, it's not explicitly defined as a Vehicle, but it has a color. And that's all that a Vehicle needs. So yeah, we can totally use this as a Vehicle.

And there's no red squiggly line here. We can totally drive() this blue object.

// Objects with correct "structure" _and extra propertes_ are valid
const obj2 = {
  color: "green",
  hasLeatherSeats: true,
};

drive(obj2, 10);

All right. One more step forward. This object is not explicitly defined as a Vehicle, but it has a color. So we know that works.

But then it also has this other thing. It has leather seats. And so this is just to show this extra stuff doesn't get in the way. TypeScript sees that it has the necessary property called color. And it kind of ignores this extra stuff. And TypeScript knows that we can use this as a Vehicle.

// Now for a another type
// Note we do not say "extends Vehicle" anywhere
type Truck = {
  color: string;
  carryingCapacity: number;
};

function carry(t: Truck, load: number) {
  if (load > t.carryingCapacity) {
    console.log(`failed to carry ${load} units in a ${t.color} truck`);
  } else {
    console.log(`carrying ${load} units in a ${t.color} truck`);
  }
};

Okay. Now we have another type, a new type called Truck. Truck has a color just like Vehicle. And it also has a carrying capacity. And now we're not saying anywhere that Truck counts as a Vehicle.

But the Truck type does happen to have the same property, color, that a Vehicle requires. And now here we define a function that requires specifically a Truck.

const truck: Truck = {
  color: "red",
  carryingCapacity: 8,
};

carry(truck, 7);

So here's just the super straightforward case. Again, not, nothing interesting going on here. Explicitly defining a Truck. And we can carry it. There's no red squiggly line. It's just kind of to prove that the function is working and TypeScript is working.

// Is a Truck a Vehicle?
// Uncomment to find out
// drive(truck, 15);

Okay. Now here's a question. Is a Truck a Vehicle? Uncomment to find out.

Yes, a Truck is a Vehicle.

Just like those objects above the compiler sees that Truck has the required color property. And we can totally use this as a Vehicle.

// Is a Vehicle a Truck?
// Uncomment to find out
// carry(vehicle, 10);

And now is a Vehicle a Truck?

Well, no.

That's actually giving us a red squiggly line. And so why not? So in a nominal type system, this error might stop here. It would say a Vehicle doesn't count as a Truck.

But TypeScript looks at the definition of the Vehicle and it looks at the definition of a Truck and says, well, a Truck needs a carrying capacity. And a Vehicle does not have a carrying capacity. So it goes, it goes one step further and it tells you why, why does a Vehicle not count as a Truck?

So that's in my mind, one of the nice advantages of a structural typing is that it goes that extra step and it just has more helpful error messages. It tells you: why does this Vehicle not count? Oh, it doesn't have a carrying capacity.

This is the end of the structural type in section. You might hear my cats racing around behind the causing noise. All right.

Algebraic data types. These are the most fun, I think.

So what are algebraic data types from a high level? They allow you to combine existing types to create new types. They give you operations on types.

So one thing you might do is you'll combine a Cat and a Dog to create a CatDog type.

Or maybe slightly more practically, you'll combine a TextInput and a NumberInput to create a new type called GenericInput.

And I think there are some other kinds of algebraic data types, but the main ones are products and sums. So that's what we're going to talk about.

So the product type, I want you to think "AND" on this one.

The product of a Cat and Dog, we'll call it CatDog, has all of the fields of Cat and all of the fields of Dog. It is simultaneously a complete Cat and a complete Dog.

So not really like the cartoon. This is a little bit different.

And in TypeScript, this is called an intersection and the operator is at a single ampersand.

Sum types, I want you to think "OR".

And so we're creating variants of the type. So if we take the sum of TextInput and NumberInput, we'll end up with a type that could be either a TextInput or a NumberInput.

And at some point, when you're processing things, you might need to know, okay, we have this thing that is either a TextInput or a NumberInput. At some point, you usually need to know, okay, which one is it? Is it a TextInput or is it a NumberInput? Because that matters at some point.

You distinguish these using a tag. And it's pretty straightforward. You'll see the examples.

And this is just kind of for context: Some languages have these tags built in. And you kind of don't need to worry about it. You'll just automatically be able to distinguish using a match pattern, or pattern matching, for example.

In TypeScript, we don't have that luxury. So in TypeScript, these are called an untagged union, which means that we need to do the tagging ourselves. And the operator is a single pipe.

// Product Types

type Mammal = {
  name: string;
  lungCapacity: number;
};
type OceanDweller = {
  name: string;
  favoriteOcean: string;
};

type OceanDwellingMammal = Mammal & OceanDweller;

All right. So just some, I think these are mostly straightforward examples.

Here's our product type down here. It's an OceanDwellingMammal. And it is the product of a Mammal and an OceanDweller. And I'm define these up here, a Mammal has a name and a lung capacity. OceanDweller has a name and a favorite ocean. TypeScript is smart enough to notice that the name is the same in both of these things. So these are compatible.

So OceanDwellingMammal requires a name, a long capacity, and a favorite ocean.

const aardvark: Mammal = {
  name: "aardvark",
  lungCapacity: 1,
};
// Is an aardvark an OceanDwellingMammal?
// const oceanAardvark: OceanDwellingMammal = aardvark;

const tuna: OceanDweller = {
  name: "tuna",
  favoriteOcean: "pacific",
};
// Is a tuna an OceanDwellingMammal?
// const airLovingTuna: OceanDwellingMammal = tuna;

All right. We'll move through these. These are pretty straightforward.

If we just have a Mammal, say an aardvark, TypeScript will prevent us from ever using it as an OceanDwellingMammal because it doesn't have a favorite ocean.

And we're already starting to see how the structural typing comes into play here in TypeScript, too, because it can tell us specifically this aardvark does not have a favorite ocean.

And similarly, Tuna is not an OceanDwellingMammal because it's only an OceanDweller and it's not a Mammal.

const dolphin = {
  name: "dolphin",
  lungCapacity: 4,
  favoriteOcean: "atlantic",
};

// Is a dolphin an OceanDwellingMammal?
// const o: OceanDwellingMammal = dolphin;

And of course, just have the happy path.

Our dolphin has all three and it is a valid OceanDwellingMammal. Their TypeScript is not giving us any red lines in the technique.

// Sum Types

type SuccessfulString = {
  status: "success";
  value: string;
};

type Failure = {
  status: "fail";
  err: Error;
};

type StringResult = SuccessfulString | Failure;

function generateDateStringOnlyIfEven(): StringResult {
  const now = new Date();
  if (now.getMilliseconds() % 2 === 0) {
    return {
      status: "success",
      value: now.toISOString(),
    };
  } else {
    return {
      status: "fail",
      err: new Error("not an even millisecond"),
    };
  }
}

Cool. So now I'm going to move on to sum types.

And one place where sum types become useful is in results that could succeed or fail. And so the result of a function could be two totally different things. But we don't know ahead of time.

So here I'm defining a sum type, which I'm calling StringResult, which could be either a SuccessfulString or a Failure. And so here I've defined these two: SuccessfulString and Failure.

And I'm using the status property here. This is the tag that I was talking about earlier. And this is what I mean by saying we have to manually tag these in TypeScript. I have to explicitly say that a SuccessfulString has a status of success and a Failure has a status of fail.

Notice that everything else about these two types is completely different. A SuccessfulString has a value that is a string and a Failure has an error.

Cool. So now we can use this when writing a function. We have a unnecessarily opinionated function here that refuses to give you a date if the number of milliseconds is odd, because why not?

And so we've told TypeScript up front that we're writing a function that's going to return a StringResult.

And then if the milliseconds is even, we can return one variant, a SuccessfulString. And TypeScript has looked at the structure of this object literal that we're returning. And it can tell that this is a valid SuccessfulString. And it knows that a SuccessfulString is one of the options for StringResult. So TypeScript is allows us to do this.

Similarly, if the number of milliseconds is odd, we want to fail. So we return status fail and with an error. And TypeScript can see the structure, knows that it counts as a Failure. And that Failure is one of the valid variants for StringResults, which is what we want to return. So TypeScript is totally okay with this entire function.

const s = generateDateStringOnlyIfEven();

if (s.status === "success") {
  console.info(s.value);
} else {
  console.error(s.err);
}

But now, and now here's the key part of this sum type business is when we want to actually use it. We call this function and we end up with a StringResult.

When we get this, all we know is that s is a StringResult. We don't know anything else about it. So the only thing that we know that we can access on is status. And TypeScript can even tell you it's either going to be success. But notice it's not auto completing value or error or anything else here, because all we know so far is that it's a StringResult.

Now, if we check the tag, if we say, well, if the status is equal to success, well, then TypeScript is smart enought to know. Now we know which variant we are. So now TypeScript knows that inside, only inside of this if block, s is a SuccessfulString. And so it can guarantee to us that we can access the value property on the SuccessfulString.

And because there's only one other variant of StringResult, TypeScript can guarantee to us that in this else block, the s must be a Failure. So it can guarantee us that we can access the error property on this Failure.

// Combining everything to do some form processing

type FieldCommon = {
  label: string;
  description: string;
};

type TextField = FieldCommon & {
  type: "text";
  defaultValue: string;
  multiline: boolean;
};
type NumberField = FieldCommon & {
  type: "number";
  defaultValue: number;
  min: number;
  max: number;
};
type CheckboxField = FieldCommon & {
  type: "checkbox";
  defaultValue: boolean;
};

type Field = TextField | NumberField | CheckboxField;

type Form = {
  title: string;
  fields: Field[];
};

All right. So here, now I tried to combine these things and take it away from like ultra current contrived examples to just like somewhat contrived examples.

And now we're going to look at a form and it's just going to combine a lot of what we talked about.

So a TextField right here. This is a product type. This has all of the properties of what I'm calling a FieldCommon. And all of these properties. So a TextField has a label, a description, a type which is a constant string text. It also has a default value, which is a string and a multiline boolean. TextField has all those things.

A NumberField similarly has all the fields of FieldCommon and a couple other things. And the same with a CheckboxField.

And now we can have a sum type just called Field, which might be either a TextField or a NumberField or a CheckboxField.

And then we can just put all those into something to wrap them and we'll call that a Form. So a Form has a title and a list of fields.

So that's already kind of cool. We've described a pretty complex data structure using product types and sum types.

function renderForm(form: Form): string {
  let html = "";
  html += `<h1>${form.title}</h1>\n\n`;

  for (const field of form.fields) {

    html += "<hr>\n";
    html += `<label>${field.label}</label><br>\n`;

    if (field.type === "checkbox") {
      html += `<input type="checkbox" ${field.defaultValue ? "checked" : ""}/><br>\n`;
    } else if (field.type === "number") {
      html += `<input type="number" min="${field.min}" max="${field.max}" value="${field.defaultValue}"/><br>\n`;
    } else if (field.type === "text") {
      if (field.multiline) {
        html += `<textarea>${field.defaultValue}</textarea><br>\n`;
      } else {
        html += `<input value="${field.defaultValue}" /><br>\n`;
      }
    }

    html += `<span>${field.description}</span><br>\n\n`;
  }

  html += "<hr>";

  return html;
}

And now I can show you some code that actually uses this.

So let's say we want to render this form--

And I should say upfront: I am creating HTML with string concatenation here. So don't try this at home. It's not fun.

But in this function, we can iterate over these fields. And inside of this for loop, out here, all TypeScript knows is that this field is a Field. So it doesn't know what kind of field it is. But it knows that all of the kinds of fields, all of those variants, they all have a label. So we can go ahead and use that here, even though we only know this is a Field.

Then we can check the tag, the type. And so inside of this if block, we know that a field is a CheckboxField. And this line isn't particularly interesting.

But the next line is more interesting. Inside of this block, we know that a field is a NumberField. So we can go ahead and access the min and the max of this field. And TypeScript can guarantee to us that those will be there.

And similarly for the TextField inside of this if block, TypeScript knows that a field is a TextField. So we can go ahead and access the multiline property.

const htmlForm = renderForm({
  title: "Survey: Giraffes, too tall?",
  fields: [
    {
      label: "How many feet tall is the average giraffe?",
      description: "You don't have to be correct, just guess.",
      type: "number",
      defaultValue: 23,
      min: 20,
      max: 50,
    },
    {
      label:
        'Do you agree with the following statement: "Zebras\' greatness are overshadowed by Giraffes."',
      description: "No pun intended.",
      type: "checkbox",
      defaultValue: true,
    },
    {
      label: "Describe your general attitude towards giraffes.",
      description:
        "Just remember that they will always block your view at a concert.",
      type: "text",
      defaultValue: "They're pretty tall.",
      multiline: true,
    },
  ],
});

console.log(htmlForm);

And then, just to show the other side of it, we can create a kind of, and this is another place where the structural typing comes into play, I've just created here a huge literal object that is a totally statically typed Form object.

And it's just passed directly into renderForm. So it has a title and it has a list of fields. And each field is statically typed with a label description type and then other values.

And so just to show you what this looks like, we'll add something and you can see what the type system can do for you while you're coding.

With nothing in it, obviously it's not a valid field. So we might add a label and a description. It can give us this autocomplete because it knows that no matter what, any field has a description.

And then maybe we'll do type. And now it knows that this object has to be one of these three different variants. And it knows that this type field has to be one of these three different things. So we'll choose number.

Now that we've defined that, now, well, it's a little hard to see, not the prettiest error message, but it can tell us that, as it currently exists, it's not assignable to type NumberField. Because it knows that at this point, we've defined the tag "number". So it knows that we're trying to create a NumberField. And so it can tell us what it's wrong. And it's a little hard to read, but it says we're missing the following properties, defaultValue, min, and max.

So it won't let us finish yet and it makes us do the default value. And after I finish this, I'm planning to us, because this is now a correct number field. And the static type system knows that. So it will allow this to run.

And that's about it! That's all I had prepared.

I'll send out the presentation and all the code that I showed. You can run all of it. It all runs. So you can verify that I'm not lying to you. But yeah, that's all I had. Thanks everyone for coming.