From-scratch compiler for embedded systems language

https://github.com/orndorffgrant/lang808c

Post backdated to the time of the project.

This was a course project for my Master's Degree in Computer Engineering at the University of Maryland.

In Fall 2023, I was the only one who signed up for the night class on compilers. As a result, the class became an independent study.

Anyone who has been in this situation before knows that it is a blessing and a curse. On the one hand, I had a lot of freedom to choose what I wanted to do. On the other hand, I had to learn everything on my own. For learning, I read the Dragon Book.

You can read below for an overview, or watch my video presentation of the project. Note that the video presentation was a course deliverable, so I make a point in it to over-emphasize the "from-scratch" nature and the novelty of the features.

Scope

I chose to write a compiler totally from scratch for a simple language that I made up. The language was designed to be simple enough to write a compiler for in a semester, but also have some unique features. Specifically, the programming language supports novel features that make writing firmware for ARM Cortex-M chips easier. For demonstration and feasibility purposes, the compiler is tailored for generating executable code for a specific chip, the SAMD21.

The course number was 808, so I called the language lang808 and the compiler lang808c. I would have preferred to write the compiler in any language that has decent string manipulation and data structures built-in, but I was required to write it in C. Since I was writing it in C, I had some fun with memory management. Each type of object is kept in a static contiguous stack-allocated arena. This didn't scale particularly well, but it was fun to play with and good-enough for this project (and there wasn't a single bug from misuse of malloc or free).

The compiler is truly "from-scratch". Upon execution, lang808c will:

That hex file can then be flashed directly to a SAMD21 chip using a tool like OpenOCD.

Cool features

lang808 contains three special features for ARM firmware development in addition to basic imperative programming support.

The first special feature is a keyword and structure to define a hardware peripheral on an ARM chip that is mapped to a piece of memory. Here is an example with inline comments explaining the feature.

# This is a snippet of the real peripheral definition of the Real Time Clock
# (RTC) for the SAMD21 chip.
# The "MemoryMappedPeripheral" keyword tells the predictive parser that what
# follows is a hardware peripheral definition.
# "RTC" is the name of this peripheral.
# "@0x40001400" tells the compiler that this peripheral's registers are mapped
# to memory with a base address at 0x4001400.
# "!3" tells the compiler that there is an interrupt associated with the RTC
# and that its interrupt vector is the 3rd in the ARM Vector table.
MemoryMappedPeripheral RTC @0x40001400 !3 {
    # Inside the braces, the individual registers of the peripheral are defined
    # This first register is named "ctrl" and it is given the type "BitField16"
    # "BitField16" means that it is 16 bits long and contains multiple fields
    # within it.
    ctrl: BitField16 {
        # Inside the braces after "BitField16", the individual fields of the
        # register are defined.
        # This first field is named "swrst" and it is 1 bit long.
        swrst: 1;
        enable: 1;
        # A field within a BitField can also have enumerated values.
        # The "mode" field here is defined as a "BitEnum2", which means it is
        # 2 bits long, and can have the following enumerated values.
        mode: BitEnum2 {
            # Inside the braces after "BitEnum2", the individual enumerated
            # values are defined.
            # This first value is named "count32" and is the value 0.
            # The values for "mode" can only be referenced by their name as
            # specified here.
            count32 = 0x0;
            count16 = 0x1;
            clock = 0x2;
        };
        # "$unused" is a special keyword when there are bits that are unassigned
        # or reserved and shouldn't be touched by the firmware.
        $unused: 3;
        matchclr: 1;
        prescaler: BitEnum4 {
            div1 = 0x0;
            div2 = 0x1;
            div4 = 0x2;
            div8 = 0x3;
            div16 = 0x4;
            div32 = 0x5;
            div64 = 0x6;
            div128 = 0x7;
            div256 = 0x8;
            div512 = 0x9;
            div1024 = 0xa;
        };
        $unused: 3;
    };
    intenset: BitField8 {
        comp0: 1;
        $unused: 5;
        syncrdy: 1;
        ovf: 1;
    };
    # The "$unused" keyword also works for registers
    $unused: u16;
    # This register in the RTC peripheral is simpler than the previous one.
    # It is named "count" and is just a 32 bit integer.
    count: u32;
    comp0: u32;
}

After this project, I learned about devicetree, which already has a nice syntax for defining these kinds of things. If I were to do this again, I might use devicetree instead of rolling my own syntax.

The second special feature of lang808 is the peripheral initialization support. It allows a simple specification of how hardware peripherals should be initialized when the device is reset. Here is an example with inline comments explaining the feature.

# The "initialize" keyword tells the compiler that what follows is an
# initialization block for a particular hardware peripheral.
# "RTC" tells the compiler that the initialization block is for the RTC
# peripheral.
initialize RTC {
    # Inside the braces, we can have any number of assignments to registers
    # defined in the "RTC" peripheral.
    # This first assignment assigns a 0 to the "count" register.
    count = 0x0;
    comp0 = 0x4000;
    # "intenset" is defined as a BitField8 with fields, so we use a different
    # syntax with braces to assign a value to it.
    intenset = {
        # Inside these braces we can assign values to any of the fields within
        # register. We don't have to include every field. Any field that is not
        # given a value explicitly gets the value 0.
        # Here we only give the "comp0" field the value of 1, and all other
        # fields get the value 0.
        comp0 = 1;
    };
    ctrl = {
        enable = 1;
        # Here is an example of using the name of an enumerated value of a
        # field within a register.
        # "count32" is defined as having the value 0 in the "BitEnum"
        # definition of "mode".
        mode = count32;
        matchclr = 1;
    };
}

The third and final special feature of Lang808 is the ability to easily configure an interrupt vector with a custom function. Here is an example with inline comments explaining the feature.

# The "on_interrupt" keyword tells the compiler that what follows is a function
# that should be configured as the interrupt vector for the interrupt
# associated with a particular hardware peripheral.
# "RTC" tells the compiler that the interrupt is for the RTC peripheral. The
# compiler knows that the RTC's interrupt is 3rd in the ARM Vector table from
# the peripheral definition.
# The compiler will automatically add this function to the ARM Vector table in
# the correct position.
# The compiler will also automatically add initialization code to the reset
# function that enables this interrupt in the ARM Nested Vector Interrupt
# Controller (NVIC).
on_interrupt RTC {
    # The contents of the interrupt are a normal function, so you can do
    # anything you would normally want to do, including use local variables,
    # call functions, use if statements, etc.
    u32 curr_comp = 0;
    u32 next_comp = 0;

    blinkLED();

    curr_comp = RTCMode0.comp0;

    if (going_faster == 1) {
        next_comp = curr_comp - 0x200;
    } else {
        next_comp = curr_comp + 0x200;
    }

    if (next_comp == 0x4000) {
        going_faster = 1;
    }
    if (next_comp == 0x1000) {
        going_faster = 0;
    }

    RTCMode0.comp0 = next_comp;

    RTCMode0.intflag = {
        comp0 = 1;
    };
    
    # At the end of an on_interrupt function, the compiler inserts code
    # that automtically clears the interrupt in the NVIC.
}

Some of the more mundane things that the compiler supports are:

Examples and source code

I include three example programs in the repository that demonstrate the language's features on the SAMD21 Xplained Pro board.

You can view all of the source code for lang808c on github.