Introducing Bluefin - A statically-typed reflective programming language

February 24, 2022 · 8 min read

College Junior

Introduction#

After my experience working on the Hermes JavaScript interpreter during my time at the MLH fellowship, I was curious to see what it would be like to develop my own language spec and interpreter. Once fall semester ended, and I started to get bored during winter break, I started work on Bluefin, a simple interpreter in C++. Since December, I have been reading up on computer language concepts, and have patched together a basic interpreter when not too busy with school.

I hope to make this blog a place where I can provide information and tips about building interpreters, and also keep an update of how Bluefin is going. Obviously, I'm already a few months and many hours down the road, so I'll try to use this post to catch up on all thats happened over the past 64 commits.

Starting off#

How to get started#

When I just started I really wasn't sure where I could find information, help, and guidance. Computer languages are pretty complex, and it was hard to find tutorials on even basic aspects of them. Luckily I stumbled upon a few really helpful resources (listed below) that helped me in the right direction. The one resource that I can highly recommend is Ruslan Pivak's blog. Not only does Ruslan go into detail into building an interpreter from scratch, he also provides countless resources to dive further and learn more about implementation patterns.

From there, I got started coding. I learned and utilized CMake, set up a git repo, and got to coding. One thing Ruslan's blog does that I think is fantastic is starting by implementing a simple calculator. This way, you can develop the lexer, parser, and interpreter for your language while working on a smaller but super useful problem.

Helpful resources#

Ruslan's Blog: a great introduction to writing interpreters

https://ruslanspivak.com/lsbasi-part1/

Fundamental Concepts in Programming Languages: An introduction to a more mathmatical and theoretical understanding of computer languages.

https://github.com/awh/compsci-papers/blob/master/Fundamental%20Concepts%20in%20Programming%20Languages.pdf

Language Implementation Patterns: A useful reference with great explainations of general language topics that go beyond interpreters. Also has nice projects. Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages - Terence Parr

The Syntax of C in BNF: This details how to parse C-style languages and was super useful for me.

https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm

The DLang website: Walking through this website inspired me and helped answer some questions about what a computer language should aspire to be.

https://dlang.org/overview.html

Bluefin Objectives#

After looking at other language features, watching some conference presentations, and thinking about what I want out of a language, I brainstormed some ideas for how the development of Bluefin should be guided. Taking inspiration from DLang, I made an outline of the design goals for Bluefin.

Bluefin is developed with a simple set of development goals that outline the intentions of the project into the future. These main ideas are summarized below.

Design goals of Bluefin

Enable and mandate type-safe programming practices.
Make writing code fast and simple to understand.
Allow the code to be self-documenting through readable syntax and strict typing.
Encourage memory-safe code by abstracting memory allocation for the user.
Allow an easy transition from C, C++, or other similar languages by adopting C-like syntax.
Provide a segmentable workflow that prioritizes iterative practices
Support easy and intuitive object-oriented programming primitives.
Use a context-free grammar that is intuitive and performant.
Do not sacrifice performance.
Provide powerful abstraction through introspection.
Ensure Bluefin is open source and accessible.

Each of these goals was crafted to support the fundamental goal of Bluefin, to provide a performant, educational, and syntactically powerful language. The importance of performance to the language is represented in points (8,9,11). In not sacrificing on performance, Bluefin scales to large projects and complex computational tasks. However, usability is incredibly important to Bluefin’s position as an educational computer language. Ease of access and programming, summarized in points (1,2,3,4,5,6,7,8), is guaranteed through maintaining recognizable syntax while simplifying cumbersome aspects of traditional languages. Moreover, by a computer languages Occam’s Razor doing things the best way is always easier than doing them the wrong way. Abstractly, Bluefin can be used in unique ways to other similar languages through iterative code blocks and introspection (10). This transition in development workflow gives users more power over how they want to develop. Finally, Bluefin is organized to give the developer extreme power and control over their code. Through layered abstraction and runtime introspection, Bluefin looks to empower developers in their environments.

General tips#

Now it's time for some general advice for anyone looking to get into writing computer languages. Obviously I'm also a rookie so this isn't a definitive guide, but sometimes it helps to hear from someone who just went through the process.

(1) Maintain documentation for your grammar#

Building up a language from scratch requires tons of iteration. One way to help clarify your progress is by maintaining a living language spec document. Keeping this file, probably in EBNF or BNF, is super useful for debugging and understanding your parser. For this I relied a lot on both the C-language spec and Ruslan's blog.

Once you have this spec, it's easy to match up the rules of the grammar to the procedures in your parser. Additionally, if you ever want to use a parser-generator instead of writing a parser for yourself, this grammar will help.

A short example from Bluefin:

unary_expression = primary_expression                 | unary_operator primary_expression
primary_expression = indentifier                    | constant                    | "(" assignment_expression ")"
assignment_expression = additive_expression                      | unary_expression assignment_operator assignment_expression

(2) Start with a basic calculator#

I already brought this up in the introduction, but starting by writing a basic calculator is an awesome way to start. A super simple calculator doesn't require rigorously splitting your code into a lexer, parser, and interpreter, but starting from a super basic level and refactoring helps realize the importance of these design patterns.

It can also help with your leetcode skills!!

(3) Avoid premature abstraction#

While writing an intrepreter it is easy to get carried away with trying to support all the possible abstractions at each step of the process. For example, it's easy to want to support many primative data types after figuring out the basics of type-handeling and interpreting. However, this race towards abstraction seems to cause unnecessary code complexity for features that can feasible be easily pushed down the road. Currently, Bluefin only supports int and double data types as adding more would only make the code more messy without really improving its underlying functionality.

Avoiding abstraction helps keep your codebase clean and managable, and places importance on developing language features over adding support for all usecases too soon.

(4) Be careful about how data is shared#

In Bluefin, I have objects for the Lexer, Parser, and Interpreter. Dealing with all these abstractions it becomes quite tricky to pin down which instance owns which data. It's important to slowly comb through your code to avoid unnecessary copies and to pass data between objects efficiently.

Bluefin today#

Project setup#

Bluefin is written in C++, built using make and CMake, ran in Docker on Debian, tested with GTest using Git modules, formatted with clang-format, and deployed on Github with CI provided by GitHub Actions.

Features#

Currently, Bluefin can support some basic programs using C-style syntax. I've implemented expression parsing, declaration, and type checking. Here's an example of a syntactically valid Bluefin program to date.

{    int x = 10 % 4 + 3;     /* x = 5 */    double y = x * 2;       /* y = 10 */}

Bad memory management#

One glaring problem still in my pile of TODO's is fixing up the many memory leaks in the parsing and intepreting phases. While constructing the AST, I currently initialize nodes as raw pointers because I need to use these pointers to be able to up and downcast between ASTNodes and ASTNode instances (BinOpNode). I was honestly quite surprised that downcasting in C++ was this difficult and decided to leak memory in the short term to avoid adding too much complication. Soon, I hope to refactor the code with propper pointer and memory management to solve this problem.

Adding CI#

Most recently, I worked through adding CI for building, testing, and providing cover coverage for Bluefin. Testing is really important to maintaining projects as they grow, especially in an interpreter that is constantly getting refactored.

Using GitHub actions, I used the base CMake CI profile and then adapted it to my project. This involved:

Adding a submodule flag so the action imported Gtest
Adding flags to my CMake build so that it could selectively generate code coverage
Running tests using ctest
Installing and running lcov to generate coverage reports
Displaying that coverage statistics using github-actions-report-lcov@1

More info about how I set up the CI is available in my config doc.

Signing off#

The immediate future#

The next step for Bluefin is to add semantic analysis. I'm excited because I think this feature is a big step and gets Bluefin closer to evaluating control statements!