All good computer science departments offer a compilers course, but relatively few make it a required part of the undergraduate curriculum. This post answers the question: Why should you take this course, even if you never plan on writing a compiler?
One of the reasons I’m writing this post is that although I enjoyed the compilers class I took as an undergrad, I had a hard time seeing the practical use. Most of the material seemed either obvious or esoteric. (As a matter of fact there are still wide areas of the compilers literature that I find totally uninteresting.) Anyway, it took several years before I pieced together the actual reasons why this kind of course is broadly useful. Here they are.
Serious programmers have to understand parsers and interpreters because we end up writing little ones all the time. Every time you make a program extensible or deal with a new kind of input file, you’re doing these things. The extreme version of this claim is Greenspun’s 10th law:
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
Given that we spend so much time writing these things, we can either do each one in one-off, hacky way, or we can bring 60 years of theoretical and practical knowledge to bear on the problem, and do it right. The important things to know are: When should you borrow existing code or use an existing tool? When does the theory have something to offer? What principles of language design can be brought to bear on our daily little languages?
A compiler is supposed to correctly translate every valid program in its input language. To meet this goal the compiler developers must understand the entire input language including corner cases never seen by normal programmers. This understanding is an important step towards seeing programming languages are they really are, as opposed to seeing them as they are usually written. For example, my understanding of the C language changed entirely after I learned the details of sequence points, undefined behaviors, and the usual arithmetic conversions. These concepts are second nature to C compiler writers, but largely unknown to beginning and intermediate programmers. It’s not an exaggeration to say that you’ll think about a language quite differently, and a lot more accurately, once you see how the sausage is made. This applies to any programming language, but particularly to the more semantically unclean ones like C and C++.
By understanding a compiler, you’ll end up with a very clear idea about which optimizations are in-scope for a compiler, and also which ones they cannot do, no matter how plausible and simple they seem. You’ll learn what kinds of code constructs commonly block optimization, why this happens, and what to do about it. You’ll learn why some of the world’s most excellent optimizations, such as an FIR filter that uses half of the register file to cache filter coefficients and half of the register file to cache samples, are unlikely to be implemented by any general-purpose optimizer. You and your favorite compiler are a team working together to create fast code; you can cooperate with it in an effective way, or you can fight against it with premature optimization and other silly tricks.
Second, compiler backends are intimately connected to their target architectures, and of course modern architectures are not remotely intended to be friendly targets for human assembly language programmers. By understanding a compiler backend and why it generates the code that it does, you’ll arrive at a better operational understanding of computer architectures.
Compilers (ideally) have three parts:
- language-dependent frontend (parsing, type-checking)
- language and target independent middle end (optimization)
- target-dependent backend (code generation)
In this post I’ve tried to argue that understanding each of these parts has value — even if you’ll never implement or modify them.