LLVM: YES!

LLVM deserves every compiler learner to study because it has become the standard for modern compiler construction. Using LLVM, you only need to handle the compiler frontend and use LLVM’s backend tools to generate target codes on different platforms.

Chinese Version on Zhihu: https://zhuanlan.zhihu.com/p/366919983

Here are the important instructions you need to know from this post:

# Compile and emit LLVM IR for your platform:
clang main.c -S -emit-llvm

# Compile and emit LLVM IR for MIPS:
clang main.c -S -emit-llvm -target=mipsel

# Translate LLVM IR to Assembly:
llc -filetype=asm main.ll -o main.s

# Assemble MIPS:
mipsel-linux-musl-gcc main.s -static -o main

# Assemble x86-64:
gcc main.s -o main

# Run MIPS in simulator:
qemu-mipsel ./main

# Run x86-64 directly:
./main

To use LLVM, please install LLVM and Clang. You may install them directly using apt or pacman.

Background

This semester, I take one compiler course, CSC4180, and our project is to implement a compiler frontend for a simplified version of the C language and generate MIPS codes. However, comparing to MIPS, LLVM IR is worth more effort for a system programmer. Therefore, I decide to learn LLVM IR, generate LLVM IR as the output for my compiler, and use LLVM’s toolchain to produce MIPS and x86-64 target codes.

Procedure of LLVM Toolchain

Comparison between One-Pass and LLVM Compiler
Comparison between One-Pass and LLVM Compiler

For the traditional compilers, given the source code, they directly generate the corresponding assembly codes. Even though most of them use IR, it is difficult to build your own compiler reusing those existing tools.

The main contribution of LLVM is that it provides a productive toolchain for compiler construction from frontend to backend. Most tools are in C++ source codes (header files) or binary format, so a C++ programmer should have no problem handling them.

To build the compiler for your language, you only need to develop the frontend and produce LLVM IR codes. Then, you can use the backend tools to translate the LLVM IR to various target assembly codes like MIPS and x86-64. LLVM’s library already provides valuable functions for programmers to produce LLVM IR efficiently.

For my project, I choose to implement functions to produce LLVM IR by myself to practice programming. One main difficulty is that I need to care about the code generation’s sequence to ensure the intermediate value’s ID is incremental by one. For example, for the grammar IF exp0 THEN codeblock0 ELSE codeblock1, I need to manually handle the code generation sequence to inject the label value, which also has an ID. If using LLVM’s library, then we can free ourselves from the ID handling.

After generating LLVM IR, I use llc to compile the LLVM IR into MIPS and x86-64. This job is actually dull and requires doing one-to-one translation, and I believe leaving this part to the existing tool is wise.

One optional step is optimizing the LLVM IR codes using opt. It is common to optimize codes on the IR level, and the main compiler backend research focuses on such optimization. IR is language-independent and platform-independent, so its optimization is reusable.

I have listed some useful shell scripts as follows. Hope you have fun with LLVM!

# Compile and emit LLVM IR for your platform:
clang main.c -S -emit-llvm

# Compile and emit LLVM IR for MIPS:
clang main.c -S -emit-llvm -target=mipsel

# Translate LLVM IR to Assembly:
llc -filetype=asm main.ll -o main.s

# Assemble MIPS:
mipsel-linux-musl-gcc main.s -static -o main

# Assemble x86-64:
gcc main.s -o main

# Run MIPS in simulator:
qemu-mipsel ./main

# Run x86-64 directly:
./main

Useful Resources of LLVM

Personally speaking, I think LLVM’s resources are messy comparing to other languages and frameworks like Python and all those AI stuff. One possible reason might be because the compiler is not that popular as AI. If you would like to contribute to the open-source community, I think LLVM is a good choice.

Through the learning of LLVM, I read the following websites and documents:


To read more programming skills sharing: GEEK Category

Leave a Reply

Your email address will not be published. Required fields are marked *