In my last post, we
explored how to build WASM modules,
both with and without WASI support, using
Clang. In a comment on
Reddit,
it was mentioned that much of the setup I walked through in that post could be
avoided by just leveraging Zig’s WASI
supprt.
This is a great point, and I would recommend doing the same. The following
command is inarguably simpler than what I described.
$ zig cc --target=wasm32-wasi
However, there are two reasons why knowing how to use Clang for compilation is
useful. First, and most practical, is that I am working on a codebase that uses
Clang for its compiler toolchain, so leveraging Zig is not currently an option.
Second is that understanding the, admittedly more involved, Clang incantations
taught us a little more about what actually goes into a WASM module, and how
that changes when using WASI. In order to know exactly what is inside a WASM
module, we need to crack it open though. That is what we are going to do today!
As a recap, one of the programs we compiled was a simple add()
function, which
accepted two integers and returned their sum.
wasm32_args.c
int add(int a, int b) {
return a+b;
}
We compiled it to a WASM module using the following command.
$ /usr/lib/llvm-17/bin/clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all -o wasm32_args.wasm wasm32_args.c
This produced a binary file which can be recognized as a v1 WASM module.
$ file wasm32_args.wasm
wasm32_args.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)
We can view the hex contents of the file using xxd
.
00000000: 0061 736d 0100 0000 010a 0260 0000 6002 .asm.......`..`.
00000010: 7f7f 017f 0303 0200 0105 0301 0002 063f ...............?
00000020: 0a7f 0141 8088 040b 7f00 4180 080b 7f00 ...A......A.....
00000030: 4180 080b 7f00 4180 080b 7f00 4180 8804 A.....A.....A...
00000040: 0b7f 0041 8008 0b7f 0041 8088 040b 7f00 ...A.....A......
00000050: 4180 8008 0b7f 0041 000b 7f00 4101 0b07 A......A....A...
00000060: a701 0c06 6d65 6d6f 7279 0200 115f 5f77 ....memory...__w
00000070: 6173 6d5f 6361 6c6c 5f63 746f 7273 0000 asm_call_ctors..
00000080: 0361 6464 0001 0c5f 5f64 736f 5f68 616e .add...__dso_han
00000090: 646c 6503 010a 5f5f 6461 7461 5f65 6e64 dle...__data_end
000000a0: 0302 0b5f 5f73 7461 636b 5f6c 6f77 0303 ...__stack_low..
000000b0: 0c5f 5f73 7461 636b 5f68 6967 6803 040d .__stack_high...
000000c0: 5f5f 676c 6f62 616c 5f62 6173 6503 050b __global_base...
000000d0: 5f5f 6865 6170 5f62 6173 6503 060a 5f5f __heap_base...__
000000e0: 6865 6170 5f65 6e64 0307 0d5f 5f6d 656d heap_end...__mem
000000f0: 6f72 795f 6261 7365 0308 0c5f 5f74 6162 ory_base...__tab
00000100: 6c65 5f62 6173 6503 090a 4202 0200 0b3d le_base...B....=
00000110: 0106 7f23 8080 8080 0021 0241 1021 0320 ...#.....!.A.!.
00000120: 0220 036b 2104 2004 2000 3602 0c20 0420 . .k!. . .6.. .
00000130: 0136 0208 2004 2802 0c21 0520 0428 0208 .6.. .(..!. .(..
00000140: 2106 2005 2006 6a21 0720 070f 0b00 3404 !. . .j!. ....4.
00000150: 6e61 6d65 0119 0200 115f 5f77 6173 6d5f name.....__wasm_
00000160: 6361 6c6c 5f63 746f 7273 0103 6164 6407 call_ctors..add.
00000170: 1201 000f 5f5f 7374 6163 6b5f 706f 696e ....__stack_poin
00000180: 7465 7200 6609 7072 6f64 7563 6572 7301 ter.f.producers.
00000190: 0c70 726f 6365 7373 6564 2d62 7901 0c55 .processed-by..U
000001a0: 6275 6e74 7520 636c 616e 673f 3137 2e30 buntu clang?17.0
000001b0: 2e36 2028 2b2b 3230 3233 3132 3039 3132 .6 (++2023120912
000001c0: 3432 3237 2b36 3030 3937 3038 6234 3336 4227+6009708b436
000001d0: 372d 317e 6578 7031 7e32 3032 3331 3230 7-1~exp1~2023120
000001e0: 3931 3234 3333 362e 3737 2900 2c0f 7461 9124336.77).,.ta
000001f0: 7267 6574 5f66 6561 7475 7265 7302 2b0f rget_features.+.
00000200: 6d75 7461 626c 652d 676c 6f62 616c 732b mutable-globals+
00000210: 0873 6967 6e2d 6578 74 .sign-ext
As described in the Binary Format portion of the WASM
specification,
each module is made up of sections. Each section begins with a 1-byte
identifier.
ID (Decimal) | ID (Hex) | Section |
---|---|---|
0 | 0x00 | Custom |
1 | 0x01 | Type |
2 | 0x02 | Import |
3 | 0x03 | Function |
4 | 0x04 | Table |
5 | 0x05 | Memory |
6 | 0x06 | Global |
7 | 0x07 | Export |
8 | 0x08 | Start |
9 | 0x09 | Element |
10 | 0x0a | Code |
11 | 0x0b | Data |
12 | 0x0c | Data Count |
Each section must be present at most once, and they must be provided in-order,
with the exception being Custom sections, for which there may be an arbitrary
number and they may be present anywhere in the file. Every section begins with
its identifier, then an LEB128
variable-length encoded u32
size, followed by the contents of the section. In fact,
all integers in a WASM module are encoded using LEB128.
Decoding LEB128 Integers
LEB128 can be used to encode signed and unsigned integers of arbitrary length.
We will primarily be focused on u32
(unsigned 32-bit) integers today, so we’ll
skip detailing how to decode signed integers. You can find more details on the
previously linked Wikipedia page.
The algorithm for decoding unsigned integers is as follows:
- Take the least significant (lower) 7 bits of the next byte.
- Binary shift the 7 bits to the left by 7 multiplied by the byte number
(initially 0) and bitwiseOR
with previously decoded bits. - If the most significant bit (i.e. the 8th bit) is a
0
, stop decoding.
Otherwise, go to step (1).
As an example, if we had the byte sequence a6 03
, we would decode it using the
following steps.
Take first byte and convert hex to binary.
Take least significant 7 bits.
Shift bits left by 0 (this is the “0th” byte, 7*0 = 0
) and OR
with
previously decoded bits (none decoded yet).
Observe that the 8th bit in 0xa6
is a 1
, so continue to the next byte.
Take least significant 7 bits.
Shift bits left by 7 (this is the “1st” byte, 7*1 = 7
) and OR
with
previously decoded bits.
0000011 -> 0000011 0000000
0000000 0100110 | 0000011 0000000 = 0000011 0100110
Observe that the 8th bit in 0x03
is 0
. We are done. Convert the final result
to decimal.
Now that we know how to interpret integers, let’s start breaking down the
sections.
Preamble
Link to heading
00000000: 0061 736d 0100 0000 .... .... .... .... .asm.......`..`.
Before the first section is the “preamble”, which is how file
was able to
recognize that our binary was a v1 WASM module. The first 4 bytes decode to