Friday, April 19, 2024
Google search engine
HomeUncategorizedUnderstanding every byte in a WASM module

Understanding every byte in a WASM module

In my last post, we
explored how to build WASM modules,
both with and without WASI support, using
Clang. In a comment on
Reddit
,
it was mentioned that much of the setup I walked through in that post could be
avoided by just leveraging Zig’s WASI
supprt
.
This is a great point, and I would recommend doing the same. The following
command is inarguably simpler than what I described.

$ zig cc --target=wasm32-wasi

However, there are two reasons why knowing how to use Clang for compilation is
useful. First, and most practical, is that I am working on a codebase that uses
Clang for its compiler toolchain, so leveraging Zig is not currently an option.
Second is that understanding the, admittedly more involved, Clang incantations
taught us a little more about what actually goes into a WASM module, and how
that changes when using WASI. In order to know exactly what is inside a WASM
module, we need to crack it open though. That is what we are going to do today!

As a recap, one of the programs we compiled was a simple add() function, which
accepted two integers and returned their sum.

wasm32_args.c

int add(int a, int b) {
  return a+b;
}

We compiled it to a WASM module using the following command.

$ /usr/lib/llvm-17/bin/clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all  -o wasm32_args.wasm wasm32_args.c

This produced a binary file which can be recognized as a v1 WASM module.

$ file wasm32_args.wasm 
wasm32_args.wasm: WebAssembly (wasm) binary module version 0x1 (MVP)

We can view the hex contents of the file using xxd.

00000000: 0061 736d 0100 0000 010a 0260 0000 6002  .asm.......`..`.
00000010: 7f7f 017f 0303 0200 0105 0301 0002 063f  ...............?
00000020: 0a7f 0141 8088 040b 7f00 4180 080b 7f00  ...A......A.....
00000030: 4180 080b 7f00 4180 080b 7f00 4180 8804  A.....A.....A...
00000040: 0b7f 0041 8008 0b7f 0041 8088 040b 7f00  ...A.....A......
00000050: 4180 8008 0b7f 0041 000b 7f00 4101 0b07  A......A....A...
00000060: a701 0c06 6d65 6d6f 7279 0200 115f 5f77  ....memory...__w
00000070: 6173 6d5f 6361 6c6c 5f63 746f 7273 0000  asm_call_ctors..
00000080: 0361 6464 0001 0c5f 5f64 736f 5f68 616e  .add...__dso_han
00000090: 646c 6503 010a 5f5f 6461 7461 5f65 6e64  dle...__data_end
000000a0: 0302 0b5f 5f73 7461 636b 5f6c 6f77 0303  ...__stack_low..
000000b0: 0c5f 5f73 7461 636b 5f68 6967 6803 040d  .__stack_high...
000000c0: 5f5f 676c 6f62 616c 5f62 6173 6503 050b  __global_base...
000000d0: 5f5f 6865 6170 5f62 6173 6503 060a 5f5f  __heap_base...__
000000e0: 6865 6170 5f65 6e64 0307 0d5f 5f6d 656d  heap_end...__mem
000000f0: 6f72 795f 6261 7365 0308 0c5f 5f74 6162  ory_base...__tab
00000100: 6c65 5f62 6173 6503 090a 4202 0200 0b3d  le_base...B....=
00000110: 0106 7f23 8080 8080 0021 0241 1021 0320  ...#.....!.A.!. 
00000120: 0220 036b 2104 2004 2000 3602 0c20 0420  . .k!. . .6.. . 
00000130: 0136 0208 2004 2802 0c21 0520 0428 0208  .6.. .(..!. .(..
00000140: 2106 2005 2006 6a21 0720 070f 0b00 3404  !. . .j!. ....4.
00000150: 6e61 6d65 0119 0200 115f 5f77 6173 6d5f  name.....__wasm_
00000160: 6361 6c6c 5f63 746f 7273 0103 6164 6407  call_ctors..add.
00000170: 1201 000f 5f5f 7374 6163 6b5f 706f 696e  ....__stack_poin
00000180: 7465 7200 6609 7072 6f64 7563 6572 7301  ter.f.producers.
00000190: 0c70 726f 6365 7373 6564 2d62 7901 0c55  .processed-by..U
000001a0: 6275 6e74 7520 636c 616e 673f 3137 2e30  buntu clang?17.0
000001b0: 2e36 2028 2b2b 3230 3233 3132 3039 3132  .6 (++2023120912
000001c0: 3432 3237 2b36 3030 3937 3038 6234 3336  4227+6009708b436
000001d0: 372d 317e 6578 7031 7e32 3032 3331 3230  7-1~exp1~2023120
000001e0: 3931 3234 3333 362e 3737 2900 2c0f 7461  9124336.77).,.ta
000001f0: 7267 6574 5f66 6561 7475 7265 7302 2b0f  rget_features.+.
00000200: 6d75 7461 626c 652d 676c 6f62 616c 732b  mutable-globals+
00000210: 0873 6967 6e2d 6578 74                   .sign-ext

As described in the Binary Format portion of the WASM
specification
,
each module is made up of sections. Each section begins with a 1-byte
identifier.

ID (Decimal) ID (Hex) Section
0 0x00 Custom
1 0x01 Type
2 0x02 Import
3 0x03 Function
4 0x04 Table
5 0x05 Memory
6 0x06 Global
7 0x07 Export
8 0x08 Start
9 0x09 Element
10 0x0a Code
11 0x0b Data
12 0x0c Data Count

Each section must be present at most once, and they must be provided in-order,
with the exception being Custom sections, for which there may be an arbitrary
number and they may be present anywhere in the file. Every section begins with
its identifier, then an LEB128
variable-length encoded u32 size, followed by the contents of the section. In fact,
all integers in a WASM module are encoded using LEB128.


Decoding LEB128 Integers

LEB128 can be used to encode signed and unsigned integers of arbitrary length.
We will primarily be focused on u32 (unsigned 32-bit) integers today, so we’ll
skip detailing how to decode signed integers. You can find more details on the
previously linked Wikipedia page.

The algorithm for decoding unsigned integers is as follows:

  1. Take the least significant (lower) 7 bits of the next byte.
  2. Binary shift the 7 bits to the left by 7 multiplied by the byte number
    (initially 0) and bitwise OR with previously decoded bits.
  3. If the most significant bit (i.e. the 8th bit) is a 0, stop decoding.
    Otherwise, go to step (1).

As an example, if we had the byte sequence a6 03, we would decode it using the
following steps.

Take first byte and convert hex to binary.

Take least significant 7 bits.

Shift bits left by 0 (this is the “0th” byte, 7*0 = 0) and OR with
previously decoded bits (none decoded yet).

Observe that the 8th bit in 0xa6 is a 1, so continue to the next byte.

Take least significant 7 bits.

Shift bits left by 7 (this is the “1st” byte, 7*1 = 7) and OR with
previously decoded bits.

0000011 -> 0000011 0000000
0000000 0100110 | 0000011 0000000 = 0000011 0100110

Observe that the 8th bit in 0x03 is 0. We are done. Convert the final result
to decimal.


Now that we know how to interpret integers, let’s start breaking down the
sections.

Preamble


Link to heading

00000000: 0061 736d 0100 0000 .... .... .... ....  .asm.......`..`.

Before the first section is the “preamble”, which is how file was able to
recognize that our binary was a v1 WASM module. The first 4 bytes decode to
asm, with the next 4 bytes indicating the version. WASM is
little-endian, meaning that the
least significant byte is first. Therefore, the version number is 1.

0100 0000 -> 0000 0001 -> 1

The first section begins following the preamble.

Type Section


Link to heading

00000000: .... .... .... .... 010a 0260 0000 6002  .asm.......`..`.
00000010: 7f7f 017f .... .... .... .... .... ....  ...............?

We can identify the first section as the Type section, as indicated by the first
byte 01. Following the identifier is the size, to which we can apply our LEB128
decoding.

0x0a -> 00001010 -> 0001010 -> 10

This informs us that the contents of this section should be 10 bytes in size.
The WASM specification describes the Type section contents as a vec of
functype. Vectors are simply an LEB128 encoded u32 length followed by a
sequence of their specified element type. The first byte is 02, which by now
you can probably recognize as 2 without needing to actually perform LEB128
decoding. This means that we should see 2 functype elements next.

Function types are prefixed with 0x60, then two vec, one for parameter
types, and one for return types, follows. We have 00 00 for the first
functype. Remembering that vec are prefixed with length. This is essentially
saying that our first function takes 0 parameters and returns 0 values. We can
use the wasm2wat
tool
to verify.

$ wasm2wat --enable-annotations wasm32_args.wasm | head -2 | tail -1
  (type (;0;) (func))

The next functype does have parameter and result types. There are two
parameters (0x02), each encoded as 0x7f, which corresponds to a signed
32-bit integer (i32). There is one return type (0x01), which is also an
i32 (0x7f). Though we don’t have symbol information about this function yet,
given that it is the last one defined we can safely assume this is our add().

$ wasm2wat --enable-annotations wasm32_args.wasm | head -3 | tail -1
  (type (;1;) (func (param i32 i32) (result i32)))

Function Section


Link to heading

00000010: .... .... 0303 0200 01.. .... .... ....  ...............?

Next is the Function section (0x03). We do not have an Import section (0x02)
in this module because we did not import any symbols. Because sections must be
provided in-order, we can be certain that no Import section will be provided now
that we have seen the Function section.

The size of this section is 3 (0x03), and the contents are specified as a
vec of typeidx (type index). A type index is a u32, which will once again
be LEB128 encoded. Our vec begins with 0x02, so we should expect two type
indices. The following two bytes, 0x00 and 0x01, correspond to the entries
in our previously detailed Types section.

At this point, we have two functions, each with their own type signature. For
this specific module, the Type and Function sections may seem redundant.
However, consider if we had another function in our module.

int multiply (int a, int b) {
  return a*b
}

multiply() has the same function signature as add(), meaning that we would
have only one entry for (i32, i32) i32 in the Type section, then two entries
in the Function section that referenced the type signature by the corresponding
index. Compiling the new program with multiply() added results in an
indentical preamble and Type section, but we can see that the Function section
(0x03) now has length 4 (0x04), with a type index vec of length 3
(0x03), and three type index entries (0x00, 0x01, 0x01) with the latter
two referring to the same type signature.

00000000: 0061 736d 0100 0000 010a 0260 0000 6002
00000010: 7f7f 017f 0304 0300 0101 .... .... ....

Memory Section


Link to heading

00000010: .... .... .... .... ..05 0301 0002 ....  ...............?

We once again skip a section that is not present in our module (Table with ID
0x04), and move on to Memory (0x05). The Memory section contains a vec of
memories (mem), which are made up of limits. In the current WASM
specification, only one memory may be defined. Our Memory section has size of
3 bytes (0x03), and, as expected, the first byte (0x01) specifies that the
length of the vec is 1. A limit can include both a maximum and a minimum
size (both LEB128 encoded u32), as indicated by the first byte. In our case,
the first byte is 0x00, which means that only a minimum size will be defined,
and the maximum is free to grow to any size. If the first byte was 0x01, both
a minimum and a maximum would be defined.

The threads
proposal

extends limits to be allow specifying whether memory is shared or unshared.

In our module, the minimum memory size is 2 (0x02).

Global Section


Link to heading

00000010: .... .... .... .... .... .... .... 063f  ...............?
00000020: 0a7f 0141 8088 040b 7f00 4180 080b 7f00  ...A......A.....
00000030: 4180 080b 7f00 4180 080b 7f00 4180 8804  A.....A.....A...
00000040: 0b7f 0041 8008 0b7f 0041 8088 040b 7f00  ...A.....A......
00000050: 4180 8008 0b7f 0041 000b 7f00 4101 0b..  A......A....A...

The Global section (0x06) is next, with a size of 0x3f. With a larger size,
we can quickly check our LEB128 decoding to ensure that we don’t need to
consider subsequent bytes.

The 8th bit is 0, so we can simply convert to the decimal value of 63 for
our size. The section includes a vec of globals (global), where each
global consists of a type (globaltype) and expression (expr). A
globaltype is made up of a value type (valtype) and a 1-byte flag (mut)
indicating whether the value is mutable or not.

An expression is encoded by a sequence of instructions (instr) with a
terminating byte (0x0b) specifying the end of the sequence. The byte following
the section size, 0x0a, informs us that 10 globals will be defined in the
vec. We can easily
extract the first one by looking for the first instance of 0x0b.

The 0x7f should be familiar at this point as an i32, which is the
globaltype of this global. The following byte, 0x01, marks it as mutable
(mut).

This is followed by the initialization expression, which includes the first
instruction we have seen. 0x41 is the opcode for i32.const, which simply
returns a static i32 constant, which is specified by the following bytes.
We’ll need to use our LEB128 decoding to interpret it.

Take the first byte.

Take the least significant 7 bits and shift left 0 bits.

The 8th bit is a 1 so take the next byte.

Take the least significant 7 bits and shift left 7 bits.

10001000 -> 0001000 0000000

OR with previously decoded bits.

0000000 0000000 | 0001000 0000000 = 0001000 0000000

The 8th bit is a 1 so take the next byte.

Take the least significant 7 bits and shift left 14 bits.

00000100 -> 0000100 0000000 0000000

OR with previously decoded bits.

0000000 0001000 0000000 | 0000100 0000000 0000000 = 0000100 0001000 0000000 

The 8th bit is a 0 so we are done.

0000100 0001000 0000000 -> 66560

We can verify our decoding using wasm2wat again.

The $__stack_pointer symbol name will be found in a later section.

$ wasm2wat --enable-annotations wasm32_args.wasm | head -34 | tail -1
  (global $__stack_pointer (mut i32) (i32.const 66560))

The same process can be applied to all globals in the vec, as shown in the
textual representation.

$ wasm2wat --enable-annotations wasm32_args.wasm | head -43 | tail -10
  (global $__stack_pointer (mut i32) (i32.const 66560))
  (global (;1;) i32 (i32.const 1024))
  (global (;2;) i32 (i32.const 1024))
  (global (;3;) i32 (i32.const 1024))
  (global (;4;) i32 (i32.const 66560))
  (global (;5;) i32 (i32.const 1024))
  (global (;6;) i32 (i32.const 66560))
  (global (;7;) i32 (i32.const 131072))
  (global (;8;) i32 (i32.const 0))
  (global (;9;) i32 (i32.const 1))

Export Section


Link to heading

00000050: .... .... .... .... .... .... .... ..07  A......A....A...
00000060: a701 0c06 6d65 6d6f 7279 0200 115f 5f77  ....memory...__w
00000070: 6173 6d5f 6361 6c6c 5f63 746f 7273 0000  asm_call_ctors..
00000080: 0361 6464 0001 0c5f 5f64 736f 5f68 616e  .add...__dso_han
00000090: 646c 6503 010a 5f5f 6461 7461 5f65 6e64  dle...__data_end
000000a0: 0302 0b5f 5f73 7461 636b 5f6c 6f77 0303  ...__stack_low..
000000b0: 0c5f 5f73 7461 636b 5f68 6967 6803 040d  .__stack_high...
000000c0: 5f5f 676c 6f62 616c 5f62 6173 6503 050b  __global_base...
000000d0: 5f5f 6865 6170 5f62 6173 6503 060a 5f5f  __heap_base...__
000000e0: 6865 6170 5f65 6e64 0307 0d5f 5f6d 656d  heap_end...__mem
000000f0: 6f72 795f 6261 7365 0308 0c5f 5f74 6162  ory_base...__tab
00000100: 6c65 5f62 6173 6503 09.. .... .... ....  le_base...B....=

The Export section (0x07) consists of a vec of export, with each
containing a name and an export description (exportdesc). The name is a
vec of byte, while the exportdesc contains a 1-byte prefix indicating the
type of export, followed by an index to the appropriate section where the export
is defined.

I’ll leave it to the reader to LEB128 decode 0xa701 as the section size (167
bytes). The first byte following the section size, 0x0c, indicates that the
vec will contain 12 exports. The next byte, 0x06, is the first byte of the
first export, and thus is defining the length of its name as 6. The following
6 bytes can be converted to UTF-8
characters
.

6d 65 6d 6f 72 79 -> memory

Because we compiled with -Wl,--export-all, all symbols will be exported. In
this case, memory is referring to the first element in the vec in our Memory
section. The export type prefixes are defined as follows.

ID (Hex) Export
0x00 Function
0x01 Table
0x02 Memory
0x03 Global

As expected, the prefix following memory is 0x02. The next byte 0x00,
specifies that this export corresponds to the first memory in the vec. The
next two exports are our functions. The first is __wasm_call_ctors, which,
following the name definition, has a function prefix (0x00) and an index to
the first function in the Function section vec (0x00).

11 5f 5f 77 61 73 6d 5f 63 61 6c 6c 5f 63 74 6f 72 73 -> __wasm_call_ctors

The second correponds out our add() function, and refers to the second
(0x01) function (0x00) in the Function section.

The remaining exports are shown in their textual representation below.

$ wasm2wat --enable-annotations wasm32_args.wasm | head -55 | tail -12
  (export "memory" (memory 0))
  (export "__wasm_call_ctors" (func $__wasm_call_ctors))
  (export "add" (func $add))
  (export "__dso_handle" (global 1))
  (export "__data_end" (global 2))
  (export "__stack_low" (global 3))
  (export "__stack_high" (global 4))
  (export "__global_base" (global 5))
  (export "__heap_base" (global 6))
  (export "__heap_end" (global 7))
  (export "__memory_base" (global 8))
  (export "__table_base" (global 9))

Code Section


Link to heading

00000100: .... .... .... .... ..0a 4202 0200 0b3d  le_base...B....=
00000110: 0106 7f23 8080 8080 0021 0241 1021 0320  ...#.....!.A.!. 
00000120: 0220 036b 2104 2004 2000 3602 0c20 0420  . .k!. . .6.. . 
00000130: 0136 0208 2004 2802 0c21 0520 0428 0208  .6.. .(..!. .(..
00000140: 2106 2005 2006 6a21 0720 070f 0b.. ....  !. . .j!. ....4.

In this module, we don’t have a Start (0x08) or Element (0x09) section, so
next is the Code section (0x0a). The section size is 66 (LEB128 encoded
0x42), and it consists of the body and local variables of the each function as
a vec of code. Each code element consists of a size (u32), vec of
locals, and an expression (expr) made up of instructions.

Keep in mind that we didn’t specify any optimizations when compiling this
module (e.g. -O3), so our code section is going to be much longer than it
needs to be. However, the unoptimized function body gives us an opportunity to
explore more WASM instructions.

The vec has two elements (0x02), which correspond to the two entries in the
Function section. The first function has a size of 2 (0x02), which we can
immediately know means that it has no locals or instructions. The vec of
locals has a length of 0 (0x00) and the expr, which is its sequence of
instructions, is a single 0x0b (the expression terminating byte).

The next function, which is our add(), has a size of 61 (0x3d). Its vec
of locals has length 1 (0x01). Locals are encoded as a u32 count and a
value type (valtype). That is, there is a single element in the vec for each
value type for which at least one local exists. Because our vec has length
1, we know that all locals are the same value type. Specifically, there are
6 (0x06) locals of type i32 (0x7f).

We’ll save a deep dive into the WASM instruction set architecture (ISA) for a
future post, but the key difference from most ISAs you have likely interacted
with is that WASM operates as a stack machine. Values that are to be used as
operands for an instruction must first be pushed onto the stack, before
subsequently being popped off the stack and used to compute a result, which is
then pushed onto the stack.

The first instruction in our add() function is encoded as 0x23, which
corresponds to global.get. This instruction takes the index (u32) of a symbol in
the Global section, but you may notice something strange when LEB128 decoding
it.

80 80 80 80 00 -> 00000000 00000000 00000000 00000000

Why are we using the maximum length for LEB128 encoding an unsigned 32-bit
integer? The reason is related to Global section merging during
linking
.
Though we only need one byte to encode index 0, if the index of
$__stack_pointer changes, we can now be certain that the global.get
instruction can be updated without changing the position of any other bytes in
the module.

As previously mentioned, WASM is a stack machine, so global.get is going to
push the value of $__stack_pointer onto the stack.

Our next instruction is 0x21, which corresponds to local.set. It pops a
value off the top of the stack, then stores it in the local at the index
specified by the supplied immediate, which in this case is 2 (0x02).
Combining this instruction with the previous one results in storing the value of
$__stack_pointer in the local at index 2. Why use 2 and not 0? In
accordance with the WASM specification, 2 actually refers to the first
declared local, as parameters are referenced as the first locals.

The parameters of the function are referenced through 0-based local indices in
the function’s body; they are mutable.

You may already recognize these first few instructions as part of a function
prologue

in which we are “growing the stack”. The use of a stack when coming from a
language like C can be confusing given that WASM has its own implicit stack.
Also, in this particular example, it is unnecessary to manage a stack, and, as
you’ll see in a moment, compiling with optimization removes these instructions.
Nevertheless, a “shadow stack” is necessary in some real programs, and we’ll
explore some examples in a future post.

Decompiling the full function body results in the following.

$ wasm2wat --enable-annotations wasm32_args.wasm | head -32 | tail -28
  (func $add (type 1) (param i32 i32) (result i32)
    (local i32 i32 i32 i32 i32 i32)
    global.get $__stack_pointer
    local.set 2
    i32.const 16
    local.set 3
    local.get 2
    local.get 3
    i32.sub
    local.set 4
    local.get 4
    local.get 0
    i32.store offset=12
    local.get 4
    local.get 1
    i32.store offset=8
    local.get 4
    i32.load offset=12
    local.set 5
    local.get 4
    i32.load offset=8
    local.set 6
    local.get 5
    local.get 6
    i32.add
    local.set 7
    local.get 7
    return)

The only essential instructions are accesing the parameters (local.get 0 and
local.get 1), and the eventual adding of the values with i32.add. This can
be observed by recompiling with ooptimization, then Decompiling.

clang -target wasm32 -nostdlib -Wl,--no-entry -Wl,--export-all -O3-o wasm32_args_optimized.wasm wasm32_args.c
$ wasm2wat --enable-annotations wasm32_args_optimized.wasm | head -8 | tail -4
  (func $add (type 1) (param i32 i32) (result i32)
    local.get 1
    local.get 0
    i32.add)

Custom Sections


Link to heading

00000140: .... .... .... .... .... .... ..00 3404  !. . .j!. ....4.
00000150: 6e61 6d65 0119 0200 115f 5f77 6173 6d5f  name.....__wasm_
00000160: 6361 6c6c 5f63 746f 7273 0103 6164 6407  call_ctors..add.
00000170: 1201 000f 5f5f 7374 6163 6b5f 706f 696e  ....__stack_poin
00000180: 7465 7200 6609 7072 6f64 7563 6572 7301  ter.f.producers.
00000190: 0c70 726f 6365 7373 6564 2d62 7901 0c55  .processed-by..U
000001a0: 6275 6e74 7520 636c 616e 673f 3137 2e30  buntu clang?17.0
000001b0: 2e36 2028 2b2b 3230 3233 3132 3039 3132  .6 (++2023120912
000001c0: 3432 3237 2b36 3030 3937 3038 6234 3336  4227+6009708b436
000001d0: 372d 317e 6578 7031 7e32 3032 3331 3230  7-1~exp1~2023120
000001e0: 3931 3234 3333 362e 3737 2900 2c0f 7461  9124336.77).,.ta
000001f0: 7267 6574 5f66 6561 7475 7265 7302 2b0f  rget_features.+.
00000200: 6d75 7461 626c 652d 676c 6f62 616c 732b  mutable-globals+
00000210: 0873 6967 6e2d 6578 74                   .sign-ext

The remaining bytes make up three Custom sections (0x00), as there is no Data
(0x0b) section or Data Count (0x0c) section in our module. Custom sections
are mostly unstructured, but do begin with the same u32 size as other
sections, followed by a name. The 0x00 byte identifies our first custom
section, and its size is 52 bytes (0x34). The name encoding starts with
the number of bytes in the name, which in this case is 4 (0x04). The
following 4 bytes are UTF-8 characters making up the name.

The name of this custom section happens to literally be “name”. It also
happens to be the only Custom section that is defined in the WASM specification
(see Custom Sections in the Appendix). Like other sections, it should only be
included at most once, and it has the additional requirement of occurring after
the Data section. There are three subsections that may be included in the “name”
Custom section, identified by the following IDs.

ID (Hex) Subsection
0x00 Module name
0x01 Function names
0x02 Local names

The first subsection is Function names (0x01), and has size of 25 (0x19).
It consists of a vec of name / index pairs, otherwise known as a “name
map”, which assigns the provided name to the given index in the Function
section. The vec here is of length 2 (0x02), and first element corresponds
to the first function in the Function section (0x00). The name assigned to the
function is 17 bytes long (0x11).

5f 5f 77 61 73 6d 5f 63 61 6c 6c 5f 63 74 6f 72 73 -> __wasm_call_ctors

The following function name entry can be decoded in the same manner, predictably
assigning add to the second function in the Function section (0x01).

The next and final subsection of the “name” Custom section is not actually
standardized in the core WASM specification, but rather part of the Extended
Name Section
proposal
.
In the proposal, 7 (0x07) is the index for the Global names subsection. This
subsection also contains a “name map”, providing pairs of a Global section index
and name. The first entry in our Global section (0x00) is being assigned the
name __stack_pointer.

5f 5f 73 74 61 63 6b 5f 70 6f 69 6e 74 65 72 -> __stack_pointer

The next two Custom sections are defined in the WASM tool-conventions
repository
. The first is
named
producers
,
and is meant to denote the tools that were used to produce the WASM module. The
second is named
target_features

and must come after the producers Custom secion when included. It describes
what features are used, and whether linking should fail if a given feature is or
is not in the allowed set.

The full decompilation of all three Custom sections is as follows.

$ wasm2wat --enable-annotations wasm32_args.wasm | tail -3
  (@custom "name" "1192011__wasm_call_ctors13add71210f__stack_pointer")
  (@custom "producers" "1cprocessed-by1cUbuntu clang?17.0.6 (++20231209124227+6009708b4367-1~exp1~20231209124336.77)")
  (@custom "target_features" "2+fmutable-globals+8sign-ext"))

The --enable-annotations, which we have been using for all wasm2wat
invocations, is required to include Custom sections in the WAT output.

Final Thoughts


Link to heading

The WASM module in this post did not include every section that can occur, but
hopefully this breakdown was thorough enough that you feel confident in
dissecting those that were omitted on your own. There are a number of topics,
such as linking, optimization, and shadow stacks, that were mentioned in this
post but were not covered in depth. Check back for future posts where we will go
into greater detail.

As always, if you have feedback, questions, or just want to chat, feel free to
reach out to @hasheddan on any of the platforms listed on the home
page
.

Read More

RELATED ARTICLES

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments