Monday, July 15, 2024
Google search engine
HomeUncategorizedThe Byte Order Fiasco (2021)

The Byte Order Fiasco (2021)

01 may 2021 @ justine’s web page

One of the most challenging topics in the C / C++ programming language
is how to handle endianness properly. There’s a surprising amount of
depth here. I’ve been programming in C for a while and I feel like I
keep learning about this subject, even when I thought I’d seen it all.

Let’s say we want to deserialize a 32-bit integer off the wire, using
code that’ll work on machines like IBM’s s390x mainframes (one of the
few big endian chips still in play). Here’s the naive approach:

#include 
#include 
#include 
#if (defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) || 
    defined(__s390x__) || defined(__ppc__) || defined(__PPC__) ||          
    defined(__powerpc__) || defined(__POWERPC__) ||                        
    defined(__BIG_ENDIAN__) || defined(__ARMEB__) ||                       
    defined(__THUMBEB__) || defined(__AARCH64EB__) ||                      
    defined(_MIPSEB) || defined(__MIPSEB) || defined(__MIPSEB__)
#define READ32BE(p) (*(uint32_t *)(p))
#else
#define READ32BE(p) bswap_32(*(uint32_t *)(p))
#endif
char b[4] = {0x80,0x02,0x03,0x04};
int main(int argc, char *argv[]) { printf("%08xn", READ32BE(b)); }

Aside from being ugly, the code above is wrong. Modern compiler policies
don’t even permit code that looks like like that anymore. Your compiler
might see that and emit assembly
that formats
your hard drive with btrfs
. That’s the thing about behaviors which
are undefined according to the C standard.

The compiler benchmark wars have been very competitive ever since the GNU
vs. Apple/Google schism these past ten years. Clang and GCC are reaching
for any optimization they can get. Undefined Behavior may be hostile and
dangerous, but it’s legal. So don’t let your code become a casualty. In
fact, compilers won’t even warn you if they rewrite your program in
unexpected ways because you’re not conforming to the C standard. Someone
will probably build an X-Ray one of these days and there’ll be
a Dumb Ways to Die video that
causes things to change. But until that day, we have an excellent tool
for avoiding these problems and it’s called UBSAN.

$ cc -g -Os -Wall 
     -fsanitize=undefined 
     -pedantic -fstrict-aliasing -Wstrict-aliasing=3 
     -o /tmp/o endian.c
$ /tmp/o
endian.c:30:3: runtime error: load of misaligned address 0x5575a79fc05f for type 'uint32_t',
               which requires 4 byte alignment
0x5575a79fc05f: note: pointer points here
 5f 74 27 00 80  02 03 04 00
             ^
80020304

So how do we deserialize an int properly? Rob Pike has a
good blog
post
talking about how easy it is. The solution he recommends is as
follows:

char b[4] = {0x80,0x02,0x03,0x04};
#define READ32BE(p) p[3] | p[2]<<8 | p[1]<<16 | p[0]<<24
int main(int argc, char *argv[]) { printf("%08xn", READ32BE(b)); }

Huge improvement! It still has undefined behaviors.
Assuming -fsigned-char is the default behavior, we get:

$ cc -fsanitize=undefined -g -Os -o /tmp/o endian.c && /tmp/o
endian.c:19:53: runtime error: left shift of negative value -128
80020304

Let’s assume we change the data so the undefined behavior goes
away. 0x80 is the same in this context as -128
a.k.a. CHAR_MIN.

char b[4] = {0x02,0x03,0x04,0x80};
#define READ32BE(p) p[3] | p[2]<<8 | p[1]<<16 | p[0]<<24
int main(int argc, char *argv[]) { printf("%08xn", READ32BE(b)); }

If we do this then something even worse happens. Rather than crashing or
a runtime warning we get an incorrect result, because of sign extension.

$ cc -fsanitize=undefined -g -Os -o /tmp/o endian.c && /tmp/o
ffffff80

So the solution is simple right? Let’s just use unsigned
char
instead. Sadly no. Because unsigned char in C
expressions gets type promoted to the signed type int. So
if we say 0x80<<24 it overwrites the sign bit, which is an
undefined behavior according to the C standard.

$ cc -fsanitize=undefined -g -Os -o /tmp/o endian.c && /tmp/o
endian.c:19:53: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
80020304

That part of the C standard is the legacy of companies like UNIVAC, who
didn't agree with John von Neumman's design doc from the second world war
for EDVAC, which very clearly laid out that the plan was to have 32-bit
little endian two's complement integers. Imagine that. Even Seymour Cray
himself admitted that when he chose ones' complement it was because he
hadn't yet understood the full meaning of John von Neumman's design at
the time.

So here's the secret to writing good generic C code:


Mask, and then shift.


Mask, and then shift.


Mask, and then shift.


Mask, and then shift.

Repeat it like a mantra. You mask first, to define away potential
concerns about signedness. Then you cast if needed. And finally, you can
shift. Now we can create the fool-proof version for machines with at
least 32-bit ints:

#define READ32BE(p) 
  (uint32_t)(255 & p[0]) << 24 | (255 & p[1]) << 16 | (255 & p[2]) << 8 | (255 & p[3])

Another cool trick is we can make the code look more elegant using octal
notation:

#define READ16LE(S) ((255 & (S)[1]) << 8 | (255 & (S)[0]))
#define READ16BE(S) ((255 & (S)[0]) << 8 | (255 & (S)[1]))
#define READ32LE(S)                                                    
  ((uint32_t)(255 & (S)[3]) << 030 | (uint32_t)(255 & (S)[2]) << 020 | 
   (uint32_t)(255 & (S)[1]) << 010 | (uint32_t)(255 & (S)[0]) << 000)
#define READ32BE(S)                                                    
  ((uint32_t)(255 & (S)[0]) << 030 | (uint32_t)(255 & (S)[1]) << 020 | 
   (uint32_t)(255 & (S)[2]) << 010 | (uint32_t)(255 & (S)[3]) << 000)
#define READ64LE(S)                                                    
  ((uint64_t)(255 & (S)[7]) << 070 | (uint64_t)(255 & (S)[6]) << 060 | 
   (uint64_t)(255 & (S)[5]) << 050 | (uint64_t)(255 & (S)[4]) << 040 | 
   (uint64_t)(255 & (S)[3]) << 030 | (uint64_t)(255 & (S)[2]) << 020 | 
   (uint64_t)(255 & (S)[1]) << 010 | (uint64_t)(255 & (S)[0]) << 000)
#define READ64BE(S)                                                    
  ((uint64_t)(255 & (S)[0]) << 070 | (uint64_t)(255 & (S)[1]) << 060 | 
   (uint64_t)(255 & (S)[2]) << 050 | (uint64_t)(255 & (S)[3]) << 040 | 
   (uint64_t)(255 & (S)[4]) << 030 | (uint64_t)(255 & (S)[5]) << 020 | 
   (uint64_t)(255 & (S)[6]) << 010 | (uint64_t)(255 & (S)[7]) << 000)

Now you might be looking at the code above and be thinking, surely
that's the slowest thing imaginable. Here's a jewel from the TensorFlow
codebase expressing that sentiment:

// Fall back on a non-optimized implementation on other big-endian targets.
// This code swaps one byte at a time and is probably an order of magnitude
// slower.
#define BYTE_SWAP_64(x)                                                      
  ((((x)&0x00000000000000ffUL) << 56) | (((x)&0x000000000000ff00UL) << 40) | 
   (((x)&0x0000000000ff0000UL) << 24) | (((x)&0x00000000ff000000UL) <<  8) | 
   (((x)&0x000000ff00000000UL) >>  8) | (((x)&0x0000ff0000000000UL) >> 24) | 
   (((x)&0x00ff000000000000UL) >> 40) | (((x)&0xff00000000000000UL) >> 56))

That might have been true with old compilers, but it's not true today.
The more verbosely well-defined your code is, then with a good modern
compiler, the smaller the generated code will be. Here's what we get if
we run it through
clang.godbolt.org:

read32le(char*):
	mov	(%rdi),%eax
	ret
byte_swap_64(unsigned long):
	mov	%rdi,%rax
	bswap	%rax
	ret
read32be(char*):
	mov	(%rdi),%eax
	bswap	%eax
	ret

So we can finally delete the ifdef soup, and I'd call that progress.

What if we want to serialize integers to the wire? Using our new
knowledge, that becomes easy too. Here's how it can be done with a
mempcpy-style API:

#define WRITE16LE(P, V)                        
  ((P)[0] = (0x00000000000000FF & (V)) >> 000, 
   (P)[1] = (0x000000000000FF00 & (V)) >> 010, (P) + 2)
#define WRITE16BE(P, V)                        
  ((P)[0] = (0x000000000000FF00 & (V)) >> 010, 
   (P)[1] = (0x00000000000000FF & (V)) >> 000, (P) + 2)
#define WRITE32LE(P, V)                        
  ((P)[0] = (0x00000000000000FF & (V)) >> 000, 
   (P)[1] = (0x000000000000FF00 & (V)) >> 010, 
   (P)[2] = (0x0000000000FF0000 & (V)) >> 020, 
   (P)[3] = (0x00000000FF000000 & (V)) >> 030, (P) + 4)
#define WRITE32BE(P, V)                        
  ((P)[0] = (0x00000000FF000000 & (V)) >> 030, 
   (P)[1] = (0x0000000000FF0000 & (V)) >> 020, 
   (P)[2] = (0x000000000000FF00 & (V)) >> 010, 
   (P)[3] = (0x00000000000000FF & (V)) >> 000, (P) + 4)
#define WRITE64LE(P, V)                        
  ((P)[0] = (0x00000000000000FF & (V)) >> 000, 
   (P)[1] = (0x000000000000FF00 & (V)) >> 010, 
   (P)[2] = (0x0000000000FF0000 & (V)) >> 020, 
   (P)[3] = (0x00000000FF000000 & (V)) >> 030, 
   (P)[4] = (0x000000FF00000000 & (V)) >> 040, 
   (P)[5] = (0x0000FF0000000000 & (V)) >> 050, 
   (P)[6] = (0x00FF000000000000 & (V)) >> 060, 
   (P)[7] = (0xFF00000000000000 & (V)) >> 070, (P) + 8)
#define WRITE64BE(P, V)                        
  ((P)[0] = (0xFF00000000000000 & (V)) >> 070, 
   (P)[1] = (0x00FF000000000000 & (V)) >> 060, 
   (P)[2] = (0x0000FF0000000000 & (V)) >> 050, 
   (P)[3] = (0x000000FF00000000 & (V)) >> 040, 
   (P)[4] = (0x00000000FF000000 & (V)) >> 030, 
   (P)[5] = (0x0000000000FF0000 & (V)) >> 020, 
   (P)[6] = (0x000000000000FF00 & (V)) >> 010, 
   (P)[7] = (0x00000000000000FF & (V)) >> 000, (P) + 8)

If you program in C long enough, stuff like this becomes second nature,
and it starts to almost feel inappropriate to even have macros like the
above, since it might be more appropriately inlined into the specific
code. Since there have simply been too many APIs introduced over the
years for solving this problem. To name a few for 32-bit byte swapping
alone: bswap_32, htobe32, htole32, be32toh, le32toh, ntohl, and htonl
which all have pretty much the same meaning.

Now you don't need to use those APIs because you know the secret. This
blog post covers most of the dark corners of C so if you've understood
what you've read so far, you're already practically a master at the
language, which is otherwise remarkably simple and beautiful.

see also

The byte order fallacy

Read More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments