home about blog now contact

Writing Tiny Code - Hello World

Okay, this is slightly daunting being the first post on this site, but I'll give it a go.

Have you ever considered how small you can get an ELF binary to go?

Let's consider this pretty simple C snippet:

#include <unistd.h>

int main(void) {
	write(0, "hello world\n", 12);
	return 0;
}

Hopefully, it's relatively obvious what this does. write refers to the write syscall here, so this snippet prints "hello world" as expected.

We'll need to use a couple of compiler flags to squeeze as much out of this as we can. I'll assume you're using GCC for this, in my case, version 14.2.1. I am also on a x86-64 system, so this will be an ELF64 binary.

gcc -o hello -Os -s hello.c

Here's a short overview of these flags:

This leaves us with a binary of...

$ ls -lh
-rwxr-xr-x 1 natha natha  15K Nov 15 23:22 hello

Now 15K is pretty good for a ELF binary on a modern x86-64 system, but we can do better. Let's translate this to basic assembly code and see how far we get.


I'll be using AT&T syntax assembly here, since it's considered to be standard on most *nix systems, which probably include the one you're using. You can of course use Intel syntax if you wish, if you prefer it.

I have to confess, I would typically use Intel syntax for this, but conventions have got the better of me.

One thing to note here is this is designed for x86-64 Linux, and the syscall numbers may vary on other platforms or architectures.

.section .data
msg:
    .string "hello world\n"
msglen = . - msg

.section .text
.global _start

_start:
    mov $1, %rax
    mov $1, %rdi
    mov $msg, %rsi
    mov $msglen, %rdx
    syscall
    
    mov $60, %rax
    xor %rdi, %rdi
    syscall

Okay, I'll work down from the top. .section .data defines a new section in our program, these sections are mapped into the generated ELF binary.

In the data section, we define a basic "hello world" string, as you would expect. msglen = . - msg may look a bit weird, but basically gets the length of the msg string. This subtracts the location of msg, from the current location, thus, giving the length of the string.

The next section is .text. This is the section where executable code is stored, which is our hello world program. .global _start allows the linker, which runs after assembly, to locate the entry point for our program.

The next bits can be broken down into logical paragraphs, let's start with the first:

mov $1, %rax
mov $1, %rdi
mov $msg, %rsi
mov $msglen, %rdx
syscall

This is one part where AT&T syntax is counter-intuitive compared to Intel. In AT&T syntax, CPU registers are denoted with %, and $ refers to an immediate value. With Intel syntax, this is generally implicit. The ordering is also unintuitive. The first line moves the immediate value 1 into the rax register. With Intel syntax, this line would look more like mov rax, 1, which is clearer.

Here 1 refers to the write syscall, though it's rather unclear. For this, a syscall reference is particuarly handy. You can do a quick search for "x86-64 linux syscall table", or you can use my preferred one.

I've got a quick overview of the others:

In essence, we move the values we defined earlier into the correct registers for the write syscall. If you translate this to C, it is the equivalent of:

write(1, msg, msglen);

The second paragraph is a little simpler, go and grab a syscall reference and see if you can find out what this does.

mov $60, %rax
xor %rdi, %rdi
syscall

Okay, with the explanation over, let's see how small this is.

$ as --64 -o hello.o hello.s
$ ld -o hello hello.o
$ ls -lh
-rw-r--r-- 1 natha natha  984 Nov 16 00:08 hello.o
-rwxr-xr-x 1 natha natha 8.8K Nov 16 00:09 hello

8.8K, while half the size of the 15K C version, doesn't seem much better.

We can try to strip out some of the ELF stuff...

$ strip hello
$ ls -lh
-rwxr-xr-x 1 natha natha 8.4K Nov 16 00:09 hello

...but there's not a great difference.

Notice that the object file that was actually generated by the assembler was only 984 bytes, which is significantly smaller. This is something we can exploit, but I won't reveal anything now, as this post is getting rather long.


Well I hope that was interesting, for the first post on this site at least. In the next post, I'll show you a way to optimise this even further, getting us down to under 200 bytes!