Writing Tiny Code - Hello World
Okay, this is slightly daunting being the first post on this site, but I'll give it a go.
Have you ever considered how small you can get an ELF binary to go?
Let's consider this pretty simple C snippet:
#include <unistd.h>
int main(void) {
write(0, "hello world\n", 12);
return 0;
}
Hopefully, it's relatively obvious what this does. write refers to the
write syscall here, so this snippet prints "hello world" as expected.
We'll need to use a couple of compiler flags to squeeze as much out of this as we can. I'll assume you're using GCC for this, in my case, version 14.2.1. I am also on a x86-64 system, so this will be an ELF64 binary.
gcc -o hello -Os -s hello.c
Here's a short overview of these flags:
-Osoptimizes the binary for size-sstrips unused symbols, ELF sections, etc.
This leaves us with a binary of...
$ ls -lh
-rwxr-xr-x 1 natha natha 15K Nov 15 23:22 hello
Now 15K is pretty good for a ELF binary on a modern x86-64 system, but we can do better. Let's translate this to basic assembly code and see how far we get.
I'll be using AT&T syntax assembly here, since it's considered to be standard on most *nix systems, which probably include the one you're using. You can of course use Intel syntax if you wish, if you prefer it.
I have to confess, I would typically use Intel syntax for this, but conventions have got the better of me.
One thing to note here is this is designed for x86-64 Linux, and the syscall numbers may vary on other platforms or architectures.
.section .data
msg:
.string "hello world\n"
msglen = . - msg
.section .text
.global _start
_start:
mov $1, %rax
mov $1, %rdi
mov $msg, %rsi
mov $msglen, %rdx
syscall
mov $60, %rax
xor %rdi, %rdi
syscall
Okay, I'll work down from the top. .section .data defines a new section in
our program, these sections are mapped into the generated ELF binary.
In the data section, we define a basic "hello world" string, as you would
expect. msglen = . - msg may look a bit weird, but basically gets the length
of the msg string. This subtracts the location of msg, from the current
location, thus, giving the length of the string.
The next section is .text. This is the section where executable code is
stored, which is our hello world program. .global _start allows the
linker, which runs after assembly, to locate the entry point for our
program.
The next bits can be broken down into logical paragraphs, let's start with the first:
mov $1, %rax
mov $1, %rdi
mov $msg, %rsi
mov $msglen, %rdx
syscall
This is one part where AT&T syntax is counter-intuitive compared to Intel.
In AT&T syntax, CPU registers are denoted with %, and $ refers to an
immediate value. With Intel syntax, this is generally implicit. The ordering
is also unintuitive. The first line moves the immediate value 1 into the
rax register. With Intel syntax, this line would look more like
mov rax, 1, which is clearer.
Here 1 refers to the write syscall, though it's
rather unclear. For this, a syscall reference is particuarly handy. You can
do a quick search for "x86-64 linux syscall table", or you can use my preferred one.
I've got a quick overview of the others:
mov $1, %rdi, sets destination file descriptor to STDOUT (which is normally 1)mov $msg, %rsi, sets the buffer to write, which is themsgstringmov $msglen, %rdx, sets the number of bytes to writesyscall, triggers the syscall
In essence, we move the values we defined earlier into the correct registers
for the write syscall. If you translate this to C, it is the equivalent of:
write(1, msg, msglen);
The second paragraph is a little simpler, go and grab a syscall reference and see if you can find out what this does.
mov $60, %rax
xor %rdi, %rdi
syscall
Okay, with the explanation over, let's see how small this is.
$ as --64 -o hello.o hello.s
$ ld -o hello hello.o
$ ls -lh
-rw-r--r-- 1 natha natha 984 Nov 16 00:08 hello.o
-rwxr-xr-x 1 natha natha 8.8K Nov 16 00:09 hello
8.8K, while half the size of the 15K C version, doesn't seem much better.
We can try to strip out some of the ELF stuff...
$ strip hello
$ ls -lh
-rwxr-xr-x 1 natha natha 8.4K Nov 16 00:09 hello
...but there's not a great difference.
Notice that the object file that was actually generated by the assembler was only 984 bytes, which is significantly smaller. This is something we can exploit, but I won't reveal anything now, as this post is getting rather long.
Well I hope that was interesting, for the first post on this site at least. In the next post, I'll show you a way to optimise this even further, getting us down to under 200 bytes!