Noë Flatreaud

[EN] Writing a simple x86 BIOS bootloader

Alright, alright, let me explain myself !

In this article, we'll write a very simple x86 bootloader which could be used as a first footstep into building your own OS. I'll assume that you have some knowledge of assembly, but hopefully everything should be understandable as is.

Boot Process

When an x86 computer turns on, it executes firmware located inside the motherboard's read-only memory (ROM).

You have two main firmware standards:

Because it is simpler and widely supported, this article will describe how to write a BIOS bootloader, not UEFI.

BIOS boot in a nutshell

When an i386 CPU boots, the BIOS is loaded from firmware into memory. It performs various operations such as RAM detection, and other hardware initializations, before finally attempting its boot sequence.

The BIOS generally checks for bootable disks in a specific order, known as its boot disk hierarchy. Checking floppy disks, CD-ROMs, then HDDs.

The BIOS may handle each medium differently. For floppy disks the first 512 bytes are read into memory at a specific location, but extra steps may be required for hard drives which contain master boot record (MBR) information, and CD-ROMs can be loaded entirely into memory and used as a RAM disks.

As the BIOS iterates through the list it attempts to find the first readable 512 bytes (called the boot sector) which ends with the magic number 0xaa55.

Once found, the BIOS now runs the opcodes copied at the address location [0x7c00]. Regardless of medium, the bootloader will be loaded at this address.

Why the magic number 0xaa55 ?

The magic number 0xaa55 is used as a distinct synchronization sequence, easily identifiable in binary as 1010101001010101. It also helps to determine if a system is big endian or little endian - as it will read as either 0xaa55 or 0x55aa.

Real mode ? Protected mode ? Wtf ?

Real mode is the legacy mode before 80286 CPU came to the market.

When the BIOS starts, you'll enter in 16-bit Real Mode for backward compatibility. The program counter will start at physical address 0x7c00.

In real mode you have:

Here you can find references for the BIOS interrupt table:

On the other side, Protected mode was designed to prevent illegal writes to other programs memory directly at runtime, but now you have a bunch of fascinating features like:

In protected mode you unfortunately don't have the BIOS Interrupt vector as is, you have to remap everything in the Interrupt Descriptor Table (IDT) and Global descriptor table (GDT). But that's another story...

Writing our boot sector

The environment

For this serie, I'll use Fedora and toolbox as my main dev environment, but you can use whatever you like.

In this example, and for the next ones, I'll use tools like nasm, gcc, qemu and other embedded utilities like hexdump and ndisasm

$ sudo dnf install -y nasm qemu gcc gcc-c++ kernel-devel

The program

Create a new file called boot.asm:

; ----------------------------------------------------------------
; Here's an easy bootloader example for x86 systems 
; that displays a short message like 'Hello World' to the screen
; ----------------------------------------------------------------

bits 16     ; Let NASM know you're dealing with 16bit real mode
org 0x7c00  ; Tell to the assembler where to start (as explained aboce)

    mov si, msg    ; Point si register to the msg label
    call print     ; Call the print procedure detailed bellow
    jmp $          ; Jump on itself (current address pointer)

; ----------------------------------------------------------------
; Functions 
; ----------------------------------------------------------------

    push ax
    push bx
    mov bx, 0
    lodsb           ; Load 1byte from [si] within al, and increment
    cmp al, 0       ; Compare al with zero, sets equal flag
    je .done        ; If equal flag is set, jump to .done
    mov ah, 0x0e    ; Use 0x0e (Write Character in TTY Mode)
    int 0x10        ; Call Video Services BIOS interrupt
    pop bx
    pop ax

; ----------------------------------------------------------------
; Variables 
; ----------------------------------------------------------------

msg: db "Hello World!", 10,13, 0

; ----------------------------------------------------------------
; Magic word + padding
; ----------------------------------------------------------------
times 510-($-$$) db 0 ; Fill with zeros until 510 bytes
dw 0xaa55 ; Here comes the magic number, in little endian

You can then assemble it with:

$ nasm -f bin -o boot.bin boot.asm

Testing our bootloader

Let's check our binary under the hood.

$ hexdump -C boot.bin
00000000  be 1b 7c e8 02 00 eb fe  50 53 bb 00 00 b4 0e ac  |..|.....PS......|
00000010  3c 00 74 04 cd 10 eb f7  5b 58 c3 48 65 6c 6c 6f  |<.t.....[X.Hello|
00000020  20 57 6f 72 6c 64 21 0a  0d 00 00 00 00 00 00 00  | World!.........|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
$ ndisasm boot.bin
00000000  BE1B7C            mov si,0x7c1b
00000003  E80200            call 0x8
00000006  EBFE              jmp short 0x6
00000008  50                push ax
00000009  53                push bx
0000000A  BB0000            mov bx,0x0
0000000D  B40E              mov ah,0xe
0000000F  AC                lodsb
00000010  3C00              cmp al,0x0
00000012  7404              jz 0x18
00000014  CD10              int 0x10
00000016  EBF7              jmp short 0xf
00000018  5B                pop bx
00000019  58                pop ax
0000001A  C3                ret
0000001B  48                dec ax
0000001C  656C              gs insb
0000001E  6C                insb
0000001F  6F                outsw
00000020  20576F            and [bx+0x6f],dl
00000023  726C              jc 0x91
00000025  64210A            and [fs:bp+si],cx
00000028  0D0000            or ax,0x0
0000002B  0000              add [bx+si],al
000001FB  0000              add [bx+si],al
000001FD  0055AA            add [di-0x56],dl

As you can see, the program we assembled is easily compiled in just few operation codes and buffers.

We can find below an easy reference to our own code, where $, msg and print labels are replaced by the corresponding physical addresses:

00000000  BE1B7C            mov si,0x7c1b
00000003  E80200            call 0x8
00000006  EBFE              jmp short 0x6
    mov si, msg
    call print
    jmp $

Another example with the print procedure we just wrote :

00000008  50                push ax
00000009  53                push bx
0000000A  BB0000            mov bx,0x0
0000000D  B40E              mov ah,0xe
0000000F  AC                lodsb
00000010  3C00              cmp al,0x0
00000012  7404              jz 0x18
00000014  CD10              int 0x10
00000016  EBF7              jmp short 0xf
00000018  5B                pop bx
00000019  58                pop ax
0000001A  C3                ret
    mov bx, 0
    lodsb           ; Load 1byte from [si] within al, and increment
    cmp al, 0       ; Compare al with zero, sets equal flag
    je .done        ; If equal flag is set, jump to .done
    mov ah, 0x0e    ; Use 0x0e (Write Character in TTY Mode)
    int 0x10        ; Call Video Services BIOS interrupt

And here come the magic number! (little endian, so it's reversed) :

000001FD  0055AA            add [di-0x56],dl
    dw 0xaa55

You can try out your program using qemu with :

$ qemu-system-i386 -hda boot.bin

Congrats! You have successfully created a simple bootloader for an operating system. With this solid foundations, you can start to develop more functionality for the system by tweaking BIOS Interrupts and using it yourself, within the 510 bytes window ;)

