Writing an OS in Haskell

March 25, 2026

[ computer-science haskell ]

The code is here. This was a solo class project. I wanted to see what happens when you take a language designed for functional programming and force it to talk to hardware registers.

HaskellOS architecture

What Is This

HaskellOS is a bare-metal operating system for the Raspberry Pi Zero. It boots on real hardware, gives you a shell with 18 commands, reads files off an SD card, blinks LEDs, talks over wireless radio, runs green threads, garbage collects, and has an embedded Lisp interpreter.

The codebase is about 4,900 lines of Haskell, 1,350 lines of C, and 310 lines of ARM assembly. The Haskell handles all the logic, the C does the byte-level grunt work, and the assembly boots the thing.

Why Haskell

The honest answer is that I wanted to see if it was possible. The slightly more considered answer is that operating systems are full of state machines, mode switches, and “if this flag is set and that register has this value and we’re in this mode then do X.” Pattern matching is genuinely great for this, since the compiler checks you handled every case. No forgotten edge cases, no fall-through bugs.

C vs Haskell data types

The type system also prevents real hardware mistakes at compile time. Output pins and input pins are different types. You literally cannot call a read function on an output pin. In C, you’d need to remember not to. In Haskell, the compiler remembers for you.

The Trick: MicroHS

You can’t use GHC for bare-metal development. GHC’s runtime depends on mmap, pthreads, signals, and a whole pile of Linux/POSIX infrastructure that doesn’t exist when you’re the only thing running on the CPU.

MicroHS motivation

Instead, I used MicroHS, a minimal Haskell compiler by Lennart Augustsson that compiles Haskell to SKI-style combinators. The runtime is tiny. The output is a graph reducer, not native code, so the C runtime just needs to set up memory and kick off the combinator evaluator. No OS required underneath.

The boot sequence looks like this:

GPU loads the kernel image from SD card
ARM assembly sets up supervisor mode
C runtime zeros BSS, allocates the heap
MicroHS combinator reducer initializes
Haskell takes over

The feature domino effect

Features

Shell

Everything runs over UART serial. The commands break into a few categories:

File ops: ls, cat, touch, write, rm, mv
Hardware: blink, gpio, timer, echo
System: info, vm, heartbeat, uptime, reboot
Radio: nrf init, nrf send, nrf recv, nrf stats, nrf status
Scripting: lisp (REPL) and lisp run <file>

Command dispatch is pattern matching on the command string, so adding a new command is just adding a new clause to dispatchCommand. Each handler parses its own arguments using the parser combinator library and calls into the relevant hardware module.

FAT32 Filesystem

The OS reads and writes files on the SD card’s FAT32 partition. The core types mirror the on-disk structures:

data BPB = BPB             -- BIOS Parameter Block
  { bpbBytesPerSec  :: Word16
  , bpbSecPerClus   :: Word8
  , bpbReservedSecs :: Word16
  , bpbNumFATs      :: Word8
  , bpbTotalSecs    :: Word32
  , bpbFATSize      :: Word32
  , bpbRootCluster  :: Word32
  }

data DirEntry = DirEntry
  { deName      :: String
  , deCluster   :: Word32
  , deSize      :: Word32
  , deIsDir     :: Bool
  }

Mounting works in stages. mountFS initializes the SD card, reads the MBR, parses the boot sector into a BPB, and calculates the LBA offsets for the FAT and cluster regions. File reads follow the FAT chain from a directory entry’s starting cluster. I also wrote a custom MaybeIO monad that threads Maybe through IO so that any step in the chain (bad MBR signature, missing partition, corrupt FAT entry) short-circuits cleanly without nested case statements.

The painful part was byte-level access. Haskell’s FFI only gives you 32-bit word operations. Reading a single byte means loading an aligned word and masking off the bits you want. One line in C becomes six in Haskell.

The memcpy situation

Green Threads and Processes

MicroHS provides forkIO, MVar, and threadDelay out of the box. The scheduler runs round-robin with ~10ms preemption via ARM timer interrupts at 100Hz. Sleeping threads enter a time queue and consume zero CPU.

On top of this I built a typed channel system for message passing between threads:

data Chan a = Chan
  { chanRead  :: MVar (MVar (ChItem a))
  , chanWrite :: MVar (MVar (ChItem a))
  }

newChan :: IO (Chan a)
send    :: Chan a -> a -> IO ()
recv    :: Chan a -> IO a        -- blocks until data available
tryRecv :: Chan a -> IO (Maybe a) -- non-blocking
select  :: [Chan a] -> IO (Int, a) -- wait on first available

The channels are unbounded linked lists backed by MVars. send never blocks, recv blocks until data arrives, and select polls multiple channels round-robin (useful when a thread needs to listen to several sources).

There’s also an Erlang-style supervisor that monitors child processes and restarts them according to a policy:

data RestartPolicy = Permanent | Temporary

data ChildSpec = ChildSpec
  { csName   :: String
  , csAction :: IO ()
  , csPolicy :: RestartPolicy
  }

supervisor :: [ChildSpec] -> IO ()

A Permanent child gets restarted whenever it exits. A Temporary child runs once. The supervisor checks every 100ms. Without try/catch in MicroHS, crash detection is limited (the thread just silently dies and stays in Running state forever), but clean exits work fine.

Virtual Memory

Page table entries are algebraic data types. The full PageTableEntry encodes an ARMv6 1MB section descriptor:

data PageTableEntry = PTE
  { ptePA      :: Word32       -- physical address (1MB-aligned)
  , pteAP      :: AccessPerm
  , pteDomain  :: Word8        -- domain 0-15
  , pteCache   :: CachePolicy
  , pteExecute :: Bool         -- Execute Never flag
  }

data AccessPerm  = APNoAccess | APPrivOnly | APUserRO | APFullAccess
data CachePolicy = Uncached | WriteThrough | WriteBack

Invalid memory configurations are literally unrepresentable. You can’t accidentally map a page with nonsense permissions because the type doesn’t have a constructor for that.

Memory regions are defined declaratively and mapped during boot:

Region	Address	Size	Permissions
Kernel code	`0x00000000`	1 MB	Privileged, cached
Kernel heap	`0x00100000`	128 MB	Privileged, cached
Device registers	`0x20000000`	16 MB	Privileged, uncached
MicroHS heap	`0x0A000000`	configurable	Privileged, cached

initMMU creates the page table, maps all regions, sets domain 0 to client mode, and flips the MMU on permanently.

Wireless Networking

Two NRF24L01+ radio modules connected over SPI give the Pi wireless communication at 2.4GHz. The driver is configured through a record type:

data NrfConfig = NrfConfig
  { nrfChannel     :: Word8    -- 0-125
  , nrfPayloadSize :: Word8    -- 1-32 bytes
  , nrfDataRate    :: DataRate -- 250Kbps / 1Mbps / 2Mbps
  , nrfPower       :: PowerLevel
  , nrfSpiCS       :: Word8
  , nrfCEPin       :: Word8
  }

Transmission uses hardware acknowledgment with exponential backoff retry. When the radio reports MAX_RT (max retransmissions exceeded), the driver waits $500 \cdot 2^{\text{attempt}} + \text{random_us}$ microseconds before trying again, up to 7 attempts. After that, the packet is counted as lost. The NrfHandle uses an MVar for state coordination and IORefs for statistics tracking (messages sent, received, retransmitted, lost).

The interesting part is the NetChan bridge. It serializes typed Haskell values through a Msg algebraic data type for wireless transmission, so on the receiving end you can pattern match on radio data just like any other Haskell value. This lets you build distributed systems across multiple Pis where the type system spans the radio link.

Lisp Interpreter

The parser combinator library implements four typeclass instances (Functor, Applicative, Monad, Alternative) which lets you compose parsers readably:

parseExpr = parseNumber <|> parseBool <|> parseString
        <|> parseQuote  <|> parseList  <|> parseAtom

The <|> operator tries each parser in order and backtracks on failure. The Lisp itself is a simple eval/apply interpreter over S-expressions, with the standard primitives (car, cdr, cons, define, lambda, if, quote, arithmetic).

The fun part is the hardware FFI. The Lisp environment binds OS functions so you can do things like blink an LED or read a timer directly from the REPL. Running Lisp inside Haskell inside a combinator reducer on bare-metal ARM hardware is a sentence I never expected to write.

The Hard Parts

Haskell is simple

GC corruption. MicroHS’s garbage collector didn’t check for NULL pointers. On Linux, address 0x0 is unmapped, so dereferencing it crashes immediately and you notice. On bare metal, address 0x0 is the ARM exception vector table, which contains perfectly valid-looking bytes. The GC would interpret vector table entries as heap node pointers, follow them into nonsense memory, and silently corrupt the heap. The fix was adding boundary checks to keep the GC inside the actual heap region.

SD card initialization. The GPU had already initialized the EMMC controller before my kernel ran, leaving it in an unknown state with stale interrupt flags. Getting reliable SD card access meant clearing old state, retrying initialization with delays, etc.

Off-by-one in memory maps. The EMMC controller lives at 0x20300000. My peripheral region was mapped to 0x20000000-0x202FFFFF, and debugging this was so ghhg.

Thoughts

Haskell is genuinely good for and fun to write the the logic layer of an OS. It is nice to move every possible runtime error into compile time.

But it fights you on anything byte-level. Packing and unpacking structures, copying memory regions, bit-twiddling registers are all things C was born to do and Haskell was born to abstract away. The FFI bridge works, but it was my first time using it, so it felt awkward.

Logic belongs in Haskell, raw bytes in C, with FFI as the bridge. The kernel compiles to about 300KB.

Hello world in Haskell