Building a Formal Verification Tool from Scratch: Part 1 (Die Hard Edition)

📚 Part 1 • Building a Formal Verification Tool from Scratch

+ 2 more articles in this series

→ Building a Formal Verification Tool from Scratch: Part 1 (Die Hard Edition)
○ Building a Formal Verification Tool from Scratch: Part 2 (The Virtual Machine)
○ Building a Formal Verification Tool from Scratch: Part 3 (Rethinking the Foundation)

View full series → 🔔 Get notified of new articles

I’ve decided to dive into the world of formal verification by building my own TLA+-inspired model checker from scratch using Odin.

Most programmers learn to think in recipes: step 1, step 2, step 3. But that’s not how you catch the bugs that unit tests miss. For those—the race conditions, deadlocks, and edge cases in concurrent and distributed systems—you need to think like a mathematician, not a chef. You define the rules of your system and let the computer discover what’s possible.

To truly understand how formal verification works, I’m building my own compiler and model checker from the ground up. Why? Because I want to grasp every detail of the process, design it specifically for systems programming, and demystify formal methods by making them practical and accessible. Plus, I want to practise systems programming whilst exploring languages like Odin and Zig—languages I’m currently learning—for building these kinds of tools.

This series documents that journey. We start today with the fundamental shift in mindset: moving from writing recipes to defining the rules.

The Journey Ahead

Over this series, I’ll be building every component needed for a working TLA-style model checker: from parsing and type systems to state space exploration and counterexample generation. We’ll start simple—like we are today—and gradually tackle the hard problems: handling infinite state spaces, optimising search strategies, and eventually verifying real distributed systems. Each post will document what I’ve learned and built, mistakes and all.

The Die Hard Problem

Do you remember the famous fountain scene from Die Hard with a Vengeance? Bruce Willis and Samuel L. Jackson are standing at a fountain with a bomb ticking. They have a 3-gallon jug and a 5-gallon jug. To stop the bomb, they need to place exactly 4 gallons of water on the scale.

Even if you haven’t seen the film, the puzzle works perfectly for understanding what we’re building here.

As programmers, our instinct is to jump straight to implementation: we write code that solves the puzzle we’ve already worked out on a whiteboard. We create a script—a recipe—that executes a specific sequence of moves:

Fill the 5-gallon jug.
Pour into the 3-gallon jug until it’s full.
Empty the 3-gallon jug.
…and so on.

This works if you already know the solution. But what if you’re designing a messaging protocol or a distributed cache? You don’t know the specific sequence of network delays and race conditions that will corrupt your data. If you knew the sequence, you wouldn’t have the bug in the first place. You can’t write a test for a scenario you haven’t imagined yet.

That is why we need a tool that explores every possible timeline and either confirms our solution is correct or shows us exactly where it breaks.

The Shift: Recipes vs Rules

To solve problems where we can be confident of system correctness, we need to change our approach. We need to “think above the code” as Leslie Lamport (creator of TLA+) puts it when he talks about the difference between programming and coding.

Instead of writing a specific recipe—a sequence of steps in your programming language—we define the rules of the system. What actions are possible? What constraints must hold? Then we let a computer explore every possible sequence of events that can arise from those rules.

This is the core philosophy of model checking (the engine behind formal verification). We don’t tell the computer what to do step-by-step. We tell it what is possible and let it figure out the steps.

In this first article, we’re going to build that engine. I’ll show you how to model the water jug problem, implement a state space explorer, and watch as the solution emerges from the rules we define. This isn’t the full TLA compiler yet—that’s coming in future instalments—but it’s the beating heart of how formal verification works. Once you understand this, hopefully everything else will make sense.

In the context of the water jugs, the rules are simple:

You can fill a jug.
You can empty a jug.
You can pour from one jug to another until the source is empty or the destination is full.

That’s it. We don’t try to solve the puzzle ourselves. We feed these rules into an engine, and the engine explores every possible timeline until it finds one where a jug contains exactly 4 gallons.

It’s less like writing a script and more like Doctor Strange viewing 14 million futures to find the one where we win.

Building a TLA-Inspired Model Checker in Odin

The core components of a TLA-style model checker are straightforward in concept:

State Vector: A data structure representing the entire system at a given moment.
Transition Relation: A function that generates all possible next states from the current state.
Solver: An engine that explores the state space to find paths to target states.
Trace Reconstruction: Once we find a solution, we backtrack to show the sequence of actions.
Type System & DSL: (Future work) A way to define states and transitions in a high-level, user-friendly manner.
Virtual Machine: (Future work) An execution environment to run the defined models efficiently.
Optimisation Techniques: (Future work) Methods to reduce the state space, such as state hashing and symmetry reduction.

Let’s start simple. For this water jug example, I’ve implemented the first four components: the state vector, transition relation, solver, and trace reconstruction. This gives us everything we need to understand the core mechanics. The type system, virtual machine, and optimisations will come in future articles as we tackle more complex problems.

You can find the complete code on GitHub, but I’ll focus on the key concepts here rather than every implementation detail.

I chose Odin because it’s a data-oriented systems language with manual memory management, making it perfect for building lightweight, fast tools where you want fine control over resources.

Defining the World: The State Vector

First, we define what the “world” looks like at any snapshot in time:

// distinct int creates a new type that can't be accidentally mixed with other ints
Gallons :: distinct int

State :: struct {
    small: Gallons, // The 3-gallon jug
    big:   Gallons, // The 5-gallon jug
}

In a more complex system, this struct would hold your database contents, server statuses, or network buffers. For now, it’s just two jugs.

Generating Possibilities: The Transition Relation

Here’s where the paradigm shift happens. This function doesn’t make a decision—it generates every possible next state from the current state:

next_states :: proc(s: State) -> [dynamic]State {
    states := make([dynamic]State)

    // Rule 1: We could fill the small jug
    append(&states, State{Gallons(SMALL_JUG_CAP), s.big})

    // Rule 2: We could fill the big jug
    append(&states, State{s.small, Gallons(BIG_JUG_CAP)})

    // Rule 3: Pour Big -> Small (calculate how much fits)
    {
        amount_to_pour := min(s.big, SMALL_JUG_CAP - s.small)
        new_small := s.small + amount_to_pour
        new_big := s.big - amount_to_pour
        append(&states, State{Gallons(new_small), Gallons(new_big)})
    }

    // (Rules 4-6 for emptying and reverse pouring omitted for brevity - see GitHub)

    return states
}

Notice there’s no if statement checking if this is the “right” move. We just append every valid legal action to the list. We’re not solving the puzzle—we’re defining the physics of the water jug universe.

Want to see the complete implementation with all six rules? Check the GitHub repo.

The Solver: Exploring Every Possible State

Now we need an engine to navigate this state space. Think of it as exploring every branch of a choose-your-own-adventure book simultaneously:

queue := make([dynamic]State)
append(&queue, initial_state)

visited := make(map[State]State)  // Maps each state to its parent
visited[initial_state] = initial_state

for len(queue) > 0 {
    current := queue[0]
    ordered_remove(&queue, 0)

    // Have we found a solution?
    if current.big == TARGET {
        final_state = current
        found = true
        break
    }

    // Explore all possible next states
    next_candidates := next_states(current)

    for next in next_candidates {
        if next in visited { continue }  // Already explored this branch

        visited[next] = current  // Record where we came from
        append(&queue, next)
    }
}

The visited map does double duty: it prevents infinite loops and records the entire path through the state space. Once we find a solution, we can walk backwards from the target to reconstruct the exact sequence of moves that worked.

Note: I’m using breadth-first search here because it finds the shortest solution, but the key insight isn’t the search algorithm—it’s the fact that we’re generating states from rules rather than hard-coding moves. You could use depth-first search, A*, or any traversal strategy. The model checking philosophy remains the same.

Reconstructing the Solution

The trace reconstruction walks backwards through the parent chain, then prints in forward order using recursion:

print_trace :: proc(end: State, parents: map[State]State) {
    if end == parents[end] {
        print_state(end)  // Reached the start
        return
    }
    parent := parents[end]
    print_trace(parent, parents)  // Recurse to print in order
    fmt.println("   |")
    fmt.println("   v")
    print_state(end)
}

This recursive approach naturally prints states in chronological order and gives us the complete solution path.

When the Solution Emerged

When I ran this program, something remarkable happened. The solver found a solution and output this trace (I’ve added annotations to explain each step):

[ small: 0, big: 0 ]
   |
   v
[ small: 0, big: 5 ]   <-- Filled Big
   |
   v
[ small: 3, big: 2 ]   <-- Poured Big into Small (leaves 2 in Big)
   |
   v
[ small: 0, big: 2 ]   <-- Emptied Small
   |
   v
[ small: 2, big: 0 ]   <-- Poured Big into Small
   |
   v
[ small: 2, big: 5 ]   <-- Filled Big
   |
   v
[ small: 3, big: 4 ]   <-- Poured Big into Small (leaves 4 in Big) -> SOLVED

Look at that third step: [0, 2]. The solver decided to empty the small jug.

Why? I didn’t tell it to. I didn’t program that strategy.

The solver found it because it was the only path that worked. It tried timelines where it filled the jug again—those led to dead ends. It tried timelines where it emptied the big jug—those went back to the start. The only timeline that progressed towards the goal involved emptying that small jug at precisely that moment.

This is when I truly understood what formal verification gives us. The solution emerged from the rules themselves. I defined the physical laws (the rules), and the solver discovered the solution.

Why This Matters for Real Systems

This is a trivial example, but it demonstrates something profound.

Imagine you’re building a distributed cache for a Node.js application. You’ve got multiple servers, network delays, and race conditions. In testing, everything works fine. But in production, under specific timing conditions you haven’t imagined, two servers might simultaneously update the same cache key with different values, corrupting your data.

That “Empty Small Jug” step might correspond to discovering that you must invalidate the cache entry during a network partition, even though it seems counterintuitive and hurts performance. You didn’t program that rule explicitly—the solver found it by exploring every possible timeline of network delays and concurrent requests. It’s the only path that prevents data corruption.

You can’t write a unit test for it because you haven’t thought of it. But if you define the rules of your caching protocol—network latency bounds, retry logic, consistency requirements—the solver explores the state space and shows you the exact sequence of events that leads to corruption. Or better yet, it confirms that no such sequence exists.

The beauty is that the solution emerges from the rules themselves. You’re not hard-coding behaviour; you’re discovering what behaviour is possible under your constraints. This is why model checking finds bugs that traditional testing misses. It doesn’t test the cases you thought of—it tests every case that can arise from your rules.

Of course, not every programming problem needs formal verification. If you’re building a simple REST API or a static website, traditional testing is perfectly adequate. Model checking shines when you’re dealing with concurrency, distributed systems, or protocols where the interactions between components create emergent behaviour you can’t easily predict. It’s a power tool—learn when to reach for it and when a regular unit or E2E test will do.

What Comes Next

We have a working engine, but it’s hardcoded for water jugs. To make this a real tool that can model distributed systems, consensus algorithms, or concurrent data structures, we need to handle any type of data, not just integers.

The next phase of this project involves building the type system and virtual machine to support a custom DSL. This will let us define complex states and transitions without writing Odin code for each new model. I’ll be documenting that journey as I learn and build.

This project is a learning journey as much as a technical one. I’m figuring this out as I go, documenting the dead ends alongside the breakthroughs. I’d love to have you along for the ride if that resonates with you—the kind of person who wants to understand systems from first principles.

Each article in this series will arrive in your inbox if you subscribe to my newsletter below. No fluff, just the technical details, the mistakes I made, and what I learned from them. Between articles, I share shorter updates and experiments on Twitter.