De Bruijn Indices: From Lambda Trees to Compiler Internals

This page explains the lambda‑calculus term λx.λy.zxy, its de Bruijn index form λλ521, how the counting works, and why this representation solves real problems for compilers and interpreters.

1. The Named‑Variable Term

The original term is:

λx. λy. z x y

This term contains:

Two nested lambda abstractions (λx then λy)
A body z x y, which is an application of three expressions
x and y are bound variables
z is a free variable

Tree Structure

The structure shows the two abstractions and the application of z to x and y.

2. The De Bruijn Index Term

The corresponding de Bruijn form is:

λ λ 5 2 1

In this representation, variable names are removed and replaced with numbers that indicate how many binders (λ’s) must be crossed to reach the variable’s definition.

Tree Structure

The structure is identical to the named‑variable tree; only the labels change.

3. Why the Numbers Are 5, 2, and 1

The variable ordering assumed is:

x, y, z, x, y, z, …

Using this ordering:

1 refers to the nearest binder → y
2 refers to the next binder → x
5 refers to the first z in the free‑variable list

Thus the body z x y becomes:

5 2 1

4. Bound vs Free: Why 3 and 4 Are Not Used Here

Bound variables use the “bucket” numbers created by λ’s; free variables use the later numbers. There are two binders (λx and λy), so the free‑variable list begins at index 3:


3 → x  
4 → y  
5 → z  
6 → x  
7 → y  
8 → z  
...

In the term λx. λy. z x y:

y is bound → climbs 1 λ → index 1
x is bound → climbs 2 λ’s → index 2
z is free → uses the next free‑variable index → 5

Indices 3 and 4 correspond to free x and y, which do not appear in this term, so they are not used here.

5. Baby‑Level “Bucket” Explanation

Every λ is a “bucket” on a stack. Inside the body, a variable does not say “I am x”, it says: “I live in bucket #1, #2, … above me.”

First λ (λx) → bucket #1
Second λ (λy) → bucket #2

Inside the body z x y:

y is in the nearest bucket → index 1
x is in the next bucket → index 2
z is in no bucket → it is free → index 5 (from the free‑variable list)

6. Final Mapping for the Example


z → 5   (free)
x → 2   (bound)
y → 1   (bound)

Therefore:

λx. λy. z x y
→
λ λ 5 2 1

7. What Problem Does This Solve for Compilers?

At first glance, de Bruijn indices look like “just replacing names with numbers”. In reality, they solve several concrete problems that compilers and interpreters face.

7.1 Variable Name Clashes

def outer(x):
    def inner(x):
        return x + 1
    return inner(x)

Two different variables are both called x. Humans can track which is which by looking at indentation and scope. A compiler must track this mechanically and precisely.

With de Bruijn indices, the inner variable is always “the nearest binder”, e.g. index 1. The outer one is index 2, and so on. Names no longer matter.

7.2 Renaming Should Not Change Meaning (α‑Equivalence)

λx. λy. x + y
λa. λb. a + b

These two functions are mathematically the same, just with different variable names. A compiler must recognise that they are equivalent.

With de Bruijn indices, both become:

λ λ 2 + 1

They are now literally the same sequence of symbols. No special α‑equivalence logic is needed: renaming bound variables does not change the index form.

7.3 Substitution and Variable Capture

(λx. (λy. x + y)) (y)

If substitution is done naively with names, the free y in the argument can be accidentally captured by the inner λy, changing the meaning of the program.

With de Bruijn indices, substitution is defined on numbers and binder depth. The rules are purely structural, which makes it much easier to avoid capture and reason about correctness.

8. A Python‑Flavoured Analogy

Python closures already use a similar idea internally: they refer to variables by position, not by the original source name.

def outer(a):
    b = 10
    def inner(c):
        return a + b + c
    return inner

Conceptually, the inner function can be thought of as using something like:

load freevar #0   # a
load freevar #1   # b
load local   #0   # c

The runtime does not need the original names a, b, c; it only needs to know “which slot” or “which position” to read from. This is the same spirit as de Bruijn indices: variables are identified by where they live, not by what they are called.

9. Big Picture

De Bruijn indices remove variable names from lambda terms.
They make α‑equivalence trivial: renaming bound variables does not change the term.
They make substitution more mechanical and less error‑prone.
They match how many runtimes and compilers already think about environments: as indexed slots.

Although the idea comes from early work in lambda calculus, it has become a practical tool for designing compilers and interpreters that can reason about code reliably and transform it safely.

De Bruijn Indices: From Lambda Trees to Compiler Internals

1. The Named‑Variable Term

Tree Structure

2. The De Bruijn Index Term

Tree Structure

3. Why the Numbers Are 5, 2, and 1

4. Bound vs Free: Why 3 and 4 Are Not Used Here

5. Baby‑Level “Bucket” Explanation

6. Final Mapping for the Example

7. What Problem Does This Solve for Compilers?

7.1 Variable Name Clashes

7.2 Renaming Should Not Change Meaning (α‑Equivalence)

7.3 Substitution and Variable Capture

8. A Python‑Flavoured Analogy

9. Big Picture

2. The De Bruijn Index Term