Profile Avatar

Vishal Singh

Blockchain Cryptography Security Researcher

zk vuln

Breaking LeanMultisig zkVM Shamir (Part 1)

A formally-verified circuit is only as secure as the transcript feeding it randomness. I found three bugs in LeanMultisig's Fiat-Shamir layer that prove this — one that lets a prover recycle an old proof against a different public input, two that silently corrupt or bypass security checks on 32-bit and max-difficulty targets. All three share the same root cause: trusting the caller to supply well-formed inputs.

March 28, 2026

How I got here

I had been reading the leanEthereum codebase for a few days — not looking for anything in particular, just mapping how the proving pipeline works end-to-end. The high-level circuits are beautiful. The Lean 4 formalization is genuinely impressive. But the more I read, the more I kept coming back to one folder: crates/backend/fiat-shamir/.

Every time I audit a ZK system, I start with the transcript. Not the circuits. Not the constraint system. The transcript. A circuit can be perfectly specified and formally verified, and the whole system still falls apart if the randomness feeding it is not unique. The transcript is the one place where a subtle mistake becomes a catastrophic one.

A day reading challenger.rs and verifier.rs turned up three bugs — different in severity, all rooted in the same assumption: that callers supply well-formed inputs. In a proof system, that assumption should never hold implicitly.

What Fiat-Shamir actually is — with the math

A prover $\mathcal{P}$ wants to convince a verifier $\mathcal{V}$ of a public statement $x$ without revealing the witness $w$. The classic structure is a three-move protocol:

Interactive Sigma Protocol
$$\mathcal{P} \xrightarrow{\text{ commitment } a} \mathcal{V} \xrightarrow{\text{ challenge } c \xleftarrow{\$} \mathbb{F}} \mathcal{P} \xrightarrow{\text{ response } z} \mathcal{V}$$

Security rests on the verifier's challenge $c$ being unpredictable. But this requires the verifier to be online — useless for on-chain verification.

The Fiat-Shamir heuristic (1986) removes this requirement by replacing the live challenge with a hash of the transcript. The prover computes:

Fiat-Shamir Transformation
$$c = H(\text{ctx} \| a)$$

Here $H$ is a collision-resistant hash, $\text{ctx}$ is the public context (statement, protocol parameters, domain separators), and $a$ is the commitment. The challenge is now deterministic.

In a multi-round protocol like WHIR, the transcript $\tau$ grows with each round:

Multi-Round Transcript
$$\tau_0 = H(\text{ctx})$$ $$\tau_i = H(\tau_{i-1} \| a_i) \quad \text{for each round } i$$ $$c_i = \tau_i$$

The whole construction depends on one property: the hash must be injective over the sequence of absorbed values.

Injectivity Requirement
$$H(\tau \| \mathbf{v}) = H(\tau \| \mathbf{v}') \implies \mathbf{v} = \mathbf{v}'$$

If two distinct sequences produce the same hash state, two different proofs generate identical challenges — the proof is no longer bound to a unique statement. This is transcript malleability, and it is exactly what H-4 introduces.

How the sponge absorbs scalars

leanMultisig hashes with a Poseidon sponge. The sponge splits its internal state of $n$ field elements into two parts: the rate $r$ (the public absorption lane) and the capacity $c$ (the private security domain). Every time $r$ elements are absorbed, a permutation $\pi$ mixes them into the full state:

RATE (r = 4 elements) s₀ s₁ s₂ s₃ CAPACITY (c) s₄ s₅ … sₙ Sponge State Input: [a, b, c] a b c 0 padded! XOR/add π Poseidon Permutation absorb New Rate τ₀ τ₁ τ₂ τ₃ New Capacity Challenge = τ₀
Figure 1: One absorption round of the Poseidon sponge.
The Collision: Two Different Inputs → Same State Input A: [x₁, x₂, x₃] x₁ x₂ x₃ 0 zero pad Input B: [x₁, x₂, x₃, 0] x₁ x₂ x₃ 0 IDENTICAL π sponge SAME STATE τ = τ'
Figure 2: Both inputs pass through the same observe_scalars path. Input A has three elements; the fourth slot is zero-padded automatically by the chunking loop. Input B provides four elements, the last explicitly zero. The sponge sees identical buffers. No attacker-supplied zero is needed — the collision happens naturally whenever the input length is not a multiple of RATE.

H-4 — The collision bug that broke statement binding

🔴 High · Issue #171 ✓ Fixed in ee7377f
Transcript Collisions from Zero-Padded Scalar Absorption
crates/backend/fiat-shamir/src/challenger.rs
This is the one I was most excited to find. It is not an edge-case crash. It is a violation of the core security property of the whole proof system.

The vulnerable code

pub fn observe_scalars(&mut self, scalars: &[F]) {
    for chunk in scalars.chunks(RATE) {
        let mut buffer = [F::ZERO; RATE];
        for (i, val) in chunk.iter().enumerate() {
            buffer[i] = *val;
        }
        self.observe(buffer);
    }
}

The code is idiomatic Rust. It reads naturally. It is wrong in a subtle but devastating way.

Why this is a security violation

When the input length is not a multiple of RATE, the last chunk is zero-padded. A proof for public input [x₁, x₂, x₃] absorbs [x₁, x₂, x₃, 0] into the sponge. A proof for [x₁, x₂, x₃, 0] — a different statement, four explicit elements — absorbs the same buffer. Both produce identical challenge sequences.

The consequence: a proof for statement A is computationally valid for statement B whenever B's input is a zero-extended version of A's. On a multisig contract, an attacker could replay a valid approval for one signer set as if it were an approval for a padded superset — a completely different authorization.

The fix

// prove_execution.rs
prover_state.add_base_scalars(&[
    vec![
        whir_config.starting_log_inv_rate,
        log2_strict_usize(memory.len()),
+       public_input.len(),
    ],
    traces.values().map(|t| t.log_n_rows).collect(),
].concat());

Absorbing public_input.len() before the values makes this call-site injective — inputs of different lengths now hash to different states.

⚠️ Nuance: the fix is at the call-site in prove_execution.rs, not inside observe_scalars. The underlying API still zero-pads silently. Any future caller that omits the length prefix reintroduces the bug. A more defensive fix puts the length prefix inside observe_scalars so it cannot be forgotten.

M-1 — The bitmask that panics on 32-bit

🟡 Medium · Issue #172 ✓ Fixed in ee7377f
Bitmask Shift Can Panic on 32-Bit Targets
crates/backend/fiat-shamir/src/challenger.rs
This one is sneakier. It would never show up in your CI on a standard x86-64 machine.

The vulnerable code

pub fn sample_in_range(&mut self, bits: usize, n_samples: usize) -> Vec<usize> {
    assert!(bits < F::bits());
    let sampled_fe = self.sample_many(n_samples.div_ceil(RATE)).into_iter().flatten();
    let mut res = Vec::new();
    for fe in sampled_fe.take(n_samples) {
        let rand_usize = fe.as_canonical_u64() as usize;
        res.push(rand_usize & ((1 << bits) - 1));
    }
    res
}

Why this only breaks on 32-bit

The literal 1 in 1 << bits is inferred as usize. On a 64-bit host every shift up to 63 succeeds. On a 32-bit target — WASM runtimes, embedded provers, some zkVM environments — usize is 32 bits, so any bits >= 32 overflows:

  • In debug builds: Rust panics with "attempt to shift left with overflow".
  • In release builds: the shift wraps silently, corrupting the bitmask with no error signal.

The guard assert!(bits < F::bits()) checks against the field width — 64 for a 64-bit field. Values up to 63 pass. On 32-bit that is not enough.

The fix

Shift in u64 — always 64 bits wide — then cast:

let mask = (1u64 << bits) - 1;
let rand_usize = fe.as_canonical_u64() & mask;
res.push(rand_usize as usize);

The existing assert guarantees bits <= 63, so no additional guard is needed. One character — 1 becomes 1u64 — eliminates the entire class of 32-bit overflow risk.

M-2 — The overflow that turns PoW into a no-op

🟡 Medium · Issue #173 ✓ Fixed in ee7377f
Shift Overflow Lets PoW Grinding Check Be Bypassed
crates/backend/fiat-shamir/src/verifier.rs
M-1 crashes or silently corrupts. M-2 is quieter and more dangerous — the verifier accepts every proof without complaint, at exactly the security level where you need it most.

The vulnerable code

if self.challenger.state[0].as_canonical_u64() & ((1 << bits) - 1) != 0 {
    return Err(ProofError::InvalidGrindingWitness);
}

At first glance this looks correct: verify that the leading bits of the state are zero, reject otherwise. But an integer edge case turns the whole check into dead code.

How bits = 64 is reached

bits is read directly from the WHIR configuration. At maximum security the field is 64-bit, so the code sets bits = F::bits() — exactly 64. This is not a caller mistake; it is the intended value at the highest grinding difficulty.

⚠️ When bits = 64, the expression (1u64 << 64) - 1 wraps to 0 in Rust release mode. The bitmask is 0, so state[0] & 0 == 0 is always true, the inequality != 0 is always false, and the check never rejects. Any nonce passes. The grinding requirement — the main cost imposed on a WHIR prover to deter spam — is completely bypassed at the very difficulty level where it matters most.

The fix

Guard the shift explicitly so that bits = 64 produces the correct all-ones mask instead of wrapping to zero:

let mask: u64 = if bits >= 64 {
    u64::MAX
} else {
    (1u64 << bits) - 1
};
if self.challenger.state[0].as_canonical_u64() & mask != 0 {
    return Err(ProofError::InvalidGrindingWitness);
}

What I learned — and what you should check in your own code

None of these bugs reflect carelessness. All three are invisible to standard testing: H-4 requires two crafted inputs, M-1 only fires on non-x86-64 targets, M-2 only strikes at an exact parameter boundary. The lesson is not to test harder — it is to build APIs that make the mistakes structurally impossible.

1. The Fiat-Shamir injectivity checklist

Are lengths encoded before data?
Are domain separators used?
Can two different inputs reach the same transcript state?

On domain separators: leanMultisig initialises the sponge with a fixed protocol tag, which blocks cross-protocol replay. But the tag is a compile-time constant — it is not parameterised by circuit shape or WHIR folding configuration. Two proofs with different circuit sizes but the same field width can share an identical opening prefix. The separator works at the protocol boundary; it does not work at the instance level. That is a separate, lower-severity issue, but worth auditing in any fork.

2. Encode widths explicitly — never trust type inference for security arithmetic

M-1 and M-2 are the same bug in different clothes: a shift on an integer whose width is platform-inferred. The fix is identical in both cases: write the type. 1u64, not 1. Any arithmetic touching security parameters — bit widths, difficulty targets, field sizes — should use fixed-width types. The compiler will not warn you when platform-dependent sizes collide with security boundaries.

3. Test at the exact documented limit, not just below it

M-2 was only reachable at bits = F::bits() — the maximum. A test at bits = 63 passes. A test at bits = 64 catches the bug immediately. Whenever a function accepts a parameter with a documented maximum, add a test at that exact value. This applies to any bit-count, difficulty level, or field-size-derived input.


Summary

ID Finding Severity Status
H-4 Transcript collision via zero-pad absorption High Fixed
M-1 Bitmask shift panic on 32-bit Medium Fixed
M-2 PoW grinding bypass via shift overflow Medium Fixed

Issues: #171, #172, #173
Fix: ee7377f · Author: @this-vishalsingh

Request an Audit