How I got here
I had been reading the leanEthereum codebase for a few days — not looking for anything in particular, just mapping how the proving pipeline works end-to-end. The high-level circuits are beautiful. The Lean 4 formalization is genuinely impressive. But the more I read, the more I kept coming back to one folder: crates/backend/fiat-shamir/.
A day reading challenger.rs and verifier.rs turned up three bugs — different in severity, all rooted in the same assumption: that callers supply well-formed inputs. In a proof system, that assumption should never hold implicitly.
What Fiat-Shamir actually is — with the math
A prover $\mathcal{P}$ wants to convince a verifier $\mathcal{V}$ of a public statement $x$ without revealing the witness $w$. The classic structure is a three-move protocol:
Security rests on the verifier's challenge $c$ being unpredictable. But this requires the verifier to be online — useless for on-chain verification.
The Fiat-Shamir heuristic (1986) removes this requirement by replacing the live challenge with a hash of the transcript. The prover computes:
Here $H$ is a collision-resistant hash, $\text{ctx}$ is the public context (statement, protocol parameters, domain separators), and $a$ is the commitment. The challenge is now deterministic.
In a multi-round protocol like WHIR, the transcript $\tau$ grows with each round:
The whole construction depends on one property: the hash must be injective over the sequence of absorbed values.
If two distinct sequences produce the same hash state, two different proofs generate identical challenges — the proof is no longer bound to a unique statement. This is transcript malleability, and it is exactly what H-4 introduces.
How the sponge absorbs scalars
leanMultisig hashes with a Poseidon sponge. The sponge splits its internal state of $n$ field elements into two parts: the rate $r$ (the public absorption lane) and the capacity $c$ (the private security domain). Every time $r$ elements are absorbed, a permutation $\pi$ mixes them into the full state:
observe_scalars path. Input A has three elements; the fourth slot is zero-padded automatically by the chunking loop. Input B provides four elements, the last explicitly zero. The sponge sees identical buffers. No attacker-supplied zero is needed — the collision happens naturally whenever the input length is not a multiple of RATE.
H-4 — The collision bug that broke statement binding
The vulnerable code
pub fn observe_scalars(&mut self, scalars: &[F]) {
for chunk in scalars.chunks(RATE) {
let mut buffer = [F::ZERO; RATE];
for (i, val) in chunk.iter().enumerate() {
buffer[i] = *val;
}
self.observe(buffer);
}
}
The code is idiomatic Rust. It reads naturally. It is wrong in a subtle but devastating way.
Why this is a security violation
When the input length is not a multiple of RATE, the last chunk is zero-padded. A proof for public input [x₁, x₂, x₃] absorbs [x₁, x₂, x₃, 0] into the sponge. A proof for [x₁, x₂, x₃, 0] — a different statement, four explicit elements — absorbs the same buffer. Both produce identical challenge sequences.
The consequence: a proof for statement A is computationally valid for statement B whenever B's input is a zero-extended version of A's. On a multisig contract, an attacker could replay a valid approval for one signer set as if it were an approval for a padded superset — a completely different authorization.
The fix
// prove_execution.rs
prover_state.add_base_scalars(&[
vec![
whir_config.starting_log_inv_rate,
log2_strict_usize(memory.len()),
+ public_input.len(),
],
traces.values().map(|t| t.log_n_rows).collect(),
].concat());
Absorbing public_input.len() before the values makes this call-site injective — inputs of different lengths now hash to different states.
prove_execution.rs, not inside observe_scalars. The underlying API still zero-pads silently. Any future caller that omits the length prefix reintroduces the bug. A more defensive fix puts the length prefix inside observe_scalars so it cannot be forgotten.
M-1 — The bitmask that panics on 32-bit
The vulnerable code
pub fn sample_in_range(&mut self, bits: usize, n_samples: usize) -> Vec<usize> {
assert!(bits < F::bits());
let sampled_fe = self.sample_many(n_samples.div_ceil(RATE)).into_iter().flatten();
let mut res = Vec::new();
for fe in sampled_fe.take(n_samples) {
let rand_usize = fe.as_canonical_u64() as usize;
res.push(rand_usize & ((1 << bits) - 1));
}
res
}
Why this only breaks on 32-bit
The literal 1 in 1 << bits is inferred as usize. On a 64-bit host every shift up to 63 succeeds. On a 32-bit target — WASM runtimes, embedded provers, some zkVM environments — usize is 32 bits, so any bits >= 32 overflows:
- In debug builds: Rust panics with "attempt to shift left with overflow".
- In release builds: the shift wraps silently, corrupting the bitmask with no error signal.
The guard assert!(bits < F::bits()) checks against the field width — 64 for a 64-bit field. Values up to 63 pass. On 32-bit that is not enough.
The fix
Shift in u64 — always 64 bits wide — then cast:
let mask = (1u64 << bits) - 1;
let rand_usize = fe.as_canonical_u64() & mask;
res.push(rand_usize as usize);
The existing assert guarantees bits <= 63, so no additional guard is needed. One character — 1 becomes 1u64 — eliminates the entire class of 32-bit overflow risk.
M-2 — The overflow that turns PoW into a no-op
The vulnerable code
if self.challenger.state[0].as_canonical_u64() & ((1 << bits) - 1) != 0 {
return Err(ProofError::InvalidGrindingWitness);
}
At first glance this looks correct: verify that the leading bits of the state are zero, reject otherwise. But an integer edge case turns the whole check into dead code.
How bits = 64 is reached
bits is read directly from the WHIR configuration. At maximum security the field is 64-bit, so the code sets bits = F::bits() — exactly 64. This is not a caller mistake; it is the intended value at the highest grinding difficulty.
bits = 64, the expression (1u64 << 64) - 1 wraps to 0 in Rust release mode. The bitmask is 0, so state[0] & 0 == 0 is always true, the inequality != 0 is always false, and the check never rejects. Any nonce passes. The grinding requirement — the main cost imposed on a WHIR prover to deter spam — is completely bypassed at the very difficulty level where it matters most.
The fix
Guard the shift explicitly so that bits = 64 produces the correct all-ones mask instead of wrapping to zero:
let mask: u64 = if bits >= 64 {
u64::MAX
} else {
(1u64 << bits) - 1
};
if self.challenger.state[0].as_canonical_u64() & mask != 0 {
return Err(ProofError::InvalidGrindingWitness);
}
What I learned — and what you should check in your own code
None of these bugs reflect carelessness. All three are invisible to standard testing: H-4 requires two crafted inputs, M-1 only fires on non-x86-64 targets, M-2 only strikes at an exact parameter boundary. The lesson is not to test harder — it is to build APIs that make the mistakes structurally impossible.
1. The Fiat-Shamir injectivity checklist
✓ Are domain separators used?
✓ Can two different inputs reach the same transcript state?
On domain separators: leanMultisig initialises the sponge with a fixed protocol tag, which blocks cross-protocol replay. But the tag is a compile-time constant — it is not parameterised by circuit shape or WHIR folding configuration. Two proofs with different circuit sizes but the same field width can share an identical opening prefix. The separator works at the protocol boundary; it does not work at the instance level. That is a separate, lower-severity issue, but worth auditing in any fork.
2. Encode widths explicitly — never trust type inference for security arithmetic
M-1 and M-2 are the same bug in different clothes: a shift on an integer whose width is platform-inferred. The fix is identical in both cases: write the type. 1u64, not 1. Any arithmetic touching security parameters — bit widths, difficulty targets, field sizes — should use fixed-width types. The compiler will not warn you when platform-dependent sizes collide with security boundaries.
3. Test at the exact documented limit, not just below it
M-2 was only reachable at bits = F::bits() — the maximum. A test at bits = 63 passes. A test at bits = 64 catches the bug immediately. Whenever a function accepts a parameter with a documented maximum, add a test at that exact value. This applies to any bit-count, difficulty level, or field-size-derived input.
Summary
| ID | Finding | Severity | Status |
|---|---|---|---|
H-4 |
Transcript collision via zero-pad absorption | High | Fixed |
M-1 |
Bitmask shift panic on 32-bit | Medium | Fixed |
M-2 |
PoW grinding bypass via shift overflow | Medium | Fixed |
Issues: #171, #172, #173
Fix: ee7377f · Author: @this-vishalsingh