An incomplete guide to Folding: Nova, Sangria, SuperNova, HyperNova, Protostar

August 10th, 2023

Special thanks to arnaucube, Brecht Devos, Barak, Benedikt Bünz, Binyi Chen, and Ben Wan for review.

TL;DR

The goal, the whole zk industry is working on, – is to (i) decrease the proof generation cost as much as possible while (ii) preserving proof as small as possible and (iii) verification as efficient as possible. It’s possible through prover optimisation or, as an alternative, one can compress the data the prover proves.

Huge work was done on folding scheme techniques within the last several years by the zk scientific community. In this article, we explore the folding scheme technique that can be applied to achieve a more efficient approach to the IVC (Incrementally Verifiable Computation) when verifying a single proof verifies the entire computation chain. We also briefly explain what folding schemes can be used for and present the evolution up to the current state of the art.

Content

Recursive SNARKs: features and constraints
Folding – an alternative approach to recursion, and Nova – folding application for R1CS
Sangria – Nova for PLONKish circuits
SuperNova – generalized Nova
HyperNova – folding with sum-checks and customizable constraint systems (CCS)
ProtoStar – Nova (and Sangria) efficiency-preserving generalization

Let’s start with a naive circuit-building approach

In a naive approach, one builds a giant circuit that covers all opcodes and EVM changes.

Disclaimer: to learn what the opcodes and circuits are – check this article by PSE, to learn how different circuits are connected with each other in a ZK-EVM, check another article by PSE.

The problems with this approach

Prover memory requirement is proportional to n, the number of times the circuit was invoked;
Hard to parallelize/distribute proof generation;
Verifier time might also depend on n;
The prover time is proportional to n.

1. Recursion and aggregation: features and constraints

Recursive SNARK means verifying a SNARK proof inside of another SNARK proof. Recursion allows a ZK-SNARK prover to squeeze more knowledge into their proofs while maintaining the succinctness properties of a SNARK.

Recursive SNARKs applications

Example 1: Instead of producing proof of knowledge, one creates proof that they know a proof.

As a result, we have two circuits:

Inner Circuit (large): Prover proves that they know the witness. At this stage, proof generation is fast, but the proof is large (except for the case when the proof size is a constant);
Outer Circuit (small): Prover proves that they know the proof. At this stage, proof generation is slower (but it’s not crucial as in most cases the circuit is much smaller than the first), but the proof is small.

Thanks to recursive SNARKs, we get fast proof generation and short final proof. However, this approach works if the Verifier Circuit (outer circuit) is much smaller than the initial statement one wants to prove.

Example 2: use proof aggregation to prove many statements at once (ex., for ZK-Rollup, prove that all transactions in the block are valid).

To produce one monolithic proof for all statements, one needs to wait until they have all statements they need to prove (in a rollup example, a block should be filled with as many transactions as possible before the block is ready to be proven). Instead, thanks to recursive SNARKs, one can start generating proof once they get the first statement. And in the end, generate proof of proofs.

As in the ZK-Rollup example, it can work in the following way:

As a result, after all transactions are received, the prover generates only one final proof of proofs (aggregating other proofs), which is much faster than generating a monolithic proof for all transactions.

Example 3: IVC – Incrementally Verifiably Computation

IVC allows us to do proofs for long computations with relatively little memory.

A function F (ex., microprocessor) is iterated many times. At every step a new proof is generated: it proves that the computation to this point is correct.

The core idea of IVC

At step n:

the prover computes a new state s_n

and a proof π_n that proves prover has a witness (s_n-1, w_n, π_n-1) such that the current state (s_n) is correct relative to the previous state (s_n-1). That is, F(s_n-1, w_n) = s_n. And the proof of the previous state (π_n-1) is correct relative to the previous state (s_n-1). That is, V(vp, (n-1, s_0, s_n-1), π_n-1) = true.

The important thing is that in the end of every step, there is only one final output – state s_n. The knowledge of witnesses (w_0, …, w_n) is embedded into s_n. And at every step, there is a proof π that proves that the computation at each step up to the current step was done correctly. That is, we have π_1 at step 1, π_2 at step 2, π_3 at step 3, …, π_n at step n. The verifier can check that the computations at all previous steps up to the current step n were done correctly by verifying only one proof π_n.

Applications of IVC

Break a long computation into a series of small identical steps that significantly reduces the prover memory requirements;
Generate succinct proof that the current state of blockchain is correct. That is, to compress a validity proof of the entire blockchain into one succinct proof (ex., used by Mina to “inject” a proof into each block that all the previous transactions on the blockchain from genesis are valid);
F is one or more invocations of the Delay Function, together with IVC it constitutes the Verifiable Delay Function (ex., MinRoot);
ZK-VM: F is a step (any OPCODE supported by the machine) of the VM (ex., EVM, LLVM, WASM, etc.);
zkBridge: F validates state according to blockchain consensus rules, etc.

Limitations of recursion

However, the prover needs to build a proof for a circuit C that runs the entire verification algorithm V (vp, x, π). And this is expensive to verify evaluation proofs for polynomial commitments.

Okay, recursion allows us to do some tricks with generating proofs inside of other proofs. It is helpful to reduce proof generation time and proof size. But what if we could compress the statement we want to prove?

An evolution of ZK-SNARKs for semi-structured computations

Diagram source: “CCS & HyperNova with Srinath - Folding Schemes FTW” lecture by PSE.

2. Folding – an alternative to recursion, and Nova – folding instantiation for R1CS

Disclaimer 1: in 2020, the paper “Proof-Carrying Data without Succinct Arguments” was published by Benedikt Bünz, Alessandro Chiesa, William Lin, Pratyush Mishra, and Nicholas Spooner. It achieved the same properties that are described below as “cool things about folding”. However, the cost of it was 3 scalar multiplications instead of 2 (for folding).

Disclaimer 2: even though in Nova, folding scheme was instantiated for R1CS it can be generalized for other constraint systems. For example, as it was done for PLONKish circuits in Sangria (by Nicolas Mohnblatt). Check section 3, “Sangria – Nova for PLONKish circuits” for more details.

Idea: make pre-processing to compress the SNARK inputs.

Cool things about folding:

Doesn’t perform any polynomial divisions or multiplications (ex., via FFTs that require much memory and are computationally heavy);
Works with any type of elliptic curves (ex., secp, secq, Pasta, etc.);
F is specified with R1CS;
No trusted setup;
Compared to the recursion overhead, folding is expected to be 5-50x faster than Groth16, PLONK, Halo, etc. Prover time is dominated by two multi-exponentiations of size O(C), where C = |F|;
Verifier Circuit is const-size (two scalar multiplications);
Proof size is O(log C) group elements (few KBs).

Long story short: with folding one can “compress” two instances into one and add the proof that they were “compressed” correctly.

It meets both completeness and knowledge soundness requirements:

If the original witnesses, w1 and w2, are correct, then the new witness, w, is correct as well;
If the new witness w is correct for the folded instance x, then the original witnesses w1 and w2 for the initial instances x1 and x2 can be extracted.

High-level overview of “folding in a nutshell”

Originally, folding is an interactive protocol, as reflected on the diagram above where T is a cross-term, meaning that variables with indexes 1 and 2 are engaged in one term, and r is a random challenge.

The protocol can be converted into non-interactive using Fiat-Shamir. Then the prover generates randomness on its own, hashing x1, x2, and T, and also attaches the proof that the randomness was generated correctly.

Initially, Nova was designed for R1CS (remember, there are three common constraint systems: R1CS, PLONK, and AIR).

For R1CS program A, B, C, and public input x (where A, B, C – three matrices of m*m size), w is a valid witness if z = (x, w) such that (Az) ○ (Bz) = Cz, where ○ is a Hadamard Product.

To build a folding scheme for R1CS, we have

The R1CS instance: R1CS program A, B, C and public input x;
Public input x1 and witness z1 = (x1, w1);
Public input x2 and witness z2 = (x2, w2);
We expect that (Az) ○ (Bz) = Cz both for z1 and z2.

If we go straightforward (just fold):

Use linear combination with randomness r: x = x1 + r * x2;
Calculate z = z1 + r * z2 = (x1 + r * x2, w1 + r * w2);
Calculate (Az) ○ (Bz);
But instead of desirable (Az) ○ (Bz) = Cz, where Cz = Cz1 + r * Cz2, we will get (Az) ○ (Bz) = Cz1 + r^2 * Cz2 + E, where E includes all some cross-terms and E is the following:

To overcome this, one can use Relaxed R1CS:

To build a folding scheme for relaxed R1CS, we have

The R1CS instance: R1CS program A, B, C and public inputs (x, u, E), where a scalar c and a vector E are noise parameters;
Public inputs (x1, c1, E1) and witness z1 = (x1, w1);
Public inputs (x2, c2, E2) and witness z2 = (x2, w2);
We expect that (Az) ○ (Bz) = u(Cz) + E both for z1 and z2.

If we go straightforward (just fold):

Use linear combination with randomness r: (i) x = x1 + r * x2, (ii) u = u1 + r * u2, (iii) E = E1 + rT + r^2 * E2 (where T is a cross-term);
Calculate z = z1 + r * z2 = (x1 + r * x2, w1 + r * w2);
Calculate (Az) ○ (Bz);
And now (Az) ○ (Bz) = cu(Cz) + E is true, meaning that w is a valid witness for the relaxed R1CS instance (x, u, E);
However, the size of E is related to the size of the total computation: that is, E might be much larger than x. So, for the verifier, E is too large to process it.

To handle this issue, one can use the committed relaxed R1CS: that is, to use a small commitment to E, comm(E), instead of using large E. It solves the problem as while E depends on the size of the total computation, commitment to E does not.

To build a folding scheme for committed, relaxed R1CS, we have

The R1CS instance: R1CS program A, B, C and public inputs (x, u, comm(E)), where comm(E) is the commitment to E done using a homomorphic commitment scheme;
Public inputs (x1, u1, comm(E1)) and witness (z1, E1, rE1), where rE is the randomness used for comm(E);
Public inputs (x2, u2, comm(E2)) and witness (z2, E2, rE2);
We expect that (Az) ○ (Bz) = u(Cz) + E both for z1 and z2, and comm(E) is the commitment to E using r for E1, E2, rE1, and rE2.

If we go straightforward (just fold):

Use linear combination with randomness r: (i) x = x1 + r * x2, (ii) u = u1 + r * u2, (iii) comm(E) = comm(E1) + r * comm(T) + r^2 * comm(E2) (where instead of T we also use the homomorphic commitment to T with some randomness rT);
Calculate (i) z = z1 + r * z2 = (x1 + r * x2, w1 + r * w2), (ii) E = E1 + rT + r^2 * E2, (iii) rE = rE1 + r * rT + r^2 * rE2;
Calculate (Az) ○ (Bz);
(Az) ○ (Bz) = u(Cz) + E is true.

For detailed (Az) ○ (Bz) calculation, check the slides, and for a true deep dive into Nova (and proof of completeness and knowledge soundness), check the original Nova paper.

Folding was first implemented in Nova, that gave the Nova proving system substantial advantage.

Comparison of different modern proving systems

Folding scheme application

Use the committed relaxed R1CS to modify the IVC

We augment function F to F’, adding the check that the folding at the previous step was done correctly.

Roughly: Nova recursion overhead is 10x lower than in SNARK-based IVC with state-of-the-art (as by July, 2022) trusted setup SNARK and 100x smaller than with a SNARK without trusted setup.

However, this is an interactive protocol. To make it non-interactive, one can use Fiat-Shamir.

Prover generates a random challenge on his own. The random challenge is the hash of two folded instances and commitment to the cross-term;
Prover also generates a proof that the random challenge was generated correctly;
However, the verifier doesn’t have the inputs that the prover used for random challenge generation. Everything it has – the very last result of folding.
That’s why the prover should also prove (i) that all the foldings it made to get the inputs for random challenge generation (ex., message folding, noise parameters folding, cross-terms folding, etc.) were done correctly, (ii) that folded instances are linked together correctly, that is that the output of step i is the input to the step i + 1;
That is done by augmenting the initial R1CS program A, B, C into A’, B’, C’ to verify that (i) witnesses are correct for the instances being folded and (ii) the folding was done correctly. This program takes three instances: x_i, x_1→i, x_1→i+1, and checks that (i) x_1→i+1 is the correct folding of x_i and x_1→i and (ii) x_i is a valid instance;
In fact, it checks that function F was computed correctly and also does two multiplications in group G (one to check the product r * x and one r * com(T)). In a “normal” recursion, the prover needs (i) to evaluate the function F and (ii) run the whole SNARK verification circuit;
In Nova, running the whole verification circuit is substituted by two multiplications in group G + some simple hashing (hashing is cheap).

That is much (~10x) faster and cheaper than with a recursive SNARK. We will have only 20k constraints – that is cheap!

Furthermore, when one produces a proof with Nova, the size of the proof is proportional to one step of the recursive computation.

Some use cases for using IVC modified with folding scheme

VDF – Verifiable Delay Functions. For VDF, function F is some delay function that takes some non-trivial sequential computation. For example, computing a cube root inside the finite field.
Parallelization (ex., distributed proving between different entities or between different CPUs or between different cores in one CPU).
A back-end for Turing-complete SNARK languages like Lurk, where function F executes a step of the program.
Rollups, where function F is the state transition function – it takes as input the previous Merkle Root and some transactions, executes transactions, and outputs an updated Merkle Root.

Core Nova challenges

Proofs are of size O(|C|), which depends linearly on the size of the circuit.

Nova experimental implementation by Microsoft research team.

3. Sangria – Nova for PLONKish circuits

Okay, in Nova, the folding scheme was instantiated for R1CS, but what about other constraint systems? Sangria (suggested by Nicolas Mohnblatt from Geometry) shows that Nova’s folding scheme can be applied to any quadratic constraint system, particularly PLONKish circuits.

Applying folding for PLONKish circuits

The Verifier work is constant in the depth of the circuit;
Constants are worse than in Nova (this is the cost of more flexible arithmetization).

Short, high-level recap of what is PLONK:

Fixed-size grid;

Cells are filled with numbers;

There is a set of rules we need to follow. The set of rules includes (i) copy constraints: cells with the same color should have the same values;

(ii) gate equation: each row i should satisfy the following equation:

Selectors and the rules define the circuit. Once we fixed the selectors and the rules, the circuit is fixed.

Naive Random Linear Combination application:

Copy constraints are preserved (this can be algebraically derived from the perfect completeness, check the Sangria technical note for more details);
But the gate equation doesn’t hold because of the non-linearity.

To fix it, we use (as in the original Nova) relaxation for the gate equation

Witness W = (w_a, w_b, w_c) plus noise vector e;
Instance: public inputs X = (x_a, x_b, x_c), a scalar u, and commitments comm(w_a), comm(w_b), comm(w_c), comm(e).
Relaxed gate equation – we now have a homogeneous function:

Stage 1: Folding

trace’ + r * trace’’ = final_trace (exactly as in the diagram above);
u’ + r * u’’ = u.

Stage 2: put folding results into the relaxed gate equation

Gate(trace’) + r * t + r^2 * Gate(trace’’) = Gate(final_trace)

However, t is pretty massive (how you can see in the equality several rows above) hence a bit hard to manipulate, so we use a trick with defining e to make it easier (and just get rid of t):

e’ - r * t + r^2 * e’’ = e.

Protocol

Prover computes the cross-term t and commits to t;
Prover sends comm(t) to verifier;
Verifier sends some randomness;
Both the prover and the verifier do linear combinations: Prover and verifier output folded instance: X’ + r * X’’ = X u’ + r * u’’ = u com(w_a) = com(w_a’) + r * com(w_a’’) com(w_b) = com(w_b’) + r * com(w_b’’) com(w_c) = com(w_c’) ++ r * com(w_c’’) com(e’) - r * com(t) + r^2 * com(e’’) = com(e)

Prover outputs folded witness: W’ + r * W’’ = W e’ - r * t + r^2 * e’’ = e r_a’ + r * r_a’’ = = r_a r_b’ + r * r_b’’ = r_b r_c’ + r * r_c’’ = r_c r_e’ + r * r_t + r^2 * r_e’’ = r_e

For a verifier, to be able to work with commitments, it should use an additively homomorphic commitment scheme.

For a detailed description of the folding scheme for Relaxed PLONK, check this note by Nicolas Mohnblatt from Geometry or Sangria Technical Note.

Sangria Performance

Verifier makes four additions of the commitments (elliptic curves point additions);
In fact, we can work with wider circuits (meaning that each gate has more than two wires coming in and more than one wire coming out) by committing to extra columns. The verifier will need to perform one extra point addition per trace column;
We can also deal with higher-degree custom gates. Each extra degree requires the prover’s message to include one additional commitment while the verifier will perform one extra point addition per degree. It will also change how we handle the noise term. A gate of degree d will be relaxed with powers of u up to u^d. The new noise’s folding equation will be

12k constraints (out of 20k constraints in total) in Nova’s circuit are point additions that are the most expensive. So the question is: will the benefit of wider circuits and custom gates outweigh the cost of extra point additions (that the verifier will impose)?

4. SuperNova – generalized Nova

Okay, in Nova, one function, F, can be applied again and again. But what if there are different functions at every step? That is what SuperNova allows us to do!

Furthermore, while Nova proofs are of size O(|C|), that is, depending linearly on the size of the circuit, in SuperNova, the cost of proving a step of a program is proportional only to the circuit size representing the requested instruction.

SuperNova is a Nova generalization to semi-structured circuits.

Cool things about SuperNova

Each step applies on any of the possible functions {F_1, F_2, F_3, etc…};
Proof generation does not have to follow the sequential path (possible using binary tree optimization).

Under SuperNova, non-uniform IVC (a generalization of IVC) is introduced.

We have

A collection of k + 1 non-deterministic functions F_1, F_2, …, F_k, that are kind of circuits encoding different instructions, and a special function φ that helps pick one of the instructions to execute (φ is a selector circuit);
An initial input s_0.

The goal is

Prove that s_n is the output of applying circuits C_j n times, where at step i, j = φ(w_i-1, s_i-1).

To dive deeper into SuperNova, check the original paper.

Comparing SuperNova to recursive ZK-SNARKs with universal circuits and ZK-SNARKs with trimmed circuits:

In trimmed circuits, instead of using a universal circuit, one just picks a circuit that was executed and, as a result, pays only for what was executed (not for the whole circuit execution);
By overhead, we mean extra growth in witness size, public IO (Interactive Oracle) size, number of constraints, NP checker time (NP checker looks at witness and constraint system and checks that it is satisfiable), the prover time, etc.
In the table above, by recursion overhead, we mean that in the case of using recursion, some primitives are not very efficient. For example, memory where one has to use either Merkle tree or a multi-set based hash function. Both result in at least a few hundred constraints per memory operation. Whereas for the non-recursive SNARK, we can use, for example, permutations that have a much lower cost to access memory. However, for recursive SNARKs, this recursion overhead can be mitigated if each instruction is doing at least 10k gates.

SuperNova experimental implementation by jules.

5. HyperNova – folding with sum-checks and customizable constraint systems (CCS)

We will use the sum-check protocol in the explanations of the next chapter (HyperNova). To stay on the same page, let’s briefly describe what the sum-check is.

A brief explanation of the sum-check protocol

Informal intuition: we can think about the sum-check protocol as a tool that turns the polynomial evaluation of multiple points into the evaluation of one point.

Formal intuition:

For detailed explanation, check the book “Proofs, Arguments, and Zero-Knowledge” by Justin Thaler, section 4.1 “The Sum-Check Protocol”.

Verifier has oracle access to a k-variate polynomial g over field F;
Verifier’s goal is to compute the sum of polynomial g evaluations over k variables [b_1, …, b_k];
Naively, the verifier can just ask the oracle for the polynomial g evaluation at each value of b and just sum the results. It will take 2^k oracle evaluations and 2^k field additions;

With the sum-check protocol, the verifier delegates a substantial part of the work to the prover, and the verifier itself only checks k prover messages and then makes one g polynomial evaluation (oracle query) at the very end;
At stage 0, the prover sends the verifier claimed answer c_1. The sum-check protocol must check that

In each next round, the prover sends the verifier a univariate polynomial. For example, in the first round, the prover sends the verifier the polynomial s_1(x_1), that equals to

To specify the polynomial, the prover sends 3-4 coefficients (3-4 field elements);

The verifier wants to check that indeed (i) s_1(x_1) = H_1(x_1), (ii) it is consistent with the true answer being c_1 (was sent at the 0 stage). So, the verifier checks (i) c_1 = s_1(0) + s_1(1), (ii) s_1 = H_1 at a random point r_1. s_1(r_1) can be computed directly from the prover’s first message, but not the H_1(r_1);
The verifier needs to check

To calculate it, the prover and the verifier can recursively apply the sum-check protocol and proceed round by round through the protocol. In each round, another variable of g is bound to the randomly chosen element r_1. They keep doing it round by round until in the kth round the final variable of g is bound to a random element r_k;

In the kth round, the prover sends the univariate polynomial s_k(x_k) claimed to equal H_k = g(r_1, …, r_k);
The verifier checks that s_k-1(r_k-1) = s_k(0) + s_k(1);
And then, the final statement the verifier needs to check is s_k(r_k) = g(r_1, …, r_k). This check costs the verifier one oracle query (which it can perform on its own).

CCS

CCS is a constraint system that generalizes Plonkish, R1CS, and AIR without overhead.

SNARKs originally described for R1CS extend easily to CCS (and hence PLONKish):

Spartan (R1CS) → SuperSpartan (CCS)
Marlin (R1CS) → SuperMarlin (CCS)

However, SNARKs originally described for PLONKish (Plonk, HyperPlonk, etc.) cannot handle CCS (or even R1CS) without overhead. The high-level reason is that both CCS and R1CS have linear combinations, while the PLONKish proof systems are restricted by limited and specific types of linear constraints.

CCS description

To specify the circuit, CCS has t matrices M_1, M_2, …, M_t;
To allow customization of constraints, it has q multisets S_1, S_2, …, S_q;
Public input: x;
A witness W such that Z = (W, x, 1).

The circuit is satisfying if the sum of Hadamard products equals zero, that is

One can say that R1CS is a special case of CCS where

Circuit description: M_1 = A, M_2 = B, M_3 = -C;
S_1 = {1, 2}, S_2 = {3};
Public input: x;
A witness W such that Z = (W, x, 1);
Az ○ Bz = Cz.

Committed CCS (CCCS)

To fold two CCSs into one, we will use CCCS (Committed CCS) where C = comm(w) is also in the instance.

However, if we try to do Nova-style relaxation, the number of cross-terms would be proportional to the degree of the constraints. That is not very efficient.

So, instead of folding two CCCSs, we can linearize one of the CCCS to get LCCCS. And then fold CCCS x LCCCS → LCCCS (the result will be a linearized CCCS as well).

By linear CCCS, we mean that the constraint is linear, so the size of the multiset is one. That is sufficient to construct IVC but is not NP-complete.

Linearized Committed CCS (LCCCS)

CCS consists of

t matrixes M_1, M_2, …, M_t;
q multisets S_1, S_2, …, S_q;
Public input: x;
Commitment C = comm(W);
Scalar u;
Random challenge r;
Scalars (v_1, v_2, …, v_t) that compress all the rows of the matrix:

A witness vector W such that Z = (W, x, u).

High-level CCS folding diagram

Step 1: CCCS x LCCCS → LCCCS folding: use one invocation of the sum-check protocol to reduce CCCS to LCCCS over the same random challenge

This is an interactive protocol;
Two folded instances should have the same structure: the same matrixes M and multisets S;
Verifier starts with LCCCS (C’, u’, x’, r’, v_1’, …, v_t’) and CCCS (C’’, x’’), makes linear combinations, and outputs two LCCCSs, (C’, u’, x’, r, σ_1’, …, σ_t’) and (C’, u’, x’, r, Θ_1’, …, Θ_t’) with the same random challenge r;
Prover inputs and outputs W’, W’’.

Step 2: For LCCCS x LCCCS → LCCCS folding – all the checks are linear constraints

Verifier sends to the prover random challenge γ (the random challenge is the same for all i);
Prover takes W’, W’’, makes a linear combination W’ + γ * W’’ = W and outputs W;
Verifier takes (C’, u’, x’, r, v_1’, …, v_t’), (C’’, u’’, x’’, r, v_1’’, …, v_t’’) and makes linear combinations C’ + γ * C’’ = C u’ + γ * u’’ = u x’ + γ * x’’ = x v_i’ + γ * v_i’’ = v_i and outputs (C, u, x, r, v_1, …, v_t).

Correctness of [(C, u, x, r, v_1, …, v_t), W] holds by linearity:

HyperNova

HyperNova is a prover-efficient and recursive ZK-SNARK for proving incremental computations where each step is expressed with committed CCS.

HyperNova uses an early version of SuperSpartan to construct a new folding scheme for customizable constraints. The key idea is to devise a polynomial that encodes claims about high-degree constraints along with some prior claims. When the sum check is applied to that polynomial, the resulting claims are in a form suitable for folding without needing any cross-terms.

HyperNova can be constructed in two ways: (i) a direct approach: directly builds on the non-interactive multi-folding scheme for CCS, (ii) another approach uses the multi-folding scheme in conjunction with Nova as a black box.

For a detailed HyperNova protocol description, check the original paper.

HyperNova costs

Prover cost: 1 MSM to commit to witness + 1 sum-check;
Verifier cost (recursive circuit): 1 scalar multiplication + a logarithmic number of hashes.

HyperNova experimental implementation by arnaucube and asn.

nlookup: a lookup argument for Nova

Suppose there is a table T of size n. Consider m variables v_1, …, v_m in a CCS instance, and we wish to enforce that those values are contained in T.

lookup: to store T as a Merkle tree for which the circuit gets as public input a commitment. Then to prove that a certain value is in T, the prover could supply as non-deterministic advice to the circuit a Merkle proof of inclusion, and the circuit verifies the Merkle proof of inclusion. It requires O(m * log(n)) hash evaluations inside the circuit.
plookup: the number of constraints is O(max(m, n)) that works if n ≈ m, but doesn’t work for recursion where a particular recursive step may perform m << n lookup operations;
nlookup: for m lookups on a table of size n entries, nlookup requires O(m * log(n)) multiplications and O(log(n)) hash operations inside a circuit (with small constants), and the prover performs O(n) finite field operations. In particular, the prover does not commit to any additional polynomials. That is a perfect tool for expressing finite state machines efficiently with Nova and HyperNova. For a detailed nlookup description, check section 7 of the HyperNova paper.

6. ProtoStar – Nova (and Sangria) efficiency-preserving generalization

ProtoStar IVC combines a general recipe for efficient folding schemes and simple special-sound protocols* for expressive relations that allows to have a recursive circuit supporting (i) high-degree gates, (ii) arbitrarily large lookup tables, (iii) arbitrarily many opcodes in the VM with only 3 group operations.

Disclaimer: accumulation and folding refer to a similar technique under the hood. The definitions difference appeared due to the different papers by different authors exploring similar phenomena roughly at the same time.

The goal of ProtoStar is roughly speaking similar to the goal of HyperNova: as in Nova (and its generalisation – Sangria), the prover complexity and recursive circuit size are both proportional to the degree of the gate, ProtoStar and HyperNova aim to minimize it (ideally, to make a constant). However, in ProtoStar, unlike in HyperNova, (i) the overhead for supporting lookups is close to negligible, (ii) sum-check protocol is not required.

Diagram source: ProtoStar paper.

The compiler only requires vector commitments and thus does not require trusted setup, pairings, or FFTs.
The prover in each accumulation/IVC step is logarithmic in the number of supported circuits and independent of the table size in the lookup (comparing to d * log(n) for Nova).
The lookups are more efficient as instead of a product argument, one uses a log derivative argument. Hence, it is possible to put zero coefficients in many places. The recursive circuit is dominated by 3 group scalar multiplications and a hash of d field elements, where d is the degree of the highest gate;
High-degree gates are possible without additional MSM costs. Instead of having multiple gates, one can add them up with the power of the random element b (verifier challenge), and it ends up having only one gate of degree d+1. The gate of degree d+1 can be handled because it is only a single polynomial (check the 4th step of the “General ProtoStar Recipe for folding” (next section) for more details). However, one still needs d cross-terms, but instead of using commitments, one can use field elements as a trivial commitment scheme to the 1-dimensional vector. These cross-terms are much cheaper to process;
In the verifier circuit size, ProtoStar does a better job than HyperNova: if non-native field operations take place, the necessary non-native field operations simulation might be quite expensive without lookup support. Protostar has cheap lookup support (O(d) in ProtoStar vs O(d*log(N) in HyperNova) so that the non-native field op circuit can be more efficient. However, if one uses SNARK-friendly hash functions, the difference is insignificant.
The techniques in ProtoStar can be generalized for PCD (Proof-Carrying Data), that is, to prove not only a chain of computations but the whole tree/DAG (Directed Acyclic Graph) of computations without sacrificing efficiency.
It supports efficient circuit branching. That is, the circuit is always proportional to the entire set of opcodes. That is, creating circuits with multiple branches that are (i) turned on and off by selectors and (ii) organized in such a way that “turned off” branches do not cost anything in the proving process;
It can be used with an IVC compiler, recently proposed by Wilson, for further efficiency improvements.

General ProtoStar Recipe for folding

1. Build a multi-round special-sound protocol* Π for relation R.

*A side remark on the special-sound protocol

Compared to IVC, a special-sound protocol is easier to construct as it (i) doesn’t require succinctness, (ii) can be interactive, (iii) can represent complex relations (ex., lookup relation).

k-special-sound protocol can extract valid witness from k rounds of communication.

Consider an example of a 1-special-sound protocol:

Source: a workshop “Deep dive into Protostar paper & protocol” with Binyi Chen by PSE.

2. Transform the protocol Π into another interactive protocol CV[Π] with a compressed verifier for relation R.

3. Transform this protocol into a non-interactive argument NARK(CV[Π]) using Fiat-Shamir.

4. Build a folding scheme for the verifier check of NARK(CV[Π]).

We have two checks, a prover check and a folding check, and we would love to fold them into one check;
Instance x for prover includes commitments C, random challenges r, and u = 1. Instance x for folding includes commitments C, random challenges r, slack u, and noise vector E;
Witness w for prover as well as for folding includes prover messages m;
The accumulation check is a homogeneous equation calculated by the verifier, while the prover’s NARK check has elements of different degrees from 0 to d, where d is the max degree of each check.
To goal is to fold both checks into one equation. We can do it using polynomial interpolation:

However, under this approach, many group operations depends on the max degree of each check d. And we want them to be independent.

Compressing Verification Checks: compress a number of verifier checks L into one constraint using Random Linear Combination. However, the degree of this check ends up being much larger, d+L. Furthermore, the terms in the check are not homogeneous anymore. So, folding can’t be applied anymore.
To fix the homogeneity problem, instead of using some coefficients with different degrees, the verifier can substitute all of them with another variable so the check degree will be d+1 and the check will be homogeneous. The prover will send an array with initial coefficients with different powers at the end. However, under this technique, the verifier needs to check all these initial coefficients. So, O(l^1/2) extra group operations are added to the prover work and O(l^1/2) deg-2 checks for correctness are added to the verifier work, where l is the number of initial verifier checks.

Following the ProtoStar accumulation/folding scheme, the number of Group operations is independent of the degree of the verifier check.

For a detailed ProtoStar description, check the original paper.

Other exciting, relevant stuff using folding that wasn’t covered in this article for further exploration:

Origami – an explicit folding scheme for Halo2 lookup arguments by Yan Zhang and aardvark;
ParaNova – parallelizing Nova by oskarth;
MoonMoon – adding permutation argument to Nova and using Nova over integer coefficients by using Pedersen commitments in RSA group by Lev Soukhanov;
ProtoGalaxy – efficient ProtoStar-style folding of multiple instances: a paper by Liam Eagen and Ariel Gabizon and proof-of-concept implementation by arnaucube;
A Nova attack explanation article “The zero-knowledge attack of the year might just have happened, or how Nova got broken” by David Wong.

Conclusion and what is next?

So, HyperNova and ProtoStar are presumably not the end game. And folding implementations for existing zk protocols is still in its early stage and is a work in progress. To track how folding implementations are going, check the github repo by lurk lab that aggregates all the folding news, including freshly baked implementations. Folding and all these awesome Novas were introduced by Srinath Setty from Microsoft research, keep an eye on his twitter to learn what is next. Huge educative works on folding and its implementations are being done by zeroknowledge.fm (check its zkStudyClub) and PSE group from the EF (check their YouTube and keep an eye on upcoming research events on their Discord server).

Sources:

“Recursive zkSNARKs: Exploring New Territory” article, by 0xPARC
“ZKP MOOC Lecture 10: Recursive SNARKs” lecture and slides, by Dan Boneh
“Nova: Recursive Zero-Knowledge Arguments from Folding Schemes” paper, by Abhiram Kothapalli, Srinath Setty, Ioanna Tzialla
”Module Fourteen: Nova Crash Course” ZK Whiteboard Session with Justin Drake, by zeroknowledge.fm
A twitter thread summarizing key insights on IVC and folding schemes, by @0xevevm
“Nova: Recursive Zero-Knowledge Arguments from Folding Schemes” lecture with Srinath Setty, by Protocol Labs
“Supernova” lecture with Srinath Setty, by zeroknowledge.fm
“SuperNova: Proving universal machine executions without universal circuits” paper, by Abhiram Kothapalli, Srinath Setty
“CCS & HyperNova - Folding Schemes FTW” lecture with Srinath Setty, by PSE group
“HyperNova: Recursive arguments for customizable constraint systems” paper, by Abhiram Kothapalli, Srinath Setty
A twitter thread introducing ProtoStar, by Benedikt Bünz
“Protostar: Generic Efficient Accumulation/Folding for Special-sound Protocols” paper, by Benedikt Bünz, Binyi Chen
A twitter thread summarizing ProtoStar, by levochka.eth
zeroknowledge.fm podcast, episode 280: ProtoStar with Benedikt Bünz and Binyi Chen.
A workshop “Deep dive into Protostar paper & protocol” with Binyi Chen by PSE.