Special thanks to Ben Wan, Matthew Finestone, and Brecht Devos for feedback and review.
Introduction
ZK-SNARK Definition
The Properties of ZK-SNARKs
The PLONKish ZK-SNARK Proof Systems
3.1 The PLONKish ZK-SNARKs in a Nutshell and Their Role
3.2 Commitment Schemes
3.3 Interactive Oracle Proof
3.4 Fiat-Shamir Heuristic
3.5. How the PLONKish ZK-SNARK Proof Systems Work
3.6 A Bit About Advanced PLONKish ZK-SNARK Aspects
References
A ZK-SNARK is a cryptographic proving system. It allows some entity to prove that something is true, without revealing other information.
ZK-SNARKs are useful in several applications and domains like blockchains and verifiable computing. One prominent blockchain application is its use in ZK-Rollups.
ZK-Rollups are layer-2 blockchains built atop other blockchains (such as Ethereum) that utilize ZK-SNARKs (or other cryptographic proving systems) to prove the validity of state transitions. That is, with each new network update (when a new batch of transactions is added) the new network state is computed and the proof of this state’s validity is generated. Crucially, only one entity needs to generate this proof, after which anyone can be convinced of its validity.
The desirable properties that ZK-Rollups provide are typically (i) scalability and/or (ii) privacy preservation. When scalability is the sole desired property, ZK-Rollups are sometimes referred to as validity rollups.
To compute the proof in ZK-Rollups designed for scaling Ethereum’s EVM, one uses the ZK-EVM. By strict definition, ZK-EVM is the set of cryptographic programs (circuits) that is responsible for generating a Zero-Knowledge Proof (ZKP), though at times colloquially refers to the entirety of an EVM-capable general purpose ZK-Rollup.
In this article we will dive into what exactly a ZK-SNARK is and how it works.
SNARK is a succinct proof that the statement is true. Formally, it proves the knowledge of the execution trace of the algorithm, which produces the correct solution for a certain problem. To be more exact, it proves the knowledge of the non-public and non-constant values of the execution trace. These values as a whole are usually called a witness. The elements of the witness, which are the parts of the input of the algorithm, constitute independent witnesses, since they exist before the execution of the algorithm and are not determined by other elements of the execution trace. The public non-constant values of the execution trace, which specify the instance of the problem solved by the algorithm, we call a public statement.
ZK-SNARK stands for Zero-Knowledge Succinct Non-Interactive Argument of Knowledge.
S – Succinct
Succinct – the proof is “short” and the verification time is “fast”:
“Short” proof means that the size of the proof is sublinear to the size of the witness.
“Fast” verification time means that the running time of the Verifier should be (i) sublinear to the size of the witness and (ii) linear to the public statement.
But in fact we want it to be not just “succinct” but “polylogarithmic”.
Meaning that the proof size and verification time are almost independent of the size of the witness.
1. The size of the proof π should grow no faster than some constant times the square of the logarithm of the size of the witness w (for sufficiently large w):
The proof size depends on the particular ZK-SNARK protocol. For Plonky2 it is O(log^2(|w|), but this is the “worst” case. For example, for the classic PLONK and Groth16 the proof size is O(1), i.e. a constant.
2. The verification time t_v should grow no faster than some constant times the square of the logarithm of the size of the witness w and linearly depend on the size of the public statement x (for sufficiently large x and w when only one of them changes its size):
N – Non-Interactive – proof generation is performed without receiving any data from the verifier.
ARK – Argument of Knowledge – similar to a regular proof, but the soundness only holds against a polynomially bounded prover, while in a proof the soundness holds against a computationally unbounded prover. We will discuss the notion of soundness in the next section.
ZK – Zero-Knowledge – non-disclosure of the witness.
There are three main properties of ZK-SNARKs, which are considered below.
Completeness: If the the prover knows the correct execution trace of the algorithm, the verifier has to be convinced that the prover has this knowledge.
Knowledge soundness: The prover can convince the verifier of knowing the correct execution trace only if the prover really knows it, except with some very small probability.
Zero-Knowledge: The verifier does not know any knowledge of the execution trace other than it is correct.
To be applied to some algorithm, a ZK-SNARK proof system needs to know the system of equations over the finite field, which describe the relationships between the values in the execution trace table of this algorithm (the data structure for storing the execution trace) in a way sufficient to guarantee the integrity of computation. The language used to express this system of equations (also called a constraint system) is called an arithmetization.
The PLONKish ZK-SNARK proof systems use arithmetization with higher expressive power in comparison with the older proof systems. The first ones allow us to use the constraint system equations in the form of arbitrary polynomials on the constrained variables, while for the older proof systems (i.e. R1CS-based ones) these equations have the form of the linear and quadratic polynomials. This feature of the PLONKish ZK-SNARK proof systems has a beneficial impact on computation efficiency of the prover and makes it easier to apply ZK-SNARKs to various algorithms. As the result, a lot of PLONKish ZK-SNARK proof systems have emerged recently: classic PLONK, Halo2, Kimchi, Plonky2, HyperPlonk, etc.
In Taiko, we use the variant of Halo2 proof system based on the KZG polynomial commitment scheme.
The PLONKish ZK-SNARK proof system consists of three components:
Below we look at each component in detail.
A commitment scheme allows a committer to publish a value, called the commitment, which binds the committer to a message without revealing it. Later, the committer may open the commitment and reveal the committed message or some part of it to a verifier, who can check that the revealed information is consistent with the commitment.
Different authors describe different numbers of algorithms constituting commitment scheme. However, we should mention at least four of them:
Setup: takes some initial parameters as inputs and generates setup parameters. Setup specifies (i) what objects we commit to (e.g., for univariate polynomial commitment schemes we specify the field of the coefficients and the maximum degree for the polynomials we commit to) and (ii) security level in bits.
We can briefly define the security level as follows: a cryptosystem has n-bit security level if breaking it requires O(2^n) time.
Commit: produces the commitment to the message in accordance with the setup parameters.
Partial opening: produces a partial opening proof specific to a chosen part of the message (e.g. value of the committed univariate polynomial for some argument) and the setup parameters.
Verify: uses the partial opening proof and the setup parameters to check whether the information revealed by the prover is the specified part of the message, which corresponds to the specified commitment.
*For some commitment schemes the “setup” and “open” algorithm may be absent (e.g. in the case of using a hash function to commit to large numbers).
The commitment scheme should have the following properties:
Binding: once the prover has committed to the specific message it is bound to the message it has committed to;
Hiding: a commitment reveals nothing about message. Hiding might be perfect, when the partial opening proof reveals no information about the non-queried part of the message (e.g. the KZG commitments), or non-perfect, when partial opening proof reveals some portion of the aforesaid information (e.g. the FRI-based polynomial commitments).
The commitment schemes can be classified into several types in accordance with the target objects:
Single element commitments;
Set commitments;
Vector commitments;
Polynomial commitments.
The polynomial commitment schemes are the only type that is directly used in the PLONKish ZK-SNARK proof systems. The univariate polynomial commitment schemes are of great importance for the the aforementioned proof systems (e.g. classic PLONK, Halo2, Kimchi, Plonky2, etc.). However, some novel approaches to the PLONKish ZK-SNARKs, which are based on the multilinear polynomial commitments, are emerging now (e.g. HyperPlonk).
For a formal and detailed commitment scheme explanation you can read this paper by Aniket Kate, Gregory M. Zaverucha, and Ian Goldberg. Also it introduces the KZG polynomial commitment scheme used in both the classic PLONK and the variant of Halo2 used by Taiko.
For a formal and detailed polynomial commitment scheme explanation you can check out this paper by Alexander Vlasov and Konstantin Panarin.
Interactive Oracle Proof is an interactive proof in which the verifier has “oracle access” to the prover’s messages, may probabilistically query them and is not required to read the entire prover’s message.
For a formal and detailed Interactive Oracle Proof (IOP) explanation check out this paper by Eli Ben-Sasson, Alessandro Chiesa, and Nicholas Spooner.
The Fiat-Shamir heuristic provides a way to transform a (public-coin) interactive argument into a non-interactive argument. Intuitively, the idea is to replace a verifier's random challenges
with a hash of the prover's prior messages, but the exact details are usually unspecified.
*Public-coin protocol is a protocol where the verifier sends only a random elements.
In ZK-SNARK proof systems the knowledge of the witness is proven by means of using the modified Interactive Oracle Proof approach, which uses polynomial commitments to provide an oracle access to the prover’s message and is made non-interactive by means of Fiat-Shamir Heuristic. In this section we are explain the given construction in details.
Recall that applying a ZK-SNARK proof system to some algorithm requires building the corresponding constraint system. In order to be able to build it, we define the structure of the execution trace table for this algorithm and the constant values in it. We use the term “circuit” for the complex of the execution trace table filled only with constants and the corresponding constraint system. To generate a ZK-SNARK proof for an instance of execution of some algorithm, we need to synthesize the circuit instance for it, which is the algorithm’s circuit with the corresponding instance of the execution trace table (i.e. a table, in which the constant, witness and public statement values are specified).
Let us consider the structure of the trace tables used by the PLONKish ZK-SNARK proof systems. All columns in such a table belong to one of three types, which we name in accordance with the terminology used in Halo2 and describe as follows:
Fixed columns – their cells contain the constants of the execution trace table;
Instance columns – they are used to store the public statement values;
Advice columns – all witness data are stored in them (including the independent witness values and the secret results of computation).
Summing up, we note the following:
Execution trace table (a PLONKish one) = fixed columns + instance columns + advice columns;
Circuit = execution trace table filled only with constants + constraint system;
Circuit instance = circuit + witness in its table + public statement in its table;
Synthesis: (circuit, public statement, independent witness values) → circuit instance;
Applying a ZK-SNARK proof system to some algorithm = describing the circuit + defining the synthesis procedure.
Why is the term “circuit” widely used in this context? Originally, when only R1CS-based ZK-SNARK proof systems were available, it was convenient to represent the constraint system in the form of the arithmetic circuit over the finite field, the output of which must be zero. We argue that the term “circuit” in the context of the PLONKish SNARKs is used rather for historical reasons. Anyway, let us briefly consider the arithmetic circuits used for older ZK-SNARKs.
An arithmetic circuit used for R1CS-based SNARKs defines an n-variable polynomial over a finite field with an evaluation recipe. Initially it is represented as a Directed Acyclic Graph (DAG):
It includes
Public inputs x, which are used to specify the public statement values;
Secret inputs w, which are used to specify the independent witness values;
Arithmetic gates, which designate addition and multiplication over the finite field.
Let’s look at the example of an arithmetic circuit over the rational number field.
We start at the bottom of the graph with the lowest gates, that are (2 x 4 = 8) and (4 + 11 = 15) and save these values as intermediary values;
And then we go one row higher (to the second row) and calculate the sum of two intermediate values (we got at the previous stage), that is (8 + 15 = 23);
And as we have only three gates, and we passed through all of them, 23 is our final output.
After the PLONKish ZK-SNARK proof system synthetizes a circuit instance, each column of the instance of execution trace table in encoded as the polynomial over the finite field in the following way. Let C_j(x) - the polynomial describing the j-th column and ω is a primitive 2^k-th root of unity, where k is chosen to make 2^k slightly greater that the initial height of the instance of this table. The table instance is padded with random rows (with non-zero cells only in the advice columns) to consist of 2^k rows, and C_j(ω^i) is set equal to the value in the i-th row of the j-th column of the given table instance. The role of padding for the zero-knowledge property is explained later.
The statement “ω is a primitive 2^k-th root of unity” means that ω^(2^k) = 1 and for any positive integer t, which is less than 2^k, we have ω^t ≠ 1.
The constraint system is transformed into the polynomial equation form by means of replacing the variables in it with the corresponding polynomials obtained from the column polynomials by substituting “x times ω to a certain power” for x.
For example, let us consider the constraint system equation s(i) * (a(i) * b(i) - c(i+1)) = 0. It means that a(i) * b(i) must be equal to c(i+1) if s(i) is non-zero. Here a, b and c are advice columns, s is a fixed column and i is the current row number.
This constraint can be applied to all 2^k rows in the following way. Let S(x), A(x), B(x) and C(x) be the column polynomials for the columns a, b, c and s, respectively. Thus, we want
which means that all elements of this set must be the roots of S(x) * (A(x) * B(x) - C(x * ω)). Therefore, this polynomial must be divisible by
since ω is a primitive 2^k-th root of unity.
Z(x) = x^(2^k) - 1 is called the “vanishing polynomial”, because it is 0 for all x being elements of a cyclic multiplicative group generated by ω. Thus, S(x) * (A(x) * B(x) - C(x * ω)) mod Z(x) = 0 means that the aforesaid constraint is true for satisfied for all 2^k rows.
Assume that P_0(x), P_1(x), …, P_m(x) are constraints, which are applied to all 2^k rows and have the form similar to the polynomial S(x) * (A(x) * B(x) - C(x * ω)) considered above. If α is randomly chosen by the verifier from the finite field, then
means that with overwhelming probability all these constraint are satisfied for all 2^k rows. This equality implies that we can find a quotient polynomial T(x) such that
Therefore, to convince the verifier that the prover knows with overwhelming probability such content of the execution trace table that satisfies all m constraints, it is sufficient to show that for randomly chosen α the prover has the advice column polynomials (participating in P_0(x), P_1(x), … , P_m(x)) and the quotient polynomial T(x), which satisfy this polynomial equation. The verifier can convince itself that the prover is very likely to know these polynomials by asking the prover for the values of the given polynomials at a random point ζ, which is chosen by the verifier among non-roots of Z(x), and evaluating both sides of this equation at the point ζ. This approach can be justified by means of Schwartz-Zippel Lemma.
In order to make proof generation non-interactive, all random values, which would be generated by the verifier in the interactive version, should be obtained using Fiat-Shamir heuristic.
To bind the prover to it’s advice polynomials and quotient polynomia T(x), a polynomial commitment scheme is used. The prover commits to these polynomials, opens these commitments at ζ (and ζ * ω^q, if some constraint affects rows i and i + q) and creates the proof, which consist of (I) these commitments, (II) values of the polynomials at ζ (and ζ * ω^q, if it is required) and (III) the corresponding opening proofs. Actually, to make the proof shorter the multi-opening is used, i.e. we generate one small proof for values of all points of the polynomials. Thus, the proof is succinct.
To satisfy zero-knowledge property, the rows randomly chosen by the prover are appended to the instance of the execution trace table to make it 2^k rows high. These rows have non-zero cells only in the advice columns, since only the advice columns contain data we hide. This is done before constructing the advice column polynomials to make the number of “argument-value” pairs, which are revealed during the commitment openings, less then the number of the randomly added pairs for each of these polynomials. Thus, for each advice column polynomial the amount of information required to restore it is greater than the amount of witness information even after the commitments are opened. This circumstance results in zero-knowledge.
Some computations, which are the same for all instances of the circuit, are done at the pre-processing stage. They include performing setup for the polynomial commitment schemes and computing the fixed column polynomials as well as the commitments to them. The part of the data produced by these computations, which is used by the verifier is often called the verification key. In the similar way a notion of the proving key may be defined. The underlying polynomial commitment scheme determines whether the trusted setup is performed during the pre-processing stage.
Some more advanced aspects of the PLONKish SNARKs are not considered here, namely:
Copy constraints: i.e. making the i-th cell of the column a equal to the j-th cell of the column b;
Lookup gates: assuring that the values of certain cells of a row belong to some set.
Conceiving these topics requires understanding the material described above. More information can be found here and here.
ZK-EVM on Rollup-glossary.vercel.app
Aniket Kate, Gregory M. Zaverucha, and Ian Goldberg, Constant-Size Commitments to Polynomials and Their Applications
Dan Boneh, Building a SNARK
Alexander Vlasov, Konstantin Panarin, TRANSPARENT POLYNOMIAL COMMITMENT SCHEME WITH POLYLOGARITHMIC COMMUNICATION COMPLEXITY
Eli Ben-Sasson, Alessandro Chiesa, and Nicholas Spooner, Interactive Oracle Proofs
TokenInsight Team, How to transform interactive proofs into non-interactive proofs? Fiat-Shamir Heuristic!
Manish Kumar Giri, Deep dive into Zero-knowledge-proof & ZK-SNARK
Fiat-Shamir Problems on Merlin.cool
Zcash Team, Lookup Argument
Metastate Team, On PLONK and plookup
Zcash Team, Proof Systems
Paul Garrett, Roots of Unity
Explore open positions on our job board.
To stay updated on the latest from Taiko:
Website: https://taiko.xyz
Discord: https://discord.gg/taikoxyz
GitHub: https://github.com/taikoxyz
Twitter: https://twitter.com/taikoxyz
Contribute to Taiko and earn a GitPOAP! You will also be featured as a contributor on our README. Get started with the contributing guide.