On robust strong ‐ non ‐ interferent low ‐ latency multiplications

The overarching goal of this work is to present new theoretical and practical tools to implement robust − t − probing security. In this work, a low ‐ latency multiplication gadget that is secure against probing attacks that exploit logic glitches in the circuit is presented. The gadget is the first of its kind to present a 1 ‐ cycle input ‐ to ‐ output latency while belonging to the class of probing security by optimized composition gadgets [6]. In particular, the authors show that it is possible to construct robust ‐ t ‐ strong ‐ non ‐ interferent gadgets without compromising on latency with a moderate increase in area. The authors provide a theoretical proof for the robustness of the gadget and show that, for t ≤ 4, the amount of randomness required can even be reduced without compromising on robustness.


| INTRODUCTION
In this work, we address the problem of protecting hardware implementations against side channel attacks.State-of-the-art countermeasures are typically based on masking [1] but creating a masked implementation is not trivial at all, especially when we consider adversarial scenarios such as probing attacks [2] or, more recently, glitch-extended probing attacks [3,4].The non-linear modules of cryptographic circuits are the most intricate to protect against such attacks.For this reason, the most studied case is the secure implementation of the logic AND, which over the years has never ceased to stimulate research since its inception [2].
In the following, we say that a gadget is t−probing secure when, given t probes, it is impossible to derive information about the secret values encoded in the masks/ shares.One of the main problems addressed in t−probing security is composability, that is, determining, given two t−probing secure gadgets, if their functional composition is still t−probing secure.It is common understanding that this depends on the amount of refreshing, a procedure that aims to break higher order dependencies and bring back the secret's shares into a uniformly random state, after a series of operations that might have invalidated uniformity [5].
One school of thought, identified as probing security by optimized composition [6], exploits inner gadget properties to determine whether their composition is t−probing secure.One of them is strong-non-interference (t−SNI, [7]) which requires that the number of input shares derivable from a certain set of probes depends only on the number of internal positions present in that set (whenever that set's size is less or equal to t).Demonstrating that a gadget is in the first place t−SNI might require lengthy proofs or automatic tools [8,9], but once it has been done, composition can be studied with simpler, although not trivial rules.This kind of scenario is called optimized because, in principle, it could lead to gadgets with an overall minimized refreshing effort.This is, however, easier said than done as, even recently, some gadgets that were thought to be t−probing secure have been shown to be vulnerable to higher order attacks [10].Trivial composability tries, instead, to identify inner gadget properties that could make reasoning about composition even more trivial, in the sense that it suffices for certain gadgets to ensure at least probe-isolating-noninterference [6] to be able to compose them.
An additional problem is protecting the gadget from circuit glitches.One way to address this is threshold implementations (TI) [11] that ensure that all logic cones of a primitive depend only on a proper subset of the shares.Besides the overall correctness constraints, this means ensuring that, (i) if a gadget's input is fed with shares (computed from the secret) whose distribution is uniform, its outputs must be uniform as well and (ii) each output share can be computed using only a proper subset of the input shares.
The current research trend tries to address t−probing security and glitches as a single challenge instead of different, seemingly orthogonal problems.The robust probing model is probably the most important conceptual evolution with respect to the original t−probing security model [3,4].In this attack model, glitches are seen as extended probes that constitute additional observation points of the input values of a given cone of logic.With this model, one can prove that some gadgets are not only t−SNI in the conventional sense but can be made robust−t−SNI by adding a register layer at the outputs (see e.g., [3]) trading off latency with security.On the side of trivial composability, stricter conditions on a gadget, like the t-PINI condition [6,12], have been identified to ensure robustness in the presence of glitches.
It is currently understood that the minimal input-sharesto-output latency of circuits with optimized composability (e.g., the glitch-robust t−SNI multiplication presented in [3]) requires two cycles.In this work, we reduce it to one cycle at the cost of some extra randomness by adopting a different temporal scheme for producing refresh random bits.In fact, we will provide a class of 1-cycle-latency multiplication gadgets that are robust−t−SNI.In particular, in Proposition 1, we prove our construction for CMS gadgets to be robust−t−SNI, showing, therefore, for arbitrary t, the existence of such gadgets with one cycle latency using (in total) 2 ⋅ s 2 random bits, where s ¼ t þ 1 is the number of shares.Besides, we show how to lower this bound to sðs − 1Þ and s 2 for the practically most relevant cases s ¼ 2; 3 and s ¼ 4; 5, respectively, by simply removing some randomness and proving it robust−t−SNI with MASKVERIF [13]. 1  The paper is organized as follows: Section 2 summarizes the current state of the art for robust probing security pointing out a few problems with the current approaches.Section 3 presents the main construction proposed in this work, highlighting a few optimized schemes.Section 4 suggests potential applications of the new gadget.Section 5 presents some final comments and indicates some future work.

| STATE OF THE ART
Recall that a function f is t-non-interferent (t−NI) if, when given a total of o outputs and i internal probes, o þ i ≤ t implies a dependency on at most i þ o input shares.The function f is strongly t-non-interferent (t−SNI) if it even implies a dependency on at most i input shares [7].When considering glitches, probes are extended to model information that might be captured with glitches.In particular, they allow the attacker to observe all the inputs of a gadget that connect to a probed output wire, because this is what has been observed in realworld scenarios [14].When considering such kind of probes, we talk about robust − t−probing security instead of conventional t−probing security.
In this work, we address the problem of robust−t−probing security in the context of optimized composability.Chronologically, the original efforts considered a hybrid of the Ishai-Sahai-Wagner scheme [2] with TI, culminating in the Consolidated Masking Scheme (CMS) [15].While the results were important in terms of decrease of randomness needed (in CMS with t þ 1 shares, one needs ðt þ 1Þ 2 refresh values), it was shown recently that this cannot be extended past t > 2 (without even considering robust t−probing security [10]).Later proposals for a t−probing secure multiplication addressed a reduction in terms of refresh values [16,17] (with a lower bound identified in [18]) but, after the considerations made in [10], it is not clear how much past t > 1 these can be made robust-t-probing secure.Besides, all the proposed gadgets suffer from an increased latency (two cycles) because they need an additional register after the compression stage to be guaranteed robust−t−SNI.
Recent efforts put into improving CMS masking without increasing the latency have been proposed [19].Figure 1 shows a solution for the case for t ¼ 3; s ¼ 4 as proposed by the authors in Ref. [19].Note that the authors elaborate this scheme starting from the first CMS proposed in Ref. [15], changing the order of products a i b j and introducing additional random bits q i to protect the shares; however, as we now show, this gadget is not robust-3-strong-non-interferent (SNI).In fact, consider the three probes marked in green P 1 ; P 2 and P 3 : probes P 2 and P 3 are the only internal probes so all three probes should convey information about up to two shares.P 1 allows us to get ða , whereas the two internal probes P 2 and P 3 allow us to get ða 2 b 3 ; r 0 ; r 15 Þ and ða 1 b 0 ; r 1 ; r 2 ; q 0 Þ, respectively.In principle, the information on the secrets derived from P 1 (e.g., a 1 b 2 ) is covered by at least two random bits (e.g., a 1 b 2 is covered with r 0 and r 1 ); however, it is possible to unmask a 1 b 2 from P 1 adding r 0 and r 1 recovered from P 2 and P 3 , respectively.Then, three shares of b are exposed (b 2 from P 1 , b 3 from P 2 and b 0 from P 3 ) with only two internal probes.

| A PROVABLY ROBUST-t-SNI, 1- CYCLE-LATENCY CMS-LIKE SCHEME
The problem with the scheme in Figure 1 is that internal extended probes give access to each random bit used in the refresh layer (yellow section).To overcome this leak, one can sum and save into a register these pairs of random bits so as to avoid that a single probe (such as e.g., P 3 ) has access to both intermediate products and individual refresh random bits.Note that, from the point of view of the input-to-output latency, the gadget is still one cycle as this sum could be pre-computed before receiving the shares a 1 See https://github.com/vzaccaria/maskverif.docker for the MASKVERIF scripts for reproducing the results.

-
MOLTENI ET AL. and b.For the above gadget, we would have the following expressions: where square brackets indicate registered values (see Table 1), with additional red colour when they refer to the registered sum of refresh random bits; one can verify with MASKVERIF [13] that the above gadget is in fact robust-3-SNI.Note that Equation (1) describes the scheme in Figure 1 with some added registers (red brackets).This strategy is not entirely new as it has been used, to the best of our knowledge, only recently [6] in the field of trivial composability.However, we will show that also optimized composability might benefit from such strategy, as it is possible to generalize this idea to derive a sufficient condition for a gadget being 1-cycle robust−t−SNI, whose general cone structure is shown in Figure 2.
Note that the shares a i and b j are organized as in the original CMS scheme [15], and the random bits are summed up and registered before using them in the refresh layer.
Proof.For the meaning of mathematical symbols, see Table 1.Setting o i;j ≔ a i ⋅ b j þ s i;j with s i;j ≔ r i;j þ r i;jþ1 þ q i;j þ q iþ1;j for 0 ≤ i; j ≤ t, the extended output probes are γ i ≔ fo i;j |0 ≤ j ≤ tg for 0 ≤ i ≤ t, and the maximal extended inner probes are α i;j ≔ fr i;j ; r i;jþ1 ; q i;j ; q iþ1;j g and β i;j ≔ fa i ⋅ b j ; s i;j g for 0 ≤ i; j ≤ t.
An attacker gets to pick at most t extended probes, let us say a set Γ of output probes of type γ j , a set A of inner probes F I G U R E 1 1-cycle latency Consolidated Masking Scheme derived gadget proposed in [19].Green discs represent the three extended probes that make it not robust-3-strong-non-interferent.The black thick line indicates the register layer.The expressions to compute the outputs are those in Equation ( 1) except that the values in red brackets are not sampled in an additional register, that is, only those values in the black brackets are sampled x ¼ y mod V x equals y modulo the subspace V , i.e.: of type α i;j and a set B of inner probes of type Setting I ≔ fi | α i;j ∈ A or β i;j ∈ Bg and J ≔ fj | α i;j ∈ A or β i;j ∈ Bg, we claim that the attacker can simulate all those probes knowing just the inputs a i for i ∈ I and b j for j ∈ J, where clearly jIj ≤ jAj þ jBj and jJj ≤ jAj þ jBj (jAj þ jBj is the number of the chosen inner probes).All the information derivable from the extended probes Γ, A and B can be expressed using elements of 〈Γ; A; B〉, which can be seen as sums of standard probes derived from the extended ones.As the image of the uniform distribution under a linear map is the uniform distribution on its image, an element of 〈Γ; A; B〉 has a uniform distribution and is independent of all inputs a i and b j unless it is already contained in 〈a i ⋅ b j | 0 ≤ i; j ≤ t〉.Hence, the above claim can be expressed as follows: All standard probes are linear combinations of the linearly independent values a i ⋅ b j , r i;j and q i;j for 0 ≤ i; j ≤ t, that is, elements of the vector space 〈a i ⋅ b j ; r i;j ; q i;j j 0 ≤ i; j ≤ t〉.Applying to the probes the modulo operation w.r.t. the vector subspace 〈a i ⋅ b j ; r i;j | 0 ≤ i; j ≤ t〉, the probes have values q i;j , respectively.q i;j þ q iþ1;j ; for each j, the values q i;j þ q iþ1;j span a t-dimensional subspace of the ðt þ 1Þ-dimensional space generated by the q i;j with 0 ≤ i ≤ t, so P 0≤i≤t ðq i;j þ q iþ1;j Þ ¼ 0 is the only non-trivial linear dependency of the values q i;j þ q iþ1;j for fixed j.Then, for any j, with Analogously, applying to the probes the modulo operation w.r.t. the vector space Q ≔ 〈a i ⋅ b j ; q i;j | 0 ≤ i; j ≤ t〉, for fixed j, the only non-trivial linear dependency of the values r i;j þ r i;jþ1 is P 0≤j≤t ðr i;j þ r i;jþ1 Þ ¼ 0.Then, for any i, If σ involves a summand containing the term a i ⋅ b j , this term stems either from the inner probe β i;j ∈ B-implying i ∈ I and j ∈ J (confirming our claim)-or from the summand o i;j ∈ γ i ∈ Γ.As o i;j ¼ s i;j mod 〈a i ⋅ b j | 0 ≤ i; j ≤ t〉, assuming β i;j ∉ B implies, using Equation (3), that σ involves either (a) t þ 1 terms s i';j (with 0 ≤ i 0 ≤ t) obtainable from t þ 1 standard probes or (b) a summand q i 0 ;j for some 0 ≤ i 0 ≤ t.The latter case (b) implies that α i 0 ;j or α i 0 −1;j is probed, and hence j ∈ J (confirming our claim).The former case (a) requires at least t more probes (as no extended probe involves terms s i;j for more than one i) contradicting the original assumption that jΓj þ jAj þ jBj ≤ t.
Analogously, given the implication of Equation ( 4), σ involves either (a) t þ 1 terms si';j (with 0 ≤ j 0 ≤ t) obtainable from t þ 1 standard probes or (b) a summand r i;j 0 for some 0 ≤ j 0 ≤ t.The latter case (b) implies that α i;j 0 or α i;j 0 −1 is probed, and hence i ∈ I (confirming our claim).For the former case (a), by just probing γ i , an attacker can get all the terms s i;j 0 .However, we previously showed that for each term s i;j 0 contained in a summand of σ is necessarily j 0 ∈ J, implying J ¼ f0; …; tg.This contradicts that the attacker can choose at most t probes because for each inner probe at most one element is added to J. □ The placement of the products a i ⋅ b j in the output cones c i as well as the presence of randomness in Equation ( 2) is essential to guarantee that the proposed construction is robust − t − SNI.Indeed, a different placement can break (robust) strong non-interference for s big enough.In fact, assume that an attacker chooses n extended output probes γ 1 ; …; γ n placed on adjacent cones, and 4ðn − 1Þ inner probes α 1;i ; α i;1 ; α n;i ; α i;n for 1 ≤ i ≤ n.The probes γ 1 ; …; γ n give access to all values o i;j for 1 ≤ i; j ≤ n, whose sum is The inner probes allow us to derive r i;1 ∈ α i;1 , r i;nþ1 ∈ α i;n , q 1;i ∈ α 1;i and q nþ1;i ∈ α n;i , effectively exposing the first summand P 1≤i;j≤n a i ⋅ b j of the equation above; thus, 4ðn − 1Þ þ n probes allow us to derive n 2 different products a i ⋅ b j .The arrangement of the a i ⋅ b j in Equation ( 2) is such that even knowing these n 2 products do not break strong noninterference as the attacker only obtains n different shares a i and b j (1 ≤ i; j ≤ n).But already for s ¼ 12 and n ¼ 3, a different placement of the products a i ⋅ b j can expose more than 4ðn − 1Þ shares of either secret, making it not robust strong-interferent.

| Saving randomness for t ≤ 4
For t ≤ 4, the scheme presented in Proposition 1 can be simplified by removing the random bits r i;j without compromising security.This decreases the number of involved random bits from 2 ⋅ s 2 to s 2 (see Figure 3 for this construction).In particular, as one can verify with MASK- VERIF, robust−t−probing security can be ensured with just the q i;j : for 0 ≤ i; j ≤ t; t ≤ 4.However, for t ≥ 5, this particular scheme breaks because, with a specific choice of three external probes on adjacent c i and two internal probes, an attacker is able to recover three shares of a.For example, if the attacker places five probes (see Figure 3's green dots) on γ i ; γ iþ1 ; γ iþ2 ; α i;0 and α iþ2;0 then they are able to derive three shares of a, with only two internal probes.This attack is possible for any t ≥ 5.For t ≤ 2, one can additionally remove the random bits q i;i , deriving for t ¼ 1 the following scheme with only two random bits instead of s 2 ¼ 4: Similarly, for t ¼ 2, one obtains the following construction with only six random bits instead of s 2 ¼ 9: Both schemes are robust−t−SNI (for t ¼ 1 and t ¼ 2 respectively), as one can verify with MASKVERIF.

| APPLICATIONS
Our proposed structure allows us to obtain an input-shareto-output-share latency of one cycle while still being robust−t −SNI, at the expense of increased randomness.A t−SNI gadget could be made robust−t−SNI with reasonable latency by replacing all t−SNI ANDs with our proposed gadget, all t−NI ANDs with DOM ANDs, and all t−SNI refresh gadgets with the robust−t−SNI refresh gadgets from Ref. [6].Indeed, compared to the DOM [20] and the HPC2 [6] gadgets, which both need sðs − 1Þ=2 random bits, our gadgets require 2� randomness for s ¼ 2; 3, about 2.5� for s ¼ 4; 5 and more than 4� for s > 5.However, our solution requires only 1-cycle latency instead of at least two cycles of latency that characterizes the current DOM and HPC2; it is thus clearly a matter of trade off between latency and randomness.Another application could be to lower the latency of an HPC2-based construction by 'kickstarting' the S-boxes: after 1, 2, 3 rsp.4 cycles, one can obtain with HPC2 gadget values of algebraic degrees 1, 2, 3 rsp.5 in the input bits due to their asymmetric latency of 1 rsp. 2 in their inputs.Replacing just all HPC2 gadgets in the first layer with our gadget can save one cycle latency, as the achievable algebraic degrees are then 2, 3, 5 rsp.8.This can be done, for example, for the optimized PRESENT S-box of fig.
6b of [6] to regain the better latency of the DOM-based construction.If additionally all S-box inputs that are added to the PRESENT S-Box outputs are refreshed before with a robust mask refresh, the resulting circuit becomes robust probing secure for, we believe, a moderate increase in area.

| CONCLUSIONS
In this work, we have derived a new robust−t−SNI construction for multiplying two secrets in a robust strongly non-interferent way.The novel construction has 1-cycle input-to-output latency and, for low security degrees t, the randomness complexity is comparable with conventional, 2cycle-latency approaches.As a future work, we plan to study the use of the proposed gadget in the S-boxes of known cryptographic algorithms as well as the randomness requirements for higher t.In particular, preliminary work shows that a scheme that involves 42 randoms for t ¼ 5 is possible, but we believe this not to be the lowest bound achievable.

F I G U R E 3
The optimized construction that is valid for any t < 5 but fails for t ≥ 5. Green discs represent the probes used to mount the attack MOLTENI ET AL.

1
Meaning of some mathematical symbols employed in the text