DOI QR코드

DOI QR Code

Efficient Multi-Exponentiation and Its Application

효율적인 다중 멱승 알고리즘과 그 응용

  • 임채훈 (세종대학교 인터넷학과)
  • Published : 2002.08.01

Abstract

This paper deals with efficient algorithms for computing a product of n distinct powers in a group(called multi-exponentiation). Four different algorithms are presented and analyzed, each of which has its own range of n for best performance. Using the best performing algorithm for n ranging from 2 to several thousands, one can achieve 2 to 4 times speed-up compared to the baseline binary algorithm and 2 to 10 times speed-up compared to individual exponentiation.

다수의 멱승의 곱을 계산하는 효율적인 다중 멱승 알고리즘은 다양한 암호 프로토콜의 성능 향상을 위해 유용하게 사용될 수 있다. 본 논문에서는 4가지의 서로 다른 다중 멱승 알고리즘을 기술하고 그 효율성을 비교 분석한다. 각 알고리즘은 곱해지는 멱승의 수에 따라 가장 효율적인 영역들이 있으며, 각 영역에서 최선의 알고리즘을 이용하는 경우 기본적인 이진 알고리즘을 사용하는 경우에 비해 2∼4배 정도의 성능향상을 얻을 수 있고, 각각의 멱승을 독립적으로 계산하는 경우에 비해서는 2∼10배 정도의 성능향상을 얻을 수 있다.

Keywords

Ⅰ. Introduction

Evaluating exponentiation in a group is the most time-consuming operation in most public key cryptosystems, e.g., modular exponentiation in the RSA and Diffie-Hellman/ ElGamal systems and scalar multiplication in elliptic curve cryptosystems. Fast exponentiation algorithms thus have been studied extensively in past years and a number of such algorithms have been proposed. For example, the sliding window algorithm is the most popular algorithm for general exponentiation in any group"(2) and there exist much faster algorithms for exponentiation to a fixed base [3,4].

On the other hand, it is often required in many cryptographic protocols to evaluate a product of multiple powers in a group (called multi-exponentiation). Verification of discrete log-based signatures is the most common example : Individual verification requires two-term exponentiation and batch verification [5,6] requires w-term exponentiation for quite large n. There exist several efficient algorithms for two-term exponentiation[7,8], but not much research has been done for efficient algorithms for general multi-exponenti-ation with large n.

In this paper, we present and analyze several efficient algorithms for w-term exponentiation for arbitrary large n. The presented algorithms include the sliding window algorithms with/without signed encoding and algorithms using the precomputation techniques of (3) and (4).Each of the algorithms is shown to have its own range of n with relative strength. We also show that with signed encoding and Montgomery's simultaneous inversion technique [9, Algorithm 10.3.4], these algorithms can be significantly improved in elliptic curve groups. Finally, we demonstrate the power of multi-exponentiation with application to batch verification of (modified) DSA signatures.

Ⅱ. Batch Verification of Exponentiation

Let G be a (cyclic) group of order q= |G| and gi (0≤i<n) be elements of order q in G. Given a collection of triplets (xi,gi,yi) (0≤i<n) such that xi∈Zq and gi,yi∈G, we want to verify that each triplet (xi,gi,yi) satisfies the exponentiation function yi=gixin G.

A naive approach to doing such verification is to evaluate each exponentiation gixi and compare the result with yi,  which requires n exponentiations. However, there is a much better approach- the probabilistic batch verification [5] (called the small exponent test in [6]). In this probabilistic batch verification, we randomly pick integers ci (0≤i<n) over the interval (0.2t) and test the following equation for equality:

#(1)

where zi=cixi mod q. The probability of error in this probabilistic batch verification isshown to be at most 2-t[6], so we can obtain a desired level of confidence by choosing a appropriate value of t. In most applications, t=30~60 would suffice.

If the exponentiation function has the same base g(i.e., gi=g for all i' s), the right-hand side of Equation (1) is simplified to a single power gz, where z = # mod q. In this case, the Bucket Test in [6] can further speed up the above small exponent test, but it cannot be used for batch verification of general exponentiation with distinct bases.

It can thus be seen that in either case efficient multi-exponentiation is a key to the performance of batch verification of modular exponentiation. The goal of this paper is to develop such algorithms. From now on, we will focus on efficient evaluation of the following form of general multiexponentiation:

#(2)

Ⅲ. Algorithms for Muti-Exponentiation

In this section, we will present six algorithms for multi-exponentiation. We first describe the basic binary algorithms that can be used as a baseline for performance comparison. Then four algorithms are presented to speed up the naive binary algorithms: sliding window algorithms using unsigned/signed encoding, and algorthhms using the precomputation techniques of Brickell et al. [3] and Lim-Lee [4]

Throughout this paper, we will use the letters M, S and I to denote the computing complexity of multiplication, squaring and inversion, respectively, and the letters y and μ to denote the complexity ratio of S over M and I over M, respectively, i.e., y= S/M, μ=I/M.

3.1 Binary Algorithms

Let g=早户2, be the binary representation of clfwhere q,产{0,1}, The binary algorithm to compute Y in Equation (2) carries out the following iteration t times, starting with Y= 1 and running j= t~l down to 0: multiply Y by all y/s such that = 1 and then square the result Y. Obviously, this algorithm requires 0.5wZ~l multiplications and t— 1 squarings on average. With optimal signed encoding for each exponent(e.g., see〔10〕),the expected number of multiplications can be reduced to 堕。끄> — i at the cost of t more squarings. This performance can be obtained by computing Y as

#

where cf and c? respectively denote the positive and negative parts of c;- after optimal signed encoding(so,(,= c?- c?). Note that optimal signed encoding can reduce the probability of a bit being nonzero to 1/3 on average with at most 1- bit expansion.

The following theorem summarizes the performances of the unsigned binary algorithm(denoted by Algorithm BU) and the signed binary algorithm(denoted by Algorithm BS) for computing n-term exponentiation.

Theorem 1 (Algorithms BU/BS)

a) The expected number of multiplications required by Algorithm BU is given by

#

b) The expected number of multiplications required by Algorithm BS is given by

#

3.2 Sliding Window Algorithm

The window-family algorithm will be a better way to evaluate w-term exponentiation with small n (处=2 to 4). Among many variants, we focus on the window algorithm using independently sliding windows⑻,since it is most efficient for application to multiexponentiation. This window algorithm uses a distinct sliding window for each exponent instead of an ordinary simultaneous window.

Let w be the window size in bits and 庇 be the number of windowed values for ci Then we can express each exponent ci as

#

where #s are odd and # be the precomputed values for 乂 given by

#

Then, Equation (2) can be rewriten as

#(3)

Now the right-hand side of Equation (3) can be computed using the ordinary squareand-multiply algorithm as shown in Algorithm WU.

Algorithm WU: Sliding window Alg.

#

The precomputation stage (steps 1 to 3) requires (2W~1 — 1) multiplications and 1 squaring for each exponent, and the main computation part (steps 4 to 9) requires (—1) squarings and about 盘] multiplications for each exponent. Theorem 2 below summarizes the performance of Algorithm WU:

Theorem 2 (Algorithm WU)

a) The expected number of multiplications required by Algorithm WU with window size w is given by

#

The algorithm also requires a temporary storage for 2 w~xn precomputed values.

b) The optimal window size wopt only depends on t. The range of t for which wopt is optimal can be determined from the inequality #:

#

This inequality gives the following table for optimal window sizes depending on the range of t, where each pair (j, 拥여》means that wopt is optimal up to Zmax (from the previous value):

We next consider how to use signed encoding in the above sliding window algorithm. Since signed encoding requires expensive multiplicative inversion, we want to minimize the number of inverses required. For this, we split each sign-encoded exponent into positive and negative parts as in Algorithm BS. Suppose that each exponent ci is signencoded with window size w. We can then express each ci as

#

where #s are odd and #, and # and # respectively denote the set of indices for positive and negative windowed values. Equation (2) can then be expressed as

#(4)

This equation enables us to use signed encoding in the sliding window algorithm at the cost of one inversion and at most (t~w) squarings. Let WS be the algorithm to compute the right-hand side of equation (4). Note that the number of nonzero windows in each sign-encoded exponent is about # on average. Therefore, Algorithm WS will perform better than Algorithm WU only if the cost reduction (i.e., the the number of multiplications reduced) due to the signed encoding is greater than the increased cost of one inversion and (t— w) squarings. This would be the case for large n. The following theorem summarizes the performance of Algorithm WS:

Theorem 3 (Algorithm WS)

a) The expected number of multiplications required by Algorithm WS with window size w is given by

#

The algorithm also requires a temporary storage for #precomputed values.

b) The optimal window size wopt mainly depends on t. Given n, the range of t for which w opt is optimal can be determined by solving #:

#

where the latter approximation holds for nXwoPt)2 (not that we always have /<1). This inequality gives the following table for optimal window sizes for an interesting range of t:

3.3 Algorithm Using Lim-Lee's Precomputation Technique

It is quite natural to come up with Lim-Lee's precomputation technique〔4〕for efficient w-term exponentiation(note that this technique is a natural extension of, so including, the simultaneous window algorithm, often called Shamir's trick for n = 2). Basic idea of Lim-Lee's technique with parameters {h, w) is to partition the set of n powers into h blocks of size w, make a distinct precomputation table for simultaneous window of size 1 for each bk泪k, and then apply the binary algorithm to the h blocks of powers.

More formally, let q= S "2, be the binary i=o , representation of % and express Equation (2) as

#(5)

where h=# Next, precompute and store the products of all possible combinations of y/s in each Ih block of size w as

#

where # and # Then the right-hand side of equation (5) can be computed as

#

where 。(为)=cq+i)刃-c切切The detailed algorithm is depicted below as Algorithm LL.

Algorithm LL: Lim~Leezs algorithm

#

It is easy to see that the precomputation stage (steps 1 to 3) requires (,2w~w~l)h multiplications and the main computation (steps 4 to 12) can be done in (/— 1) squarings and at most (th—1) multiplications. For the average performance, we need to consider the expected number of all-zero e's (see 〔4〕for details). The following theorem summarizes the performance of Algorithm LL

Theorem 4 (Algorithm LL)

a) The expected number of multiplications required by Algorithm LL with blocking factor w is given by

#

where h= \ — ] and r= n mod w if w n> w, and w= n (so 物=1,户=0) if n<.w. The algorithm also requires a temporary storage for 2W \ ~ } precomputed values.

b) The optimal value of blocking factor w, wopt, mainly depends on t. Given n, the range of t for which wopt is optimal can be determined from the inequality CLL(n, t, w捻mCClAk t,3여§누 1)- In particular, when m is a multiple of both wopt and 四砌+1, one can obtain the following simple formula:

#

This inequality gives the following table for optimal window sizes for an interesting range of t

Note that the average performance of Algorithm LL is slightly worse when n is not a multiple of w. So, it is always preferable to choose the batch size n as a multiple of wopt whenever possible.

3.4 Algorithm Using Bricked et al/s Precomputation Technique

Another way to speed up zz-term exponentiation (in particular, for large n) is to use the basic scheme of Brickell et al/s precomputation method (3).Suppose that for a fixed window size w each exponent % is represented in base 2W as %= 科勺”/七 where h= f ~七 1 and 0ecI;;<2w. Then we can express Equation (2) as

#(6)

where 匕•= H y:". Now we can compute each 2 = 0 Yj using the basic scheme in〔3〕and the right-hand side of Equation (6) using the repeated square-and-multiply algorithm. See Algorithm BG for details.

Algorithm BG: Brickell et al/s algorithm

#

Note that each 5 is grouped by w bits from the LSB for simplicity, it would be better to do the grouping starting with the MSB, since then the number of squarings can be a little bit reduced when 母=0 mod w. The following theorem summarizes the performance of Algorithm BG

Theorem 5 (Algorithm BG)

a) The expected number of multiplications required by Algorithm BG with base 2W is given by

#

where h= [ — \ , r= t mod w, and w 5(r) = 1 if r#0 and 3(r) = 0 otherwise.

b) The optimal window size wopt mainly depends on n. Given t, the range of n for which w opt is optimal can be determined by CBG(w; t, wopeCBG{n', t, woZ(z+l).In particular, when t is a multiple of both w opt and wopt+l, one can obtain the following simple inequality:

#

where the latter approximation holds for #.

The average performance of Algorithm BG is slightly worse when t is not a multiple of w. Note however that the bit-length t of exponent is a security parameter of a system and thus cannot be chosen arbitrarily. Thus the optimal size of w should be determined for a given t. For example, one can obtain the following table for t= 160.

We can obtain a more general version of Algorithm BG using some on-line precompu- tation. For example, suppose that the nvlaues, #for #, are on-line precom- puted. Then, Equation (6) can be rewritten as

#(7)

where 匕•= II vV'! and n Note ,=o ,=o that using Equation (7) one can reduce the number of multiplications required in step 6 almost by half at the cost of wn squarings for on-line precomputation. Therefore, Using this equation may result in better performances when t is large and n is relatively small.

In general, Algorithm BG using v pre-for #) has the following performance formula:

#

where h, h= I. — J , r= t mod w, w It = L 代]J and w = r if 物=0 mod 0+1, w otherwise. Note that CBG(n;t, w,0)=c bg3t, w).

To achieve the best possible performance for given n and t, we have to choose optimal values for w and v. The value of v, which determines the amount of on-line precomputation, may be quite large for small n and large t. [Table 1〕shows the range of n according to the optimal values of v for some interested values of t. For example, when t= 160, using nonzero v is always advantageous if 応168. Obviously, the performance advantage with on-line precomputation becomes larger as t increases.

[Table 1] Range of n giving best performances in Algorithm BG for given v and t (q~b denotes aen

Ⅳ. Performance Comparison

Let us compare the performances of Algorithms WU, WS. LL and BG. First note that the optimal window size woPi in Algorithms WU and WS only depends on the bit-length t of exponents and the optimal bio아dng factor wopt in Algorithm LL depends only on t assuming that m=0 mod wopt. Thus, in these three algorithms, for given t we can express the cost function in terms of n alone. For example, for t= 160, we have the following simple formulas:

#

Here we took y= S/M= 0.8. Note that if our objective is to verify the equality Y=A - B~l, then we can check the equality by Y- B=A without computing the multiplicative inverse. So, for simplicity, let us take “ = /〃M=0 in Algorithm WS and denote the resulting algorithm as Algorithm WS*.

On the other hand, the optimal window size in Algorithm BG depends on both n and t. For example, for t= 160, we can obtain the following optimal performance formulas according to the range of n:

#

Using the above formulas derived for t= 160. we can determine the best algorithm according to the batch size n. We can derive similar equations for other values of t. 〔Table 2] shows the best performing algorithms and their range of n for some selected values of t. Note that we almost always have the same order of preferred algorithms as n increases- WU, LL, WS *, BG (in fact, there are some fluctuations in the order of Algorithm LL and WS *, though neglected in the table).

[Table 2] Best performing algorithms for given n and t

For more exact quantitative comparison, we provide performance figures of the four algorithms in [Table 3〕for some selected values of n, where the performance of the binary algorithm is also tabulated as it can serve as a base line for the comparison. The table shows that with the best performing algorithm for given n we can achieve about 2 to 4 times speed-up(for the range of n from several tens to several thousands) compared to the binary algorithm.

[Table 3) No. of multiplications(unit : 1000) re-quired for multi-exponentiation for t= 160

Ⅴ. Further Speedup in Elliptic Curve Groups

Basic operation in elliptic curve arithmetic is addition and subtraction between elliptic points, and computation of an integer multiple of a given point in this additive group, called scalar multiplication, corresponds to modular exponentiation in multiplicative groups. So, multi-exponentiation of Equation (2) can be written using additive notation as

#

where K/s are base points and c/s are scalar values(integers). A distinct feature of ellip-tic curve arithmetic, compared to modular arithmetic, is that we can freely use addition/ subtraction chains since the cost for elliptic subtraction is almost the same as the cost for elliptic addition. This enables us to improve the algorithms presented in Sect.Ill when used for elliptic curve arithmetic. Furthermore, we can take advantage of the high degree of parallelism in Algorithms LL and BG to further improve these algorithms.

In this section, we will use y to denote the performance ratio of elliptic doublingto-addition, i.e., y=D/A.

4.1 Improvement Using Signed Encoding

A basic algorithm for scalar multiplication in elliptic curve groups is the signed binary algorithm, denoted by Algorithm BS-EC. From Sect.Ⅲ.1, we can easily see that the expected number of elliptic additions required by Algorithm BS-EC is given by

#

Similarly, we have the following performance formula for Algorithm WS-EC:

#

The storage requirement and optimal window sizes for Algorithm WS-EC are the same as those given in Theorem 3.

We can also improve Algorithm BG using signed encoding. Suppose that each multiplier a is represented in base 2W as in Sect.Ⅲ.4. Since a we-bit number c can always be encoded into an integer 潟 whose absolute value is less than or equal to we can encode the entire a as

#

where h= \ -브丄 ] and Om| 3기盘 ^-1. We can easily derive the expected number of elliptic additions required by Algorithm BG-EC from the corresponding fornula in Sect.Ⅲ.4 :

#

where h=[-号+ J , r= /+1 mod w, hf = L—-J ' h, and w ~ r if "=° mod v+1 and w — w— 1 otherwise. Given t and v, we can find optimal window sizes depending on n as before.

4.2 Further Improvement Using Simultaneous Inversion

Another speedup technique in elliptic curve arithmetic is to take advantage of parallel computation. Suppose that it is allowed to add or double m elliptic points in parallel and suppose that we do the arithmetic in affine coordinates. Then we can use the simultaneous inversion trick due to Montgomery [9, Algorithm 10.3.4〕to reduce the number of inversions required for m elliptic additions/doublings to only one at the cost of 3 field multiplications per elliptic addition/doubling. Using this technique, we can achieve about 20 to 30% speedup in Algorithms LL-EC and BG-EC, thanks to the very large degree of parallelism in these algorithms. For example, on-line precomputation (0ei< n) in Algorithm BG-EC can be performed in affine coordinates only using one inversion,(曲- 3) multiplications and 2n squarings. Thus, if the degree of parallelism ( n in the above example) is quite large, we can perform an elliptic addition only using about 5 multiplications and 2 squarings. This is the fastest known method for elliptic addition.

Note that elliptic curve arithmetic in projective coordinates usually yields better performances than elliptic curve arithmetic in affine coordinates. However, if it is possible to do parallel execution of a number of elliptic additions/doublings as discussed above, it is almost always more efficient to do elliptic curve arithmetic in affine coordinates.

Ⅵ. Batch Verification of Digital Signatures

There are a number of cryptographic applications requiring multi-exponentiation: e.g., (batch) verification of ElGamal-type signatures, elliptic scalar multiplication using Frobenius expansion, exponentiation in GFCp7), etc. Batch verification of ElGamal-type signatures is particularly interesting, since we can substantially improve the performance of a variety of applications using digital signatures. There have been proposed and analyzed efficient batch verification algorithms on a variant of DSA signatures〔5,6〕. But there is no algorithm presented for efficient evaluation of multi-exponentiation required for the batch test.

A DSA signature on a message m, generated by a secret/public key pair mod 力), consists of (r, s) computed by A —gk mod p , r— A mod q and s= rx) mod q (p, q primes s.t. a\p~ 1 and \q\ = 160, g an element of order q). Verification of the signature can be done by checking the equality {gayb mod p) mod q, where a= ms mod q and b= rs -1 mod q. Since a batch verification technique cannot be applied to DSA signatures in their original form, Naccache et al.〔5〕considered a slight modification: send 3s), instead of 3s), as a signature and convert the signature back to its original form after successful verification.

Now, let us consider batch verification of the above modified DSA signatures. First, consider n signatures, {(爲•,&•)} (0ez'< n), generated by the same signer with a signing key pair (%, y). For convenience, we will denote the batch instance by {(兀,心,缶)} (0^/< n), where 山=mod q and 缶=七,-'mod q. Then the batch verification equation is given by

#(8)

where a= 2 Cja; mod b— S c;b;mod q and c/s are integers randomly chosen over the interval [0,2'). It is proved in〔5,6〕that the error probability of this test is less than 2 -'. So, f=30~60 would be sufficient in most applications. The left-hand side of equation (8) can now be efficiently computed using one of the algorithms presented in Sect.Ill according to the batch size n. Note that the bucket test in〔6〕can significantly improve the above small exponent test, in particular for large n. But our multi-exponentiation algorithms can further speed up the computation required by the bucket test.

On the other hand, it is much more likely in most applications of digital signatures that a batch instance consists of digital signatures from different signers. So, we next consider a batch instance consisting of n signatures, {(%,但,缶)} (0ez< w), generated by n distinct signers, where each signer i possesses a signing key pair In this case, the batch verification equation becomes

#(9)

where § ce/mod q, 歸= c0・modg. Now, since each b: is 1= \q\ = 160 bits long, the main computational load is to evaluate multiexponentiation in the right-hand side of equation (9) and the bucket test in〔6〕 does not result in any improvement in this case. Thus, a fast multi-exponentiation algorithm is crucial to the practicality of this general batch verification. Equation (9) can be rewritten for more efficient computation as

#(10)

where# mod q.

The left-hand side of equation (10) can be computed using one of our presented algorithms according to the batch size n. Since the size of the q is much smaller than that of the b- , we may choose different window sizes in Algorithms WU. WS and LL for better performances. However, Algorithm BG should have the same window size to share some common computations. From the analysis of Sect.IV, we can see that the presented algorithms enable us to verify a batch instance of n signatures about 2 to 4 times faster than the naive binary algorithm, and about 2 to 10 times faster than individual verification, depending on the batch size n.

Ⅶ. Conclusion

There are a number of cryptographic applications requiring efficient multi-exponentiation(i.e., computation of a product of powers), in particular in a variety of applications using digital signatures. In this paper, we presented several algorithms for efficient multi-exponentiation and analyzed their performances. The presented algorithms can perform wterm exponentiation( n ranging from a few to several thousands) 2 to 4 times faster than the basic binary multiexponentiation algorithm. We can choose the best performing algorithm according to the number (n) of powers and the bit-length of exponents. In general, Algorithm WU is the best for very small n, while for moderate values of n Algorithms LL and WS perform better. For very large n, Algorithm BG is the fastest.

References

  1. LNCS v.435 Addition chain heuristics J.Bos;M.Coster
  2. LNCS v.1334 Efficient elliptic curve exponentiation H.Cohen;A.Miyaji;T.Ono
  3. LNCS v.658 Fast exponentiation with precomputation E.F.Brickell;D.M.Gordon;K.S.McCurley;D.Wilson
  4. LNCS v.839 More flexible exponentation with precomputation C.H.Lim;P.J.Lee
  5. LNCS v.950 Can D.S.A. be improved? Complexity trade-offs with the digital signature standard D.Naccache;D.M' Raihi;S.Vaudenay;D.Raphaeli
  6. LNCS v.1403 Fast batch verification of modular exponentiation and digital signatures M.Bellare;J.A.Garay;T.Rabin
  7. IEE Proc of Computers and Digital Techniques v.141 Multi-exponentiation S.M.Yen;C.S.Laih;A.K.Lenstra https://doi.org/10.1049/ip-cdt:19941271
  8. Proc.of Japan-Korea Joint Workshop on Information Security and Cryptology(JW-ISC 2000) An efficient implementation of two-term exponentiation in elliptic curves S.G.Sim;P.J.Lee
  9. A course in computational number theory, GTM 138(3rd corrected printing) H.Cohen
  10. LNCS v.1294 An improved algorithm for arithmetic on a family of elliptic curves J.A.Solinas