Hardware accelerators are often used in cryptographic applications for speeding up the highly arithmetic-intensive public-key primitives, e.g. in high-end smart cards. One of these emerging and very promising public-key scheme is based on HyperElliptic Curve Cryptosystems (HECC). In the open literature only a few considerations deal with hardware implementation issues of HECC. Our contribution appears to be the first one to propose architectures for the latest findings in efficient group arithmetic on HEC. The group operation of HECC allows parallelization at different levels: bit-level parallelization (via different digit-sizes in multipliers) and arithmetic operation-level parallelization (via replicated multipliers). We investigate the trade-offs between both parallelization options and identify speed and time-area optimized configurations. We found that a coprocessor using a single multiplier (D = 8) instead of two or more is best suited. This coprocessor is able to compute group addition and doubling in 479 and 334 clock cycles, respectively. Providing more resources it is possible to achieve 288 and 248 clock cycles, respectively.
Finding optimum parallel coprocessor design for genus 2 hyperelliptic curve cryptosystems
BERTONI, GUIDO MARCO;BREVEGLIERI, LUCA ODDONE;
2004-01-01
Abstract
Hardware accelerators are often used in cryptographic applications for speeding up the highly arithmetic-intensive public-key primitives, e.g. in high-end smart cards. One of these emerging and very promising public-key scheme is based on HyperElliptic Curve Cryptosystems (HECC). In the open literature only a few considerations deal with hardware implementation issues of HECC. Our contribution appears to be the first one to propose architectures for the latest findings in efficient group arithmetic on HEC. The group operation of HECC allows parallelization at different levels: bit-level parallelization (via different digit-sizes in multipliers) and arithmetic operation-level parallelization (via replicated multipliers). We investigate the trade-offs between both parallelization options and identify speed and time-area optimized configurations. We found that a coprocessor using a single multiplier (D = 8) instead of two or more is best suited. This coprocessor is able to compute group addition and doubling in 479 and 334 clock cycles, respectively. Providing more resources it is possible to achieve 288 and 248 clock cycles, respectively.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.