VI. CONCLUSIONS
This paper presented high speed implementation of ECC, and we presented, evaluated, and optimized NUMS256, Ted37919, and NUMS384 on 8-bit AVR and 32-bit ARM11 processors. In particular, we introduced an efficient implementation of multi-precision multiplication and squaring for multiplication on 8-bit AVR and ARM micro-controllers. The finite field multiplication and squaring with the length of 256- bit can be accomplished within 6,301/4,489 and 543/428 clock cycles for 8-bit AVR and 32-bit ARM, respectively. The result sets new speed records on an 8-bit AVR and 32-bit ARM processors. And then, we implemented NUMS curves including NUMS256, Ted37919, and NUMS384, which combine the individual computational advantages of the twisted Edward and Montgomery curves. Finally, we achieved record-setting execution times for scalar multiplication over 256, 379, and 384- bit security prime fields. For example, NUMS256 curve only requires 1.357 M cycles on 32-bit ARM11 processor, which is more than 1.6x faster than the widely-used Curve25519.