Home | History | Annotate | Download | only in asm

Lines Matching full:cycles

19 # are for streamed GHASH subroutine and are expressed in cycles per
43 # 11-13 cycles on contemporary x86 cores. As for choice of MMX in
48 # Add PCLMULQDQ version performing at 2.10 cycles per processed byte.
50 # instruction latency appears to be 14 cycles and there can't be more
52 # Karatsuba multiplication would take 28 cycles *plus* few cycles for
64 # cycles and Naggr chosen by Intel is 4, resulting in 2.05 cycles per
67 # which for a single multiplication is ~5 cycles. Unfortunately Intel
70 # alone resulted in 2.46 cycles per byte of out 16KB buffer. Note that
75 # ~13 cycles and Naggr is 2, giving asymptotic performance of ...
87 # is more realistic estimate. In this case it gives ... 1.91 cycles.
91 # one byte out of 8KB buffer in 2.10 cycles, while x86_64 counterpart
110 # one byte in 2.07 cycles on Sandy Bridge, and in 2.12 - on Westmere.
114 # where original 64-bit code processes one byte in 1.95 cycles.
1339 # more data - larger table. Best reported result for Core2 is ~4 cycles
1343 # results compare? Minimalistic "256B" MMX version delivers ~11 cycles
1346 # "256B" one, in other words not worse than ~6 cycles per byte. It