Home | History | Annotate | Download | only in asm

Lines Matching refs:loop

18 // "wider" than Itanium? Can you experience loop scalability as
37 // Wrong! Note that getf latency increased. This means that if a loop is
171 .skip 32 // makes the loop body aligned at 64-byte boundary
183 brp.loop.imp .L_bn_add_words_ctop,.L_bn_add_words_cend-16
224 .skip 32 // makes the loop body aligned at 64-byte boundary
236 brp.loop.imp .L_bn_sub_words_ctop,.L_bn_sub_words_cend-16
283 .skip 32 // makes the loop body aligned at 64-byte boundary
306 brp.loop.imp .L_bn_mul_words_ctop,.L_bn_mul_words_cend-16
317 // This loop spins in 2*(n+12) ticks. It's scheduled for data in Itanium
320 // ldf8. The loop is not scalable and shall run in 2*(n+12) even on
321 // "wider" IA-64 implementations. It's a trade-off here. n+24 loop
326 // this very instruction sequence in bn_mul_add_words loop which in
359 // of Intel the following loop is commented out? Indeed, it looks so
363 // The loop therefore spins at the latency of xma minus 1, or in other
364 // words at 6*(n+4) ticks:-( Compare to the "production" loop above
397 .skip 48 // makes the loop body aligned at 64-byte boundary
412 brp.loop.imp .L_bn_mul_add_words_ctop,.L_bn_mul_add_words_cend-16
423 // This loop spins in 3*(n+10) ticks on Itanium and in 2*(n+10) on
465 .skip 32 // makes the loop body aligned at 64-byte boundary
486 brp.loop.imp .L_bn_sqr_words_ctop,.L_bn_sqr_words_cend-16
497 // will appear larger than loss on "wider" IA-64, then the loop should