Example of Compiler Optimization
Here is the optimized x86-64 machine code gcc
generates for that loop:
.L3: movsd (%rdi,%rax), %xmm1 mulsd (%rsi,%rax), %xmm1 addq $8, %rax cmpq %rdx, %rax addsd %xmm1, %xmm0 jne .L3
Here, for comparison, is the optimized machine code for the hypothetical RISC machine:
L1: fld r4(r1),f1 // fetch a[i] into f1 fld r4(r2),f2 // fetch b[i] into f2 fmul f1,f2,f1 // floating point multiplication fadd f0,f1,f0 // floating point addition addi r4,8,r4 // i8 = i8 + 8; cmp r4,r3,r5 // is i8 less than n8? blt r5,L1 // if so, branch to loop body
The Intel code is shorter by one instruction because the
mulsd
instruction allows one operand to be
in memory. I'm assuming the RISC machine requires all
operands to be in registers.