Example of Compiler Optimization

Here is the optimized x86-64 machine code gcc generates for that loop:

      .L3:
          movsd   (%rdi,%rax), %xmm1
          mulsd   (%rsi,%rax), %xmm1
          addq    $8, %rax
          cmpq    %rdx, %rax
          addsd   %xmm1, %xmm0
          jne     .L3

Here, for comparison, is the optimized machine code for the hypothetical RISC machine:

      L1:
          fld     r4(r1),f1           // fetch a[i] into f1
          fld     r4(r2),f2           // fetch b[i] into f2
          fmul    f1,f2,f1            // floating point multiplication
          fadd    f0,f1,f0            // floating point addition
          addi    r4,8,r4             // i8 = i8 + 8;
          cmp     r4,r3,r5            // is i8 less than n8?
          blt     r5,L1               // if so, branch to loop body

The Intel code is shorter by one instruction because the mulsd instruction allows one operand to be in memory. I'm assuming the RISC machine requires all operands to be in registers.