Homework 2 Due Thurs., Feb. 6 You must read the articles: POWER4 system microarchitecture http://www.research.ibm.com/journal/rd/461/tendler.html The Microarchitecture of the PentiumŪ 4 Processor http://developer.intel.com/technology/itj/q12001/articles/art_2.htm Both articles have links from the course web page. After reading the articles, answer the following questions. When looking at the IBM CPU, consider only _one_ CPU (even though the IBM chip has two CPUs per chip). As before, please use the "submit" script in the course directory, to submit directory (tarred and gzipped) with the file REPORT.txt in the directory. The quality of the technical English in REPORT.txt is also important. Please also bring hardcopy to class to hand in. 1. Both CPUs support hardware data prefetch. State how they are different. Which do you believe is better? Give an example code fragment in which the better system has an advantage. 2. For each CPU, state: a. How many instructions can be "in flight" (partially processed at one instant)? b. How many internal integer and floating point registers are there? [ At this time, I don't know if the IBM article states this. ] c. How many loads and stores can simultaneously exist in the internal pipeline? 3. What is the branch mispredict penalty (number of cycles) for each CPU? 4. How many instructions can be issued per cycle? 5. For each processor, does it support out-of-order execution and out-of-order instruction retirement? 6. The article on the Pentium 4 indirectly states that the Pentium 4 cache is only 8 KB due to the need to support the old 8086 instruction set. Explain the steps in that reasoning, based on the article. 7. What are the architectural features that the Intel added to the Pentium 4 to better support multi-media? 8. Compare the branch prediction methods of the two CPUs. Where are they the same? Where are they different? State your opinion on which is better, and give some example code fragment showing where the better branch predictor has an advantage. 9. The latest widely available Pentium 4 has a clock rate of 3.06 GHz, while the latest widely available POWER4 has a clock rate of 1.3 GHz. Nevertheless, the POWER4 can be comparable or faster than the Pentium 4 on certain programs. Give three diffent kinds of examples of programs on which you would expect the POWER4 to be the same speed or faster _in spite_ of the faster clock rate of the Pentium 4. Explain the technical reasons.