Homework 6 DEADLINE: Due Friday, Oct. 24 SUBMISSION: As always, you will submit a tarball (hw6.tar.gz) and a README file. You _must_ verify 'make check', or explain in README what isn't working. In the tarball, you will include: * README * AI.txt (for cache-simulator; If you didn't use AI for that, just say so.) * AI-implementation (for cache-simulator, required only if using AI for this) * LLVM.txt (a short, description of how the LLVM code works, with reference to explicity functions; The LLVM code can be written by the AI.) * Makefile (see below for required targets) * cache-simulator.c (and other required source files) * mytest.c (I will provide this as part of this homework.) * *.ll (These are LLVM files for intermediate representation; When using clang in your Makefile, I recommend the -S flag, which will produce human-readable .ll files for the IR code. Otherwise, you will get .bc files, which are "bit codes" in a binary representation.) *** IN THIS ASSGNMENT, _ONLY_, FOR THE CODE RELATED TO LLVM, *** YOU MAY USE AS MUCH AI AS YOU WISH. *** YOU DO NOT NEED TO INCLUDE A FILE, AI.txt. *** HOWEVER, YOU MUST STILL SUBMIT AI.txt FOR cache-simulator. *** AND YOU MUST SUBMIT AN AI-implementation.txt *** FILE (IF YOU DID USE AI). SEE 000-AI-POLICY.txt IN THIS SUBDIRECTORY. [ WARNING: THIS IS STILL BEING REVISED. ] In this assignment, we will simulate a fully associative cache (the first of the cache types discussed in class). The cache simulator will apply an LRU eviction policy (Least Recently Used). The simulator must include an 'M' field (modified) and a 'V' field (valid) for each cache line. I have asked Google Gemini to create a review of my classroom lecture on fully associative caches. You will find it in the subdirectory: 1-REVIEW-fully-associative-cache +---------------------------------------------------------------------- | In case, you are looking for a classical (human-generated) review | of fully associative caches, here are two possibilities, below. | But frankly, I prefer the review by Google Gemini. :-) | A. See Chapter 22 of ostep.org: | https://pages.cs.wisc.edu/~remzi/OSTEP/vm-beyondphys-policy.pdf | B. A nice youtube video is here (but without mentioning the modified | and valid bits): | https://www.youtube.com/watch?v=A0vR-ks3hsQ | "Ep 074: Fully Associative Caches and Replacement Algorithms" +---------------------------------------------------------------------- Furthermore, we will instrument an arbitrary C program to detect when a global variable is read or written. When this happens, we will write to stderr (not to stdout) an indication: * whether it was a read or a write; * what was the address; * we will assume that every read or write is a word (4 bytes); * and we will round _down_ to a word-aligned address, when we report the address. For example, 0x1ab6 will be rounded down to 0x1ab4. We will instrument this binary using LLVM. If you use Ubuntu, you should include the packages: sudo apt install llvm llvm-dev clang lldb (In a Red Hat-derived Linux distro, use llvm-devel instead of llvm-dev. The clang command is the analog of gcc, clang++ is for g++, and lldb is for gdb. But gdb still works, and lldb has different syntax.) To get you started, try asking an AI: 1. Does copilot understand IR in LLVM? 2. Please give an example of how to write an LLVM instrumentation pass. 3. Please give me an example of building and invoking the instrumentation pass. 4. Please write the emitWrite function to print "global variable: %p\n" with the address being the pointer, and a second argument with 0 for 'read' and 1 for 'write'.. 5. And if the AI gives you code using cmake (e.g., CmakeLists.txt), then ask it: Please convert this to using ordinary 'make' instead of 'cmake'. 6. Test the code (including on Khoury), and tell the AI if there are bugs in the code. You will then pipe the stderr from the arbitary C program to a cache simulator program that you write. Your cache simulator will have two modes: verbose (for each address, declare if it is a cache miss or a cache hit); or summary (show the mapping of each cache line to the virtual address in the target program. Your cache simulator will take an argument to specify the number of bytes in a data block of the cache (in a cache line). You may assume that the number of bytes is a power of two. You will also specify the size of the cache in bytes (also a power of two). From this, your program can derive how many cache lines (cache size deivided by bytes in a cache line). The API of your program must be: cache-simulator --data-block-size [NUMBER] --cache-size [NUMBER] As described before, cache-simulator will receive information about global variable reads or writes in its stdin, and you will invoke it by a pipe. The cache simulator must include for a given address to the cache, simulation of: * cache hit and miss for a given cache line * whether it is a data read or data write (but you do not need ot include the actual data read or written) * eviction of a cache line, and whether it had to be written back, due to a modified bit In addition, you must include a summary of the final state of the cache. [ For an example, see: 1-REVIEW-fully-associative-cache which has an example. You can use a shorter more compact format in your simulation. ] [ TODO: I WILL PROVIDE A TEST PROGRAM THAT THE GRADER MAY OR MAY NOT CHOOSE TO USE. THIS WILL BE CALLED mytest.c (SEE 'make check' below.) YOU MUST INCLUDE mytest.c IN YOUR TARBALL. ] You will include a Makefile that at least has a target: BLOCK_SIZE=128 CACHE_SIZE=4096 cache-simulator: cache-simulator.c gcc -o $@ $< %-llvm: %.c [YOU WRITE HERE COMMANDS TO CREATE A BINARY test1-llvm FROM test1.c, test2.c TO test2-llvm, ETC. NOTE THAT '%' IS A WILD CARD (AS IN SQL), AND WILL MATCH ANY STRING] '$@' REPRESENTS THE TARGET (%-llvm) AND '$<' represents THE DEPENDENCY. FOR EXAMPLE: gcc -o $@ $< ] clean: rm -f [REMOVE ALL BUT THE ORIGINAL SOURCE FILES REMOVE *.ll *.o AND EXECUTABLE FILES] dist: clean dir=`basename $$PWD`&& cd .. && tar czvf $$dir.tgz ./$$dir dir=`basename $$PWD`&& ls -l ../$$dir.tgz check: mytest-llvm cache-simulator ./mytest-llvm 2>&1 1>/dev/null | cache-simulator Recall that commands in a Makefile must start with a . In vim, it will remind you of that. For those who are curious how this works: ./mytest-llvm 2>&1 1>/dev/null | cache-simulator note that first we dup fd 2 to fd 1. fd 1 used to be pointing to stdout, and now fd 2 is also pointing to stdout. After that, "1>/dev/null" means that fd 1 will now point to /dev/null (nothing). But this doesn't affect fd 2, which continues to point to stdout. So, the workflow could just be: make check which will do: make cache-simulator make mytest-llvm ./mytest-llvm | ./cache-simulator --data-block-size ${BLOCK_SIZE} --cache-size ${CACHE_SIZE} ==== I will soon update these instructions with a mytest.c file in this homework subdirectory. I will also update these instructions with hints about how to use AI to make LLVM create a special binary (e.g., mytest-llvm), which will write to stderr every address for a global read and write. In using AI, you will start with questions like: > Does this AI understand LLVM IR? > I want to have LLVM call a function "addrEmit(void *addr, int readWrite)" every time that it reads or writes to a global variable. The argument "addr" should be the address of the global variable. The argument readWrite should have value '0' for a read, and '1' for a write. You must also include a file LLVM.txt that explains the code written by the AI for LLVM. You may ask the AI any questions, if you don't understand the code that it wrote.