EXPLORING xv6

These notes will help you to better understand xv6.
In particular, there is an emphasis on two portions of the code:
  There is the question of how a system call makes its way:
    from user space;
    through usys.S (analogous to libc.so);
    into the kernel through a trap call (asynchronous interrupt);
    into a trap handler for T_SYSCALL (for all system calls) in trap.c;
    and finally through an array of system call addresses where
      the index represents the system call number (in syscall.c);
    and then on to the actual function in the kernel.
  And there is the question of how file access goes from:
    a fd in the process table (created by the 'open' syscall);
    to a 'struct file' in a system-wide array of all struct files;
    to an 'inode' in the superblock of a filesystem;
    and on to the actual data block.
    [ You should be able to find the files that support each "feature". ]
    [ Note also that there are 'ref' fields (reference counts) associated
      with a 'struct file' and with an 'inode'.  Be sure that you understand
      their use, and why we have a separate struct file and inode. ]
  Along the way, you'll find three types of permissions:
    the permissions in a struct file,
    the permissions in an inode;
    and the permissions of a memory segment (from 'mmap', not covered here).
===

The xv6 software is based on the original UNIX, Sixth Edition (also
called Version 6 UNIX).
This was the first version of UNIX to be widely released outside
Bell Labs.  It was released in May, 1975, for the DEC PDP-11 minicommputer.
Xv6 is made available by MIT:
    http://pdos.csail.mit.edu/6.828/2014/xv6.html
The contents there, xv6-rev6.pdf, nicely organizes the files according
to purpose.

Many of these pointers are reproduced on our local, hyper-referenced copy:
    http://www.ccs.neu.edu/course/cs3650/unix-xv6/
Be sure to also read the
    "HELP on using the doxygen interface to browse the code"
on that page, for tips to efficiently use the search, find the source code,
see who calls a function, etc.

One other pointer that will help you is that it often pays to
read the (relatively short) .h file before reading the corresponding .c file.
The .h file provides the data structures and the names of functions
that manipulate them.

The questions below will provide a guide to better understanding xv6.

[ *** THIS IS STILL A WORK IN PROGRESS.  IT WILL BE POLISHED FURTHER LATER. ***]

PROCESSES:

1.  Where is the data structure for the process table?

2.  When there is a context switch from one process to another, where are
    the values of the registers of the old process saved?
    (This requires a reading knowledge of assembly.  We will review
    in class the key points of assembly language.)

3.  What are the possible states of a process?  Also, give a brief phrase
    describing the purpose of each state.

4.  What is the function that does a context switch between two
    processes?
    (This requires a reading knowledge of assembly.  We will review
    in class the key points of assembly language.)

5.  Explain how the context switch function works.

6.  What function calls the context switch function, and explain in detail
    what the calling function does.  (The doxygen hyper-linking
    is not perfect here.  You may have to use 'grep' on
       /course/cs3650/unix-xv6/* )

PROCESS STARTUP:

1.  Suppose a new program is going to start.  This requires a call to
    the system call, exec().  On what lines does the operating system
    create the first call frame, to be used by the user process's main()?

2.  The first call frame must have the values of the
    local variables argc and argv.  Where is the value of argv found
    in the exec() call?

3.  On what lines does the function modify the process table entry
    with the new program for the current process?

SYSTEM CALLS:

In class, we discussed how a system call (e.g., open()) is really
a function in the C runtime library, libc.so, and that function
then calls 'syscall()' with the integer for the 'open' system call.
This is similar to when you use 'syscall'' in the MARS assembler,
and you put the system call number in register $v0 before the call.

In these questions, we will follow the chain of control from
a user program that calls 'open()' to the code in 'libc' to the
syscall in the kernel, and finally to the function in the kernel
that actually does the work of the 'open()' system call.

1.  The file grep.c makes a call to 'open()'.  The definition of 'open()'
    is inside 'usys.S'.  It makes use of the macro 'SYSCALL'.
    Note that a macro, '$SYS_ ## name', will expand to the concatenation
    of 'SYS_' and the value of the macro parameter, "name".
    The assembly instruction 'int' is the interrupt instruction in
    x86 assembly.  The 'int' assembly instruction takes as an argument
    an integer, 'T_SYSCALL'.
        The code in usys.S is complex because it uses the C preprocessor.
    But, roughly, SYSCALL(open) will expand to the assembly code
    in lines 4 though 9 of usys.S, where the (x86 assembly) instruction:
      "movl $SYS_ ## name, %eax"
    expands to:
      "movl $SYS_open, %eax"
    The value of SYS_open can be found in the include file, "syscall.h".

    The instruction:
      "int $T_SYSCALL"
    uses information from "traps.h".  The "int" instruction is an
    "interrupt" instruction.  It interrupts the kernel at the address
    for interrupt number 64 (found in traps.).
        If you do "grep SYS_open /course/cs3650/unix-xv6/*
    it will lead you to:
        /course/cs3650/unix-xv6/syscall.c
    That will define the "syscalls" array, which is used by the
    function "syscall".

    Finally, here is the question:
        Give the full details of how a call to 'open()' in grep.c will
    call the function 'sys_open()' in sysfile.c, inside the operating
    system kernel.

FILES AND FILE DESCRIPTORS:

In class, we've talked about files and file descriptors.  We have
not yet discussed i-nodes.  For these questions, you can think of
an i-node as a location on disk that has the "table of contents"
for all information about a file.

In these questions, we will follow the chain of control from
open() to a file descriptor, to a "struct file" (including the
offset into the file), to the i-node.

1.  The function 'sys_open()' returns a file descriptor 'fd'.
    To do this, it opens a new 'struct file', 'filealloc()',
    and it allocates a new file descriptor with 'fdalloc()'.
    Where is the file descriptor allocated?  Also, you will see that
    the file descriptor is one entry in an array.  What is the algorithm
    used to choose which entry in the array to use for the new file descriptor?
    [ Comment:  The name 'NOFILE' means "file number".  "No." is sometimes
      used as an abbreviation for the word "number". ]

2.  As you saw above, the file descriptor turned out to be an index
    in an array.  What is the name of the array for which the file
    descriptor is an index?  Also, what is the type of one entry in
    that array.

3.  What is the name of the field that holds the offset into the file?
    We saw it in the function 'sys_open()'.

4.  Remember when we mentioned a call to 'filealloc()' above?
    Since the return value of 'filealloc()' is only a 'struct file',
    we need to initialize it.  Presumably, we will initialize it with
    a file offset of 0.  What is the line number in 'sys_open()' where
    we initialize the file offset to 0?

5.  The type of the 'struct file' was initialized to 'FD_INODE'.  What are the
    other types that it could have been initialized to?

6.  By examining the function 'sys_dup()', you can discover how a
    system call to 'dup()' will manipulate both a file descriptor
    and a "struct file".  Describe what it does in each of the
    two cases.

=====
For the next few questions, please read Chapter 39 of ostep.org:
"Interlude:  Files and Directories" from ostep.org.  (It's very
easy to read, as you'll see.)  Then continue with the remaining
questions..

7.  We have seen that an 'fd' acts as an index that leads to a pointer
    to the 'struct file'.  Separately, a 'struct file' of type
    'FD_INODE' leads to a 'struct inode'.  The 'struct inode' is stored
    on disk, in the filesystem, but the operating system will make
    a temporary copy in RAM.  Reach Chapter 40.3 in ostep.org:
      File Organization:  The Inode
    (You can skip "The Multi-Level Index".  But please do study
     Figure 40.1: "Simplified Ext2 Inode").
    Find the 'struct inode' in xv6, and then describe the xv6 fields:
      inum, ref, nlink, size, and addrs[NDIRECT+1].

8.  Describe in detail _how_ 'dirlink()' (in fs.c)
    creates a new directory entry.  Here, you will benefit from
    reaching Chapter 40.4 ("Directory Organization") in ostep.org.

9.  Suppose a 'struct file' had been initialized to FD_PIPE.  Find the
    'struct' that holds the information about a pipe.  For each field
    in that struct, Explain briefly (just a phrase) the purpose of that
    field.