Lab 11: An Introduction to Topological sort
Problem 1: Depth-First Search and Topological Sort
11.1 Dependencies as Directed Acyclic Graphs
Goals: To practice graph traversal algorithms and learn about topological sort.
The focus of this lab is on directed, acyclic graphs (called DAGs. These kinds of graphs are often used for showing dependencies between different objects. For example, here’s a subset of the computer science courses offered at Northeastern and their prerequisites.
"Algorithms and Data" has "Fundamentals 2" as a prerequisite.
"Compilers" has "Programming Languages" as a prerequisite.
"Computer Systems" has "Fundamentals 2" as a prerequisite.
"Database Design" has "Fundamentals 1" as a prerequisite.
"Fundamentals 1" has no prerequisites.
"Fundamentals 2" has "Fundamentals 1" as a prerequisite.
"Large-Scale Parallel Data Processing" has "Algorithms and Data" and "Computer Systems" as prerequisites.
"Object-Oriented Design" has "Fundamentals 2" as a prerequisite.
"Programming Languages" has "Theory of Computation" and "Object-Oriented Design" as prerequisites.
"Theory of Computation" has "Fundamentals 2" as a prerequisite.
It’s much easier to view this information as a DAG, where there’s an edge from course A to course B if course A has course B as a prerequisite. Draw this graph on paper for yourself as an exercise; you will use it below.
A particular student has decided they want to take all of these courses, but they can only take one course per semester (this student is particularly ambitious, so they’re also majoring in biology, and have a full-time job, and lead the debate club, and ...). With all of these prerequisites, though, it’s a little difficult for them to figure out a valid order they can take these courses in so they can plan their schedule for the next few years, and they’ve enlisted your help.
Let’s try doing this by hand first. By looking at the graph you drew, figure out and write down some possible order in which the student can take those classes so that they always satisfy the prerequisites for the next course. (Note: there are multiple possible orders, but we only need to find one of them.) What mental "algorithm" did you use to figure it out?
11.2 Implementing Topological Sort
This is not quite the same kind of sorting as the other algorithms we’ve seen: the nodes of the graph are not comparable in all cases. We don’t know, based on the example above, whether "Object-Oriented Design" should come before or after "Computer Systems", but we do know that both of them must come after "Fundamentals 2".
For this lab, you’ll use the following data definitions. They are simplified from the ones used in lecture (we’re skipping a separate class for edges, since there is no information we need to store on those edges for this problem):
class Curriculum { ArrayList<Course> courses; Curriculum() { this.courses = new ArrayList<Course>(); } // EFFECT: adds another course to the set of known courses void addCourse(Course c) { this.courses.add(c); } // add methods here... } class Course { String name; ArrayList<Course> prereqs; Course(String name) { this.name = name; this.prereqs = new ArrayList<Course>(); } // EFFECT: adds a course as a prereq to this one void addPrereq(Course c) { this.prereqs.add(c); } // add methods here }
Fortunately, the algorithm we’ll work through here is based on something you’ve already seen: depth-first search. Recall that in depth-first search, processing a node involves the following steps:
If this is the vertex you’re looking for, return it.
Else, if the vertex has already been processed, don’t do anything.
Otherwise, the vertex has not been processed yet, so mark it as processed then process all of its neighbors.
In DFS, the point of the algorithm was to determine whether some vertex was reachable from another one. Topological sort, however, isn’t looking for any particular vertex. Instead, the point is to traverse the graph in depth-first order and add a vertex to a sorted result list only after all of its "prerequisite" vertices have been added to the result list. As a result, the process method for topological sort consists of the following steps.
If the vertex is in the sorted result list, don’t do anything.
If the vertex is not yet in the sorted result list, process all of its neighbors, then add the vertex to the result list.
Note that here we add the processed vertex to the result list after processing its children, rather than before as we did in the DFS algorithm. This is important for two reasons. First, because we’re assuming that topological sort will be given only acyclic graphs, there is no risk of the algorithm running into an infinite loop. Second, by adding the vertex to the list only after its prerequisite vertices have been processed, we ensure that that vertex comes after its prerequisites (as well as their prerequisites, and their prerequisites’ prerequisites, and so on).
Just doing a single depth-first search is not enough to cover all of the courses, though. For example, doing a depth-first search starting from "Compilers" would never process "Algorithms and Data", because neither one (transitively) depends on the other. So topological sort does one depth-first traversal starting from every vertex in the graph. This ensures that every vertex is processed at least once, but subsequent re-processing of a vertex won’t cause any harm because we don’t do anything for vertices that have already been seen.
Follow these steps to implement topological sort.
Create an example Curriculum of Courses representing the example courses above. Create an example ArrayList<Course> of the class schedule you worked out, with prerequisites first and advanced courses later in the list.
Create some testing helpers. First, design a method on Curriculum, boolean comesAfterPrereqs(ArrayList<Course> schedule, Course c), that checks whether the provided Course appears in the schedule after all of its prerequisites do (and, clearly, that all the prerequisites appear in the schedule!).
Second, design a method boolean validSchedule(ArrayList<Course> schedule) that checks whether the schedule is valid for all the courses in it.
Your example ArrayList<Course> from above should satisfy this helper.
Design a void method process on Course that takes in a ArrayList<Course> called processed, representing the courses that have been processed and added to the eventual result so far. If this course is in processed, then do nothing; otherwise, process this course’s prereqs, then add it to processed. Hint: implement this method recursively, rather than iteratively with loops and a stack as you did for DFS in lecture.
Design a method topSort on Curriculum that takes in an ArrayList<Course>. The method should create a new ArrayList<Course> to store the eventual result, then process every course in the given list with that result list. Once every course has been processed, return the result list.
11.3 Discussion
Topological sort is used in many, many places, wherever scheduling constraints might be interesting. Consider baking a fancy pastry, which requires filling, cake layers, and frosting, each of which requires several preparation stages in advance, each of those of which require prepping ingredients, etc... Or more progammatically, we might worry about which order to load various support libraries in order to boot up the computer and eventually load and run our own programs. When there are additional kinds of constraints (like finishing the overall processing in as short a time as possible), then we need fancier algorithms yet than this, but topological sorting gives us a baseline to compare those other algorithms against.