CS 361: Algorithms and Data Structures
Homework 7
Due date: 12/7/15 by beginning of class



This homework assignment is entirely a programming assignment. It is required that you work with a partner for this assignment. This is a substantial assignment. Please do not wait until the last minute to start.

The data we will be working with is Netflix data. Click here for a zipped directory containing all the necessary files for this assignment.

In this assignment, you will implement the following Java classes:

  • NetflixAnalyzer- A Java class with a main method that allows the user to explore the Netflix data.
  • Graph - A Java class that represents a directed graph using an adjacency list representation
  • GraphAlgorithms - A Java class that implements two graph algorithms: Dijkstra's algorithm and the Floyd-Warshall algorithm.
This assignment requires heavy use of the Java Collections classes: List, ArrayList, Map, and HashMap. For an example of how to iterate over the elements in a HashMap, see the toString() method in the Reviewer class.

Netflix Analyzer

Inside the NetflixAnalyzer.java class should be a main method that allows the user to explore the Netflix data. In my program, the user specifies the filenames from the command line. When the main method is run, here is what the user should see:



This message asks the user how they want to build a graph from the Netflix data. The nodes of the graph should always be movies but there are lots of ways to define the edges of the graph. You should come up with at least two (2) different options for defining what it means for movies u and v to be adjacent in the graph. You are free to use the options I came up with above or to experiment with others. Note: I recommend coming up with options that produce an unweighted graph.

Here is what should be printed after choosing an option:



If the user chooses option 1, you should print out the following information about the graph:

  • The number of nodes
  • The number of edges
  • The density of the graph defined as D = E / (V*(V-1)) for a directed graph and D = 2E/(V*(V-1)) for an undirected graph.
  • The maximum degree (i.e. the largest number of outgoing edges of any node)
  • The diameter of the graph (i.e. the longest shortest path)
  • The average length of the shortest paths in the graph
These last two (diameter and average) require you to compute the shortest path between all pairs of nodes in the graph using the Floyd-Warshall algorithm.

If the user chooses option 2, you should ask the user for a starting node and an ending node. Then use Dijkstra's algorithm to find the shortest path between the two nodes. You should print out the shortest path for the user.

Continue printing out the menu and letting the user choose options until they choose option 3 which should cause your program to quit. Click here to see a full session.

You should add other (probably static) methods to your NetflixAnalyzer class in addition to the main method. For example, a method that uses a NetflixProcessor to read in the data, a method that constructs the graph according to the user's choice, a method to print out different messages to the console etc. In particular, it's better to break your code up into small methods instead of having a single giant main method with dozens of lines of code.

The Graph Class

The Graph class should use an adjacency list representation for storing a directed graph. Assume that the nodes are represented using integers. As such, you can use a List<List<Integer>> or a Map<Integer, List<Integer>> for the adjacency list. At the very least, your Graph class should have the following methods:

  • addNode(u) - Add a node to the graph
  • addEdge(u, v) - Add a directed edge from u to v to the graph
  • getAdjacency(u) - Return the adjacency list for the node u
The inputs to these methods (u and v) can be integers since each Netflix movie is represented using an integer id. You'll probably find it useful to add other methods to your Graph class as needed.

Graph Algorithms

Finally, you should have a class called GraphAlgorithms that has methods for both Dijkstra's algorithm and the Floyd-Warshall algorithm. Even though the graph is unweighted, I still want you to implement Dijkstra's algorithm. Just assume each edge has a weight of 1.

Dijkstra's algorithm should return back the set of parent nodes because you'll need to print out the actual path for the user. The Floyd-Warshall algorithm however can simply return back the path costs. Here is my recommendation for the definition of both methods:

  • public static int[][] floydWarshall(Graph graph)
  • public static int[] dijkstrasAlgorithm(Graph graph, int source)

The floydWarshall method takes in a graph and returns a two-dimensional array of integers. The entry in spot [i][j] should be the length of the shortest path from node i to node j. Dijkstra's algorithm takes in a graph and a source and returns an integer array. This array is the "prev" data structure we talked about in class. The i-th element in the array is the parent of node i on the shortest path from the source.

Final Tips

The zipped directory for this assignment should contain the following files:
  • movie_reviews.txt - A plain text file that contains reviews of movies by 1495 Netflix reviewers.
  • movie_titles.txt - A plain text file that contains the year and title for 776 movies.
  • movie_reviews_short.txt - A plain text file that contains fake reviews of movies by only 6 users. Use this to help you debug your code.
  • movie_titles_short.txt - A plain text file that contains fake movies. Use this to help you debug your code.
  • Reviewer.java - A Java class that represents a single Netflix reviewer
  • Movie.java - A Java class that represents a single Netflix movie
  • NetflixFileProcessor.java - A Java class that has methods for reading and parsing the movie_reviews.txt and movie_titles.txt files
  • PriorityQueue.java - My implementation of a minimum priority queue. You can use your own priority queue (although you'll have to modify it to be a minimum priorty queue instead of a maximum priority queue) or you're welcome to use mine.

It should only take a few seconds to build the graph on the full dataset! My code takes less than 10 seconds to build the graph on the full dataset. If your code is taking longer, then chances are you're inefficiently determining which node should be connected to which node in the graph.

You should test your code as you write it!! A good idea is to test each class using a main method. You can use the "short.txt" files as a small graph to test your code. Or, for your GraphAlgorithms class, draw a small graph by hand on paper and check that your Dijkstra's and Floyd-Warshall return the correct answer.

To enable assertions (which are extremely useful for catching bugs), see the Priority Queue assignment.


Extra Credit Ideas:
  • The graph is quite large and without any knowledge apriori, it's rare to guess two nodes that actually have a path between them. One nice idea would be to add a 4th option to the menu labeled "Print random path" that would print out a randomly chosen path from the graph. You could use the matrix returned by the Floyd-Warshall algorithm to choose two nodes that are connected by a path.

    A variation of this idea is to have a 4th option labeled "Print interesting path" that tries to choose a path that might be interesting to the user rather than a random path. For example, you might choose a path that involves the node with the highest degree (e.g. the node/movie with the highest degree was watched the most often and thus should be interesting to a general audience). Or you could try to find paths that connect two nodes with low degree. Or you could try to find paths that do not go through the node with the highest degree.

  • An interesting use of this Netflix data would be a recommender system where the system would recommend movies to a user based upon movies the user has already watched and liked.

    One idea would be to include a 4th option in your menu labeled "Recommend movie". If the user chooses this option, you could ask them to enter the integer id of movies they have watched and liked. Given this information and the graph, there a few different ways you could recommend a new movie:

    • For each liked movie, add its neighbors (in the graph) to a max priority queue. You can increment a movie's priority for each liked movie it is adjacent to.
    • For each pair of liked movies, find the shortest path between them. For each movie along this shortest path (i.e. each intermediate node), add it to a max priority queue. You can increment a movie's priorty for each shortest path it appears on.
    • You could combine both of these ideas
    Feel free to experiment with different ways of using the user's preferences and the graph structure to generate movie recommendations.

  • Modify your code so that it produces and works on a weighted graph (instead of an unweighted graph). For example, two movies are connected by an edge whose weight is the number of users that watched both movies. This would require you to modify your Graph.java file along with your Dijkstra's algorithm and Floyd-Warshall algorithm.



Submission Instructions

Your Java code should be submitted in a zipped directory. The directory should contain all necessary Java files. You and your partner only need to submit one directory. Please make sure that you put both of your names in the class comments so I know who you worked with. Your code should compile with no errors.

Your assignment will be graded based on functionality. That is, I will run your code on some small test graphs that I have and check that your code returns the correct answers for both option 1 (printing out information about the graph) and option 2 (shortest paths).

You should submit your zipped directory to Moodle before the beginning of the last class.




Last modified: Fri Jan 24 10:58:47 PST 2014