CS 361: Algorithms and Data Structures |
Congratulations! You've made it to the final homework assignment for CS361. This assignment brings together many of the algorithms/ideas that we've studied throughout the semester. In this assignment, we will be analyzing the MovieLens data set (ml-latest-small) which contains 100k movie ratings from 671 users collected over a 20-year period (from 1995 to 2016). The goal of this assignment is to create a graph using the MovieLens data (where the nodes are movies) and a text-based interface that allows the user to explore the graph. Ideally, this graph could be used as a recommender system (i.e. to recommend movies) and you will find some extra credit options at the end if you wish to take the assignment a step further and provide movie recommendations. This assignment is entirely a programming assignment and it is recommended (but not required) that you work with a partner. This is a substantial assignment so please do not wait until the last minute to start. Starter CodeClick here for a zipped directory containing all the necessary files for this assignment. The files are divided into the following packages:
MovieLens AnalyzerInside the MovieLensAnalyzer.java class should be a main method that allows the user to explore the MovieLens data. The user should specify the filenames for loading the data at the command line. When the main method is run, here is what the user should see: This welcome message asks the user how they want to build a graph from the MovieLens data. The nodes of the graph should always be movies but there are lots of ways to define the edges of the graph. You should come up with at least two (2) different options for defining what it means for movies u and v to be adjacent in the graph. You are free to use the options I came up with above or to experiment with others. Note: The first two options produce an undirected graph which I implement using a directed graph structure (by adding an edge from u to v and from v to u) Here is what should be printed after choosing an option: If the user chooses option 1, you should print out the following information about the graph:
If the user chooses option 2, you should print out information about the specified node. You can use the toString() method in the Movie.java class but you should also add code to print out all of the neighbors of the specified node. If the user chooses option 3, you should ask the user for a starting node and an ending node. Then use Dijkstra's algorithm to find the shortest path between the two nodes. You should print out the shortest path for the user. Continue printing the menu and letting the user choose options until they choose option 4 which should cause your program to quit. Click here to see a full session. You should add other (probably static) methods to your MovieLensAnalyzer class in addition to the main method. For example, a method that uses DataLoader to read in the data, a method that constructs the graph according to the user's choice, a method to print out different messages to the console etc. In particular, it's better to break your code up into small methods instead of having a single giant main method with dozens of lines of code. Graph AlgorithmsIn the class GraphAlgorithms.java you should implement both Dijkstra's algorithm and the Floyd-Warshall algorithm. Even though the graph is unweighted, I still want you to implement Dijkstra's algorithm. Just assume each edge has a weight of 1. Dijkstra's algorithm should return back the set of parent nodes because you'll need to print out the actual path for the user. The Floyd-Warshall algorithm however can simply return back the path costs. Here is my recommendation for the definition of both methods:
The floydWarshall method takes in a graph and returns a two-dimensional array of integers. The entry in spot [i][j] should be the length of the shortest path from node i to node j. Dijkstra's algorithm takes in a graph and a source and returns an integer array. This array is the "prev" data structure we talked about in class. The i-th element in the array is the parent of node i on the shortest path from the source. Final TipsThe starter code for this assignment contains the directory ml-latest-small with the actual data files:
It should take less than a minute to build the graph! My code takes around 25 seconds to build the graph choosing option 1. If your code is taking longer, then chances are you're inefficiently determining which node should be connected to which node in the graph. You should test your code as you write it! A good idea is to test each class using a main method. You can create a smaller movies file by just using the top 10 or top 100 lines of movies.csv. Or you could draw a small graph by hand on paper and check that your Dijkstra's and Floyd-Warshall return the correct answer. |
Extra Credit Ideas:
|
Submission Instructions Your Java code should be submitted in a zipped directory and uploaded to Moodle. The directory should contain all necessary Java files. You and your partner only need to submit one directory. Please make sure that you put both of your names in the Javadoc comments at the top of your MovieLensAnalyzer.java file so I know who you worked with. Your code will be graded on the following:
|