Artificial Intelligence Homework 5
Due date: 4/12 in class


Decision Trees

For this assignment, you will implement a decision tree to classify whether a member of the House of Representatives is a democrat or a republican.

Starter Files:
You will need two files for this assignment:

  • The data file is voting-data.tsv. This file contains the voting record on 10 different issues for 430 members of the House of Republicans. Each row corresponds to one member and has the format:
    Rep-6    D    -++---+++.

    where the first token is a label, the second token is the member's party affiliation (D for democrat and R for republican), and the last token is the voting record. A plus (+) means a vote of "yea", a minus (-) means a vote of "nea", and a period (.) means a no vote. Each token is separated by a tab.

  • The DecisionTree.java file that contains a single private method for reading in the data from file.

Java Classes:
For this assignment, you should add a main method to the DecisionTree.java class that, when executed, performs the following two tasks:

  1. Builds and prints the decision tree that results when there is no test set. You should build the decision tree using information gain to choose an attribute to split on. After the tree is built, it should be printed to the screen using a depth-first traversal where each level is indented (i.e use tabs to indent). For example, a small decision tree might look like:

    where the first node checks issue 5. If the member voted "yea", then check issue 8. If the member voted "nea", then check issue 4. If the member did not vote, then classify them as a democrat.

  2. Next, you'll evaluate the accuracy of this approach. Iterate over the data set, pulling out 43 examples at a time to be a test set. Use the remaining examples to build the tree. Then evaluate the performance of your decision tree on the test set.

    Since the test set contains 43 examples (and there are 430 examples all together), you should do this 10 times. The first iteration, your test set should consist of examples 0 through 42. The second iteration, your test set should consist of examples 43 through 85. So on and so forth.

    Average your accuracy over all 10 test sets and print this as a percentage to the screen. Please do not print the decision trees themselves!

Feel free to create other Java classes as necessary. Think carefully about the time/space complexity of your code. In particular, make sure you're not copying the actual data set again and again and again.


Submission Instructions

Your decision tree code should be submitted via Moodle. To submit, please rename your directory hw5_FirstName_LastName before you zip the directory.