Exercise #5: Using the PRISM algorithm to learn decision rules
You will need the Iris data set for this assignment
(you may find it in the data folder of the WEKA installation).
The task of this assignment is to model the Iris data by using the PRISM decision rules algorithm
(the algorithm is not included in the default WEKA installation -- you will have to install it using the "Package manager", it's the "simpleEducationalLearningSchemes" package).
You shall then compare the results produced by the PRISM algorithm with the results produced by other well known machine learning algorithms -- J4.8 and OneR.
In this assignment you shall also learn how to use the FilteredClassifier in WEKA.
Step #1:
Open the iris.arff file in WEKA (no preprocessing is needed) and proceed to the "Classify" tab.
Select the PRISM classifier (Choose → rules → Prism).
Leave the "Test options" on the default value (10-fold cross-validation).
Why is Prism "grayed out" and we cannot click on the "Start" button?
Step #2:
The problem is that PRISM works only with nominal attributes and the Iris data contain 4 numeric attributes.
We will need to discretize the data, but this time we shall use the FilteredClassifier to do this.
So, choose the FilteredClassifier in WEKA (Choose → meta → FilteredClassifier).
Open the "Parameters window" and change the "classifier" to Prism (and leave the "filter" on Discretize).
Start the algorithm. WEKA should have generated a classifier with 16 rules and a classification accuracy of 86%.
Save this "textual" output of WEKA in the file named "PRISM.txt"
(you can do this using classic copy-paste or by right-clicking on the result in the result list and "Save result buffer").
Step #3:
Repeat Step #2, but this time set the "classifier" parameter in the FilteredClassifier back to the default J48.
Start the algorithm. WEKA should have generated a classifier with just 3 rules and a classification accuracy of 93.33%.
Save the "textual" output of WEKA in the file named "J48filtered.txt".
Step #4:
The result that WEKA generates in Step #3 is a one-level decision tree, much like something the algorithm OneR could have generated.
So, let's try to start the OneR algorithm on this data. WEKA should have generated a classifier with 3 rules and a classification accuracy of 92%.
The model is identical to the one produced in Step #3 -- the difference in classification accuracy is due to the cross-validation process.
Save the "textual" output of WEKA in the file named "OneR.txt".
Step #5:
What if we run the J48 algorithm "without the FilteredClassifier"?
So, let's start the J48 algorithm on the data. WEKA should have generated a classifier with 5 rules and a classification accuracy of 96%.
Save the "textual" output of WEKA in the file named "J48.txt".
Pack all 4 TXT files in a single ZIP file, name it "PRISM-<SurnameName>.zip"
(example: PRISM-KavsekBranko.zip) and submit it here!