Exercise #7: Linear regression, the kNN algorithm
You will need the data sets abalone.arff, breast-cancer.arff and cpu.arff for this assignment.
The task of this assignment is to try out the WEKA implementations of the algorithms Linear Regression and k Nearest Neighbors.
Step #1 (linear regression):
Open the abalone.arff file in WEKA (no preprocessing is needed) and proceed to the "Classify" tab.
Choose the LinearRegression classifier under "functions" and set the "attributeSelectionMethod" parameter to "No attribute selection".
Run the algorithm with the default test option (10-fold cross-validation).
Observe the classification model in the form of a linear function. Notice also the correlation coefficient.
Save the "Classifier output" in the form of a text file (name it "LinearRegression-abalone.txt").
Step #2 (k-NN nominal):
Now, open the breast-cancer.arff file in WEKA (no preprocessing is needed) and proceed to the "Classify" tab.
Choose the IBk classifier under "lazy" and run the algorithm with the default test option (10-fold cross-validation) on the loaded data.
Make note of the classification accuracy.
Change now the "KNN" parameter of the algorithm from the default value of 1 to the new value of 3.
Re-run the algorithm on the data and make note of the classification accuracy again.
Once again, repeat the procedure with KNN = 5 and make note of the classification accuracy.
Save the "Classifier output" of the "best run" in the form of a text file (name it "kNN-bc.txt").
Step #3 (k-NN numeric):
Now, open the cpu.arff file in WEKA (no preprocessing is needed) and proceed to the "Classify" tab.
Repeat the same procedure as in Step #2 (using the IBk classifier with KNN = 1, 3 and 5).
Save the "Classifier output" of the "best run" in the form of a text file (name it "kNN-cpu.txt").
Notice that for numeric data sets there is no classification accuracy.
You should use the correlation coefficient instead for comparing the performance between different runs of the algorithm.
Pack all 3 saved files in a single ZIP file, name it "LinearRegression+kNN-<SurnameName>.zip"
(example: LinearRegression+kNN-KavsekBranko.zip) and submit it here!
- 31. maj 2022, 20:24
- 31. maj 2022, 20:24
- 31. maj 2022, 20:24