Meta-Knowledge as an engine in Classifier Combination

74 Meta-Knowledge as an engine in Classifier Combination Fredrick Edward Kitoogo 10 Department of Computer Science Faculty of Computing and IT, Makerere University Venansius Baryamureeba Department of Computer Science Faculty of Computing and IT, Makerere University The use of classifier combination has taken center stage in machine learning research, where outputs of different classifiers are combined in various ways to achieve a perceived better performance than that of any of the base classifiers involved in the combination. Many a research has however not empirically justified the use of the participating classifiers in a combination. This work looks at the usage of meta-knowledge that expresses the performance of each learning method on diverse domains to choose the most suitable learning algorithms suited for a combination for particular domains. The meta-knowledge is considered in this work is limited to the characteristics of the data involved. The approach works by having a learning algorithm that learns and describes how the data characteristics and the combined learning algorithms relate. Given a new learning domain, the domain characteristics are measured, and from the induced relationship, a selection of the most suitable base algorithms for combination will be done. The results of this work provide a fundamental step in achieving a cohesive framework for classifier combination. Categories and Subject Descriptors: I.5.2 [Pattern Recognition]: Design Methodology Classifier design and evaluation; I.5.3 [Pattern Recognition]: Clustering Algorithms; I.2.7 [Artificial Intelligence]: Natural Language Processing language parsing and understanding General Terms: Computer Science, Language Processing Additional Key Words and Phrases: classifier combination, clustering, machine learning, meta-knowledge IJCIR Reference Format: Fredrick Edward Kitoogo and Venansius Baryamureeba, Meta-Knowledge as an engine in Classifier Combination. International Journal of Computing and ICT Research, Vol. 1, No. 2, pp. 74-86. http:www.ijcir.org/volume1- number2/article 8.pdf. 1. INTRODUCTION Classification is the process of grouping of information into predetermined classes and categories of the same type. A classifier which is computer based agent is the one used in performing classifications Classifiers have to broad categories: rule-based classifiers and computational intelligence based classifiers. Whereas rule-based classifier are in general constructed by the designer, where the designer defines the rules for the interpretation of detected inputs, 10 Author s address: Fredrick Edward Kitoogo, Department of Computer Science, Faculty of Computing and IT, Makerere University, P.O. Box 7062, Kampala, Uganda, fkitoogo@cit.mak.ac.ug, www.cit.ac.ug Venansius Baryamureeba, Department of Computer Science, Faculty of Computing and IT, Makerere University, P.O. Box 7062, Kampala, Uganda, barya@cit.mak.ac.ug, www.cit.ac.ug Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than IJCIR must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. International Journal of Computing and ICT Research 2007. International Journal of Computing and ICT Research, ISSN 1818-1139 (Print), ISSN 1996-1065 (Online), Vol.1, No.2, pp. 74-86. December 2007.

75 in the computational intelligence based classifiers the designers only a based framework for the interpretation of data and the learning of training algorithms within such system are responsible for the generation of rules [Ranawana and Palade 2006] Many computing arenas such as text categorization, Named Entity Recognition, Data mining, word disambiguation, etc employ computation intelligence classification in one way or another. There are various machine learning algorithms that can be used for classification; a list of the most commonly used algorithms with their brief descriptions can be obtained from the Wikipedia website [2005]. The trend that has emerged in classification is classifier combination; where different classifiers are integrated in various ways to create a system that out performs the best individual classifiers [Kozareva et al. 2005].The motivation of classifiers is that the combinations are often much more accurate than the individual classifiers that make them up [Dzeroski and Zenko 2004; Bennett et al. 2005] Much of the research on the classifier combinations has achieved some good performance, however nothing much has been explained about the choices of the participant sin a combination for a particular dataset or domain. Some research on cataloging metadata for machine learning research and applications has been at tempted by Cunningham [1997] but is limited to single algorithms; there is a desire to have such research extended to classifier combination. This work which tackles the case of discovering meta-knowledge for classifier combination is motivated by the one free launch theorem [Wolpert and Macready 1997] which implies that one algorithm A is better than another B in a certain problem space, then also algorithm B can be better than algorithm A over another problem space ;this theorem backs up the intuition that the are is some meta-knowledge about specific problem spaces that could assist in improving the performance when classifiers are combined. The work of Kitoogo and Barya [2007] in which a question of determining the choice of base classifiers to participate in a combination was left open is also another motivator for this work. It is desirable to know before and which machine learning algorithms (algorithm type) combine best for a particular task. Beside the fact that different types of tasks ask for different types of solutions, it is difficult to determine before and which type of combination solution best suits a particular task. These observations point to the need of an empirical approach. In this work we take an empirical approach and perform a range of experiments to investigate this question. The work will handle the task of associating algorithm type combinations with. Data set characteristics by firstly categorizing the learning algorithms combinations, building a list of datasets, their characteristics, and the combination performance of the different sets, then 1. Applying the different learning algorithm combinations on the different categorized data sets to generate a Meta data set from the outputs and 2. Applying unsupervised clustering analysis to the mea data set to analyze the generated clusters and determine whether any patterns exist. The remainder of the paper is structured as follows: in Section 2, related work is briefly reviewed, in section 3, we outline the proposed method. We describe the supervised learning and unsupervised learning procedures of the method; in section 4 the results of the conducted experiments are presented. Finally, Section 5 closes with a conclusion and an outlook for future work. 2. RELATED WORK Various researchers have investigated the use of multiple classifiers together with different combination methods for classification problems. Some research on classifier combination is done by generating classifier combination using a single learning algorithm often referred to as homogeneous [Dzeroski and Zenko 2004]. This is normally done by manipulating the training set (as done in boosting of bagging), manipulating the input features, manipulating the output targets or injection randomness in the learning algorithms. The generated classifiers are the typically combined by majority or weighed voting. Other approaches like the heterogeneous one which apply different learning algorithms to a single dataset have been proposed by Merz [1999]. In the work of techniques of combining multiple classifiers, Alpaydin [1998], employed the strategy of varying the input representation for the different base classifiers so as to build complimentarily between the base classifiers, experiments that were conducted on some real-world data sets indicated that improved accuracy can be achieved without necessarily increasing overall computer resource usage. In their work of data-driven part-of-speech tagging, De Pauw et al. [2006] experimented with improvement of performance of for individual taggers on Kiswahili language by combining them into a committee of taggers using among others different voting schemes; plurality voting, majority voting and weighted voting. There experiments revealed that plurality voting outperformed the more elaborate scheme of weighted voting.

76 Seldom have researchers examined the use of meta-knowledge to determine which classifier combination sets perform best for particular domains or problem settings. Cornelson et al. [2002] conducted some experiments on combining families of information retrieval algorithms using mete-learning. The different algorithms were obtained by varying the normalization and similarity functions. Bennett et. al. [2005] introduce a probabilistic method for combining classifiers that considers the context- sensitive reliabilities of contributing classifiers. Their method harnesses variables that provide signal about the performance of classifiers in different situations, which they refer to as reliability indicators. Some work on finding associations between classification algorithms and characteristics of data set was done by Ibrahim [1999], the work was however limited to single individual classification algorithms. Okamoto and Yugami [2003] also investigated the effect of domain characteristics on instance based learning algorithms; their work specifically targeted the K-Nearest Neighbors classification Fig. 1. The model Accuracy as a function of the domain characteristic, again this work considered only a certain class of algorithms. Ranawana and Palada [2006] conducted an extensive review of different methods and approaches of multi classifier systems and summarized pointing out that the success of a multi classifier depends extensible on three key main features: proper selections of the participating classifiers, topology, and combinatorial methodology. The proper selection of the participating classifiers strengthens the desire for investigation into meta-knowledge. 3. THE MODEL APPROACH The model as shown in figure 1 aims t finding association between combined classification algorithms and the characteristics of different data sets in three step process: (1) Build a file of data set names, their characteristics, a file of names of algorithms indicating the algorithms types, then the different possible algorithm combination types. (1) Subject the different data sets to the range of classifier combination sets and measure the performance on each of the data sets, ultimately build a Meta data set out of the different algorithm combination types, different data characteristics and the related performance rates.

77 (2) Applying an unsupervised clustering algorithm to the file that has been built in the step 2, perform clustering analysis to determine any significant patterns. 3.1 Data Sets and Characteristics We use a set of benchmark data sets (as shown in Table I) from the UCI Machine Learning Repository [2006] that are used frequently in the field of machine learning research. 3.2 Supervising learning algorithms Supervised techniques involve the participation of an expert, who is responsible presents the algorithm with a set of inputs and informs the algorithm of the category or class to which it should be assigned to; in this work, the data sets are already in the desired format for training. Supervised learning algorithms can ideally be grouped into two broad categories: lazy learners; Eager learners put most of their time and effort in the learning phase and lazy learners divert their effort to the classification phase [Aha 1997]. Table I above shows the different base algorithms that were used in the classifier combination strategy. 3.3 Classifier Combination Table I above shows the different classifier combination sets of three that were generated out of the five base learning algorithms. Each of the different combinations is used to generate a classifier over the different data sets using Plurality Voting which was mainly motivated by the work of De Pauw et al. [2006] and that of Kitoogo and Barya [2007] in

78 which it is shown that though plurality voting is a simple combination methodology, it has a considerably high accuracy performance. 3.4 Clustering Data Generation Once the data sets, learning algorithms, and combination methodology to use for the experiments have been decided, then the clustering data generation is conducted. This is done by running the different algorithms together with the combination algorithm on all the data sets. Default settings are used for all the algorithms. 10 - Fold cross validation is used for testing the different combination sets. 3.5 Clustering Analysis After the data for clustering has been generated, the data grouped into three groups (i) High Accuracy (ii) Moderate Accuracy and, (iii) Low Accuracy. For the high accuracy data group an unsupervised learning algorithm (K-Means Clustering) is used to group the data into clusters of records (depending on the useful patterns). 4. EXPERIMENTS AND RESULTS The running of various plurality voting classifiers of three base learning algorithms generated out of combinations from five algorithms on the data sets resulted into a data set that was spit into three groups, bad accuracy performance, medium accuracy performance, and good accuracy performance shown in Tables IV, V, VII and VIII; this is done by dividing the range between the lowest accuracy figure and the highest accuracy figure into three groups. Since the main goal is to attach good accuracy performance to the characteristics of specific data set types, the good accuracy data is the one on which cluster analysis is performed in the experiments. The results from the bad data accuracy data indicate that all algorithm combinations performed badly with a high number of classes, this is further exposed even in the medium and good accuracy data sets, generally as the number of classes decreased the accuracy performance improved. 4.0.1 Results from the k-means clustering:. Preliminary experimental runs of the k-means clustering algorithm intimated between 3 and 6 clusters, so the number of clusters was set to 4. The first run of clustering clearly indicates that the number of instances was the only prominent variable used in determining the clusters as shown in round 1 clustering Variable Name Table II. Summary statistics of the clustering data First Run Obs Mean Min Max Std Dev Signf 0.96 0.86 Accuracy 112 0.748 0.072 0 1 0 Instances 112 106 2201 799 616.155 1 Attribute 112 3 57 18 14.366 0 s Classes 112 2 4 2 0.738 0 Second Run Variable Obs Mean Min Max Std Signf Name Dev 0.96 0.86 Accuracy 112 0.748 0.072 0 1 0 Attribute 17.9 14.36 112 3 57 1 s 11 6 2.32 Classes 112 2 4 0.738 0 1

79 Cluster accuracy Type accuracy Central Attribs 1 (Bayes C45Tree KNN) 0.928 16 2 2 (Bayes C45Tree Rule) 0.858 6 4 3 (Bayes KNN Rule) 0.762 57 2 4 (Bayes C45Tree KNN) 0.923 32 2 Central Classes - Summary statistics shown in Table II. The significance of the other variables can be exposed by excluding the prominent variable (number of instances) from the subsequent clustering experimental runs. The next level runs expose the significance of the other variables which were overshadowed by the number of instances in the first run. The second run of clustering reveals that the number of attributes is the greatest determinant of the clusters as shown in the Tables IX, X, XI and XII. Analysis shows that Bayes combines well with KNN for a moderately low number of classes; combination with the Rule based algorithms brings down the accuracy performance whether the number of attributes is low or high, and combination with decision trees generally has no big influence on a combinations accuracy performance. As seen in Tables IX, X, XI and XII, most of the observations fall into cluster 1 and 2 dominated by the Bayes/KNN combinations, the highest accuracies and central accuracies also reside in these clusters. It can also be observed that the Bayes/KNN combination is brought down when the number of attributes increases and also when combined with the rule based algorithm It can also be clearly noted that a combination that has C45Tree/Tree does not harbor any significant change in the accuracy performance. 5. CONCLUSION AND FUTURE WORK The first discovery from the division of the generated clustering data into groups was that irrespective of the combination set, the worst accuracy performance was exhibited in data sets which had a large number of classes for prediction. This is an implication that the higher the number of classes for prediction in a data set, the more difficult it is to learn from that data set. Because the number of instances was the only prominent variable before its exclusion from the clustering experiments where the other variables emerged as significant, then number of instances was not useful in determining the eventual clusters, subsequently rendering it not very relevant meta-knowledge for classifier combination. The main finding is that the number of attributes in a data set comes up as the most significant determinant of performance for various classifier combinations. From the results it is insinuated that the combination of Bayesian and K- Nearest Neighbor algorithms yields substantially good performance, however, the performance lowers with the number of classes. The results further expose that combination that included the Rule based algorithms performed averagely bad. Another finding was that combinations with the two decision tree methods did not significantly change the resultant performance of a combination set. For future work evaluation of the finding from clustering analysis will be conducted and compared with other combination sets without a priori clustering information. In search for more pragmatic results, experiments will in future be conducted using domain specific real world data sets and the range of the data characteristics will be extended in order to discover more meaningful patterns.

80 References AHA, D. 1997. Special Issue on Lazy learning. Artificial Intelligence Review 11, 1-5. ALPAYDIN, E. 1998. Techniques for combining multiple learners. In Proceedings of Engineering Intelligent Systems 2, 6-12. BENNETT, P., DUMAIS, S., AND HORVITZ, E. 2005. The combination of text classifiers using reliability indicators. Information Retrieval 8, 67-100. CORNELSON, M., GROSSMAN, R. L., KARIDI, G. R., AND SHNIDMAN, D. 2002.Combining Families of Information Retrieval Algorithms using Meta-Learning.survey of Text Mining: Clustering, Classification, and Retrieval 159-169. CUNNINGHAM, S. J. 1997. Dataset cataloging metadata for machine learning Applications and research. In Proceedings of the Sixth International Workshop on AI and Statistics. DE PAUW, G., DE SCHRYVER, G. M. & WAGACHA, P. W. 2006. Data-Driven Part-of-Speech Tagging of Kiswahili. Book Series: Lecture Notes in Computer Science, Book: Text, Speech and Dialogue, Springer Berlin/ Heidelberg 4188, 197-204. DZEROSKI, S. AND ZENKO, B. 2004. Is Combining Classifiers with Stacking Better than Selecting the Best One? Machine Learning 54, 255-273 IBRAHIM, R. S. 1999. Data Mining of Machine Learning Performance Data, Master s Thesis, RMIT University. KITOOGO,F.E., BARAYAMUREEBA, V. 2007. On classifier combination for better named entity. In Proceedings of the First International Computer Science and ICT Conference (COSCIT 2007), Nairobi, Kenya, Feb 2007. KOZAREVA, Z., FERRANDEZ, O., MONTOYO, A., MUNOZ, R., AND SUAREZ, A. 2005. Combining datadriven systems for improving Named Entity Recognition. In Proceedings of NLDB 3513, 80-90 MERZ, C. J. 1999. Using correspondence analysis to combine classifiers. Machine Learning 36, 33-58. OKAMOTO, S. AND YUGAMI. N. 2003. Effects of domain characteristics on instance-based learning algorithms. Theoretical Computer Science 298(1), 207-233. RANAWANA, R. AND PALADE, V. 2006. Multi-Classifier Systems: Review and a roadmap for developers. International Journal of Hybrid Intelligent Systems 3(1), 35-61. THE MACHINE LEARNING REPOSITORY. 2006. http://www.cs.uci.edu/mlearn/ ML Repository. Html accessed 04/05/2006. WIKIPEDIA. 2007. The Free Encyclopedia, http://en.wikipedia.org/wiki/category: Classification algorithms, accessed 29/04/2007. WOLPERT, D.H. AND MACREADY, W.G. 1997. No Free Lunch Theorems for optimization, IEEE Transactions on Evolutionary Computation 1, 67. APPENDIX I Table IV. Splitted generated Data for Clustering Bad accuracy performance Acc # Inst # Attr # Class Bayes C45Tree KNN 0.472 339 17 21 Bayes C45Tree Rule 0.354 339 17 21 Bayes C45Tree Tree 0.484 339 17 21 Bayes KNN Rule 0.354 339 17 21 Bayes KNN Tree 0.448 339 17 21 Bayes Rule Tree 0.354 339 17 21 C45Tree KNN Rule 0.351 339 17 21

81 C45Tree KNN Tree 0.436 339 17 21 C45Tree KNN Tree 0.551 368 20 3 C45Tree Rule Tree 0.351 339 17 21 KNN Rule Tree 0.351 339 17 21 Table V. Splitted generated data for clustering Good accuracy performance Acc # Inst # Attr # Class Bayes C45Tree KNN 0.771 148 18 4 Bayes C45Tree KNN 0.781 296 13 2 Bayes C45Tree KNN 0.825 2201 3 2 Bayes C45Tree KNN 0.834 977 14 2 Bayes C45Tree KNN 0.862 690 15 2 Bayes C45Tree KNN 0.876 106 57 2 Bayes C45Tree KNN 0.891 958 9 2 Bayes C45Tree KNN 0.923 351 32 2 Bayes C45Tree KNN 0.928 1728 6 4 Bayes C45Tree KNN 0.928 435 16 2 Bayes C45Tree KNN 0.944 569 20 2 Bayes C45Tree KNN 0.961 699 9 2 Bayes C45Tree Rule 0.748 148 18 4 Bayes C45Tree Rule 0.752 106 57 2 Bayes C45Tree Rule 0.761 690 15 2 Bayes C45Tree Rule 0.784 977 14 2 Bayes C45Tree Rule 0.789 2201 3 2 Bayes C45Tree Rule 0.858 1728 6 4 Bayes C45Tree Rule 0.885 958 9 2 Bayes C45Tree Rule 0.912 351 32 2 Bayes C45Tree Rule 0.936 435 16 2 Bayes C45Tree Rule 0.937 569 20 2 Bayes C45Tree Rule 0.941 699 9 2 Bayes C45Tree Tree 0.777 296 13 2 Bayes C45Tree Tree 0.783 2201 3 2 Bayes C45Tree Tree 0.827 977 14 2 Bayes C45Tree Tree 0.858 106 57 2 Bayes C45Tree Tree 0.862 690 15 2 Bayes C45Tree Tree 0.866 958 9 2 Bayes C45Tree Tree 0.926 351 32 2 Bayes C45Tree Tree 0.933 1728 6 4 Bayes C45Tree Tree 0.94 435 16 2 Bayes C45Tree Tree 0.944 569 20 2 Bayes C45Tree Tree 0.954 699 9 2 Table VI. Split generated data for clustering Medium accuracy performance Acc # Inst # Attrib # Classes Bayes C45Tree KNN 0.663 368 20 3 Bayes C45Tree KNN 0.687 345 6 2 Bayes C45Tree KNN 0.724 286 9 2 Bayes C45Tree KNN 0.737 214 9 6

82 Bayes C45Tree Rule 0.566 214 9 6 Bayes C45Tree Rule 0.59 368 20 3 Bayes C45Tree Rule 0.623 345 6 2 Bayes C45Tree Rule 0.644 286 9 2 Bayes C45Tree Rule 0.659 296 13 2 Bayes C45Tree Tree 0.668 286 9 2 Bayes C45Tree Tree 0.674 368 20 3 Bayes C45Tree Tree 0.687 345 6 2 Bayes C45Tree Tree 0.714 214 9 6 Bayes C45Tree Tree 0.723 148 18 4 Bayes KNN Rule 0.575 214 9 6 Bayes KNN Rule 0.59 368 20 3 Bayes KNN Rule 0.623 345 6 2 Bayes KNN Rule 0.644 286 9 2 Bayes KNN Rule 0.659 296 13 2 Bayes KNN Tree 0.655 368 20 3 Bayes KNN Tree 0.675 286 9 2 Bayes KNN Tree 0.676 345 6 2 Bayes KNN Tree 0.706 214 9 6 Bayes Rule Tree 0.575 214 9 6 Bayes Rule Tree 0.59 368 20 3 Bayes Rule Tree 0.623 345 6 2 Bayes Rule Tree 0.644 286 9 2 Bayes Rule Tree 0.645 296 13 2 C45Tree KNN Rule 0.571 214 9 6 C45Tree KNN Rule 0.59 368 20 3 C45Tree KNN Rule 0.623 345 6 2 C45Tree KNN Rule 0.644 286 9 2 C45Tree KNN Rule 0.669 296 13 2 C45Tree KNN Tree 0.678 286 9 2 C45Tree KNN Tree 0.693 345 6 2 C45Tree KNN Tree 0.729 148 18 4 C45Tree KNN Tree 0.733 214 9 6 C45Tree Rule Tree 0.571 214 9 6 C45Tree Rule Tree 0.59 368 20 3 C45Tree Rule Tree 0.623 345 6 2 C45Tree Rule Tree 0.644 286 9 2 C45Tree Rule Tree 0.659 296 13 2 KNN Rule Tree 0.58 214 9 6 KNN Rule Tree 0.59 368 20 3 KNN Rule Tree 0.623 345 6 2 KNN Rule Tree 0.644 286 9 2 KNN Rule Tree 0.659 296 13 2 Table VII. Split generated data for clustering Good accuracy -Cont... Acc # Inst # Attrib # Classes Bayes KNN Rule 0.748 148 18 4 Bayes KNN Rule 0.762 106 57 2 Bayes KNN Rule 0.768 690 15 2 Bayes KNN Rule 0.783 977 14 2 Bayes KNN Rule 0.789 2201 3 2 Bayes KNN Rule 0.858 1728 6 4 Bayes KNN Rule 0.875 958 9 2 Bayes KNN Rule 0.912 351 32 2 Bayes KNN Rule 0.938 435 16 2

83 Bayes KNN Rule 0.938 569 20 2 Bayes KNN Rule 0.946 699 9 2 Bayes KNN Tree 0.756 148 18 4 Bayes KNN Tree 0.804 977 14 2 Bayes KNN Tree 0.818 296 13 2 Bayes KNN Tree 0.825 2201 3 2 Bayes KNN Tree 0.858 106 57 2 Bayes KNN Tree 0.867 690 15 2 Bayes KNN Tree 0.887 958 9 2 Bayes KNN Tree 0.933 435 16 2 Bayes KNN Tree 0.936 1728 6 4 Bayes KNN Tree 0.943 351 32 2 Bayes KNN Tree 0.947 569 20 2 Bayes KNN Tree 0.96 699 9 2 Bayes Rule Tree 0.748 148 18 4 Bayes Rule Tree 0.761 690 15 2 Bayes Rule Tree 0.762 106 57 2 Bayes Rule Tree 0.781 977 14 2 Bayes Rule Tree 0.789 2201 3 2 Bayes Rule Tree 0.858 1728 6 4 Bayes Rule Tree 0.871 958 9 2 Bayes Rule Tree 0.914 351 32 2 Bayes Rule Tree 0.935 569 20 2 Bayes Rule Tree 0.936 435 16 2 Bayes Rule Tree 0.941 699 9 2 Table VIII. Split generated data for clustering Good accuracy -Cont... Acc # Inst # Attrib C45Tree KNN Rule 0.752 106 57 2 C45Tree KNN Rule 0.767 690 15 2 C45Tree KNN Rule 0.768 148 18 4 C45Tree KNN Rule 0.786 977 14 2 C45Tree KNN Rule 0.789 2201 3 2 C45Tree KNN Rule 0.858 1728 6 4 C45Tree KNN Rule 0.885 958 9 2 C45Tree KNN Rule 0.909 351 32 2 C45Tree KNN Rule 0.938 435 16 2 C45Tree KNN Rule 0.938 569 20 2 C45Tree KNN Rule 0.943 699 9 2 C45Tree KNN Tree 0.774 296 13 2 C45Tree KNN Tree 0.797 2201 3 2 C45Tree KNN Tree 0.83 977 14 2 C45Tree KNN Tree 0.849 106 57 2 C45Tree KNN Tree 0.87 690 15 2 C45Tree KNN Tree 0.892 958 9 2 C45Tree KNN Tree 0.926 351 32 2 C45Tree KNN Tree 0.933 1728 6 4 C45Tree KNN Tree 0.944 569 20 2 C45Tree KNN Tree 0.954 699 9 2 C45Tree KNN Tree 0.961 435 16 2 C45Tree Rule Tree 0.752 106 57 2 C45Tree Rule Tree 0.754 148 18 4 # Classes

84 C45Tree Rule Tree 0.761 690 15 2 C45Tree Rule Tree 0.782 977 14 2 C45Tree Rule Tree 0.789 2201 3 2 C45Tree Rule Tree 0.858 1728 6 4 C45Tree Rule Tree 0.885 958 9 2 C45Tree Rule Tree 0.912 351 32 2 C45Tree Rule Tree 0.933 435 16 2 C45Tree Rule Tree 0.933 569 20 2 C45Tree Rule Tree 0.934 699 9 2 KNN Rule Tree 0.761 148 18 4 KNN Rule Tree 0.762 106 57 2 KNN Rule Tree 0.767 690 15 2 Table IX. Model clusters-round 2 Cluster 1 Clust Acc # # Attribs Cls 1 Bayes C45Tree KNN 0.771 18 4 1 Bayes C45Tree KNN 0.781 13 2 1 Bayes C45Tree KNN 0.834 14 2 1 Bayes C45Tree KNN 0.862 15 2 1 Bayes C45Tree KNN 0.928 16 2 1 Bayes C45Tree KNN 0.944 20 2 1 Bayes C45Tree Rule 0.748 18 4 1 Bayes C45Tree Rule 0.761 15 2 1 Bayes C45Tree Rule 0.784 14 2 1 Bayes C45Tree Rule 0.936 16 2 1 Bayes C45Tree Rule 0.937 20 2 1 Bayes C45Tree Tree 0.777 13 2 1 Bayes C45Tree Tree 0.827 14 2 1 Bayes C45Tree Tree 0.862 15 2 1 Bayes C45Tree Tree 0.94 16 2 1 Bayes C45Tree Tree 0.944 20 2 1 Bayes KNN Rule 0.748 18 4 1 Bayes KNN Rule 0.768 15 2 1 Bayes KNN Rule 0.783 14 2 1 Bayes KNN Rule 0.938 16 2 1 Bayes KNN Rule 0.938 20 2 1 Bayes KNN Tree 0.756 18 4 1 Bayes KNN Tree 0.804 14 2 1 Bayes KNN Tree 0.818 13 2 1 Bayes KNN Tree 0.867 15 2 1 Bayes KNN Tree 0.933 16 2 1 Bayes KNN Tree 0.947 20 2 1 Bayes Rule Tree 0.748 18 4 1 Bayes Rule Tree 0.761 15 2 1 Bayes Rule Tree 0.781 14 2 1 Bayes Rule Tree 0.935 20 2 1 Bayes Rule Tree 0.936 16 2 1 C45Tree KNN Rule 0.767 15 2 1 C45Tree KNN Rule 0.768 18 4 1 C45Tree KNN Rule 0.786 14 2 1 C45Tree KNN Rule 0.938 16 2 1 C45Tree KNN Rule 0.938 20 2

85 1 C45Tree KNN Tree 0.774 13 2 1 C45Tree KNN Tree 0.83 14 2 1 C45Tree KNN Tree 0.87 15 2 1 C45Tree KNN Tree 0.944 20 2 1 C45Tree KNN Tree 0.961 16 2 1 C45Tree Rule Tree 0.754 18 4 1 C45Tree Rule Tree 0.761 15 2 1 C45Tree Rule Tree 0.782 14 2 1 C45Tree Rule Tree 0.933 16 2 1 C45Tree Rule Tree 0.933 20 2 1 KNN Rule Tree 0.761 18 4 1 KNN Rule Tree 0.767 15 2 1 KNN Rule Tree 0.783 14 2 1 KNN Rule Tree 0.938 16 2 1 KNN Rule Tree 0.938 20 2 Table X. Model clusters-round 2 Cluster 2 Clust Acc # # Attribs Cls 2 Bayes C45Tree KNN 0.825 3 2 2 Bayes C45Tree KNN 0.891 9 2 2 Bayes C45Tree KNN 0.928 6 4 2 Bayes C45Tree KNN 0.961 9 2 2 Bayes C45Tree Rule 0.789 3 2 2 Bayes C45Tree Rule 0.858 6 4 2 Bayes C45Tree Rule 0.885 9 2 2 Bayes C45Tree Rule 0.941 9 2 2 Bayes C45Tree Tree 0.783 3 2 2 Bayes C45Tree Tree 0.866 9 2 2 Bayes C45Tree Tree 0.933 6 4 2 Bayes C45Tree Tree 0.954 9 2 2 Bayes KNN Rule 0.789 3 2 2 Bayes KNN Rule 0.858 6 4 2 Bayes KNN Rule 0.875 9 2 2 Bayes KNN Rule 0.946 9 2 2 Bayes KNN Tree 0.825 3 2 2 Bayes KNN Tree 0.887 9 2 2 Bayes KNN Tree 0.936 6 4 2 Bayes KNN Tree 0.96 9 2 2 Bayes Rule Tree 0.789 3 2 2 Bayes Rule Tree 0.858 6 4 2 Bayes Rule Tree 0.871 9 2 2 Bayes Rule Tree 0.941 9 2 2 C45Tree KNN Rule 0.789 3 2 2 C45Tree KNN Rule 0.858 6 4 2 C45Tree KNN Rule 0.885 9 2 2 C45Tree KNN Rule 0.943 9 2 2 C45Tree KNN Tree 0.797 3 2 2 C45Tree KNN Tree 0.892 9 2 2 C45Tree KNN Tree 0.933 6 4 2 C45Tree KNN Tree 0.954 9 2 2 C45Tree Rule Tree 0.789 3 2 2 C45Tree Rule Tree 0.858 6 4 2 C45Tree Rule Tree 0.885 9 2 2 C45Tree Rule Tree 0.934 9 2 2 KNN Rule Tree 0.789 3 2

86 2 KNN Rule Tree 0.858 6 4 2 KNN Rule Tree 0.875 9 2 2 KNN Rule Tree 0.944 9 2 Table XI. Model clusters-round 2 Cluster 3 Cluster Accuracy # Attributes # Classes 3 Bayes C45Tree KNN 0.876 57 2 3 Bayes C45Tree Rule 0.752 57 2 3 Bayes C45Tree Tree 0.858 57 2 3 Bayes KNN Rule 0.762 57 2 3 Bayes KNN Tree 0.858 57 2 3 Bayes Rule Tree 0.762 57 2 3 C45Tree KNN Rule 0.752 57 2 3 C45Tree KNN Tree 0.849 57 2 3 C45Tree Rule Tree 0.752 57 2 Table XII. Model Clusters-Round 2 Cluster 4 Cluster Accuracy # Attributes # Classes 4 Bayes C45Tree KNN 0.923 32 2 4 Bayes C45Tree Rule 0.912 32 2 4 Bayes C45Tree Tree 0.926 32 2 4 Bayes KNN Rule 0.912 32 2 4 Bayes KNN Tree 0.943 32 2 4 Bayes Rule Tree 0.914 32 2 4 C45Tree KNN Rule 0.909 32 2 4 C45Tree KNN Tree 0.926 32 2 4 C45Tree Rule Tree 0.912 32 2 4 KNN Rule Tree 0.912 32 2