Step 1.: Please enter the data set of user-defined features in two separated files (C4.5 Format)
- The first file (definition file) contains the class labels and features names with possible feature values. Each line must end with a dot (.).
The first line contains the class labels always. For example, if you have two classes A and B the first line is:
The following lines specify features with possible value ranges. For example, if you have the feature quality with two values good or bad, then the second line should be:A,B.
You can specify continuous variables, since they will be converted to discrete intervals by discretization. An example for a continuous variable isquality: good, bad.
score: continuous. - The second file (data file) contains the samples (i.e. feature vectors). Each line contains the feature values for all features for a single sample.
The feature values are comma-separated and have to be in the same order as defined in the definition file.
An exception is the class label which is the last value. Each line end with a fullstop (.).
An example line is:
which means that for this sample the first feature, quality, has the value good, the second feature, score, has the value 2.3 and this sample belongs to class A.good, 2.3, A.
An example for the definition and data file can be found here: definition.txt and data.txt.