Usage

RECIPE

REsilient ClassifIcation Pipeline Evolution

Running

After the installation process is complete you can run the algorithm and generate the best pipeline based on the input dataset. To run, execute the following command from the root folder of the source code

1	python2 exec.py -dTr DATATRAIN -dTe DATATEST

In the project you downloaded there is a folder named datasets. An example of how to use de algorithm for the dataset iris is:

1	python2 exec.py -dTr ./datasets/iris/iris-Training0.csv -dTe ./datasets/iris/iris-Test0.csv

Note

The input data must be in .csv form regardless the extension of the file.

RECIPE offers another arguments that can be set by the user.

Argument	Parameter	Valid Values	Effect
-s or --seed	SEED	Positive Integer	Set the seed of the algorithm for reproducibility. Default: 1
-c or --config	CONFIG	String	A string referring to a configuration file that defines the parameters of the GP. Default: 'config/gecco2015-cfggp.ini'
-dTr	DATATRAIN	String	A string referring to a file containing the data used to train the pipeline methods
-dTe	DATATEST	String	A string referring to a file containing the data used to test the pipeline methods
-nc	NUMBER OF CORES	Positive Integer	Number of cores to be used on the algorithm execution. Default: 1
-ft	FULL TIMEOUT	Positive Integer	Full time (budget) to execute the whole evolutionary process (in seconds). Default: 0 (i.e., it will look only at the generation count)
-t	TIMEOUT	Positive Integer	Time (budget) to execute each individual (i..e, a pipeline) of the GP on evaluation (in seconds). Default: 300
-mr	MUTATION RATE	Positive Float	It defines the mutation rate for the evolutionary algorithm (max=1.0). Default: 0.1
-cr	CROSSOVER RATE	Positive Float	It defines the crossover rate for the evolutionary algorithm (max=1.0). Default: 0.9
-ps	POPULATION SIZE	Positive Integer	It defines the size for the initial population for the evolutionary algorithm. Default: 30
-gc	GENERATION COUNT	Positive Integer	It defines the maximum number of generations for the evolutionary algorithm. Default: 100
-gr	GRAMMAR	String	It defines the grammar to be used by RECIPE during its evolutionary process. Default: 'bnf/new_ml.bnf'
-en	EXPORT_NAME	String	A string with a file name to export pipeline. Default: 'pipeline.py'
-v	VERBOSITY	Positive Integer	Verbosity level of the output: (3-Full, 2-Intermediate ,1-Basic)

Configuring GP

The program comes with a configuration file (folder config) that can be used to set the best parameters to execute the GP. This file defines the mutation and crossover ratio values, population size, number of generations and elitism.

Results

The program generates 3 files:

Evolution-Training: Data regarding the evolution of individuals using the training data. It is found in the directory 'evolution', containing the following measures separated by commas (i.e., ","): the current generation, the fitness (i.e., f1-weighted) achieved by the worst individual in the population, the average fitness (i.e., f1-weighted) of the population, the fitness (i.e., f1-weighted) achieved by the best individual in the population. Example of the evolutionary file.
Tracking all individuals: A file map containing each evaluated individual and its fitness (i.e., evaluated measure). A pipe ("|") separates the individual from its fitness.Example of the tracking file. It is found in the directory 'fit_map'.
Results : Final file containing the best individual found and the values of the metrics on the test set. It is found in the directory 'results', containing the following measures separated by commas (i.e., ","): accuracy on the learning set, precision on the learning set, recall on the learning set, f1 on the learning set, accuracy on the validation set, precision on the validation set, recall on the validation set,f1 on the validation set, accuracy on the training set, precision on the training set, recall on the training set, f1 on the training set, accuracy on the test set, precision on the test set, recall on the test set, f1 on the test set, the used seed, and the string representing the best pipeline. Example of the result file. The learning and validation sets come from the training set and all metrics are weighted.