Learning From Data - Homework 6
Using LIONoso orchestration
Here we provide a complete workflow setup for exercises 2, 3, 4, and 5 by using the LIONoso polynomial fit factory.
Before proceeding, be sure to have Python installed on your computer, and the latest version of LIONoso
(you'll need at least version 2.1.45).
We want to analyze the performance of a linear classifier applied to a non-linear transform of two-dimensional input samples
Φ(x1, x2) = (1, x1, x2,
We are given two data files. The first, in.dta, contains 35 training triplets
(two inputs and the +/-1 output class). The second, out.dta, contains 250 test triplets.
The files are not encoded in CSV format: each line contains a triplet with blanks to separate and justify numbers,
as in the following excerpt:
-7.7947021e-01 8.3822138e-01 1.0000000e+00
1.5563491e-01 8.9537743e-01 1.0000000e+00
-5.9907703e-02 -7.1777995e-01 1.0000000e+00
2.0759636e-01 7.5893338e-01 1.0000000e+00
Our goal is to assess the in-sample and out-of-sample success rates of the classifier with different regularization weights
λ = 0, 10-3, 10-2, 10-1, 1, 10, 102, 103.
Importing the data files in LIONoso
While LIONoso cannot immediately recognize the data format of the sample files, we can load them in the application by using the
Big data or unparsed file tool:
In order for LIONoso to work on these datasets, we need to make them readable by converting them to CSV format. We can do this with
the Python script dta2csv.py that basically reads all the lines in the data file and rewrites them in CSV
Import the file in LIONoso twice (one instance for each file to be converted) using the
"Orchestration/Table manipulation/Shell executable" tool, connect them to the input files and press "Run". The resulting CSV
tables will be automatically loaded. Remember that you can change their names if the ones chosen by LIONoso look too complex.
The tables contain three numeric columns each, called x1, x2, and x3. You can check them
by double-clicking their symbol.
Computing the non-linear function Φ
phi1 = x1 * x1;
phi2 = x2 * x2;
phi3 = x1 * x2;
phi4 = Math.abs (x1 - x2);
phi5 = Math.abs (x1 + x2);
y = x3;
Note that you don't need to restate the linear components, because you will obtain them in the output table anyway.
Also, the constant component “1” does not appear because it will be automatically added by the linear
The last line is used to preserve the output column, which would otherwise be removed from the output table because
it would be unused.
Press "Check code and proceed", then reorder the input and output variable if you wish. Press "Complete function" when you are done:
the Phi transform is ready to be applied to your input tables. You can also rename the node “Phi”
outputs phi1,...,phi5,y of the function:
You can double click the new tables to inspect their content.
Applying the linear classifier
Drag the Models/Polynomials/Polynomial fit factory onto the workbench and connect it to the transformed
in-Phi table. Select the following input columns: x1, x2, phi1, phi2, phi3,
phi4, phi5. Select either y or x3 as output column. Finally, uncheck
both normalization checkboxes (if you leave them checked, results will be slightly different) and press
To apply the trained classifier to the training data in order to obtain the in-sample success rate,
connect the “in-Phi” table to the classifier. You'll obtain a new table; by pressing
the “Perform error analysis” button in the property pane, you'll obtain
various data about the classifier: we are interested to the Success rate wrt target center line:
Likewise, obtain the out-of-sample success rate by connecting the “in-Phi” table to the classifier
and performing the error analysis:
Retraining the classifier with a nonzero regularization factor
We are interested in checking the performance of a linear classifier with different values of
a regularization factor λ (also known as weight decay).
To retrain the classifier, just click the “PolyFit1” icon, modify the “Regularization factor”
value, and press “Start training”.
After a short training period, you can click on the in-sample and out-of-sample output tables and look at
the error analysis area.
If you check the in-sample and out-of-sample error analysis figures after retraining for different values of
the regularization factor, you should obtain the following results:
|Regularization factor||In-sample success rate||Out-of-sample success rate|