# Learning From Data - Homework 6

## Using LIONoso orchestration

Here we provide a complete **workflow setup** for exercises 2, 3, 4, and 5 by using the LIONoso polynomial fit factory.

Before proceeding, be sure to have Python installed on your computer, and the latest version of LIONoso (you'll need at least version 2.1.45).

We want to analyze the performance of a linear classifier applied to a non-linear transform of two-dimensional input samples
(*x*_{1}, *x*_{2}):

Φ(*x*_{1}, *x*_{2}) = (1, *x*_{1}, *x*_{2},
*x*_{1}^{2}, *x*_{2}^{2},
*x*_{1}*x*_{2}, |*x*_{1}–*x*_{2}|,
|*x*_{1}+*x*_{2}|).

We are given two data files. The first, in.dta, contains 35 training triplets
(two inputs and the +/-1 output class). The second, out.dta, contains 250 test triplets.

The files are not encoded in CSV format: each line contains a triplet with blanks to separate and justify numbers,
as in the following excerpt:

-7.7947021e-01 8.3822138e-01 1.0000000e+00 1.5563491e-01 8.9537743e-01 1.0000000e+00 -5.9907703e-02 -7.1777995e-01 1.0000000e+00 2.0759636e-01 7.5893338e-01 1.0000000e+00

Our goal is to assess the in-sample and out-of-sample success rates of the classifier with different regularization weights
λ = 0, 10^{-3}, 10^{-2}, 10^{-1}, 1, 10, 10^{2}, 10^{3}.

### Importing the data files in LIONoso

While LIONoso cannot immediately recognize the data format of the sample files, we can load them in the application by using the
**Big data or unparsed file** tool:

In order for LIONoso to work on these datasets, we need to make them readable by converting them to CSV format. We can do this with
the Python script dta2csv.py that basically reads all the lines in the data file and rewrites them in CSV
format.

Import the file in LIONoso twice (one instance for each file to be converted) using the
"Orchestration/Table manipulation/Shell executable" tool, connect them to the input files and press "Run". The resulting CSV
tables will be automatically loaded. Remember that you can change their names if the ones chosen by LIONoso look too complex.

The tables contain three numeric columns each, called `x1`, `x2`, and `x3`. You can check them
by double-clicking their symbol.

### Computing the non-linear function Φ

In order to apply the required non-linear transformation, we introduce a Javascript function node. Drag the
**Orchestration/Model/Javascript** tool onto the workbench; fill in the following code:

phi1 = x1 * x1; phi2 = x2 * x2; phi3 = x1 * x2; phi4 = Math.abs (x1 - x2); phi5 = Math.abs (x1 + x2); y = x3;

Note that you don't need to restate the linear components, because you will obtain them in the output table anyway.
Also, the constant component “1” does not appear because it will be automatically added by the linear
classifier.

The last line is used to preserve the output column, which would otherwise be removed from the output table because
it would be unused.

Press "Check code and proceed", then reorder the input and output variable if you wish. Press "Complete function" when you are done:
the Phi transform is ready to be applied to your input tables. You can also rename the node “Phi”
for clarity:

Connect the two CSV tables to the Javascript node and two new tables will appear, each containing the original values plus the
outputs `phi1`,...,`phi5`,`y` of the function:

You can double click the new tables to inspect their content.

### Applying the linear classifier

Drag the **Models/Polynomials/Polynomial fit factory** onto the workbench and connect it to the transformed
`in-Phi` table. Select the following input columns: `x1`, `x2`, `phi1`, `phi2`, `phi3`,
`phi4`, `phi5`. Select either `y` or `x3` as output column. Finally, uncheck
both normalization checkboxes (if you leave them checked, results will be slightly different) and press
“Start training”:

To apply the trained classifier to the training data in order to obtain the in-sample success rate,
connect the “in-Phi” table to the classifier. You'll obtain a new table; by pressing
the “Perform error analysis” button in the property pane, you'll obtain
various data about the classifier: we are interested to the **Success rate wrt target center** line:

Likewise, obtain the out-of-sample success rate by connecting the “in-Phi” table to the classifier and performing the error analysis:

### Retraining the classifier with a nonzero regularization factor

We are interested in checking the performance of a linear classifier with different values of
a regularization factor λ (also known as

To retrain the classifier, just click the “PolyFit1” icon, modify the “Regularization factor”
value, and press “Start training”.

After a short training period, you can click on the in-sample and out-of-sample output tables and look at
the error analysis area.

If you check the in-sample and out-of-sample error analysis figures after retraining for different values of the regularization factor, you should obtain the following results:

Regularization factor | In-sample success rate | Out-of-sample success rate |
---|---|---|

0 | 0.971429 | 0.916 |

10^{-3} | 0.971429 | 0.92 |

10^{-2} | 0.971429 | 0.916 |

10^{-1} | 0.942857 | 0.94 |

1 | 1 | 0.908 |

10 | 0.942857 | 0.876 |

10^{2} | 0.771429 | 0.772 |

10^{3} | 0.628571 | 0.564 |