Learning From Data - Homework 6 - A solution in LIONoso

Courtesy of Giovanni Pellegrini

We provide our solution to exercises 2,3,4 and 5, all about overfitting. In exercise 2 we study the in- and out-of-sample error of linear regression, in the other exercises we see how weight decay affects the error, choosing different values of λ.
Before proceeding, be sure to have Python and numpy installed on your computer.

Connecting the "Overfitting" Python script to LIONoso

You can download the "overfitting algorithm" script, containing our solution.
You can also download training and testing files here:
in.dta
out.dta
Please see the notes for Windows users if you use this operating system.

You can load the script by dragging a Parametric table into the workbench, and by specifying the filename of your script.

In the above figure we just loaded the script (Exercise2-Overfitting.py).
In the left panel you can specify the training and testing files, and the value of λ.
You have to specify the complete file path otherwise the script will not work.

By clicking the "Compute" button, the script is launched and a table containing the results of each experiment is produced.

To solve exercise 5, run the script for all requested values (from -2 to 2) and identify the one corresponding to the lowest error.

Results

Solutions for each exercise are provided below:

Exercise 2:
in-sample error : 0.028
out-of-sample error : 0.084

Exercise 3 (k = -3):
in-sample error with weight decay: 0.028
out-of-sample error with weight decay: 0.08

Exercise 4 (k = 3):
in-sample error with weight decay: 0.37
out-of-sample error with weight decay: 0.43

Exercise 5:
k: -1

Notes for Windows users

While on most UNIX-based systems (such as Linux and Mac OS X) it is possible to declare the script interpreter in the top line of the script, Windows bases the choice of the interpreter on the filename extension. There can be two types of problems:

  1. The interpreter is installed, but it did not register the file extension (as it happens, e.g., with R)
  2. A specialized application “stole” the file extension and is executed in place of the interpreter (as it happens, e.g., with Canopy, which appropriates the .py extension of Python)
In these cases, it is possible to execute the script from within LIONoso by providing a “wrapper shell script”. In the Python case, use a text editor (e.g., Notepad) to create the file Exercise2-Overfitting-b.bat containing the following text:
        @echo off
        C:\Python27\python.exe Exercise2-Overfitting.py %*
where C:\Python27\python.exe must be replaced by the path of the python.exe executable in your system. Next, import this file in the Parametric table.