Preprocess the Ground

Preprocessing is used to analyse and optimize the statistical properties of datasets such that they are easier to push through neural networks.

Preprocess
- Standardize : subtract the mean and divide by the standard deviation;
- GCN : Global Contrast Normalization;
- ZCA : Zero Component Analysis whitening;
- LeCunLCN : Yann LeCun's Local Contrast Normalization;

Preprocess

Abstract class.

An object that can preprocess a View. Preprocessing a View implies changing the data that it actually stores. This can be useful to save memory. If you know you are always going to access only the same processed version of the dataset, it is better to process it once and discard the original.

Preprocesses are capable of modifying many aspects of a View. For example, they can change the way that it converts between different formats of data. They can change the number of examples that a View stores. In other words, preprocesses can do a lot more than just example-wise transformations of the examples stored in a View.

apply(view, can_fit)

Abstract method.

view is the View to act upon.
can_fit. When true, the Preprocess can adapt internal parameters based on the contents of the view. This is usually true for input Views taken from the training DataSet.

For example, let us preprocess the Mnist inputs. First, we load the datasource and create a Standardize preprocess.

ds = dp.Mnist()
st = dp.Standardize()

Get the train, valid and test set inputs.

train = ds:trainSet():inputs()
valid = ds:validSet():inputs()
test = ds:testSet():inputs()

Fit and apply the preprocess to the train View.

st:apply(train, true)

At this point the st Preprocess has measured and stored some statistics gathered from the train View. Furthermore, the train View has been preprocessed. We can apply the same preprocessing (with the same statistics) on the the valid and test Views.

st:apply(valid, false)
st:apply(test, false)

Since this is a common pattern in machine learning, we have simplified all of the above to one line of code.

ds = Mnist{input_preprocess=dp.Standardize()}

Standardize

A Preprocess that subtracts the mean and divides by the standard deviation. Uses the bf view.

dp.Standardize{[...]}

Constructs a Standardize Preprocess. Arguments should be specified as key-value pairs.

global_mean is a boolean with a default value of false. When true, subtracts the (scalar) mean over every element in the datset. Otherwise, subtract the mean from each column (feature) separately.
global_std is a boolean with a default value of false. When true, after centering, divides by the (scalar) standard deviation of every element in the design matrix. Otherwise, divide by the column-wise (per-feature) standard deviation.
std_eps is a number with a default value of 1e-4. It is a stabilization factor added to the standard deviations before dividing. This prevents standard deviations very close to zero from causing the feature values to blow up too much.

GCN

Performs Global Contrast Normalization by (optionally) subtracting the mean across features and normalizing by either the vector norm or the standard deviation (across features, for each example).

dp.GCN{[...]}

Constructs a GCN Preprocess. Arguments should be specified as key-value pairs:

substract_mean is a boolean with a default value of true. Remove the mean across features/pixels before normalizing. Note that this is the per-example mean across pixels, not the per-pixel mean across examples.
scale is a number with a default value of 1.0. Multiply features by this constant.
sqrt_bias is a number with a default value of 0. A fudge factor added inside the square root. Adds this amount inside the square root when computing the standard deviation or the norm.
use_std is a boolean with a default value of false. If True uses the standard deviation instead of the norm.
min_divisor is a number with a default value of 1e-8. If the divisor for an example is less than this value, do not apply it.
batch_size is a number with a default value 0. The size of a batch used internally.

Note that sqrt_bias = 10, use_std = true and defaults for all other parameters corresponds to the preprocessing used in : A. Coates, H. Lee and A. Ng. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. AISTATS 14, 2011.

ZCA

Performs Zero Component Analysis Whitening. Commonly used for images, yet uses the bf view. For a comprehensize explanation of ZCA Whitening please consult the Standford Whitening article

dp.ZCA{[...]}

Constructs a ZCA Preprocess. Arguments should be specified as key-value pairs.

n_component measures the number of most important eigen components to use for ZCA. The default is to use all of components.
n_drop_component is the number of least important eigen components to drop. The default value is 0.
filter_bias is a number with a default value of 0.1. Filters are scaled by 1/sqrt(filter_bias + variance).

LeCunLCN

Implements the Local Contrast Normalization Layer at page 3 in What is the Best Multi-Stage Architecture for Object Recognition?.

Performs local subtractive and divisive normalization enforcing a sort of local competition between adjacent features in a feature map, and between features at the same spatial location in different feature maps. The subtractive normalization operation for a given site x[i][j][k] computes v[i][j][k] = x[i][j][k] − ( w[p][q]x[1][1+p][1+q] + ... + w[p][q]x[i][j+p][k+q]) + ..., where w[p][q] is a Gaussian weighting window (of default size 9 x 9) normalized so that w[1][1] + ... + w[pq] + ... = 1.

The divisive normalization computes y[i][j][k] = v[i][j][k]/max(c, σ[j][k]) where σ[j][k] = pow(w[1][1]pow(v[1][j+p][k+q],2) + ... + w[p][q]pow(v[i][j+p][k+q],2), 2), 2). For each sample, the constant c is set to the mean(σ[j][k]) in the experiments The denominator is the weighted standard deviation of all features over a spatial neighborhood.

As an example, here is Lenna before :

LeCunLCN preprocessed Lenna

and after LeCunLCN preprocessing:

LeCunLCN preprocessed Lenna .

dp.LeCunLCN{[...]}

Constructs a LeCunLCN Preprocess. Arguments should be specified as key-value pairs.

kernel_size is the local contrast kernel size. Default is 9.
threshold is the minimum threshold for using values as denomitators. Default is 0.0001.
batch_size used for performing the preprocessing. Default is 256.
channels is a list (table) of channels (colors) to normalize. Defaults to {1,2,3}.
progress is a boolean specifying whether a progress bar should be displayed.