Detailed description
Check:
Lasagne MNIST example[3]
Import Theano and numpy
Since Lasagne is built on top of Theano, it is meant as a supplement helping with some tasks, not as a replacement. You will always mix Lasagne with some Theano code.
import numpy as np
import theano
import theano.tensor as T
import lasagne
Loading the MNIST dataset
def load_dataset():
# We first define a download function, supporting both Python 2 and 3.
if sys.version_info[0] == 2:
from urllib import urlretrieve
else:
from urllib.request import urlretrieve
urlretrieve(url, filename=None, reporthook=None, data=None)
[↗]
Copy a network object denoted by a URL to a local file if necessary.
[Return]:(filename, headers)
- filename: the local file name under which the object can be found.
- headers: whatever the
info()
method of the object returned by urlopen() returned.
# download mnist files from 'http://yann.lecun.com/exdb/mnist/'
def download(filename, source='http://yann.lecun.com/exdb/mnist/'):
print("Downloading %s" % filename)
urlretrieve(source + filename, filename)
# We then define functions for loading MNIST images and labels.
# For convenience, they also download the requested files if needed.
import gzip
def load_mnist_images(filename):
if not os.path.exists(filename):
download(filename)
# Read the inputs in Yann LeCun's binary format.
with gzip.open(filename, 'rb') as f:
data = np.frombuffer(f.read(), np.uint8, offset=16)
# The inputs are vectors now, we reshape them to monochrome 2D images,
# following the shape convention:
# reshape(examples, channels, rows, columns)
data = data.reshape(-1, 1, 28, 28)
#### Normalization ####
# The inputs come as bytes, we convert them to float32 in range [0,1].
# (Actually to range [0, 255/256], for compatibility to the version
# provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
return data / np.float32(256)
def load_mnist_labels(filename):
if not os.path.exists(filename):
download(filename)
# Read the labels in Yann LeCun's binary format.
with gzip.open(filename, 'rb') as f:
data = np.frombuffer(f.read(), np.uint8, offset=8)
# The labels are vectors of integers now, that's exactly what we want.
return data
# We can now download and read the training and test set images and labels.
X_train = load_mnist_images('train-images-idx3-ubyte.gz')
y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')
# We reserve the last 10000 training examples for validation.
X_train, X_val = X_train[:-10000], X_train[-10000:]
y_train, y_val = y_train[:-10000], y_train[-10000:]
# We just return all the arrays in order, as expected in main().
# (It doesn't matter how we do this as long as we can read them again.)
return X_train, y_train, X_val, y_val, X_test, y_test
X_train.shape
is(50000, 1, 28, 28)
, to be interpreted as:
50,000 images of 1 channel, 28 rows and 28 columns each. (# of channels is 1 because there is the monochrome input.)y_train.shape
is simply(50000,1)
.
That is, it is a vector the same length ofX_train
giving an integer class label(target) for each image – namely, the digit between '0' and '9' depicted in the image (according to the human annotator who drew that digit).
Building the neural network model
This script supports three types of models. For each one, there defined a function that takes a Theano variable representing the input and returns the output layer of a neural network model built in Lasagne.
1.Build multi-layer perceptron [4] [↗] (a fixed architecture)
- 1 input layer: 1 * 28 * 28 = 784 dimensions
- input dropout: 20% to input layer
- 2 hidden layers: 800 units per layer
- output dropout: 50% to each hidden layer
- 1 softmax output layer: 10 units (label: 0 ~ 9)
def build_mlp(input_var=None):
# This creates an MLP(multi-layer perceptron) of two hidden layers of 800 units each, followed by
# a softmax output layer of 10 units. It applies 20% dropout to the input
# data and 50% dropout to the hidden layers.
# Input layer, specifying the expected input shape of the network
# (unspecified batchsize, 1 channel, 28 rows and 28 columns) and
# linking it to the given Theano variable `input_var`, if any:
l_in = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
input_var=input_var)
# Apply 20% dropout to the input data:
l_in_drop = lasagne.layers.DropoutLayer(l_in, p=0.2)
# Add a fully-connected layer of 800 units, using the linear rectifier, and
# initializing weights with Glorot's scheme (which is the default anyway):
l_hid1 = lasagne.layers.DenseLayer(
l_in_drop, num_units=800,
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform())
# We'll now add dropout of 50%:
l_hid1_drop = lasagne.layers.DropoutLayer(l_hid1, p=0.5)
# Another 800-unit layer:
l_hid2 = lasagne.layers.DenseLayer(
l_hid1_drop, num_units=800,
nonlinearity=lasagne.nonlinearities.rectify)
# 50% dropout again:
l_hid2_drop = lasagne.layers.DropoutLayer(l_hid2, p=0.5)
# Finally, we'll add the fully-connected output layer, of 10 softmax units:
l_out = lasagne.layers.DenseLayer(
l_hid2_drop, num_units=10,
nonlinearity=lasagne.nonlinearities.softmax)
# Each layer is linked to its incoming layer(s), so we only need to pass
# the output layer to give access to a network in Lasagne:
return l_out
2.Build custom multi-layer perceptron[5] (a custom architecture)
- Same as MLP, but has flexible numbers(depth) and size(width) of hidden layers
def build_custom_mlp(input_var=None, depth=2, width=800, drop_input=.2,
drop_hidden=.5):
# By default, this creates the same network as `build_mlp`, but it can be
# customized with respect to the number and size of hidden layers. This
# mostly showcases how creating a network in Python code can be a lot more
# flexible than a configuration file. Note that to make the code easier,
# all the layers are just called `network` -- there is no need to give them
# different names if all we return is the last one we created anyway; we
# just used different names above for clarity.
# Input layer and dropout (with shortcut `dropout` for `DropoutLayer`):
network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
input_var=input_var)
if drop_input:
network = lasagne.layers.dropout(network, p=drop_input)
# Hidden layers and dropout:
nonlin = lasagne.nonlinearities.rectify
for _ in range(depth):
network = lasagne.layers.DenseLayer(
network, width, nonlinearity=nonlin)
if drop_hidden:
network = lasagne.layers.dropout(network, p=drop_hidden)
# Output layer:
softmax = lasagne.nonlinearities.softmax
network = lasagne.layers.DenseLayer(network, 10, nonlinearity=softmax)
return network
3.Build convolutional neural network[6]
input layer: [Default]
- input 1x28x28 imgaes
network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28), input_var=input_var)
convolutional layer: [Default]
- 32 5x5 filters(=kernels=weights)
- non-linearity: rectify(ReLU)
- weight initializer: Glorot Uniform
network = lasagne.layers.Conv2DLayer( network, num_filters=32, filter_size=(5, 5), nonlinearity=lasagne.nonlinearities.rectify)
pooling layer: [Default]
- pool size: 2 * 2
network = lasagne.layers.DenseLayer( lasagne.layers.dropout(network, p=.5), num_units=256, nonlinearity=lasagne.nonlinearities.rectify)
fully-connected layer: [Default]
- 256 units
- 50% dropout on each unit input
- non-linearity: rectify(ReLU)
network = lasagne.layers.DenseLayer( lasagne.layers.dropout(network, p=.5), num_units=256, nonlinearity=lasagne.nonlinearities.rectify)
- Output layer: [Default]
- output 10 labels
- 50% dropout
network = lasagne.layers.DenseLayer( lasagne.layers.dropout(network, p=.5), num_units=10, nonlinearity=lasagne.nonlinearities.softmax) return network
[Architecture]
- input layer: 1 * 28 * 28 = 784 dimensions
- conv1 layer
- pool1 layer
- conv2 layer
- pool2 layer
- fc layer
- softmax output layer: 10 units (label: 0 ~ 9) with 50% dropout
def build_cnn(input_var=None):
# As a third model, we'll create a CNN of two convolution + pooling stages
# and a fully-connected hidden layer in front of the output layer.
# Input layer, as usual:
network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
input_var=input_var)
# This time we do not apply input dropout, as it tends to work less well
# for convolutional layers.
# Convolutional layer with 32 kernels of size 5x5. Strided and padded
# convolutions are supported as well; see the docstring.
network = lasagne.layers.Conv2DLayer(
network, num_filters=32, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform())
# Expert note: Lasagne provides alternative convolutional layers that
# override Theano's choice of which implementation to use; for details
# please see http://lasagne.readthedocs.org/en/latest/user/tutorial.html.
# Max-pooling layer of factor 2 in both dimensions:
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# Another convolution with 32 5x5 kernels, and another 2x2 pooling:
network = lasagne.layers.Conv2DLayer(
network, num_filters=32, filter_size=(5, 5),
nonlinearity=lasagne.nonlinearities.rectify)
network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
# A fully-connected layer of 256 units with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.5),
num_units=256,
nonlinearity=lasagne.nonlinearities.rectify)
# And, finally, the 10-unit output layer with 50% dropout on its inputs:
network = lasagne.layers.DenseLayer(
lasagne.layers.dropout(network, p=.5),
num_units=10,
nonlinearity=lasagne.nonlinearities.softmax)
return network
Conv2DLayer
will create a convolutional layer usingT.nnet.conv2d
, Theano’s default convolution.[↗]
Dataset iteration[7]
It first defines a short helper function for synchronously iterating over two numpy arrays of input data and targets, respectively, in mini-batches of a given number of items.
def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
assert len(inputs) == len(targets), "Error: inputs and targets must be the same length"
if shuffle:
indices = np.arange(len(inputs))
np.random.shuffle(indices)
for start_idx in range(0, len(inputs) - batchsize + 1, batchsize): #range(START,STOP[,STEP])
if shuffle:
excerpt = indices[start_idx:start_idx + batchsize]
else:
excerpt = slice(start_idx, start_idx + batchsize)
# Concatenate slice-object to `inputs: `inputs[slice(...)]`, same as `inputs[start_idx:start_idx+batchsize]`
yield inputs[excerpt], targets[excerpt]
numpy.random.shuffle: Modify a sequence in-place by shuffling its contents.
It is a generator function that serves one batch of inputs and targets at a time until the given dataset (in inputs
and targets
) is exhausted, either in sequence or in random order. Below we will plug this function into our training loop, validation loop and test loop.
Main function[8]
Preparation
First, loads the inputs:X, and targets(labels):y, of the MNIST dataset as numpy arrays, and split into training, validation and test data.
# Load the dataset
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
In MNIST, shape of X is (50000, 1, 28, 28), y is (50000,1).
Then define symbolic Theano variables that will represent a mini-batch of inputs and targets in all the Theano expressions which we will generate for network training and inference. They are not tied to any data yet, but their dimensionality and data type is fixed already and matches the actual inputs and targets we will process later.
# Prepare Theano variables for inputs and targets
input_var = T.tensor4('inputs') # type: theano.tensor.var.TensorVariab
target_var = T.ivector('targets') # type: theano.tensor.var.TensorVariab
Finally, call one of the three functions(build_mlp()
, build_custom_mlp()
, build_cnn()
) for building the Lasagne network. Note that we hand the symbolic input variable to build-network function so it will be linked to the network’s input layer.
# Create neural network model
if model == 'mlp':
network = build_mlp(input_var)
elif model.startswith('custom_mlp:'):
depth, width, drop_in, drop_hid = model.split(':', 1)[1].split(',')
network = build_custom_mlp(input_var, int(depth), int(width),
float(drop_in), float(drop_hid))
elif model == 'cnn':
network = build_cnn(input_var)
else:
print("Unrecognized model type %r." % model)
return
If you want to create custom MLP, set
model='custom_mlp:DEPTH,WIDTH,DROP_INPUT,DROP_HIDDEN'
to the argument ofmain()
.
For example:
main(model='custom_mlp:2,800,0.3,0.5')
Loss and update expressions[9]
Create a loss expression to be minimized in training:
prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()
- We could add some weight decay as well here, see lasagne.regularization.
- Depending on the problem you are solving, you will need different loss functions, see lasagne.objectives for more.
- lasagne.objectives.categorical_crossentropy: Computes the categorical cross-entropy between predictions and targets.
Then create update expressions for training the network (i.e., how to modify the parameters at each training step). We will use Stochastic Gradient Descent (SGD) with Nesterov momentum here, but the lasagne.updates module offers several others you can plug in instead:
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(
loss, params, learning_rate=0.01, momentum=0.9)
The first step collects all Theano
SharedVariable
instances making up the trainable parameters of the layer, and the second step generates an update expression for each parameter.
After each epoch, we evaluate the network on the validation set. We need a slightly different loss expression like below.
The crucial difference is that we pass deterministic=True
to the get_output
call. This causes all nondeterministic layers to switch to a deterministic implementation, so in our case, it disables the dropout layers.
test_prediction = lasagne.layers.get_output(network, deterministic=True)
test_loss = lasagne.objectives.categorical_crossentropy(test_prediction,
target_var)
test_loss = test_loss.mean()
As an additional monitoring quantity, we create an expression for the classification accuracy. It also builds on the deterministic test_prediction
expression.
test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),
dtype=theano.config.floatX)
Compilation[10]
Compile a function performing a training step, it will return the corresponding training loss. Additionally, each time it is invoked, it applies all parameter updates in the updates
dictionary, thus performing a gradient descent step with Nesterov momentum.
train_fn = theano.function([input_var, target_var], loss, updates=updates)
Then compile a second function computing the validation loss and accuracy. It will return the (deterministic) loss and classification accuracy, not performing any updates:
val_fn = theano.function([input_var, target_var], [test_loss, test_acc])
Training loop[11]
Finally, launch the training loop:
for epoch in range(num_epochs): # default: num_epochs=500
# In each epoch, we do a full pass over the training data:
train_err = 0
train_batches = 0
start_time = time.time()
for batch in iterate_minibatches(X_train, y_train, 500, shuffle=True):
inputs, targets = batch # batch = [ inputs[i], targets[i] ]
train_err += train_fn(inputs, targets)
train_batches += 1
This uses our dataset iteration helper function to iterate over the training data in random order, in mini-batches of 500 items each, for num_epochs
epochs, and calls the training function train_fn
we compiled to perform an update step of the network parameters.
After we capture the training loss, compute the validation loss,
# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in iterate_minibatches(X_val, y_val, 500, shuffle=False):
inputs, targets = batch
err, acc = val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1
and print some information to the console every time an epoch finishes
# Then we print the results for this epoch:
print("Epoch {} of {} took {:.3f}s".format(
epoch + 1, num_epochs, time.time() - start_time))
print(" training loss:\t\t{:.6f}".format(train_err / train_batches))
print(" validation loss:\t\t{:.6f}".format(val_err / val_batches))
print(" validation accuracy:\t\t{:.2f} %".format(
val_acc / val_batches * 100))
At the end, we re-use the val_fn()
function to compute the loss and accuracy on the test set, finishing the script:
# After training, we compute and print the test error:
test_err = 0
test_acc = 0
test_batches = 0
for batch in iterate_minibatches(X_test, y_test, 500, shuffle=False):
inputs, targets = batch
err, acc = val_fn(inputs, targets)
test_err += err
test_acc += acc
test_batches += 1
print("Final results:")
print(" test loss:\t\t\t{:.6f}".format(test_err / test_batches))
print(" test accuracy:\t\t{:.2f} %".format(
test_acc / test_batches * 100))
Nolearn MNIST example[12]
[3] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#understand-the-mnist-example
[4] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#multi-layer-perceptron-mlp
[5] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#custom-mlp
[6] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#convolutional-neural-network-cnn
[7] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#dataset-iteration
[8] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#preparation
[9] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#loss-and-update-expressions
[10] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#compilation
[11] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#training-loop
[12] http://nbviewer.jupyter.org/github/dnouri/nolearn/blob/master/docs/notebooks/CNN_tutorial.ipynb