Detailed description

Check:


Lasagne MNIST example[3]

Import Theano and numpy

Since Lasagne is built on top of Theano, it is meant as a supplement helping with some tasks, not as a replacement. You will always mix Lasagne with some Theano code.

import numpy as np
import theano
import theano.tensor as T

import lasagne

Loading the MNIST dataset

def load_dataset():
    # We first define a download function, supporting both Python 2 and 3.
    if sys.version_info[0] == 2:
        from urllib import urlretrieve
    else:
        from urllib.request import urlretrieve
  • urlretrieve(url, filename=None, reporthook=None, data=None)[↗]
    Copy a network object denoted by a URL to a local file if necessary.
    [Return]: (filename, headers)
    1. filename: the local file name under which the object can be found.
    2. headers: whatever the info() method of the object returned by urlopen() returned.
    # download mnist files from 'http://yann.lecun.com/exdb/mnist/'
    def download(filename, source='http://yann.lecun.com/exdb/mnist/'):
        print("Downloading %s" % filename)
        urlretrieve(source + filename, filename)

    # We then define functions for loading MNIST images and labels.
    # For convenience, they also download the requested files if needed.
    import gzip

    def load_mnist_images(filename):
        if not os.path.exists(filename):
            download(filename)
        # Read the inputs in Yann LeCun's binary format.
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=16)
        # The inputs are vectors now, we reshape them to monochrome 2D images,
        # following the shape convention: 
        #   reshape(examples, channels, rows, columns)
        data = data.reshape(-1, 1, 28, 28)

        #### Normalization ####
        # The inputs come as bytes, we convert them to float32 in range [0,1].
        # (Actually to range [0, 255/256], for compatibility to the version
        # provided at http://deeplearning.net/data/mnist/mnist.pkl.gz.)
        return data / np.float32(256)

    def load_mnist_labels(filename):
        if not os.path.exists(filename):
            download(filename)
        # Read the labels in Yann LeCun's binary format.
        with gzip.open(filename, 'rb') as f:
            data = np.frombuffer(f.read(), np.uint8, offset=8)
        # The labels are vectors of integers now, that's exactly what we want.
        return data

    # We can now download and read the training and test set images and labels.
    X_train = load_mnist_images('train-images-idx3-ubyte.gz')
    y_train = load_mnist_labels('train-labels-idx1-ubyte.gz')
    X_test = load_mnist_images('t10k-images-idx3-ubyte.gz')
    y_test = load_mnist_labels('t10k-labels-idx1-ubyte.gz')

    # We reserve the last 10000 training examples for validation.
    X_train, X_val = X_train[:-10000], X_train[-10000:]
    y_train, y_val = y_train[:-10000], y_train[-10000:]

    # We just return all the arrays in order, as expected in main().
    # (It doesn't matter how we do this as long as we can read them again.)
    return X_train, y_train, X_val, y_val, X_test, y_test
  • X_train.shape is (50000, 1, 28, 28), to be interpreted as:
    50,000 images of 1 channel, 28 rows and 28 columns each. (# of channels is 1 because there is the monochrome input.)
  • y_train.shape is simply (50000,1).
    That is, it is a vector the same length of X_train giving an integer class label(target) for each image – namely, the digit between '0' and '9' depicted in the image (according to the human annotator who drew that digit).

Building the neural network model

This script supports three types of models. For each one, there defined a function that takes a Theano variable representing the input and returns the output layer of a neural network model built in Lasagne.

1.Build multi-layer perceptron [4] [↗] (a fixed architecture)

  • 1 input layer: 1 * 28 * 28 = 784 dimensions
  • input dropout: 20% to input layer
  • 2 hidden layers: 800 units per layer
  • output dropout: 50% to each hidden layer
  • 1 softmax output layer: 10 units (label: 0 ~ 9)
def build_mlp(input_var=None):
    # This creates an MLP(multi-layer perceptron) of two hidden layers of 800 units each, followed by
    # a softmax output layer of 10 units. It applies 20% dropout to the input
    # data and 50% dropout to the hidden layers.

    # Input layer, specifying the expected input shape of the network
    # (unspecified batchsize, 1 channel, 28 rows and 28 columns) and
    # linking it to the given Theano variable `input_var`, if any:
    l_in = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
                                     input_var=input_var)

    # Apply 20% dropout to the input data:
    l_in_drop = lasagne.layers.DropoutLayer(l_in, p=0.2)

    # Add a fully-connected layer of 800 units, using the linear rectifier, and
    # initializing weights with Glorot's scheme (which is the default anyway):
    l_hid1 = lasagne.layers.DenseLayer(
            l_in_drop, num_units=800,
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())

    # We'll now add dropout of 50%:
    l_hid1_drop = lasagne.layers.DropoutLayer(l_hid1, p=0.5)

    # Another 800-unit layer:
    l_hid2 = lasagne.layers.DenseLayer(
            l_hid1_drop, num_units=800,
            nonlinearity=lasagne.nonlinearities.rectify)

    # 50% dropout again:
    l_hid2_drop = lasagne.layers.DropoutLayer(l_hid2, p=0.5)

    # Finally, we'll add the fully-connected output layer, of 10 softmax units:
    l_out = lasagne.layers.DenseLayer(
            l_hid2_drop, num_units=10,
            nonlinearity=lasagne.nonlinearities.softmax)

    # Each layer is linked to its incoming layer(s), so we only need to pass
    # the output layer to give access to a network in Lasagne:
    return l_out

2.Build custom multi-layer perceptron[5] (a custom architecture)

  • Same as MLP, but has flexible numbers(depth) and size(width) of hidden layers
def build_custom_mlp(input_var=None, depth=2, width=800, drop_input=.2,
                     drop_hidden=.5):
    # By default, this creates the same network as `build_mlp`, but it can be
    # customized with respect to the number and size of hidden layers. This
    # mostly showcases how creating a network in Python code can be a lot more
    # flexible than a configuration file. Note that to make the code easier,
    # all the layers are just called `network` -- there is no need to give them
    # different names if all we return is the last one we created anyway; we
    # just used different names above for clarity.

    # Input layer and dropout (with shortcut `dropout` for `DropoutLayer`):
    network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
                                        input_var=input_var)
    if drop_input:
        network = lasagne.layers.dropout(network, p=drop_input)
    # Hidden layers and dropout:
    nonlin = lasagne.nonlinearities.rectify
    for _ in range(depth):
        network = lasagne.layers.DenseLayer(
                network, width, nonlinearity=nonlin)
        if drop_hidden:
            network = lasagne.layers.dropout(network, p=drop_hidden)
    # Output layer:
    softmax = lasagne.nonlinearities.softmax
    network = lasagne.layers.DenseLayer(network, 10, nonlinearity=softmax)
    return network

3.Build convolutional neural network[6]

  • input layer: [Default]

    • input 1x28x28 imgaes
    network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
          input_var=input_var)
    
  • convolutional layer: [Default]

    • 32 5x5 filters(=kernels=weights)
    • non-linearity: rectify(ReLU)
    • weight initializer: Glorot Uniform
    network = lasagne.layers.Conv2DLayer(
          network, num_filters=32, filter_size=(5, 5),
          nonlinearity=lasagne.nonlinearities.rectify)
    
  • pooling layer: [Default]

    • pool size: 2 * 2
    network = lasagne.layers.DenseLayer(
          lasagne.layers.dropout(network, p=.5),
          num_units=256,
          nonlinearity=lasagne.nonlinearities.rectify)
    
  • fully-connected layer: [Default]

    • 256 units
    • 50% dropout on each unit input
    • non-linearity: rectify(ReLU)
    network = lasagne.layers.DenseLayer(
          lasagne.layers.dropout(network, p=.5),
          num_units=256,
          nonlinearity=lasagne.nonlinearities.rectify)
    
    • Output layer: [Default]
      • output 10 labels
      • 50% dropout
    network = lasagne.layers.DenseLayer(
          lasagne.layers.dropout(network, p=.5),
          num_units=10,
          nonlinearity=lasagne.nonlinearities.softmax)
    return network
    
    [Architecture]
    • input layer: 1 * 28 * 28 = 784 dimensions
    • conv1 layer
    • pool1 layer
    • conv2 layer
    • pool2 layer
    • fc layer
    • softmax output layer: 10 units (label: 0 ~ 9) with 50% dropout
def build_cnn(input_var=None):
    # As a third model, we'll create a CNN of two convolution + pooling stages
    # and a fully-connected hidden layer in front of the output layer.

    # Input layer, as usual:
    network = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
                                        input_var=input_var)
    # This time we do not apply input dropout, as it tends to work less well
    # for convolutional layers.

    # Convolutional layer with 32 kernels of size 5x5. Strided and padded
    # convolutions are supported as well; see the docstring.
    network = lasagne.layers.Conv2DLayer(
            network, num_filters=32, filter_size=(5, 5),
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())
    # Expert note: Lasagne provides alternative convolutional layers that
    # override Theano's choice of which implementation to use; for details
    # please see http://lasagne.readthedocs.org/en/latest/user/tutorial.html.

    # Max-pooling layer of factor 2 in both dimensions:
    network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

    # Another convolution with 32 5x5 kernels, and another 2x2 pooling:
    network = lasagne.layers.Conv2DLayer(
            network, num_filters=32, filter_size=(5, 5),
            nonlinearity=lasagne.nonlinearities.rectify)
    network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))

    # A fully-connected layer of 256 units with 50% dropout on its inputs:
    network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=.5),
            num_units=256,
            nonlinearity=lasagne.nonlinearities.rectify)

    # And, finally, the 10-unit output layer with 50% dropout on its inputs:
    network = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(network, p=.5),
            num_units=10,
            nonlinearity=lasagne.nonlinearities.softmax)

    return network
  • Conv2DLayer will create a convolutional layer using T.nnet.conv2d, Theano’s default convolution.[↗]

Dataset iteration[7]

It first defines a short helper function for synchronously iterating over two numpy arrays of input data and targets, respectively, in mini-batches of a given number of items.

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert len(inputs) == len(targets), "Error: inputs and targets must be the same length"
    if shuffle:
        indices = np.arange(len(inputs))
        np.random.shuffle(indices)
    for start_idx in range(0, len(inputs) - batchsize + 1, batchsize): #range(START,STOP[,STEP])
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
            # Concatenate slice-object to `inputs: `inputs[slice(...)]`, same as `inputs[start_idx:start_idx+batchsize]` 
        yield inputs[excerpt], targets[excerpt]

numpy.random.shuffle: Modify a sequence in-place by shuffling its contents.

It is a generator function that serves one batch of inputs and targets at a time until the given dataset (in inputs and targets) is exhausted, either in sequence or in random order. Below we will plug this function into our training loop, validation loop and test loop.

Main function[8]

Preparation

First, loads the inputs:X, and targets(labels):y, of the MNIST dataset as numpy arrays, and split into training, validation and test data.

# Load the dataset
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()

In MNIST, shape of X is (50000, 1, 28, 28), y is (50000,1).

Then define symbolic Theano variables that will represent a mini-batch of inputs and targets in all the Theano expressions which we will generate for network training and inference. They are not tied to any data yet, but their dimensionality and data type is fixed already and matches the actual inputs and targets we will process later.

# Prepare Theano variables for inputs and targets
input_var = T.tensor4('inputs')   # type: theano.tensor.var.TensorVariab
target_var = T.ivector('targets') # type: theano.tensor.var.TensorVariab

See theano.tensor.var

Finally, call one of the three functions(build_mlp(), build_custom_mlp(), build_cnn()) for building the Lasagne network. Note that we hand the symbolic input variable to build-network function so it will be linked to the network’s input layer.

# Create neural network model
if model == 'mlp':
    network = build_mlp(input_var)
elif model.startswith('custom_mlp:'):
    depth, width, drop_in, drop_hid = model.split(':', 1)[1].split(',')
    network = build_custom_mlp(input_var, int(depth), int(width),
                               float(drop_in), float(drop_hid))
elif model == 'cnn':
    network = build_cnn(input_var)
else:
    print("Unrecognized model type %r." % model)
    return

If you want to create custom MLP, set model='custom_mlp:DEPTH,WIDTH,DROP_INPUT,DROP_HIDDEN' to the argument of main().
For example:

main(model='custom_mlp:2,800,0.3,0.5')
Loss and update expressions[9]

Create a loss expression to be minimized in training:

prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()
  1. We could add some weight decay as well here, see lasagne.regularization.
  2. Depending on the problem you are solving, you will need different loss functions, see lasagne.objectives for more.
  3. lasagne.objectives.categorical_crossentropy: Computes the categorical cross-entropy between predictions and targets.

Then create update expressions for training the network (i.e., how to modify the parameters at each training step). We will use Stochastic Gradient Descent (SGD) with Nesterov momentum here, but the lasagne.updates module offers several others you can plug in instead:

params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(
        loss, params, learning_rate=0.01, momentum=0.9)

The first step collects all Theano SharedVariable instances making up the trainable parameters of the layer, and the second step generates an update expression for each parameter.

After each epoch, we evaluate the network on the validation set. We need a slightly different loss expression like below. The crucial difference is that we pass deterministic=True to the get_output call. This causes all nondeterministic layers to switch to a deterministic implementation, so in our case, it disables the dropout layers.

test_prediction = lasagne.layers.get_output(network, deterministic=True)
test_loss = lasagne.objectives.categorical_crossentropy(test_prediction,
                                                        target_var)
test_loss = test_loss.mean()

As an additional monitoring quantity, we create an expression for the classification accuracy. It also builds on the deterministic test_prediction expression.

test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),
                  dtype=theano.config.floatX)
Compilation[10]

Compile a function performing a training step, it will return the corresponding training loss. Additionally, each time it is invoked, it applies all parameter updates in the updates dictionary, thus performing a gradient descent step with Nesterov momentum.

train_fn = theano.function([input_var, target_var], loss, updates=updates)

Then compile a second function computing the validation loss and accuracy. It will return the (deterministic) loss and classification accuracy, not performing any updates:

val_fn = theano.function([input_var, target_var], [test_loss, test_acc])
Training loop[11]

Finally, launch the training loop:

for epoch in range(num_epochs):   # default: num_epochs=500
    # In each epoch, we do a full pass over the training data:
    train_err = 0
    train_batches = 0
    start_time = time.time()
    for batch in iterate_minibatches(X_train, y_train, 500, shuffle=True):
        inputs, targets = batch    # batch = [ inputs[i], targets[i] ]
        train_err += train_fn(inputs, targets)
        train_batches += 1

This uses our dataset iteration helper function to iterate over the training data in random order, in mini-batches of 500 items each, for num_epochs epochs, and calls the training function train_fn we compiled to perform an update step of the network parameters.

After we capture the training loss, compute the validation loss,

    # And a full pass over the validation data:
    val_err = 0
    val_acc = 0
    val_batches = 0
    for batch in iterate_minibatches(X_val, y_val, 500, shuffle=False):
        inputs, targets = batch
        err, acc = val_fn(inputs, targets)
        val_err += err
        val_acc += acc
        val_batches += 1

and print some information to the console every time an epoch finishes

    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss:\t\t{:.6f}".format(train_err / train_batches))
    print("  validation loss:\t\t{:.6f}".format(val_err / val_batches))
    print("  validation accuracy:\t\t{:.2f} %".format(
        val_acc / val_batches * 100))

At the end, we re-use the val_fn() function to compute the loss and accuracy on the test set, finishing the script:

# After training, we compute and print the test error:
test_err = 0
test_acc = 0
test_batches = 0
for batch in iterate_minibatches(X_test, y_test, 500, shuffle=False):
    inputs, targets = batch
    err, acc = val_fn(inputs, targets)
    test_err += err
    test_acc += acc
    test_batches += 1
print("Final results:")
print("  test loss:\t\t\t{:.6f}".format(test_err / test_batches))
print("  test accuracy:\t\t{:.2f} %".format(
    test_acc / test_batches * 100))

Nolearn MNIST example[12]


[3] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#understand-the-mnist-example

[4] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#multi-layer-perceptron-mlp

[5] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#custom-mlp

[6] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#convolutional-neural-network-cnn

[7] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#dataset-iteration

[8] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#preparation

[9] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#loss-and-update-expressions

[10] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#compilation

[11] https://lasagne.readthedocs.io/en/latest/user/tutorial.html#training-loop

[12] http://nbviewer.jupyter.org/github/dnouri/nolearn/blob/master/docs/notebooks/CNN_tutorial.ipynb

results matching ""

    No results matching ""