Detailed description
Fetch dataset
- Enter into the root directory of Caffe.
- Run
get_cifar10.sh
to download data from CIFAR10 website, andcreate_cifar10.sh
to convert data to lmdb format../data/cifar10/get_cifar10.sh ./examples/cifar10/create_cifar10.sh
Define the network
The network has been defined in $CaffeRoot/examples/cifar10/cifar10_quick_train_test.prototxt
, so you don't have to write one yourself.
Define the data layer
layer { name: "cifar" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { mean_file: "examples/cifar10/mean.binaryproto" } data_param { source: "examples/cifar10/cifar10_train_lmdb" batch_size: 100 backend: LMDB } }
The "cifar" layer has type "Data", since it reads data from lmdb file. And since it is the first layer in the network, it doesn't have a bottom field (which specifies where the data come from). It has 2 top fields which specify where the image contents and the label go to.
In
include
, we set the phase toTRAIN
which means the data are used to train the network.transform_param
specifies the mean_file used to do mean subtraction.data_param
specifies where the lmdb data locate and the batch_size for each iteration in gradient descent.layer { name: "cifar" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { mean_file: "examples/cifar10/mean.binaryproto" } data_param { source: "examples/cifar10/cifar10_test_lmdb" batch_size: 100 backend: LMDB } }
This layer is similar to the first one, except that the phase is set to
TEST
, so this layer imports the testing data.Define the convolutional layer
layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 32 pad: 2 kernel_size: 5 stride: 1 weight_filler { type: "gaussian" std: 0.0001 } bias_filler { type: "constant" } } }
This layer receives data from "data" layer, and output to "conv1". In
Param
, set the learning rate for weights and biases. Inconvolutional_param
, we set the filter number to 32, pad the image with 2 pixels, set kernel size to 5x5, and move the filter 1 pixel away each move.weight_filter
andbias_filter
are used to initialize weights and biases.Define the pool, relu, and inner product layers
layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "relu1" type: "ReLU" bottom: "pool1" top: "pool1" } layer { name: "ip1" type: "InnerProduct" bottom: "pool3" top: "ip1" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 64 weight_filler { type: "gaussian" std: 0.1 } bias_filler { type: "constant" } } }
There is no new syntax.
Define the accuracy and loss layers
layer { name: "accuracy" type: "Accuracy" bottom: "ip2" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" top: "loss" }
The accuracy layer is for "TEST" phase, and it will output the accuracy rate every 500 training iteration. The interval to show accuracy rate is defined in solver file below.
Define the solver
In the previous chapter Quick cheatsheet, we just execute the ./examples/cifar10/train_quick.sh
file. Here is the content inside.
#!/usr/bin/env sh
TOOLS=./build/tools
$TOOLS/caffe train \
--solver=examples/cifar10/cifar10_quick_solver.prototxt
# reduce learning rate by factor of 10 after 8 epochs
$TOOLS/caffe train \
--solver=examples/cifar10/cifar10_quick_solver_lr1.prototxt \
--snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
We train the network first using cifar10_quick_solver.prototxt
and then continue the training using cifar10_quick_solver_lr1.prototxt
. The only difference between the 2 file is the learning rate is cut down to its 1/10 for the last 1000 iterations.
Here is the content of the solver fileexamples/cifar10/cifar10_quick_solver.prototxt
.
# reduce the learning rate after 8 epochs (4000 iters) by a factor of 10
# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_quick_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.0001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 5000
# snapshot intermediate results
snapshot: 5000
snapshot_format: HDF5
snapshot_prefix: "examples/cifar10/cifar10_quick"
# solver mode: CPU or GPU
solver_mode: GPU
We can set the network definition, hyperparameters, and CPU or GPU mode in this file. Note the test_iter
and test_interval
definition. Caffe will test 100 iterations with batch size 100 defined in network definition file. This means all the 10000 training data will be used once the test taking place. And the test interval says the test will be done every 500 training iterations.