Intro to Digit Classification with LibTorch
Recently I’ve been considering deploy a trained digit classification model for inferring to a project developed in C++. Here is a brief post of training the model and integrating libtorch C++ API, using CNN-based neural networks and MNIST datasets as example.
Training the Model
For reasons of simplicity and convenience, I still use Python Torch to train the model here. However, you can still find the official PyTorch example using C++ frontend training the model at MNIST Example with the PyTorch C++ Frontend.
Prepare the Datasets
The MNIST datasets (Modified National Institute of Standards and Technology database) is a large database of handwritten digits which consists of 60,000 training and 10,000 testing grayscale images sized by $28 * 28$ pixels in 10 classes and commonly used for training various image processing systems.
If your want to make a custom MNIST datasets, I recommend you take a look at MNIST Database and I’ve been writtern a data_2_mnist Python Script may help you do the convertion.
Firstly import necessary libraries and download the raw MNIST datasets to ./data
folder using intergrated PyTorch utils, then set the default batch_size
and define the transform function.
1 | import torch |
Note the batch_size
affects the time that required to complete each epoch and the smoothness of the gradient between each iteration in the deep learning training process. The example above uses 64 as batch_size
for training and 1,000 as batch_size
for testing.
The larger batch size obtains a more stable estimate of the gradient descent direction, but globally it is not necessarily the correct estimate, although the gradient estimate is more accurate, how far in this direction is still related to the learning rate, at this time you can try to improve the learning rate and increase the number of iterations. However, people usually using smaller batch with smaller learning rate and then increasing epoches appropriately can often achieve better generalization performance, because the variance of gradient estimation of small batch is larger and has the effect of regularization.
The normalize function is used to converts the data into a standard Normal(Gaussian) distribution that normalize the data to the range of [0, 1], where 0.1307 and 0.3081 are the mean and standard deviation of the MNIST datasets, the mean and the standard deviation determined by the datasets itself.
$$
Normalize: output[channel] = \frac {input[channel] - mean[channel]}{std[channel]}
$$
Next try some visualization to the loaded MNIST datasets and make sure the data is in the correct format (here just uses train dataset).
1 | def img_cvt(tensor): |
Build the Neural Network
Here I just simply use a simple CNN model from PyTorch MNIST Example.
1 | import torch.nn as nn |
There’re 3 basic knowledge in the above code, here just a brief description of what they do:
Conv2d is used to implement the 2d convolution operation. The four parameters mainly used in the above code are represented in a left-to-right table: number of input channels, number of output channels, the size of the convolution kernel (type int or tuple when the convolution is a square, represents the height and width) and the step length of each slide of the convolution.
Dropout is the random discarding of a part of neurons in different training processes. It means to let the activation value of a certain neuron with a certain probability $p$, let it stop working and not update the weights during this training process and not participate in the computation of the neural network. But its weights have to be kept (just not updated for a while) and it may have to work again at the next training process.
Linear is used to set the fully connected layer in the network, it should be noted that the input and output of the fully connected layer are two-dimensional tensor.
Train and Export the Model
Firstly define the train and test function as follows:
1 | def train(model, device, train_loader, optimizer, epoch): |
Then transfer pre-built model to correct device and choose a optimizer, I used Adadelta optimizer here, the paper of that optimizer is Adadelta: An Adaptive Learning Rate Method.
I want the learning rate to be reduced to a fraction of the original gamma every certain number of steps (or epoch), so the StepLR scheduler is used here.
1 | import torch.optim as optim |
Then we can start training the model.
1 | for epoch in range(1, epochs + 1): |
After training, we can evaluate the model and export the model to a file.
1 | data_iter = iter(test_loader) |
Note that if we want to use the trained model towards a non-python environment, we should export the model as a TorchScript Module. A Module is the basic unit of composition in PyTorch, it contains (from Basics of PyTorch Model Authoring):
- A constructor, which prepares the module for invocation
- A set of Parameters and sub-Modules. These are initialized by the constructor and can be used by the module during invocation.
- A forward function. This is the code that is run when the module is invoked.
Also the moudle could be optimized for mobile devices Optimize a TorchScript Model
Finally, we have exported the traced module to a file called mnist_traced.pt
which could be used in libtorch.
1 | blob_input = torch.zeros(1, 1, 28, 28).to(device) |
Inferring with LibTorch C++ Frontend API
For inference, I prefer to wrap libtorch frontend API to a single header file to use.
Header Module Helper
As the code below shows, class dnn::ts_mnist
is initialized with the model file path and label vector, and a _input
vector is used to store the input data in futher steps.
1 |
|
The constructor of ts_mnist
will load the module by calling torch::jit::load
. Then wrote the function for inferring.
1 | // the minmal OpenCV header used for image processing |
Assuming that the inferring function accepts a color image as an incoming parameter, the color convertion of grayscale should be apply to that image because the module (or the model inside the module) requires a 1 dimensional tensor input, then force resize the grayscale image to $28 * 28$ to match the input shape.
Before calling forward function to the module, the image should be converted to a tensor and normalized, referring to the input shape of the module at exporting step, the normalized tensor’s shape is 2 dimensional ($28 * 28$), thus we should add 2 demension to the tensor to match the input shape ($1 * 1 * 28 * 28$), just calling unsqueeze_(0)
twice.
Then the output result can be obtained by calling forward function, finally we can apply a argmax function to the output tensor to get the final result that we want.
Compile using CMake and Test
Integrating libtorch into existing C++ projects is easy, for macOS users, just install libtorch
, torchvision
and opencv
using HomeBrew, and add following lines to the CMakeLists.txt
file would be fine.
1 | find_package(Torch REQUIRED) |
For Linux users, apt
or alternative package manager may not good at handling this, which CMake find_package
function would always reporting the package was not found. Things would goes little complicated if you’re using the official pre-compiled version of libtorch, you may encounter ABI breakage of linker errors, and manually compiling libtorch may fix this problem (manually compiling OpenCV is also recommend, if turely needed). However, you may specify a path to the pre-built libraries like this so that CMake can find that.
1 | set(CMAKE_FIND_ROOT_PATH <libraries path>) |
For MSVC users, please take a look at official libtorch minmal example for more details.
Then have a simple test, the following code uses first argument as the module file path, the second argument as the path of the image for inferring and the third argument as the labels index.
1 |
|
As a intro to digit classification with libtorch, I haven’t test the libtorch C++ Frontend API for CUDA devices currently, maybe I could do that later :D