A lightweight, cross-platform, header-only library written in standard C++ for tensor arithmetic with automatic differentiation, designed to closely mimick the famous PyTorch library in usage and appearance but with improved compile-time safety checks.
Here's an example that trains a two-layer neural network to learn the identity function. This code can be run in demo.cpp. Most of this should look and feel immediately familiar to a PyTorch user.
// layer sizes
constexpr std::size_t InputDim = 4;
constexpr std::size_t HiddenDim = 16;
constexpr std::size_t OutputDim = 4;
// learnable network parameters
auto W0 = scorch::rand<float, HiddenDim, InputDim>();
auto b0 = scorch::rand<float, HiddenDim>();
auto W1 = scorch::rand<float, HiddenDim, HiddenDim>();
auto b1 = scorch::rand<float, HiddenDim>();
auto W2 = scorch::rand<float, OutputDim, HiddenDim>();
auto b2 = scorch::rand<float, OutputDim>();
// optimizer
// learning rate, momentum ratio, parameters...
auto opt = scorch::optim::SGD(0.1f, 0.8f, W0, b0, W1, b1);
// batch size
constexpr std::size_t BatchDim = 16;
for (auto i = 0; i < 100; ++i) {
    // random input
    auto x = scorch::rand<float, BatchDim, InputDim>();
    // identity function: output is equal to input
    auto y = copy(x);
    // compute the network output
    // Yes, it's actually this simple
    auto y_hat = sigmoid(sigmoid(x % W0 + b0) % W1 + b1) % W2 + b2;
    // compute the loss
    auto l = mean((y_hat - y) ^ 2.0f);
    // don't forget to zero the gradients before back-propagation
    opt.zero_grad();
    // compute the gradients of all parameters w.r.t. the loss
    l.backward();
    // take a training step
    opt.step();
}Notable features:
- Support for vector, matrix, and tensor variables with arbitrarily many dimensions
- Support for scalar variables
- Element-wise functions, broadcasting semantics*, matrix-vector mulitplication, and more.
- The usual overloaded operators, plus %for matrix-vector multiplication and^for exponentiation.
- Extremely ergonomic syntax for writing expressions (see the example)
- Compile-time checking of tensor shape compatibility (!!!)
- Automatic differentiation using reverse-mode gradient computation
- Dynamic computational graphs
- Optimizers (only SGD for now)
- Tested with MSVC, GCC, and Clang
* Broadcasting semantics are only supported for pairs of tensors whose shapes are identical except that one may have additional higher dimensions. For example, a size 3x5x7 tensor is broadcastable with a size 5x7 tensor and a size 7 tensor, but a size 3x5x7 tensor is not broadcastable with a size 1x1x7 tensor, or a size 1x1x1 tensor.
Features that are not supported but are probably coming soon:
- Tensor views, clever indexing, and differentiation through tensor scalar element access
- Convolutions
- Matrix-matrix multiplication
- Some remaining basic mathematical functions (e.g. cbrt,atan, etc...)
- Smarter optimizers (e.g. Adam, RMSProp, if I can understand them)
- Higher-order derivatives (maybe)
Features that not supported and probably never will be:
- GPU acceleration
- Dynamically-sized tensors
This code was written by Tim Straubinger and is made available for free use under the MIT license.