Tinygrad

tinygrad

The goal of tinygrad is to make the simplest machine learning framework there is.

Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.

Features

LLaMA and Stable Diffusion

tinygrad can run LLaMA (opens in a new tab) and Stable Diffusion (opens in a new tab)!

Laziness

Try a matmul. See how, despite the style, it is fused into one kernel with the power of laziness.

DEBUG=3 python3 -c "from tinygrad.tensor import Tensor;
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
c = (a.reshape(N, 1, N) * b.permute(1,0).reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"

Change DEBUG to 4 to see the generated code.

Neural networks

As it turns out, 90% of what you need for neural networks are a decent autograd/tensor library. Throw in an optimizer, a data loader, and some compute, and you have all you need.

Neural network example (from test/models/test_mnist.py)

from tinygrad.tensor import Tensor
import tinygrad.nn.optim as optim
 
class TinyBobNet:
  def __init__(self):
    self.l1 = Tensor.uniform(784, 128)
    self.l2 = Tensor.uniform(128, 10)
 
  def forward(self, x):
    return x.dot(self.l1).relu().dot(self.l2).log_softmax()
 
model = TinyBobNet()
optim = optim.SGD([model.l1, model.l2], lr=0.001)
 
# ... complete data loader here
 
out = model.forward(x)
loss = out.mul(y).mean()
optim.zero_grad()
loss.backward()
optim.step()

Accelerators

tinygrad already supports numerous accelerators, including:


  • CPU
  • CPU
  • GPU (OpenCL)
  • C Code (Clang)
  • LLVM
  • METAL
  • CUDA
  • Triton
  • PyTorch
  • HIP
  • WebGPU

And it is easy to add more! Your accelerator of choice only needs to support a total of 26 (optionally 27) low level ops. More information can be found in the documentation for adding new accelerators (opens in a new tab).

;