tinygrad
The goal of tinygrad is to make the simplest machine learning framework there is.
Due to its extreme simplicity, it aims to be the easiest framework to add new accelerators to, with support for both inference and training. If XLA is CISC, tinygrad is RISC.
Features
LLaMA and Stable Diffusion
tinygrad can run LLaMA (opens in a new tab) and Stable Diffusion (opens in a new tab)!
Laziness
Try a matmul. See how, despite the style, it is fused into one kernel with the power of laziness.
DEBUG=3 python3 -c "from tinygrad.tensor import Tensor;
N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);
c = (a.reshape(N, 1, N) * b.permute(1,0).reshape(1, N, N)).sum(axis=2);
print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
Change DEBUG
to 4
to see the generated code.
Neural networks
As it turns out, 90% of what you need for neural networks are a decent autograd/tensor library. Throw in an optimizer, a data loader, and some compute, and you have all you need.
Neural network example (from test/models/test_mnist.py)
from tinygrad.tensor import Tensor
import tinygrad.nn.optim as optim
class TinyBobNet:
def __init__(self):
self.l1 = Tensor.uniform(784, 128)
self.l2 = Tensor.uniform(128, 10)
def forward(self, x):
return x.dot(self.l1).relu().dot(self.l2).log_softmax()
model = TinyBobNet()
optim = optim.SGD([model.l1, model.l2], lr=0.001)
# ... complete data loader here
out = model.forward(x)
loss = out.mul(y).mean()
optim.zero_grad()
loss.backward()
optim.step()
Accelerators
tinygrad already supports numerous accelerators, including:
- CPU
- CPU
- GPU (OpenCL)
- C Code (Clang)
- LLVM
- METAL
- CUDA
- Triton
- PyTorch
- HIP
- WebGPU
And it is easy to add more! Your accelerator of choice only needs to support a total of 26 (optionally 27) low level ops. More information can be found in the documentation for adding new accelerators (opens in a new tab).