Skip to content

distributedstatemachine/ominub

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Affine

Incentivized MLOps on Bittensor

Installation

1. Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Create .venv

uv venv
source .venv/bin/activate

3. Install Package

uv pip install -e .

4. Set Up Environment Variables

touch .env

Open the file and add your environment variables for R2.

BT_HOTKEY = "your_bittensor_wallet_hotkey_name"
BT_COLDKEY = "your_bittensor_wallet_coldkey_name"
R2_BUCKET_ID = "your_bucket_id"
R2_ACCOUNT_ID = "your_account_id" 
R2_GRADIENTS_BUCKET_NAME = "your_bucket_name"
R2_WRITE_ACCESS_KEY_ID = "your_access_key_id"
R2_WRITE_SECRET_ACCESS_KEY = "your_secret_access_key"

Run the worker

# Run the worker attached to netuid 10 on Bittensor mainnet.
affine-worker 10
>> INFO:     Started server process [86575]
>> ...
>> INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

API Reference

Import affine as a python module

import affine as af

put( key: Union[str, af.AFKEY], data: af.Serializable ) -> None

Upload objects to your R2 bucket.

# Upload Data.
import torch
await af.put('my_data', {'key':'value'}) # Single json serializable dict

# Upload Tensors.
import torch
await af.put('my_tensor', torch.zeros(10)) # Single tensor
await af.put('my_tensors_list', [torch.zeros(1), torch.zeros(2)]) # Tensor lists
await af.put('my_tensors_dict', {'a': torch.zeros(1), 'b': torch.zeros(2)}) # Tensor dicts
await af.put('my_tensors_dict', {'a': torch.zeros(1), 'b': torch.zeros(2)}) # Tensor dicts

# My Module.
module = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(28 * 28, 64),
    torch.nn.ReLU(),
    torch.nn.Linear(64, 10)
)
await af.put('my_module', module) # torch.jit.script serializable

get(key: str, bucket: str = default_bucket) -> Any

Download objects

# Upload Data.
await af.get('my_data') # {'key':'value'})

# Upload Tensors.
import torch
my_tensor = await af.get('my_tensor') # torch.zeros(10)
await af.get('my_tensors_list' ) # [torch.zeros(1), torch.zeros(2)]
await af.get('my_tensors_dict' ) # {'a': torch.zeros(1), 'b': torch.zeros(2)}
await af.get('my_tensors_dict' ) # {'a': torch.zeros(1), 'b': torch.zeros(2)}

# Run module
mod = await af.get('my_module', module) # torch.jit.script 
out = mod( my_tensor )

exists(key: str, bucket: str = default_bucket) -> bool

Checks if a key exists in the specified bucket.

assert await af.get('my_tensor')

list(prefix: str = "", bucket: str = default_bucket) -> List[str]

Lists all keys with the given prefix in the specified bucket.

assert await af.list('my_')
#[
#   my_data,
#   my_tensor
#   ...
#   my_module
#]

delete(key: str, bucket: str = default_bucket) -> None

Deletes the data associated with the key from the specified bucket.

await af.delete('my_module')
assert not await af.exists( 'my_module' )

RPC Operations

ping(endpoint: str = default_endpoint) -> bool

Checks connectivity with the specified endpoint.

await af.ping( endpoint = '0.0.0.0:8000' ) # Returns true if a worker is running on this endpoint

load(key: str, endpoint: str = default_endpoint) -> Any

Asks the remote endpoint to load the given key.

await af.load('my_module', endpoint = '0.0.0.0:8000' ) # Send message to localhost to load this module.

forward(f: str, x: str, y: str, endpoint: str = default_endpoint) -> Any

Performs forward pass computation using the specified key, input x and output y.

# Runs a forward pass through my_module with inputs from my_tensor and my_output.
await af.forward( f='my_module', x='my_tensor', y='my_output', endpoint = URL ) 

backward(y: str, dy: str, dx: str, endpoint: str = default_endpoint) -> Any

Performs backward pass computation using output y, output gradient dy and input gradient dx.

logits = (await af.get('my_output'))[0]
logits.requires_grad_(True)
loss = CRITERION(logits, labels)
loss.backward()
await af.put('grad_y', logits.grad.detach())
# Runs a forward pass through my_module with inputs from my_tensor and my_output.
af.backward( y='my_module', dy='grad_y', dx='grad_out', endpoint = URL ) 

apply(key: str, opt: str, endpoint: str = default_endpoint) -> Any

Applies optimization operation specified by opt to the data at key.

# Runs a forward pass through my_module with inputs from my_tensor and my_output.
await af.apply( key='my_module', opt='my_opy', endpoint = URL ) 

purge(keys: List[str], endpoint: str = default_endpoint) -> None

Removes the specified keys from the remote endpoint.

# Runs a forward pass through my_module with inputs from my_tensor and my_output.
await af.purge( keys = ['my_module', 'my_opt', 'my_tensor']) 

Quickstart: run a worker and train a model

1. Start a worker node

Spin‑up an Affine worker that will serve your tensors and run the forward/backward/optimisation RPCs. All that is required is the netuid of the Bittensor subnet you want to join – here we use the public training subnet with id 10.

# Inside the project root and with your `.venv` activated
affine-worker 10
# │                └─ netuid (subnet) to serve on
#
# └── this starts an Uvicorn HTTP server on http://0.0.0.0:8000
#     and automatically announces the endpoint + bucket to the chain.

You should see log lines similar to

INFO     Bucket=my‑awesome‑bucket
INFO     Wallet=(coldkey=mywallet, hotkey=mywallet:0)
INFO     Subnet=10.
INFO     Serving ...
INFO     Served state to chain.
INFO     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Leave this process running – it is your "miner" for the remainder of the example.

2. Train a model that uses the worker

Open a second terminal, activate the same virtual‑env and simply run

python train.py

train.py will

  1. download the MNIST dataset,
  2. pick–up the subnet that you just served to (netuid 10),
  3. locate your worker's endpoint from the on‑chain metagraph,
  4. upload the model weights and data to your R2 bucket,
  5. execute the forward / backward / optimiser steps remotely on the worker, and
  6. finally pull back the trained parameters into mnist_trained.pt.

At the end you should see output like

Epoch 3/3 – mean loss 0.0123
✔ Finished – model saved to mnist_trained.pt

That's it – you have successfully run a fully remote training loop backed by an Affine worker node!

If you want to point the training script at a different miner simply change UID inside train.py to the UID of the desired hotkey (visible in the worker logs or on the Bittensor explorer).

About

Incentivized ML Ops on Bittensor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%