Tags · born-ml/born

v0.7.3

chore: update go-webgpu to v0.1.2 and goffi to v0.3.5 (#21)

Dec 27, 2025
d0d4ea3
zip
tar.gz
Notes

v0.7.2

chore: update go-webgpu to v0.1.1 and goffi to v0.3.3 (#18)

- Updated go-webgpu/webgpu v0.1.0 → v0.1.1
- Updated go-webgpu/goffi v0.3.1 → v0.3.3 (indirect)
- Updated documentation for v0.7.2 release

Dec 24, 2025
6a4c94c
zip
tar.gz
Notes

v0.7.1

v0.7.1: Code quality refactoring (Issue #14)

- Apply Burn framework patterns for improved code quality
- Pre-slice bounds elimination in Conv2D/MaxPool2D
- Stride specialization for compiler auto-vectorization
- Flash Attention CPU refactor (complexity 111 → <30)
- Autodiff delegation pattern (separate orchestration from computation)
- New internal/parallel package for parallel execution utilities
- Extended Backend interface with backward operation methods

Dec 16, 2025
1a40bb4
zip
tar.gz
Notes

v0.7.0

feat: v0.7.0 inference optimization - Flash Attention, Speculative De…

…coding, GGUF (#13)

TASK-062: GGUF Import Complete
- Parser: types, metadata, tensor info extraction
- Loader: tensor data loading with memory mapping
- Dequantization: K-quants (Q4_K, Q5_K, Q6_K, Q8_0)
- Converter: GGUF to Born tensor format

TASK-060: Flash Attention 2
- Week 1: CPU reference with online softmax O(N) memory
- Week 2: WebGPU WGSL shader with tiled computation
- Supports causal masking, head dims 64-256, block sizes 64/128
- GPU vs CPU validation < 1e-4 error

TASK-061: Speculative Decoding
- Draft model generates K tokens speculatively
- Target model verifies in parallel batch
- Modified rejection sampling for token acceptance
- 2-4x speedup potential for autoregressive generation

Also: Fixed 226 gosec G115 lint issues across codebase

Dec 9, 2025
78fd1a1
zip
tar.gz
Notes

v0.6.0

feat: v0.6.0 - ONNX Import + Lazy GPU Mode (#11)

* feat(tensor): add lazy GPU evaluation and raw tensor operations

- Add LazyGPUData struct for GPU-resident tensor data with lazy realization
- Add runtime.SetFinalizer for automatic GPU buffer cleanup on GC
- Extend RawTensor with GPUData() accessor and lazy data realization in Data()
- Add comprehensive raw_ops.go with 50+ tensor operations (argmax, topk, etc.)
- Support type conversion, broadcasting, and advanced indexing operations

* feat(webgpu): add GPU autodiff backward ops and batch operations

- Add GPU-accelerated backward operations (MatMulBackward, AddBackward, etc.)
- Add GPUTensor wrapper for seamless GPU/CPU tensor operations
- Add batch processing support for efficient multi-tensor operations
- Include comprehensive test coverage for all new operations

* feat(webgpu): complete Phase 3 lazy mode with command batching

- Add lazy compute operations that keep data GPU-resident until needed
- Implement GPU-to-GPU buffer copy to avoid CPU round-trips
- Add command batching to reduce GPU sync overhead (200 submits → 1-2)
- Extend lazy mode to all critical operations (Add, Mul, MatMul, Softmax, etc.)
- Add FlushCommands() for explicit synchronization when needed

* feat(onnx): add ONNX model import and inference API

- Add ONNX protobuf parser for .onnx model files
- Implement model loader with weight extraction and graph construction
- Add operator registry with 30+ standard ONNX operators
- Support activations (ReLU, Sigmoid, Tanh, Softmax, GELU)
- Support math ops (MatMul, Add, Mul, Div, Sqrt, Pow)
- Support shape ops (Reshape, Transpose, Squeeze, Unsqueeze, Concat)
- Support utility ops (Gather, Slice, Cast, Constant, Identity)
- Include comprehensive test coverage for parser and loader

* docs: add organization logo assets

* docs: update documentation for v0.6.0 release

- Add CHANGELOG entry for v0.6.0 (ONNX import, lazy GPU, command batching)
- Update README with new features and version number
- Update ROADMAP to show v0.6.0 as current release
- Mark Phase 6 as complete (ONNX + Lazy GPU)
- Add performance metrics for lazy GPU mode

* fix(lint): update for golangci-lint v2.7.0

- Remove unused //nolint:gosec directives for G115 (fixed in gosec v2.21.2+)
- Add nolint only for actual warnings: G103 (unsafe), G304 (file path), G404 (weak random)
- Fix whitespace errors (unnecessary leading newlines)

Dec 3, 2025
edec9ec
zip
tar.gz
Notes

v0.5.5

feat(webgpu): add GPU multi-dim Transpose and Expand operations

- Add transposeNDShader for N-dimensional transpose (up to 6D)
- Add expandShader for NumPy-style GPU broadcasting
- Support both float32 and int32 data types
- Remove CPU fallback for Transpose and Expand
- Add 9 new tests for 3D/4D/5D/6D operations

Fixes ~60s/batch slowdown in transformer training.
Closes TASK-052.

Dec 3, 2025
65df10b
zip
tar.gz
Notes

v0.5.4

v0.5.4: Model Serialization

Features:
- Born Native Format v2 (.born) with SHA-256 checksum
- Security validation (offset overlap, path traversal, bounds check)
- Memory-mapped reader for large models (70GB+)
- Checkpoint API for training resume
- SafeTensors export for HuggingFace compatibility

New Package:
- internal/serialization - Format writer/reader, validation, mmap

API:
- nn.Save(model, path, modelType, metadata)
- nn.Load(path, backend, model)
- nn.SaveCheckpoint() / nn.LoadCheckpoint()
- serialization.WriteSafeTensors()

Dec 3, 2025
9bf4944
zip
tar.gz
Notes

v0.5.3

v0.5.3 - WebGPU Backend Fixes (HRM Compatibility)

Bug Fixes:
- Comparison ops: always return float32 (0.0/1.0) even for int32 inputs
- Sum int32: added WGSL shader for int32 sum reduction
- Sum scalar shape: fixed return shape from [1] to [] for proper scalar handling
- Where int32 condition: added support for int32 condition tensors
- Where broadcasting: added NumPy-style broadcasting (like Burn)
- Gather backward: support for int32, int64, float32 index tensors

New Functions:
- runComparisonOp - dedicated function for comparison operations
- int32ToFloat32 - helper for int32 to float32 conversion

Tests:
- 3 new Gather backward tests

Dec 2, 2025
22834d9
zip
tar.gz
Notes

v0.5.2

v0.5.2 - Critical Autodiff Fixes & Public WebGPU API

Dec 1, 2025
89fb977
zip
tar.gz
Notes

v0.5.1

v0.5.1 Hotfix - Embedding Autodiff

Critical fix:
- nn.Embedding now records on autodiff tape
- Gradient flow restored for all embedding parameters
- All models with embeddings can now train properly

Dec 1, 2025
113b78d
zip
tar.gz
Notes

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.7.3

v0.7.2

v0.7.1

v0.7.0

v0.6.0

v0.5.5

v0.5.4

v0.5.3

v0.5.2

v0.5.1

Tags: born-ml/born