Skip to content

Tags: born-ml/born

Tags

v0.7.3

Toggle v0.7.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update go-webgpu to v0.1.2 and goffi to v0.3.5 (#21)

v0.7.2

Toggle v0.7.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update go-webgpu to v0.1.1 and goffi to v0.3.3 (#18)

- Updated go-webgpu/webgpu v0.1.0 → v0.1.1
- Updated go-webgpu/goffi v0.3.1 → v0.3.3 (indirect)
- Updated documentation for v0.7.2 release

v0.7.1

Toggle v0.7.1's commit message
v0.7.1: Code quality refactoring (Issue #14)

- Apply Burn framework patterns for improved code quality
- Pre-slice bounds elimination in Conv2D/MaxPool2D
- Stride specialization for compiler auto-vectorization
- Flash Attention CPU refactor (complexity 111 → <30)
- Autodiff delegation pattern (separate orchestration from computation)
- New internal/parallel package for parallel execution utilities
- Extended Backend interface with backward operation methods

v0.7.0

Toggle v0.7.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: v0.7.0 inference optimization - Flash Attention, Speculative De…

…coding, GGUF (#13)

TASK-062: GGUF Import Complete
- Parser: types, metadata, tensor info extraction
- Loader: tensor data loading with memory mapping
- Dequantization: K-quants (Q4_K, Q5_K, Q6_K, Q8_0)
- Converter: GGUF to Born tensor format

TASK-060: Flash Attention 2
- Week 1: CPU reference with online softmax O(N) memory
- Week 2: WebGPU WGSL shader with tiled computation
- Supports causal masking, head dims 64-256, block sizes 64/128
- GPU vs CPU validation < 1e-4 error

TASK-061: Speculative Decoding
- Draft model generates K tokens speculatively
- Target model verifies in parallel batch
- Modified rejection sampling for token acceptance
- 2-4x speedup potential for autoregressive generation

Also: Fixed 226 gosec G115 lint issues across codebase

v0.6.0

Toggle v0.6.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
feat: v0.6.0 - ONNX Import + Lazy GPU Mode (#11)

* feat(tensor): add lazy GPU evaluation and raw tensor operations

- Add LazyGPUData struct for GPU-resident tensor data with lazy realization
- Add runtime.SetFinalizer for automatic GPU buffer cleanup on GC
- Extend RawTensor with GPUData() accessor and lazy data realization in Data()
- Add comprehensive raw_ops.go with 50+ tensor operations (argmax, topk, etc.)
- Support type conversion, broadcasting, and advanced indexing operations

* feat(webgpu): add GPU autodiff backward ops and batch operations

- Add GPU-accelerated backward operations (MatMulBackward, AddBackward, etc.)
- Add GPUTensor wrapper for seamless GPU/CPU tensor operations
- Add batch processing support for efficient multi-tensor operations
- Include comprehensive test coverage for all new operations

* feat(webgpu): complete Phase 3 lazy mode with command batching

- Add lazy compute operations that keep data GPU-resident until needed
- Implement GPU-to-GPU buffer copy to avoid CPU round-trips
- Add command batching to reduce GPU sync overhead (200 submits → 1-2)
- Extend lazy mode to all critical operations (Add, Mul, MatMul, Softmax, etc.)
- Add FlushCommands() for explicit synchronization when needed

* feat(onnx): add ONNX model import and inference API

- Add ONNX protobuf parser for .onnx model files
- Implement model loader with weight extraction and graph construction
- Add operator registry with 30+ standard ONNX operators
- Support activations (ReLU, Sigmoid, Tanh, Softmax, GELU)
- Support math ops (MatMul, Add, Mul, Div, Sqrt, Pow)
- Support shape ops (Reshape, Transpose, Squeeze, Unsqueeze, Concat)
- Support utility ops (Gather, Slice, Cast, Constant, Identity)
- Include comprehensive test coverage for parser and loader

* docs: add organization logo assets

* docs: update documentation for v0.6.0 release

- Add CHANGELOG entry for v0.6.0 (ONNX import, lazy GPU, command batching)
- Update README with new features and version number
- Update ROADMAP to show v0.6.0 as current release
- Mark Phase 6 as complete (ONNX + Lazy GPU)
- Add performance metrics for lazy GPU mode

* fix(lint): update for golangci-lint v2.7.0

- Remove unused //nolint:gosec directives for G115 (fixed in gosec v2.21.2+)
- Add nolint only for actual warnings: G103 (unsafe), G304 (file path), G404 (weak random)
- Fix whitespace errors (unnecessary leading newlines)

v0.5.5

Toggle v0.5.5's commit message
feat(webgpu): add GPU multi-dim Transpose and Expand operations

- Add transposeNDShader for N-dimensional transpose (up to 6D)
- Add expandShader for NumPy-style GPU broadcasting
- Support both float32 and int32 data types
- Remove CPU fallback for Transpose and Expand
- Add 9 new tests for 3D/4D/5D/6D operations

Fixes ~60s/batch slowdown in transformer training.
Closes TASK-052.

v0.5.4

Toggle v0.5.4's commit message
v0.5.4: Model Serialization

Features:
- Born Native Format v2 (.born) with SHA-256 checksum
- Security validation (offset overlap, path traversal, bounds check)
- Memory-mapped reader for large models (70GB+)
- Checkpoint API for training resume
- SafeTensors export for HuggingFace compatibility

New Package:
- internal/serialization - Format writer/reader, validation, mmap

API:
- nn.Save(model, path, modelType, metadata)
- nn.Load(path, backend, model)
- nn.SaveCheckpoint() / nn.LoadCheckpoint()
- serialization.WriteSafeTensors()

v0.5.3

Toggle v0.5.3's commit message
v0.5.3 - WebGPU Backend Fixes (HRM Compatibility)

Bug Fixes:
- Comparison ops: always return float32 (0.0/1.0) even for int32 inputs
- Sum int32: added WGSL shader for int32 sum reduction
- Sum scalar shape: fixed return shape from [1] to [] for proper scalar handling
- Where int32 condition: added support for int32 condition tensors
- Where broadcasting: added NumPy-style broadcasting (like Burn)
- Gather backward: support for int32, int64, float32 index tensors

New Functions:
- runComparisonOp - dedicated function for comparison operations
- int32ToFloat32 - helper for int32 to float32 conversion

Tests:
- 3 new Gather backward tests

v0.5.2

Toggle v0.5.2's commit message
v0.5.2 - Critical Autodiff Fixes & Public WebGPU API

v0.5.1

Toggle v0.5.1's commit message
v0.5.1 Hotfix - Embedding Autodiff

Critical fix:
- nn.Embedding now records on autodiff tape
- Gradient flow restored for all embedding parameters
- All models with embeddings can now train properly