Tags: born-ml/born
Tags
v0.7.1: Code quality refactoring (Issue #14) - Apply Burn framework patterns for improved code quality - Pre-slice bounds elimination in Conv2D/MaxPool2D - Stride specialization for compiler auto-vectorization - Flash Attention CPU refactor (complexity 111 → <30) - Autodiff delegation pattern (separate orchestration from computation) - New internal/parallel package for parallel execution utilities - Extended Backend interface with backward operation methods
feat: v0.7.0 inference optimization - Flash Attention, Speculative De… …coding, GGUF (#13) TASK-062: GGUF Import Complete - Parser: types, metadata, tensor info extraction - Loader: tensor data loading with memory mapping - Dequantization: K-quants (Q4_K, Q5_K, Q6_K, Q8_0) - Converter: GGUF to Born tensor format TASK-060: Flash Attention 2 - Week 1: CPU reference with online softmax O(N) memory - Week 2: WebGPU WGSL shader with tiled computation - Supports causal masking, head dims 64-256, block sizes 64/128 - GPU vs CPU validation < 1e-4 error TASK-061: Speculative Decoding - Draft model generates K tokens speculatively - Target model verifies in parallel batch - Modified rejection sampling for token acceptance - 2-4x speedup potential for autoregressive generation Also: Fixed 226 gosec G115 lint issues across codebase
feat: v0.6.0 - ONNX Import + Lazy GPU Mode (#11) * feat(tensor): add lazy GPU evaluation and raw tensor operations - Add LazyGPUData struct for GPU-resident tensor data with lazy realization - Add runtime.SetFinalizer for automatic GPU buffer cleanup on GC - Extend RawTensor with GPUData() accessor and lazy data realization in Data() - Add comprehensive raw_ops.go with 50+ tensor operations (argmax, topk, etc.) - Support type conversion, broadcasting, and advanced indexing operations * feat(webgpu): add GPU autodiff backward ops and batch operations - Add GPU-accelerated backward operations (MatMulBackward, AddBackward, etc.) - Add GPUTensor wrapper for seamless GPU/CPU tensor operations - Add batch processing support for efficient multi-tensor operations - Include comprehensive test coverage for all new operations * feat(webgpu): complete Phase 3 lazy mode with command batching - Add lazy compute operations that keep data GPU-resident until needed - Implement GPU-to-GPU buffer copy to avoid CPU round-trips - Add command batching to reduce GPU sync overhead (200 submits → 1-2) - Extend lazy mode to all critical operations (Add, Mul, MatMul, Softmax, etc.) - Add FlushCommands() for explicit synchronization when needed * feat(onnx): add ONNX model import and inference API - Add ONNX protobuf parser for .onnx model files - Implement model loader with weight extraction and graph construction - Add operator registry with 30+ standard ONNX operators - Support activations (ReLU, Sigmoid, Tanh, Softmax, GELU) - Support math ops (MatMul, Add, Mul, Div, Sqrt, Pow) - Support shape ops (Reshape, Transpose, Squeeze, Unsqueeze, Concat) - Support utility ops (Gather, Slice, Cast, Constant, Identity) - Include comprehensive test coverage for parser and loader * docs: add organization logo assets * docs: update documentation for v0.6.0 release - Add CHANGELOG entry for v0.6.0 (ONNX import, lazy GPU, command batching) - Update README with new features and version number - Update ROADMAP to show v0.6.0 as current release - Mark Phase 6 as complete (ONNX + Lazy GPU) - Add performance metrics for lazy GPU mode * fix(lint): update for golangci-lint v2.7.0 - Remove unused //nolint:gosec directives for G115 (fixed in gosec v2.21.2+) - Add nolint only for actual warnings: G103 (unsafe), G304 (file path), G404 (weak random) - Fix whitespace errors (unnecessary leading newlines)
feat(webgpu): add GPU multi-dim Transpose and Expand operations - Add transposeNDShader for N-dimensional transpose (up to 6D) - Add expandShader for NumPy-style GPU broadcasting - Support both float32 and int32 data types - Remove CPU fallback for Transpose and Expand - Add 9 new tests for 3D/4D/5D/6D operations Fixes ~60s/batch slowdown in transformer training. Closes TASK-052.
v0.5.4: Model Serialization Features: - Born Native Format v2 (.born) with SHA-256 checksum - Security validation (offset overlap, path traversal, bounds check) - Memory-mapped reader for large models (70GB+) - Checkpoint API for training resume - SafeTensors export for HuggingFace compatibility New Package: - internal/serialization - Format writer/reader, validation, mmap API: - nn.Save(model, path, modelType, metadata) - nn.Load(path, backend, model) - nn.SaveCheckpoint() / nn.LoadCheckpoint() - serialization.WriteSafeTensors()
v0.5.3 - WebGPU Backend Fixes (HRM Compatibility) Bug Fixes: - Comparison ops: always return float32 (0.0/1.0) even for int32 inputs - Sum int32: added WGSL shader for int32 sum reduction - Sum scalar shape: fixed return shape from [1] to [] for proper scalar handling - Where int32 condition: added support for int32 condition tensors - Where broadcasting: added NumPy-style broadcasting (like Burn) - Gather backward: support for int32, int64, float32 index tensors New Functions: - runComparisonOp - dedicated function for comparison operations - int32ToFloat32 - helper for int32 to float32 conversion Tests: - 3 new Gather backward tests
PreviousNext