Dynamo v0.7.1 - Release Notes
Summary
Dynamo 0.7.1 is a patch release focusing on tool calling support, NIXL performance improvements, and preprocessing fixes. This release significantly expands function calling capabilities with new tool parsers for DeepSeek V3/R1 models and XML Coder format, improves NIXL concurrency and byte handling for better distributed inference performance, and fixes a critical preprocessor issue with stop token handling.
Base Branch: release/0.7.0.post1
Full Changelog
Performance and Framework Support
- NIXL Byte Handling: Refactored how bytes are passed to NIXL in the nixl_connect module (#4860) to improve memory handling efficiency and compatibility with NIXL's native byte processing requirements for distributed KV cache transfers.
- NIXL Concurrency Improvements: Enhanced concurrency support in the nixl_connect module (#4862) to enable better parallel processing of NIXL operations, improving throughput for disaggregated inference workloads with multiple concurrent requests.
Tool Calling Support
- DeepSeek V3/R1 Tool Parser: Added toolcall parser support for DeepSeek V3 and DeepSeek R1 models (#4861) enabling function calling capabilities with these popular open-weight reasoning models for agentic workflows and structured output generation.
- XML Coder Tool Parser: Implemented XML Coder tool parser format (#4859) providing an additional function calling format option for models that use XML-based tool definitions and responses.
- Tool Call Configuration Types: Refactored tool call configuration with new config types (#4857) improving type safety, validation, and extensibility of tool calling configuration options across supported models and parsers.
Bug Fixes
- Preprocessor Stop Field: Fixed preprocessor to properly populate the "stop" field in request handling (#4858) ensuring stop sequences are correctly propagated through the inference pipeline and models properly terminate generation at specified stop tokens.
- min_tokens with ignore_eos: Fixed an issue where setting
ignore_eos=truewould automatically overridemin_tokensto equalmax_tokens(#4908) ensuring users can continue generation past the EOS token without being forced to generate the maximum number of tokens.