Skip to content

Dynamo v0.7.1

Latest

Choose a tag to compare

@dagil-nvidia dagil-nvidia released this 15 Dec 19:34
15f1a73

Dynamo v0.7.1 - Release Notes

Summary

Dynamo 0.7.1 is a patch release focusing on tool calling support, NIXL performance improvements, and preprocessing fixes. This release significantly expands function calling capabilities with new tool parsers for DeepSeek V3/R1 models and XML Coder format, improves NIXL concurrency and byte handling for better distributed inference performance, and fixes a critical preprocessor issue with stop token handling.

Base Branch: release/0.7.0.post1

Full Changelog

Performance and Framework Support

  • NIXL Byte Handling: Refactored how bytes are passed to NIXL in the nixl_connect module (#4860) to improve memory handling efficiency and compatibility with NIXL's native byte processing requirements for distributed KV cache transfers.
  • NIXL Concurrency Improvements: Enhanced concurrency support in the nixl_connect module (#4862) to enable better parallel processing of NIXL operations, improving throughput for disaggregated inference workloads with multiple concurrent requests.

Tool Calling Support

  • DeepSeek V3/R1 Tool Parser: Added toolcall parser support for DeepSeek V3 and DeepSeek R1 models (#4861) enabling function calling capabilities with these popular open-weight reasoning models for agentic workflows and structured output generation.
  • XML Coder Tool Parser: Implemented XML Coder tool parser format (#4859) providing an additional function calling format option for models that use XML-based tool definitions and responses.
  • Tool Call Configuration Types: Refactored tool call configuration with new config types (#4857) improving type safety, validation, and extensibility of tool calling configuration options across supported models and parsers.

Bug Fixes

  • Preprocessor Stop Field: Fixed preprocessor to properly populate the "stop" field in request handling (#4858) ensuring stop sequences are correctly propagated through the inference pipeline and models properly terminate generation at specified stop tokens.
  • min_tokens with ignore_eos: Fixed an issue where setting ignore_eos=true would automatically override min_tokens to equal max_tokens (#4908) ensuring users can continue generation past the EOS token without being forced to generate the maximum number of tokens.