Skip to content

Conversation

SamuelBarryCS
Copy link
Contributor

@SamuelBarryCS SamuelBarryCS commented Sep 1, 2025

What:

  • Implement fast processor for PromptDepthAnything, following request in [Contributions Welcome] Add Fast Image Processors #36978
  • Add one additional test to tests/models/prompt_depth_anything to check for numerical value
  • Add a temporary file (to be deleted before merging) to run additional test and get speed benchmark of classic vs. fast implementation

Test performed:

  • Run RUN_SLOW=1 python -m pytest tests/models/prompt_depth_anything/test_image_processing_prompt_depth_anything.py -v, all passing:
image
  • Run the temporary file to get additional confidence in the fast processor output & get speed benchmarks:
📱 Testing on device: cpu
------------------------------

🔧 Config: batch_size=1, image_size=(384, 384)
⏳ Benchmarking slow processor...
⚡ Benchmarking fast processor...
📊 Results:
   Slow: 0.0032s ± 0.0004s
   Fast: 0.0006s ± 0.0000s
   Speedup: 5.83x
✅ Output verification: PASSED
   (Shape checked ✓, pixel value equality checked ✓, depth value equality checked ✓)

🔧 Config: batch_size=1, image_size=(512, 512)
⏳ Benchmarking slow processor...
⚡ Benchmarking fast processor...
📊 Results:
   Slow: 0.0049s ± 0.0003s
   Fast: 0.0014s ± 0.0001s
   Speedup: 3.42x
✅ Output verification: PASSED
   (Shape checked ✓, pixel value equality checked ✓, depth value equality checked ✓)
...
...
(all 12 cases passing)

Impact metrics

  • 10x speedup on 1 H100 setup

Speed benchmark (normal vs fast) on CPU:
image

Speed benchmark (normal vs fast) on 1 H100:
image

How to review:

  • Read diff
  • Run tests with RUN_SLOW=1 python -m pytest tests/models/prompt_depth_anything/test_image_processing_prompt_depth_anything.py -v

TODO/ Next:

  • NA

@SamuelBarryCS SamuelBarryCS marked this pull request as ready for review September 2, 2025 06:52
@SamuelBarryCS SamuelBarryCS changed the title [WIP] Add Fast PromptDepthAnything Processor Add Fast PromptDepthAnything Processor Sep 2, 2025
@SamuelBarryCS
Copy link
Contributor Author

SamuelBarryCS commented Sep 2, 2025

cc @yonigozlan, ready for review 🤗!

@Rocketknight1
Copy link
Member

cc @yonigozlan

@SamuelBarryCS SamuelBarryCS changed the title Add Fast PromptDepthAnything Processor [WIP] Add Fast PromptDepthAnything Processor Sep 8, 2025
@SamuelBarryCS SamuelBarryCS marked this pull request as draft September 8, 2025 17:35
@SamuelBarryCS
Copy link
Contributor Author

SamuelBarryCS commented Sep 8, 2025

@yonigozlan I actually have to fix things - I turned back the PR in draft/ WIP
Sorry about that. Let me ping you once it's reviewable

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @SamuelBarryCS, thanks a lot for contributing this! I pointed out a few changes to make before merging 🤗

@SamuelBarryCS SamuelBarryCS changed the title [WIP] Add Fast PromptDepthAnything Processor Add Fast PromptDepthAnything Processor Sep 11, 2025
@SamuelBarryCS SamuelBarryCS marked this pull request as ready for review September 11, 2025 05:02
processed_images = reorder_images(processed_images_grouped, grouped_images_index)

# Only stack tensors if they all have the same shape and return_tensors is specified
if return_tensors == "pt" and processed_images:
Copy link
Contributor Author

@SamuelBarryCS SamuelBarryCS Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI Yoni - we can't stack tensors of different shapes. Trying to stack without this check makes the tests fail

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's expected indeed, but better to get an error when attempting to stack than silently not stack, as the user will be expecting a tensor in output. For the batch tests, we can just set keep_aspect_ratio to False to have them pass

@SamuelBarryCS
Copy link
Contributor Author

SamuelBarryCS commented Sep 12, 2025

@yonigozlan it looks like I found a bug in the behavior of the slow processor.
Cf below where I'm logging encoding_slow.prompt_depth.shape & encoding_fast.prompt_depth.shape for torchify = True/ False in images = self.image_processor_tester.prepare_image_inputs(equal_resolution=True, torchify=True) in test_slow_fast_equivalence_batched:

=== TORCHIFY=FALSE ===
Slow processor: torch.Size([7, 1, 192, 256])
Fast processor: torch.Size([7, 1, 192, 256])

=== TORCHIFY=TRUE ===
Slow processor: torch.Size([7, 192, 256, 1])
Fast processor: torch.Size([7, 1, 192, 256])

The issue comes from applying to_channel_dimension_format with the wrong input channel dim. I fixed it in 6aec2cf. After the fix, the logging looks as follow:

=== TORCHIFY=FALSE ===
Slow processor: torch.Size([7, 1, 192, 256])
Fast processor: torch.Size([7, 1, 192, 256])

=== TORCHIFY=TRUE ===
Slow processor: torch.Size([7, 1, 192, 256])
Fast processor: torch.Size([7, 1, 192, 256])

which is way better :) !

@SamuelBarryCS
Copy link
Contributor Author

Should be ready for another (hopefully final) round of review @yonigozlan 🤗

@yonigozlan
Copy link
Member

Actually @yonigozlan , isn't there a bug in the slow processor ?

pad_size_left, pad_size_right = _get_pad(height, size_divisor)

Why are we getting left/right pd using height and not width ? ... I am doing pad_size_left, pad_size_right = _get_pad(width, size_divisor) in fast processor which makes much more sense in my opinion

Indeed thanks for catching that! Looks like it was functionally correct because it's also reversed in the call to pad, but definitely misleading

@yonigozlan
Copy link
Member

@yonigozlan it looks like I found a bug in the behavior of the slow processor. Cf below where I'm logging encoding_slow.prompt_depth.shape & encoding_fast.prompt_depth.shape for torchify = True/ False in images = self.image_processor_tester.prepare_image_inputs(equal_resolution=True, torchify=True) in test_slow_fast_equivalence_batched:

=== TORCHIFY=FALSE ===
Slow processor: torch.Size([7, 1, 192, 256])
Fast processor: torch.Size([7, 1, 192, 256])

=== TORCHIFY=TRUE ===
Slow processor: torch.Size([7, 192, 256, 1])
Fast processor: torch.Size([7, 1, 192, 256])

The issue comes from applying to_channel_dimension_format with the wrong input channel dim. I fixed it in 6aec2cf. After the fix, the logging looks as follow:

=== TORCHIFY=FALSE ===
Slow processor: torch.Size([7, 1, 192, 256])
Fast processor: torch.Size([7, 1, 192, 256])

=== TORCHIFY=TRUE ===
Slow processor: torch.Size([7, 1, 192, 256])
Fast processor: torch.Size([7, 1, 192, 256])

which is way better :) !

Niice thanks for fixing!

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating again, just pushed some last very minor changes, but everything looks good now, :) waiting for the CI to pass then I'll merge!

processed_images = reorder_images(processed_images_grouped, grouped_images_index)

# Only stack tensors if they all have the same shape and return_tensors is specified
if return_tensors == "pt" and processed_images:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's expected indeed, but better to get an error when attempting to stack than silently not stack, as the user will be expecting a tensor in output. For the batch tests, we can just set keep_aspect_ratio to False to have them pass

@yonigozlan yonigozlan enabled auto-merge (squash) September 15, 2025 14:48
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, prompt_depth_anything

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan yonigozlan merged commit ff26fe8 into huggingface:main Sep 15, 2025
23 checks passed
@SamuelBarryCS
Copy link
Contributor Author

Perfect, thanks a lot for the last round of edit & merging! 🤗

ErfanBaghaei pushed a commit to ErfanBaghaei/transformers that referenced this pull request Sep 25, 2025
* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
vijayabhaskar-ev pushed a commit to vijayabhaskar-ev/transformers that referenced this pull request Oct 2, 2025
* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
yuchenxie4645 pushed a commit to yuchenxie4645/transformers that referenced this pull request Oct 4, 2025
* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
gante added a commit to gante/transformers that referenced this pull request Oct 8, 2025
… text generation (huggingface#40837)

* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <[email protected]>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <[email protected]>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <[email protected]>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <[email protected]>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <[email protected]>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add comment

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve comments

Signed-off-by: Yuanyuan Chen <[email protected]>

* Revert typing

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <[email protected]>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: Steven Liu <[email protected]>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <[email protected]>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <[email protected]>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: vb <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <[email protected]>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <[email protected]>

---------

Co-authored-by: Mohamed Mekkouri <[email protected]>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <[email protected]>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <[email protected]>

* update cpu dockerfile

Signed-off-by: jiqing-feng <[email protected]>

* update label name

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <[email protected]>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <[email protected]>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <[email protected]>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <[email protected]>
Co-authored-by: Anna Banaszak <[email protected]>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <[email protected]>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <[email protected]>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <[email protected]>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <[email protected]>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <[email protected]>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <[email protected]>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <[email protected]>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <[email protected]>
Co-authored-by: Arthur <[email protected]>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <[email protected]>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <[email protected]>

* Format code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <[email protected]>

---------

Co-authored-by: Joao Gante <[email protected]>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <[email protected]>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <[email protected]>

* fix

---------

Co-authored-by: ydshieh <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix conflict

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <[email protected]>

* Revert changes

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <[email protected]>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <[email protected]>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <[email protected]>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <[email protected]>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <[email protected]>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Format code

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <[email protected]>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <[email protected]>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix comments

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <[email protected]>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <[email protected]>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <[email protected]>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Matt <[email protected]>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <[email protected]>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <[email protected]>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Signed-off-by: greg-kwasniewski1 <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Wang, Yi <[email protected]>
Co-authored-by: ArminAzizi98 <[email protected]>
Co-authored-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Mohamed Mekkouri <[email protected]>
Co-authored-by: Yuchao Zhang <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>
Co-authored-by: Pavel Iakubovskii <[email protected]>
Co-authored-by: Bo Zheng <[email protected]>
Co-authored-by: bozheng-hit <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Rémi Ouazan <[email protected]>
Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: Ryan Mullins <[email protected]>
Co-authored-by: Amer <[email protected]>
Co-authored-by: eustlb <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Ákos Hadnagy <[email protected]>
Co-authored-by: Grzegorz Kwasniewski <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: 艾力可 <[email protected]>
Co-authored-by: JJJYmmm <[email protected]>
Co-authored-by: Manuel de Prada Corral <[email protected]>
Co-authored-by: Samuel Barry <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
Co-authored-by: HyunZ118 <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>
Co-authored-by: ydshieh <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Shane A <[email protected]>
Co-authored-by: Xuehai Pan <[email protected]>
Co-authored-by: Matt <[email protected]>
Co-authored-by: Raushan Turganbay <[email protected]>
Co-authored-by: Aritra Roy Gosthipaty <[email protected]>
Co-authored-by: vb <[email protected]>
Co-authored-by: Yaswanth Gali <[email protected]>
Co-authored-by: Akshay Babbar <[email protected]>
Co-authored-by: liangel-02 <[email protected]>
Co-authored-by: Duc-Viet Hoang <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <[email protected]>
Co-authored-by: lilin-1 <[email protected]>
Co-authored-by: Matej Sirovatka <[email protected]>
Co-authored-by: Jack <[email protected]>
Co-authored-by: Rangehow <[email protected]>
Co-authored-by: rangehow <[email protected]>
Co-authored-by: Anna <[email protected]>
Co-authored-by: Anna Banaszak <[email protected]>
Co-authored-by: Hamish Scott <[email protected]>
Co-authored-by: Harshal Janjani <[email protected]>
Co-authored-by: Branden <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Benjamin Bossan <[email protected]>
Co-authored-by: Ita Zaporozhets <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: StevenBucaille <[email protected]>
Co-authored-by: BakerBunker <[email protected]>
Co-authored-by: lvyuanjun.lyj <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Ayush <[email protected]>
Co-authored-by: Ryan Mullins <[email protected]>
Co-authored-by: Yannick Schnider <[email protected]>
Co-authored-by: Ralph Gleaton <[email protected]>
Co-authored-by: Saidur Rahman Pulok <[email protected]>
Co-authored-by: Nick Doiron <[email protected]>
Co-authored-by: Wang, Yi <[email protected]>
Co-authored-by: Duygu Altinok <[email protected]>
Co-authored-by: Jinde.Song <[email protected]>
Co-authored-by: hbenoit <[email protected]>
Co-authored-by: nnul <[email protected]>
Co-authored-by: YangKai0616 <[email protected]>
Co-authored-by: Karol Szustakowski <[email protected]>
Co-authored-by: souvikku <[email protected]>
omsherikar pushed a commit to omsherikar/transformers that referenced this pull request Oct 8, 2025
… text generation (huggingface#40837)

* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <[email protected]>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <[email protected]>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <[email protected]>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <[email protected]>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <[email protected]>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add comment

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve comments

Signed-off-by: Yuanyuan Chen <[email protected]>

* Revert typing

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <[email protected]>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: Steven Liu <[email protected]>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <[email protected]>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <[email protected]>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: vb <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <[email protected]>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <[email protected]>

---------

Co-authored-by: Mohamed Mekkouri <[email protected]>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <[email protected]>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <[email protected]>

* update cpu dockerfile

Signed-off-by: jiqing-feng <[email protected]>

* update label name

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <[email protected]>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <[email protected]>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <[email protected]>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <[email protected]>
Co-authored-by: Anna Banaszak <[email protected]>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <[email protected]>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <[email protected]>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <[email protected]>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <[email protected]>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <[email protected]>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <[email protected]>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <[email protected]>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <[email protected]>
Co-authored-by: Arthur <[email protected]>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <[email protected]>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <[email protected]>

* Format code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <[email protected]>

---------

Co-authored-by: Joao Gante <[email protected]>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <[email protected]>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <[email protected]>

* fix

---------

Co-authored-by: ydshieh <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix conflict

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <[email protected]>

* Revert changes

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <[email protected]>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <[email protected]>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <[email protected]>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <[email protected]>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <[email protected]>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Format code

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <[email protected]>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <[email protected]>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix comments

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <[email protected]>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <[email protected]>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <[email protected]>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Matt <[email protected]>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <[email protected]>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <[email protected]>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Signed-off-by: greg-kwasniewski1 <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Wang, Yi <[email protected]>
Co-authored-by: ArminAzizi98 <[email protected]>
Co-authored-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Mohamed Mekkouri <[email protected]>
Co-authored-by: Yuchao Zhang <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>
Co-authored-by: Pavel Iakubovskii <[email protected]>
Co-authored-by: Bo Zheng <[email protected]>
Co-authored-by: bozheng-hit <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Rémi Ouazan <[email protected]>
Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: Ryan Mullins <[email protected]>
Co-authored-by: Amer <[email protected]>
Co-authored-by: eustlb <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Ákos Hadnagy <[email protected]>
Co-authored-by: Grzegorz Kwasniewski <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: 艾力可 <[email protected]>
Co-authored-by: JJJYmmm <[email protected]>
Co-authored-by: Manuel de Prada Corral <[email protected]>
Co-authored-by: Samuel Barry <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
Co-authored-by: HyunZ118 <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>
Co-authored-by: ydshieh <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Shane A <[email protected]>
Co-authored-by: Xuehai Pan <[email protected]>
Co-authored-by: Matt <[email protected]>
Co-authored-by: Raushan Turganbay <[email protected]>
Co-authored-by: Aritra Roy Gosthipaty <[email protected]>
Co-authored-by: vb <[email protected]>
Co-authored-by: Yaswanth Gali <[email protected]>
Co-authored-by: Akshay Babbar <[email protected]>
Co-authored-by: liangel-02 <[email protected]>
Co-authored-by: Duc-Viet Hoang <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <[email protected]>
Co-authored-by: lilin-1 <[email protected]>
Co-authored-by: Matej Sirovatka <[email protected]>
Co-authored-by: Jack <[email protected]>
Co-authored-by: Rangehow <[email protected]>
Co-authored-by: rangehow <[email protected]>
Co-authored-by: Anna <[email protected]>
Co-authored-by: Anna Banaszak <[email protected]>
Co-authored-by: Hamish Scott <[email protected]>
Co-authored-by: Harshal Janjani <[email protected]>
Co-authored-by: Branden <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Benjamin Bossan <[email protected]>
Co-authored-by: Ita Zaporozhets <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: StevenBucaille <[email protected]>
Co-authored-by: BakerBunker <[email protected]>
Co-authored-by: lvyuanjun.lyj <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Ayush <[email protected]>
Co-authored-by: Ryan Mullins <[email protected]>
Co-authored-by: Yannick Schnider <[email protected]>
Co-authored-by: Ralph Gleaton <[email protected]>
Co-authored-by: Saidur Rahman Pulok <[email protected]>
Co-authored-by: Nick Doiron <[email protected]>
Co-authored-by: Wang, Yi <[email protected]>
Co-authored-by: Duygu Altinok <[email protected]>
Co-authored-by: Jinde.Song <[email protected]>
Co-authored-by: hbenoit <[email protected]>
Co-authored-by: nnul <[email protected]>
Co-authored-by: YangKai0616 <[email protected]>
Co-authored-by: Karol Szustakowski <[email protected]>
Co-authored-by: souvikku <[email protected]>
snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/huggingface_transformers_pr_40602_2130fe30-b2c7-41cb-beee-e631b44292ef that referenced this pull request Oct 11, 2025
Original PR #40602 by SamuelBarryCS
Original: huggingface/transformers#40602
snorkelopstesting1-a11y added a commit to snorkel-marlin-repos/huggingface_transformers_pr_40602_2130fe30-b2c7-41cb-beee-e631b44292ef that referenced this pull request Oct 11, 2025
snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/huggingface_transformers_pr_40602_a022d0ab-de45-4126-ad35-5d40859a0560 that referenced this pull request Oct 11, 2025
Original PR #40602 by SamuelBarryCS
Original: huggingface/transformers#40602
snorkelopstesting1-a11y added a commit to snorkel-marlin-repos/huggingface_transformers_pr_40602_a022d0ab-de45-4126-ad35-5d40859a0560 that referenced this pull request Oct 11, 2025
AhnJoonSung pushed a commit to AhnJoonSung/transformers that referenced this pull request Oct 12, 2025
… text generation (huggingface#40837)

* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <[email protected]>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <[email protected]>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <[email protected]>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <[email protected]>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <[email protected]>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <[email protected]>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add comment

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve comments

Signed-off-by: Yuanyuan Chen <[email protected]>

* Revert typing

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <[email protected]>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: yonigozlan <[email protected]>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <[email protected]>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: Steven Liu <[email protected]>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <[email protected]>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <[email protected]>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: vb <[email protected]>
Co-authored-by: Steven Liu <[email protected]>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <[email protected]>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <[email protected]>

---------

Co-authored-by: Mohamed Mekkouri <[email protected]>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <[email protected]>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <[email protected]>

* update cpu dockerfile

Signed-off-by: jiqing-feng <[email protected]>

* update label name

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <[email protected]>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <[email protected]>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <[email protected]>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <[email protected]>
Co-authored-by: Anna Banaszak <[email protected]>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <[email protected]>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <[email protected]>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <[email protected]>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <[email protected]>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <[email protected]>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <[email protected]>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <[email protected]>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <[email protected]>
Co-authored-by: Arthur <[email protected]>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <[email protected]>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <[email protected]>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <[email protected]>

* Format code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <[email protected]>

---------

Co-authored-by: Joao Gante <[email protected]>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <[email protected]>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <[email protected]>

* fix

---------

Co-authored-by: ydshieh <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix conflict

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <[email protected]>

* Revert changes

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <[email protected]>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <[email protected]>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <[email protected]>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <[email protected]>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <[email protected]>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Format code

Signed-off-by: Yuanyuan Chen <[email protected]>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* Improve typing

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <[email protected]>

* More fixes

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <[email protected]>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <[email protected]>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <[email protected]>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix comments

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix code

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <[email protected]>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <[email protected]>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <[email protected]>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <[email protected]>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <[email protected]>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <[email protected]>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

* Clear pass

Signed-off-by: Yuanyuan Chen <[email protected]>

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Matt <[email protected]>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <[email protected]>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <[email protected]>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <[email protected]>
Signed-off-by: greg-kwasniewski1 <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Wang, Yi <[email protected]>
Co-authored-by: ArminAzizi98 <[email protected]>
Co-authored-by: Yuanyuan Chen <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Mohamed Mekkouri <[email protected]>
Co-authored-by: Yuchao Zhang <[email protected]>
Co-authored-by: Anton Vlasjuk <[email protected]>
Co-authored-by: Pavel Iakubovskii <[email protected]>
Co-authored-by: Bo Zheng <[email protected]>
Co-authored-by: bozheng-hit <[email protected]>
Co-authored-by: Cyril Vallez <[email protected]>
Co-authored-by: Rémi Ouazan <[email protected]>
Co-authored-by: Yoni Gozlan <[email protected]>
Co-authored-by: Ryan Mullins <[email protected]>
Co-authored-by: Amer <[email protected]>
Co-authored-by: eustlb <[email protected]>
Co-authored-by: Albert Villanova del Moral <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Ákos Hadnagy <[email protected]>
Co-authored-by: Grzegorz Kwasniewski <[email protected]>
Co-authored-by: NanoCode012 <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: 艾力可 <[email protected]>
Co-authored-by: JJJYmmm <[email protected]>
Co-authored-by: Manuel de Prada Corral <[email protected]>
Co-authored-by: Samuel Barry <[email protected]>
Co-authored-by: yonigozlan <[email protected]>
Co-authored-by: HyunZ118 <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Co-authored-by: YONGSANG <[email protected]>
Co-authored-by: Yijun Lee <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>
Co-authored-by: ydshieh <[email protected]>
Co-authored-by: Pablo Montalvo <[email protected]>
Co-authored-by: Shane A <[email protected]>
Co-authored-by: Xuehai Pan <[email protected]>
Co-authored-by: Matt <[email protected]>
Co-authored-by: Raushan Turganbay <[email protected]>
Co-authored-by: Aritra Roy Gosthipaty <[email protected]>
Co-authored-by: vb <[email protected]>
Co-authored-by: Yaswanth Gali <[email protected]>
Co-authored-by: Akshay Babbar <[email protected]>
Co-authored-by: liangel-02 <[email protected]>
Co-authored-by: Duc-Viet Hoang <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <[email protected]>
Co-authored-by: lilin-1 <[email protected]>
Co-authored-by: Matej Sirovatka <[email protected]>
Co-authored-by: Jack <[email protected]>
Co-authored-by: Rangehow <[email protected]>
Co-authored-by: rangehow <[email protected]>
Co-authored-by: Anna <[email protected]>
Co-authored-by: Anna Banaszak <[email protected]>
Co-authored-by: Hamish Scott <[email protected]>
Co-authored-by: Harshal Janjani <[email protected]>
Co-authored-by: Branden <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Benjamin Bossan <[email protected]>
Co-authored-by: Ita Zaporozhets <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Lysandre <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: StevenBucaille <[email protected]>
Co-authored-by: BakerBunker <[email protected]>
Co-authored-by: lvyuanjun.lyj <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Ayush <[email protected]>
Co-authored-by: Ryan Mullins <[email protected]>
Co-authored-by: Yannick Schnider <[email protected]>
Co-authored-by: Ralph Gleaton <[email protected]>
Co-authored-by: Saidur Rahman Pulok <[email protected]>
Co-authored-by: Nick Doiron <[email protected]>
Co-authored-by: Wang, Yi <[email protected]>
Co-authored-by: Duygu Altinok <[email protected]>
Co-authored-by: Jinde.Song <[email protected]>
Co-authored-by: hbenoit <[email protected]>
Co-authored-by: nnul <[email protected]>
Co-authored-by: YangKai0616 <[email protected]>
Co-authored-by: Karol Szustakowski <[email protected]>
Co-authored-by: souvikku <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants