IREE Release v3.8.0
1. Compiler
1.1 Data Tiling & Scaled Matmul
- Introduced DataTiledScaledMMAAttr and implemented scaled matmul data tiling materialization using new scaled intrinsic attributes for improved codegen flexibility. (#22176, #22189)
- Added ping-pong ukernel support for FP8 and FP16 data tiling, tuned for LLaMA workloads, delivering up to 30–40% latency reduction vs. non–data-tiled paths. (#21919)
- Added ROCm encoding specialization via UKernelProviderInterface for data-tiled ukernels. (#21914)
- Introduced intentional padded configurations for (I)GEMM to improve convolution performance by ~8% with no degradation in backward paths. (#21931)
- Disabled data-tiling by default for CPU backends due to memory and backend inconsistencies; it’s now opt-in via --iree-opt-data-tiling, with updated CPU docs and tests reflecting the change. (#21935)
- Published a detailed blog on Data Tiling introducing how operand layouts are transformed to match hardware-preferred formats for better locality and cache efficiency. (https://iree.dev/community/blog/2025-08-25-data-tiling-walkthrough/)
1.2 Convolution
- Transposed input backward convolution filter layout from CHWF → FHWC, aligning with matmul_transpose_b and improving performance. (#22100)
- Reordered iterator dimensions for input backward convolutions to match forward NHWC-FHWC conv layout, simplifying autotuning and shape handling. (#22208)
- Enabled extract slice propagation during convolution padding to improve fusion opportunities. (#21948)
1.3 Matmul & Vector Distribute
- Removed virtual MMAs from vector distribute matmul/conv pipelines to fix regressions and restore original performance on Punet configurations. (#22202)
- Added support for distributing subgroups across multiple M dimensions in vector distribute pipelines, improving parallel utilization. (#22000)
1.4 Others
- Added encoding propagation and fusion passes in the default dispatch creation path, improving layout-based fusion. (#22063)
- Introduced optional split-reduction size inference for batch normalization. (#21731)
- Fused broadcasts with attention consumers instead of producers, improving dimension inference and downstream fusion. (#22008)
- Updated ConvertAccGEMMToGEMM to support scaled GEMMs. (#22093)
- Reordered memref reshapes above empty tensor elimination to ensure correct dominance in bufferization. (#22045)
- Fixes and Refinements (#22222, #22106, #22179, #22041, #22143, #22095, #22233, #22197, #22195, #22033, #22031, #21997, #21910, #21970, #21952, #21900, #21890, #21665, #22100, #22208, #22045, #22202)
2. Runtime
- Split hoisted async constant lifetimes to drastically reduce retained memory (e.g., 9 GB → 500 KB in large tiled workloads). (#21995)
- Added per–entry-point flags and workgroup size emission, preparing for new HAL APIs and better runtime introspection.
- Updated GPU executable headers for versioning and added a new infer-format call to safely infer executable data format and size.
⚠️ Breaking change: requires GPU executable recompilation.(#21763)
- CPU matmul configuration switched to linalg::LinalgOp interface for better op fusion and flexibility. (#21954)
- General Enhancements and Fixes (#22101, #22110, #22102, #22048, #21921, #22075)
Change Log
Git History
What's Changed
- [DT] Fuse encoding ops more aggressively for multi-use, gather, and slices ops. by @hanhanW in #21830
- [Codegen][Tuner]: improve python binding to query target info by @bangtianliu in #21812
- [Codegen][Tuner] retire the C/Python binding for querying mma intrinsic. NFC. by @bangtianliu in #21816
- [Integrate] Drop llvm/llvm-project@b4c31dc revert. by @hanhanW in #21851
- [Encoding] Support SetEncoding on scaled contraction ops by @Max191 in #21825
- [Test] Add onnx_ops test suites with O2/O3 optimization level. by @hanhanW in #21838
- [CodeGen] Do not fuse parallel ops if they directly write to destination. by @hanhanW in #21837
- [GPU] Add pattern to fold fill into pad ops by @nirvedhmeshram in #21864
- [Codegen][IGEMM] Do not pre-pad convs with CHW layout or small input channel size by @yzhang93 in #21839
- [GPU] Remove reshape by expansion in workgroup scope of combine layout pass by @nirvedhmeshram in #21869
- [CPU] Remove passing tests from expected_compile_failures list. by @hanhanW in #21871
- [GPU] Use Affine map for size calculations of alloca's in fission pass by @nirvedhmeshram in #21870
- [Codegen][AMDGPU] Fix matmul miscompile on RDNA4 by @kuhar in #21873
- [NFC] Code Quality changes by @Muzammiluddin-Syed-ECE in #21876
- Avoid needles isa checks. NFC. by @kuhar in #21885
- [VectorDistribute] Refactor layout configuration to a simpler logic by @Groverkss in #21883
- [StableHLO][CHLO]Refactor CHLO decompositions to follow upstream StableHLO by @LekkalaSravya3 in #21682
- Revert "[VectorDistribute] Refactor layout configuration to a simpler logic" by @Groverkss in #21887
- [docs] Clarify compiler coding standards by @kuhar in #21886
- Upgrade Preprocessing and Modules to free create functions. NFC. by @kuhar in #21877
- [Codegen] Upgrade Common, SPIRV, VMVX to free create functions. NFC. by @kuhar in #21879
- [Codegen] Upgrade LLVMCPU and LLVMGPU to free create functions. NFC. by @kuhar in #21880
- [Codegen] Upgrade Dialect and Interfaces to free create functions. NFC. by @kuhar in #21881
- Add gfx950 ukernel patterns by @sebvince in #21856
- Bump version to 3.8.0 after 3.7.0 release. by @sa-faizal in #21852
- [docs] Update the file config file for running ONNX operator tests on CPU. by @hanhanW in #21892
- Upgrade GlobalOpt, InputConversion, ExternalInterfacess to free create function. NFC. by @kuhar in #21878
- [Codegen] Upgrade Transforms and Utils to free create functions. NFC. by @kuhar in #21882
- [ROCM] Update Ukernel infra to handle InnerTiledOp/Multi_MMA_MFMA by @Abhishek-Varma in #21759
- Reland "[VectorDistribute] Refactor layout configuration to a simpler logic" by @Groverkss in #21895
- Upgrade IREE plugins to free create functions. NFC. by @kuhar in #21896
- [GPU] Remove MMAScheduleAttr by @Groverkss in #21884
- [LLVMCPU] Respect dominance when doing replacement of tile and fused values by @MaheshRavishankar in #21901
- [Codegen] Upgrade iree dialects to free create functions. NFC. by @kuhar in #21898
- Integrate LLVM at llvm-project/llvm@daf8f9fc1ccc6c5679bc89058fd66d8ea4da9d59 by @rkayaith in #21893
- Upgrade all remaining code to free create functions. NFC. by @kuhar in #21902
- [LLVMGPU] Move LLVMGPUVectorLowering after OptimizeIntArithmetic by @Max191 in #21597
- [Codegen] Promote scales to LDS by @Muzammiluddin-Syed-ECE in #21767
- Bump the github-actions group with 2 updates by @dependabot[bot] in #21897
- Integrate llvm/llvm-project@31bee3421ba4 by @rkayaith in #21905
- [CPU] Tile all the ops to target vector sizes before vectorization. by @hanhanW in #21900
- [LinalgExt] Fold subview ops into map_scatter output before decomposing by @Max191 in #21891
- [GPU] Do not do c promotion for unaligned (I)GEMMs by @nirvedhmeshram in #21823
- [Codegen][ROCm] Add repro instructions for .rocmasm files by @kuhar in #21874
- [LinalgExt] Fix
FoldWithProducerReshapeByExpansion
for >1 dyn dim by @IanWood1 in #21894 - At the beginning of emulate narrow type, flatten incoming memrefs by @lialan in #21910
- Revert "Disable failing ARM-SME tests. (#21715)" by @banach-space in #21860
- [Codegen][AMDGPU] Drop backend reverts, emergency RDNA4 lowering fix by @krzysz00 in #21906
- [codegen] more consumer fusion by @jtuyls in #21848
- [CPU][DT] Add codegen support for broadcast/dequant -> matmul dispatch. by @hanhanW in #21911
- [Codegen][IGEMM] Set convolution pre-padding as default by @yzhang93 in #21899
- [Codegen][GenericVectorization] Fix incorrect usage of std::accumulation that led to overflow by @mshockwave in #21920
- Integrate llvm/llvm-project@e92cbfbe3087 by @rkayaith in #21917
- [Codegen][Cleanup] Always enable vectorization for padding and gather. by @hanhanW in #21924
- [Test] Disable AMDGPU onnx_ops test suite (O0) job. by @hanhanW in #21929
- Integrate llvm/torch-mlir@7000187b by @rkayaith in #21918
- Bump nanobind version by @Hardcode84 in #21926
- [iree][codegen] Add
#iree_codegen.denormal_fp_math
to set denormals behavior by @fabianmcg in #21840 - [ROCM] Add back specialization pattern tests by @jtuyls in #21939
- Fix
--iree-hip-target
validation by @bjacob in #21909 - Integrate LLVM at llvm/llvm-project@b22f94dcc58e by @rkayaith in #21943
- [GPU] Propagate extract slice when doing convolution padding by @nirvedhmeshram in #21948
- [CPU] Adjust tile sizes for mmt4d dispatches that have relayout ops. by @hanhanW in #21934
- Fixing CSE of hoisted encoding ops. by @benvanik in #21921
- Adding util.list.construct pseudo-op. by @benvanik in #21950
- [Dispatch Creation] Don't fuse no input producer with reduction by @IanWood1 in #21930
- Revert "[LinalgExt] Fix
FoldWithProducerReshapeByExpansion
for >1 dyn dim" by @IanWood1 in #21947 - [DispatchCreation]: Add FormSplitReductionDispatchesPass support for ArgCompare op by @bangtianliu in #21903
- [CPU] Add an experimental flag to disable linalg.conv generalization. by @hanhanW in #21953
- [GPU][DT] Add pingpong ukernels for data tiling (f8 and f16) by @Yu-Zhewen in #21919
- [ROCM][DT] Add encoding specialization infra for data-tiled ukernels by @jtuyls in #21914
- [GPU] Use UkernelDescriptor and deprecate UkernelConfigAttr and GPULowerToUkernelsPass by @Abhishek-Varma in #21766
- [docs] Update docs on sdxl golden output by @efric in #21936
- Fix Dispatch Creation TransformOptions by @IanWood1 in #21964
- Integrate LLVM at llvm/llvm-project@ed1f1b8 by @rkayaith in #21963
- [docs] Add a blog post for data-tiling introduction. by @hanhanW in #21774
- Avoid needless isa checks. NFC. by @bangtianliu in #21968
using Base::Base
in tablegen passes. by @benvanik in #21969- [iree][codegen] Set
#iree_codegen.denormal_fp_math
in attention dispatches by @fabianmcg in #21940 - [compiler][NFC] Update remaining code to free create functions. by @hanhanW in #21972
- [plugins][NFC] Upgrade plugins/ to free create functions. by @hanhanW in #21973
- [GPU][DT] Update data layout strategy for pingpong ukernels by @Yu-Zhewen in #21957
- Using explicit operation types in passes. by @benvanik in #21971
- Converting compiler/Bindings/ to tablegen Passes.td. by @benvanik in #21974
- [Codegen] Unroll instead of linearize vector.to_elements. by @amd-eochoalo in #21959
- [Codegen] Added erf ; FastMath rewrite for vector types. by @keshavvinayak01 in #21849
- Adding hal.executable
lazy
flag. by @benvanik in #21966 - Don't inline immutable globals with non-util dialect attrs. by @benvanik in #21986
- [Codegen][RISCV] Do not lower vector.gather to branches in the presence of RVV by @mshockwave in #21927
- [GPU] Only combine complex relayout chains in GPUCombineLayoutTransformation by @Max191 in #21985
- [LLVMGPU] Move masked load optimizations after vector lowering by @Max191 in #21962
- [iree-test-suites] Update golden benchmark numbers by @Max191 in #21980
- [Encoding] Deprecate MatmulKAttr encoding attribute. by @hanhanW in #21976
- [Codegen] Make collapse_shape hoisting pattern work with store_to_buffer by @Max191 in #21999
- [LinalgExt] Add canonicalization to convert identity map_scatter to copy by @Max191 in #21998
- [GPU][DT] Add benchmark files for llama_8b_f16 with data-tiling. by @hanhanW in #21975
- [Encoding] set default option for scaled matmul encodings to false by @Muzammiluddin-Syed-ECE in #21994
- Marking
stream.async.dispatch
as pure. by @benvanik in #21989 - [Dispatch Creation] Allow fusing pad with split reduction dispatch by @IanWood1 in #21987
- Fix data race in GPU C ukernels caching of shared memory size by @bjacob in #22004
- [LinalgExt][NFC] Remove unused code in TransposeFusion by @IanWood1 in #22006
- [Dispatch Creation] Rework dispatch formation logic by @IanWood1 in #21854
- [TensorExt] Fix dynamic dim canonicalization in bitcast folder by @jtuyls in #21997
- [CPU] Switch matmul config to use linalg::LinalgOp interface. by @hanhanW in #21954
- Bump the github-actions group with 2 updates by @dependabot[bot] in #21992
- Integrate LLVM at llvm/llvm-project@0648c5183f32 by @qedawkins in #22003
- [VectorDistribute] Use subgroup_basis instead of subgroup_m/n_count by @Groverkss in #21912
- Splitting hoisted async constant lifetime. by @benvanik in #21995
- Adding iree_hal_executable_export_info_t and queries. by @benvanik in #21754
- Respect user
FILECHECK_OPTS
/LIT_OPTS
environment variables when running through ctest by @rkayaith in #22019 - [PassUtils] Allow passing overload constructors to
addPredicatedPass
by @rkayaith in #22021 - [NFC] remove unused header files by @bangtianliu in #21977
- [Codegen][GPU] Enable TileAndFuse for matmul by default by @jerryyin in #21834
- [CPU] Populate to_elements unrolling patterns in LLVM conversion. by @hanhanW in #22010
- Fix mi308 Pkgci failures by @IanWood1 in #22028
- [Dispatch Creation] Fuse bcast with attention instead of producer by @IanWood1 in #22008
- [CPU] Add precondition to kernel dispatch method selection for gemm. by @hanhanW in #22031
- [Codegen][Tuner] update lowering config binding for subgroup basis by @bangtianliu in #22027
- Fix indices in scaled matmul rank assert by @jtuyls in #22016
- [GPU] Introduce Intentional Padded Configurations for (I)GEMM by @nirvedhmeshram in #21931
- [CI] Disabling WebGPU build due to CI failures. by @MaheshRavishankar in #22030
- [DispatchCreation] Add option to infer split-reduction sizes for batchnorm by @rkayaith in #21731
- Implement
iree_gpu.coalesced_gather_dma
op by @lialan in #21846 - [LinalgExt] Support map_scatter decomposition with strided memrefs by @Max191 in #21952
- [Codegen] Tile map_scatter op for large vector sizes by @Max191 in #22035
- [DispatchCreation] Fix
iree-compile
split-reduction flag name by @rkayaith in #22038 - Integrate LLVM at llvm/llvm-project@dffd7f3d9a3 by @qedawkins in #22023
- [NFC][ROCM] Refactor bitcode ukernel to a separate file by @Abhishek-Varma in #21983
- [VectorDistribute] Allow distributing subgroups on multiple m dimensions by @Groverkss in #22000
- [LLVMGPU] Vectorize map_scatter in LLVMGPUTileAndFuse pipeline by @Max191 in #21890
- [Codegen] Push up memref reshapes before empty tensor elimination by @Max191 in #22045
- [LLVMGPU] Add support for direct convolution in tile and fuse pipeline by @yzhang93 in #22033
- LLVM-Integrate: Drop revert for f645d209d by @qedawkins in #22044
- Integrate llvm/llvm-project@50ef746a12 by @qedawkins in #22046
- [Encoding] Propagate layout encodings through tensor.cast ops by @Max191 in #21970
- [python] Expose python bindings for nvvm in iree.compiler.dialects by @saladpalad in #21993
- Revert "[Dispatch Creation] Rework dispatch formation logic (#21854)" by @IanWood1 in #22058
- [Codegen] Add transform ops for matching contraction ops by @bangtianliu in #21981
- Integrate llvm/llvm-project@1ee18959bcdf by @efric in #22062
- Disable data-tiling flag by default and refresh the CPU docs. by @hanhanW in #21935
- [DispatchCreation] Propagate and fuse encodings in default path by @Max191 in #22063
- [LinalgExt][NFC] Remove unused VectorOps include by @hanhanW in #22066
- Integrate llvm/llvm-project@9d48df7a92e7 by @efric in #22064
- [LLVMGPU] Enable iree-llvmgpu-test-combine-layout-transformation by default by @Max191 in #21979
- Adding interface support for stream.async.transfer result placement. by @benvanik in #22048
- Marking stream.tensor.dispatch pure. by @benvanik in #22075
- [GPU][DT] Add data-tiling resolver by default. by @hanhanW in #22074
- [Util] Allow varying types in optimization barrier by @qedawkins in #22076
- [ROCm] Add an experimental target for gfx1250 by @kuhar in #22077
- Remove e2e matmul tests with explicit compilation-info by @bjacob in #22085
- [NFC] Improving consistency of Util/Transforms/Passes.h. by @benvanik in #22078
- e2e matmul tests covering vector-distribution by @bjacob in #22086
- Reapply "[LinalgExt] Fix
FoldWithProducerReshapeByExpansion
for >1 … by @IanWood1 in #22088 - [Test] Trim data-tiling compile flags from tests. by @hanhanW in #22092
- [Codegen] Support scaled matmul in ConvertAccGEMMToGEMM by @Max191 in #22093
- [GPU] Fix bug in shared memory computation for scaled intrinsics by @Max191 in #22095
- Integrate llvm/llvm-project@876296e9b7f0 by @efric in #22097
- [Codegen] Add transform op for matching dimension sizes. by @bangtianliu in #22040
- Revert e2e matmul tests changes by @bjacob in #22111
- [mlir][amdgpu] Replaced
nullopt
with target arch chipset inpopulateGpuPromoteShuffleToAMDGPUPatterns
pass by @xintin in #21799 - Display a warning when we spill SGPRs or VGPRs by @sebvince in #21863
- [Codegen][GPU] Fix MMA Intrinsics Sorting by @bangtianliu in #22090
- [Codegen][GPU][NFC] Fix mma sort follow up by @bangtianliu in #22122
- [DT] Add support for materializing func.func args with encodings. by @hanhanW in #22115
- Break
generate_e2e_matmul_test.py
into multiple files by @bjacob in #22120 - NFC: Simplify generation of e2e matmul test functions. by @bjacob in #22123
- [ROCm] Fix up gfx1250 definitions by @kuhar in #22131
- Clean up KnownTargets.cpp. NFC. by @kuhar in #22133
- Fix the Windows build: portably set environment variable PYTHONPATH. by @bjacob in #22136
- [Codegen][ROCm] Attempt to fix MMA sorting CI failures by @kuhar in #22141
- [NFC][GlobalOpt] Update function names in LIT by @AGindinson in #22083
- [VectorDistribute] Flush denormals for attention reduction config by @Groverkss in #22041
- Simplify op conversion pattern inheriting constructor definitions. NFC. by @kuhar in #22143
- Simplify op rewrite pattern inheriting constructor definitions. NFC. by @kuhar in #22142
- [LLVMGPU] Don't use DMA for scaled matmul by @Max191 in #22094
- [Codegen] Add bufferization support for new
iree_gpu.coalesced_gather_dma
op by @lialan in #22049 - [GPU] Support iree_codegen.load_from_buffer in GPUBubbleResourceCasts by @Max191 in #22140
- [Preprocessing] Add pass to sink transpose through pad by @IanWood1 in #22106
- [LLVMGPU] Unroll elementwise operations by @Groverkss in #21665
- Bump actions/cache from 4.2.4 to 4.3.0 in the github-actions group by @dependabot[bot] in #22152
- Increase golden time. by @amd-eochoalo in #22159
- [Codegen][AMDGPU] Fix incorrect canonical map for MXFP RHS scales by @krzysz00 in #22162
- [Preprocessing] Transpose conv filter layout from CHWF to FHWC by @yzhang93 in #22100
- Integrate llvm/llvm-project@7af31bf by @amd-eochoalo in #22148
- [LLVMGPU][Codegen] Increase parallel rows read for matvec by @efric in #22163
- [Codegen] support matching any values for dims_equal transform op by @bangtianliu in #22149
- Integrate llvm/llvm-project@a33544b by @amd-eochoalo in #22167
- E2E MXFP4 matmul tests by @bjacob in #22170
- [Test][NFC] Drop
input_type
from e2e tests because IREE can infer the input type. by @hanhanW in #22014 - [DispatchCreation] infer split-reduction sizes for ArgCompare by @bangtianliu in #22154
- [Codegen][AMDGPU] Tile and convert gather to coalesced DMA by @lialan in #22157
- Revert "[LLVMGPU] Unroll elementwise operations (#21665)" by @MaheshRavishankar in #22186
- Integrate llvm/llvm-project@0cb9d40 by @amd-eochoalo in #22182
- Port e2e matmul tests from gfx942 to gfx950 by @bjacob in #22191
- [E2E-Matmul] Remove redundant flag from scaled matmul e2e test by @Max191 in #22190
- [Codegen][AMDGPU] Enable gpu.printf patterns by @krzysz00 in #22192
- [GPU] Add thread tile size inference for map_scatter op by @Abhishek-Varma in #22179
- [DT][ROCM] Fix inner_tiled bitcode ukernel lowering with instrinsicsM(N) = 1 by @Yu-Zhewen in #22184
- [Codegen] Add
ResolveShapedTypeResultDimsPass
pass to GPU vector distribute by @fabianmcg in #22196 - [LinalgExt] Introduce linalg_ext.exp_reduction by @hhkit in #21761
- Integrate llvm/llvm-project@4845b3e by @amd-eochoalo in #22200
- [GPU] Allow multi result and indexing compute generic ops in TilleAndFuse pipeline by @nirvedhmeshram in #22195
- [Codegen][GPU] Fix IGEMM pre-padding and fusion patterns by @yzhang93 in #22197
- [DataTiling] Introduce DataTiledMMAInterfaceAttr by @Max191 in #22098
- [Codegen] Follow-up Fix for MatchContractionOp by @bangtianliu in #22201
- Removing virtual MMAs from vector distribute matmul/conv pipeline by @jerryyin in #22202
- [Codegen] Add transform op for matching convolution ops by @bangtianliu in #22194
- Revert "[GPU] Allow multi result and indexing compute generic ops in TilleAndFuse pipeline" by @IanWood1 in #22205
- [Codegen] Fix premature return in iree_codegen.inner_tiled verifier by @Max191 in #22183
- [Codegen][LLVMGPU] Later scf-to-cf to support math.erf by @newling in #21817
- [CI] Add dummy torch pkgci by @Groverkss in #22203
- [DataTiling][GPU] Introduce DataTiledScaledMMAAttr by @Max191 in #22176
- [Preprocessing] Reorder the iterator dims to match forward NHWC-FHWC convs by @yzhang93 in #22208
- Decrease llama 8b_f16_decode golden time by @efric in #22220
- [DataTiling][GPU] Implement scaled matmul data tiling materialization by @Max191 in #22189
- Integrate llvm/llvm-project@95e0ae9f by @newling in #22214
- Control const expr hoisting in Dispatch Creation by @IanWood1 in #22164
- [codegen] Fix test after PR 22196 by @fabianmcg in #22218
- [codegen][gpu] GPUApplyPaddingLevel: fold case where no padding by @newling in #22193
- [GlobalOpt] Use Option<> for TransformOptions by @IanWood1 in #22222
- [ROCm] Enable e2e stablehlo tests by @kuhar in #22224
- Adding a LiftCFGToSCFPass. by @benvanik in #22101
- Improving support for unreachable control flow in both CFG and SCF. by @benvanik in #22102
- Adding VerifyStructuredControlFlowPass. by @benvanik in #22110
- Fix g++ warning -Werror=parentheses by @IanWood1 in #22225
- Update CODEOWNERS to include new tests and dialect owners by @Groverkss in #22213
- [CI] Add clip and llama torch_models tests by @Groverkss in #22212
- [samples] Update PyTorch JIT notebook for Python 3.12 by @HeatCrab in #22209
- [Codegen] add transform op for matching attention op by @bangtianliu in #22199
- Fix linking MSVC error from forward declaration used as templated type by @Max191 in #22233
- [PkgCI] Use urllib instead of github cli in pkgci artifact_run by @Groverkss in #22211
- Transposed Workgroup Reordering for large rectangular matmuls by @sebvince in #22165
- Fix typo ConditionalTranspose attribute description by @sebvince in #22238
- [docs] Add LLVM debugging and some AMDGPU-specific tips by @krzysz00 in #22146
- Integrate llvm/llvm-project@7546bd3 by @newling in #22234
- [CI][iree-test-suites] Add random weight 8b_fp8 and 8b_fp16 benchmarks by @Groverkss in #22239
- Integrate llvm/llvm-project@327a89c by @newling in #22255
- [CI][iree-test-suites] Upload json summary for torch_models CI by @Groverkss in #22253
- [CI][iree-test-suites] Update ref for iree-test-suites by @Groverkss in #22263
- [build flags] prepare to enable more warnings in compile flags (#21996) by @schuermans-roofline in #22252
- [Codegen] Update the assembly formats and corresponding tests for matcher ops by @bangtianliu in #22270
New Contributors
- @LekkalaSravya3 made their first contribution in #21682
- @saladpalad made their first contribution in #21993
- @xintin made their first contribution in #21799
- @hhkit made their first contribution in #21761
- @HeatCrab made their first contribution in #22209
- @schuermans-roofline made their first contribution in #22252
Full Changelog: v3.7.0...v3.8.0