Releases: cupy/cupy
v14.0.0rc1
CuPy v14.0.0rc1 Release Note
This is the release candidate of the CuPy v14 planned to be shipped in January 2026. We encourage you to start testing your workload with v14.0.0rc1 and report back any feedback on our issue tracker. Refer to the Upgrade Guide for the list of changes you need to be aware of when migrating from CuPy v13 or earlier. Pre-built binary packages are available for testing:
# For CUDA 13.x
pip install cupy-cuda13x --pre -U -f https://pip.cupy.dev/pre
# For CUDA 12.x
pip install cupy-cuda12x --pre -U -f https://pip.cupy.dev/pre
# For ROCm 7.0
pip install cupy-rocm-7-0 --pre -U -f https://pip.cupy.dev/preCuPy v14 introduces support for the CUDA Toolkit package distributed on PyPI. If the CUDA Toolkit is not present in your environment, you can install CuPy alongside the necessary toolkit components by using the [ctk] extras, as follows:
pip install 'cupy-cuda13x[ctk]' --pre -U -f https://pip.cupy.dev/pre💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
CUDA Python Packages
CuPy now supports NVIDIA CUDA packages distributed on PyPI! This enhancement allows users to leverage CuPy without a system-wide CUDA Toolkit installation, and also provides better interoperability with other Python packages that utilize CUDA, such as PyTorch.
AMD ROCm 7 Support
Support for the AMD ROCm 7 platform is now included in CuPy, along with a cupy-rocm-7-0 binary package specifically built for ROCm 7.0.
Enhanced NumPy/SciPy API Coverage
CuPy now offers a greater number of APIs compatible with NumPy and SciPy, including cupy.linalg.eig and cupy.linalg.eigvals.
🛠️ Breaking Changes
- CuPy v14 follows NumPy 2 in most of its behavior.
- Support for CUDA 11 and Python 3.9 has been dropped.
- All cuDNN-related functionality has been completely removed from CuPy. We recommend users who need to access cuDNN functionality from Python to consider using cuDNN Frontend instead.
Please refer to the Upgrade Guide for details.
📝 Changes
See here for the complete list of merged PRs.
New Features
- Implement
cupy.linalg.eig/cupy.linalg.eigvals(#8854, #8980) - Implement
cupy.linalg.cond(#9140) - Support nD reductions for sparse arrays (#9209)
- Add
cupy.bool,cupy.longandcupy.ulong(#9253) - Add function
bitwise_count(#9390) - ENH: Minimal support for structured dtypes (#9440)
- Support CUDA wheels (#9444)
Enhancements
- Releasing the GIL during Thrust sorts (#7760)
- Support new cuFFT callbacks (#8242)
- Make
cupyx.scipysubmodule imports lazy (#8706) - Patch for supporting cusparseLt 0.7.1 (#9004)
- Migrate to
pyproject.toml(#9079) - Allow build on ROCm 6.4 (#9099)
- Implements Lapack
potrs(#9116) - Support NCCL for aarch64 (#9137)
- Add xsf as submodule for special function scalar kernels (#9159)
- Support CUDA 12.9 and NCCL 2.26 (#9200)
- Support loading NCCL from Pip packages (#9201)
- Update
cupyx.scipy.specialfunctions for SciPy 1.16 (#9207) - Pin xsf to version 0.1.2 (#9250)
- Add CUDA 13 & NCCL 2.27 support (#9289)
- Remove bundled header files (#9295)
- Support building NVTX on Windows without Nsight Systems (#9301)
- Change default c++ RTC standard from 11 to 14 (#9334)
- Support CUDA Array Inferface in ROCm build (#9340)
- Build against
cuda-bindings12.x (#9377) - feat(kernels) fix HIPRTC build error ROCm 6/7 (#9382)
- feat(JIT) Remove CCCL includes dir (#9383)
- feat(hipCUB) Use c++17 for hipCUB (#9384)
- Enable rocm7 (#9398)
- Drop support of CUDA 11.x and NumPy v1.x (#9406)
- feat(ptds) Add PTDS support and launch_host_func (#9407)
- [nccl] add nccl comm split (#9411)
- Fix message of shims removed in NumPy v2 (#9412)
- Replace fastrlock with C++ recursive_mutex (#9414)
- ENH,MAINT: Allow cupy-like protocols in setitem and consolidate (#9418)
- ENH: Accept nested CAI arrays in cupy.asarray and indexing (#9419)
- Make
cuda.is_available()guard against (almost) all errors (#9420) - Deprecate
cupyx.tools.install_librarytool (#9432) - Drop support for ROCm 6 or earlier (#9433)
- ENH,BUG: Allow strides without mem, fix empty byte-bounds (#9453)
- DOC: Remove experimental on async allocator (#9455)
- Remove deprecated
nvrtc.getNVVMAPI (#9457) - Adopt
cuda.bindingsnew module layout (#9458) - Revert typo fix in
cupy/_core/include/cupy/complex/complex.h(#9460) - patch for supporting cusparseLt 0.8.1 (#9509)
- Fix
csr_matrix.minimum/maximumdtype promotion rule (#8844) - ENH: cupyx/signal: add
freqz_sos, a preferred alias forsosfreqz(#9114) - Fix
cp.empty(None)to raiseTypeError(#9160) - feat: make
cupy.nan_to_numbroadcastnan,posinf, andneginfkwargs (#9240) - Fix resample error message for SciPy 1.16 update (#9241)
- Fix
freqzfor complexw(#9243) - Fix
boxcox_llffor SciPy 1.16 (#9263) - Fix return dtype of
csr_matrixminmax with scalar (#9409) signal.cspline1d_eval,qspline1d_evalthrow exception for empty cj array (#9484)- ENH: allow python scalars in the 2nd argument of searchsorted (#9512)
Performance Improvements
- Add short cut for subsetting along the minor axis (#8468)
- Implement lazy load for cuquantum (#9102)
- Accelerate duplicate installation check (#9325)
- Lazy load the testing module (#9336)
- Delay all imports of cupyx inside cupy (#9338)
- Fix cuTENSOR workspace size query (#9399)
- Invoke thrust with
par_nosync(#9497)
Bug Fixes
- Fix illegal memory access in
LinearNDInterpolator(#8983) - BUG: cupyx.scipy.signal: make
gammatonereturn arrays (#9117) - Support Cython 3.1 (#9131)
- BUG: fix
cupyx.scipy.linalg.expm(#9142) - Fix cuSOLVER feature/version detection for
eigandeigvals(#9147) - Fix overflow in CUB reduction (#9248)
- Fix lsmr type promotion rule for complex dtype (#9273)
- Allow host function call during CUDA graph capture (#9279)
- Fix
UnboundLocalErrorwhenblocking=True(#9280) - CUDA 11.1 or earlier is no longer supported (#9281)
- [BugFix] Fix
upfirdnkernel launch bug for 2D arrays (#9352) - Fix
tf2sosfailing for constant transfer function. (#9395) - Fix repeated variable in
hilbert2(#9396) - Fix Python version requirements in
pyproject.toml(#9421) - Change nccl
get_unique_idto return a bytes string (#9438) - [bug] Include
type_traitsin filters (#9479) - File cache: use
os.replace(clarity) and accept PermissionError (#9483) - BUG: Fix incorrect initialization in bspline kernel (#9486)
- Do not pass
filter=datafor ZIP (cuTENSOR on Windows) (#9492) - BUG: Fix typo in advise and prefetch affecting cuda 13 (#9493)
- Fix Windows directory path for cuTENSOR 2.3+ (#9519)
- Fix cuTENSOR import libs missing in Windows by
cupyx.tools.install_libraryinstallation (#9527)
Code Fixes
- Enable
ruffruleUP(#8849) - Add
ruffrules for static typing (#9154) - Remove deprecated modules (#9337)
- Fix for Ruff UP041 (#9423)
- Fix for Ruff UP007, UP035, UP045 (#9424)
- MAINT: Specify texture address precisely (#9471)
- (small fix) amend generate.py around cuSPARSELt (#9524)
Documentation
- Add an AI policy to prohibit misuse of the issue tracker (#9062)
- Update ROCm docs (#9105)
- Fix missing items in API reference (#9130)
- Docs: Update build-time requirement of Cython (#9134)
- Improve API reference list (#9165)
- Fix WARNING: Inline emphasis start-string without end-string (#9167)
- Bump supported NumPy version to v2.3 (#9198)
- Improve RawKernel documentation regarding views (closes #9233) (#9275)
- Docs only:
s/"recoreded"/recorded(#9287) - CUDA 13 Update docs (#9294)
- CI: update docs (#9375)
- Improve CI docs (#9415)
- feat(docs) Updating AMD docs (#9451)
- Fixed some typos in the documentation (#9454)
- Fix typos in
kernels.rst(#9467) - [docs] Update README.md (#9478)
- Prepare for upgrade guide for CuPy v14 (#9485)
- DOC: Add Nsight Compute profiling tutorial for CuPy kernels (#9514)
- Remove outdated compiler requirement info in install docs (#9520)
- Fill in compatibility matrix upper bound for CuPy v13 (#9521)
- DOC: add support for Python 3.14 (#9530)
Installation
- Limit Cython version to 3.0 or 3.1 (#9133)
- Make rebuild faster for development (#9136)
- Bump supported NumPy version for CuPy v14 (#9164)
- Fix
long_descriptionmissing afterpyproject.tomlmigration (#9227) - Do not include files listed in
MANIFEST.into wheels (#9230) - Drop cuDNN entirely (#9326)
- Bump CUDA/Ubuntu version in Docker image (#9342)
- Update conda-build support for conda CUDA 13 packages (#9378)
- Update
install_library.pyto support cuTENSOR 2.3 and drop CUDA 11.x (#9439) - Fix unnecessary assertion handling in
setup.py(#9499) - MAINT: Remove
-fno-gnu-uniqueagain (#9517)
Tests
- Fix handling of ROCm self-hosted CIs (#8860)
- Do
gc.collect()in MemoryHook test code to avoidfreehook to happen (#9092) np.unique_valuesmay return unsorted data from NumPy 2.3 (#9161)np.sumhas numerical change in NumPy 2.3 (#9162)- Add test cases for batchwise
solve_triangular(as xfail) (#9173) - CI: NumPy 2.3 (#9178)
- Add NumPy 2.3 + windows CI (#9195)
- Update pre-commit settings (#9199)
- Add
cupy.win.cuda129CI (#9213) - Fix test trigger phrase for
cupy.win.cuda129CI (#9215) - CI: Introduce per-PR kernel cache (#9234)
- Skip some signal...
v13.6.0
This is the release note of v13.6.0. See here for the complete list of solved issues and merged PRs.
🌏 We just launched our LinkedIn page. Follow us for the latest news and updates!
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
This release adds support for CUDA 13.x. Binary packages are available on PyPI: pip install cupy-cuda13x.
📝 Changes
Enhancements
- Update
cupyx.scipy.specialfunctions for SciPy 1.16 (#9246) - Add CUDA 13 support (#9300)
- Support building NVTX on Windows without Nsight Systems (#9304)
- Remove bundled header files (#9305)
- Fix freqz for complex w (#9259)
- Fix resample error message for SciPy 1.16 update (#9262)
Bug Fixes
- Fix overflow in CUB reduction (#9254)
- Fix lsmr type promotion rule for complex dtype (#9277)
- Fix
UnboundLocalErrorwhenblocking=True(#9282) - Allow host function call during CUDA graph capture (#9283)
- CUDA 11.1 or earlier is no longer supported (#9285)
Documentation
Installation
- [v13] Bump version to v13.6.0 (#9314)
Tests
- CI: Introduce per-PR kernel cache (#9235)
- Add test cases for batchwise
solve_triangular(as xfail) (#9245) - Relax tolerance of test_hilbert (#9255)
- Skip some signal q dtype tests (#9256)
- Increase CPU memory limit of
linux.cuda{128,129}CIs (#9261) - Support nD reductions for sparse arrays (#9268)
- [v13] Missing backport of special function tests (#9269)
- [v13] Wrong test skip condition of
test_zscore_empty(#9270) - Support SciPy 1.16 on Windows (#9276)
- Support SciPy 1.16 on Linux (#9284)
- CI: NVTX1 removed from Windows machine image (#9303)
- Fix CI failure in CUDA 12.4 (#9311)
- [v13] Fix scipy version condition of COO matrix test (#9312)
Others
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @brycelelbach @Ellecee @emcastillo @kmaehashi @robertmaynard
v13.5.1
This is the release note of v13.5.1. This is a hot-fix release to address an issue related to the buffer protocol support for UMP added in v13.5.0 (#9223). See here for the complete list of solved issues and merged PRs.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
📝 Changes
Bug Fixes
- Fix buffer protocol to raise TypeError when it is not meant to be supported (#9222)
Installation
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
v13.5.0
Note
2025-07-11: We have marked this release as "yanked" on PyPI to prevent new installations due to unexpected regressions. The hot-fix release v13.5.1 is available.
This is the release note of v13.5.0. See here for the complete list of solved issues and merged PRs.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
- CuPy now supports NVIDIA CUDA 12.9 and AMD ROCm 6.4 platforms, and NumPy 2.3.
- Unified Memory Programming support for HMM/ATS-enabled systems (such as NVIDIA Grace Hopper Superchip) has been added. Refer to the documentation for the usage.
- Binary packages on PyPI (wheels) can now load NCCL packages installed via Pip (e.g.,
nvidia-nccl-cu12). In addition, Arm (aarch64) wheels are now built with NCCL support enabled.
Request for Comments
We are going to finalize the following RFC issues.
- Drop support for cuDNN in CuPy v14 (#8215)
- Update set of supported ROCm versions in CuPy v13/v14 (#8607)
- Remove
cupyx.tools.install_libraryin CuPy v14 (#9204)
📝 Changes
New Features
- Support system allocated memory (#9033)
Enhancements
- Fix rocThrust build for ROCm 6.3 (#9023)
- Allow discovering cuTENSOR using major version (#9037)
- Support FIPS enabled machines with MD5 hashing (#9055)
- Update cutensornet accelerator based on cuquantum-python 25.03 deprecation (#9058)
- Refactor hashing (#9059)
- Raise user warning in both
{to,from}Dlpack& Update the Interoperability page (#9061) - Allow build on ROCm 6.4 (#9100)
- Migrate to pyproject.toml (#9135)
- Support NCCL for aarch64 (#9141)
- Support loading NCCL from Pip packages (#9208)
- Support CUDA 12.9 and NCCL 2.26 (#9211)
- Fix
cupyx.scipy.stats.zscorefor SciPy 1.15 (#9024)
Performance Improvements
- Implement lazy load for cuquantum (#9104)
Bug Fixes
- JIT: Support empty return (#9001)
- API: Revert
toDlpack()default to the old unversioned one (#9007) - BUG: Hot fix for numpy 2 support in some fusion paths (#9012)
- Fix compilation error of
cupy.infin fusion2 (#9043) - Support Cython 3.1 (#9132)
- Fix
cupyx.scipy.linalg.expm(#9144)
Code Fixes
- Fix
get_typenameto emitthrust::complex(#9054)
Documentation
- Add an AI policy to prohibit misuse of the issue tracker (#9095)
- Update ROCm docs (#9108)
- Docs: Update build-time requirement of Cython (#9145)
- Fix WARNING: Inline emphasis start-string without end-string (#9168)
- Improve API reference list (#9189)
- Bump supported NumPy version to v2.3 (#9203)
Installation
- Limit Cython version to 3.0 or 3.1 (#9146)
- Bump NumPy version restriction (#9166)
- Make rebuild faster for development (#9196)
- Bump version to v13.5.0 (#9212)
Tests
- CI: Do not run full CI on CUDA 12.0/12.1/12.2 + Windows (#9000)
- CI: Pin setuptools version on Windows (#9039)
- Revert "CI: Pin setuptools version on Windows" (#9056)
- Mark xfails in some spline tests for SciPy 1.15 (#9060)
- Support SciPy 1.15 (#9063)
- Skip some dtype checks with NumPy 2.x (#9064)
- Skip tests for different behavior of integer overflow from NumPy 2 (#9072)
- Skip some cupyx.scipy.special tests for SciPy 1.15 (#9073)
- Skip some tests for numerical error from NumPy 2 (#9075)
- Do
gc.collect()in MemoryHook test code to avoidfreehook to happen (#9093) np.sumhas numerical change in NumPy 2.3 (#9169)- Fix
cp.empty(None)to raiseTypeError(#9174) - CI: NumPy 2.3 (#9194)
- Add NumPy 2.3 + windows CI (#9197)
- Update pre-commit settings (#9202)
- Add
cupy.win.cuda129CI (#9214) - Fix test trigger phrase for
cupy.win.cuda129CI (#9217)
Others
- Allow specifying no libraries when generating wheel metadata (#9080)
- Upgrade
pre-commithooks (#9156)
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @Azusachan @EarlMilktea @ev-br @jakirkham @kmaehashi @leofang @MattTheCuber @rongou @seberg @yangcal
v14.0.0a1
This is the release note of v14.0.0a1. See here for the complete list of solved issues and merged PRs.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
This is the first alpha release of the CuPy v14 series, containing:
- New type promotion rules and behaviors aligned with the NumPy 2 specification.
- 42 new NumPy/SciPy-compatible APIs, including
cupy.concat,cupyx.scipy.interpolate.CubicSpline,cupyx.scipy.spatial.Delaunay,cupyx.scipy.ndimage.find_objects, andcupyx.scipy.special.lambertw. See the Comparison Table for the detailed coverage.
Binary packages are available for testing. Try installing now by:
$ pip install cupy-cuda12x --pre -U -f https://pip.cupy.dev/pre🛠️ Changes without compatibility
- CuPy v14’s behavior will be aligned with NumPy v2.
- Type promotion rules are now NEP50 compatible. See Changes to NumPy data type promotion.
intis now 64-bit (int64) on Windows. See Windows default integer.- APIs removed in NumPy v2 (see Changes to namespaces) were marked deprecated in CuPy v14. Although they are kept available in v14 for smooth migration, they are planned to be removed in the next major release (CuPy v15).
- The behavior of
copyargument has been changed (#8545). See Adapting to changes in thecopykeyword.
- Support for Python 3.9, NumPy 1.22 and 1.23, SciPy 1.7, 1.8, and 1.9 has been dropped. (#8491)
cupy.random.choicemay return different results from CuPy v13. (#8483)- Building CuPy from source code now requires Cython 3.0. (#8457)
cupyx.scipy.linalg.{tri,tril,triu}APIs were removed from CuPy to follow the latest SciPy’s specification. Usecupy.{tri,tril.triu}instead. (#8499)- NumPy fallback mode (
cupyx.fallback_mode) has been removed as discussed in #8497. (#8816) - Legacy DLPack APIs (
cupy.toDlpackandcupy.fromDlpack) are now marked deprecated. Usecupy.from_dlpackinstead. See the documentation for the usage. (#8831)
📝 Changes
New Features
- Add
KDTreetocupyx.scipy.spatial(#7671) - Add
neighborsoption toRbfInterpolator(#7864) - ENH: cupyx/signal: add
sweep_poly(#7873) - Add 2D Delaunay triangulation (#7985)
- Add
cupyx.signal.pulse_compressionfrom cuSignal's non SciPy-compat API (#8022) - Add
LinearNDInterpolatortocupyx.scipy.interpolate(#8035) - Add
cupyx.signal.convolve1d3ofrom cuSignal's non SciPy-compat API (#8037) - Add
cupyx.signal.{firfilter,firfilter_zi,firfilter2}(#8052) - Add
cupyx.signal.{pulse_doppler, cfar_alpha}(#8057) - Add
cupyx.signal.{complex_cepstrum,real_cepstrum,inverse_complex_cepstrum,minimum_phase}(#8062) - Add
cupyx.signal.mvdr(#8077) - ENH: signal: add
lanczosandkaiser_bessel_derivedwindows (#8081) - Add
cupyx.signal.ca_cfar(#8087) - Add
cupyx.signal.convolve1d2o(#8101) - Add
cupyx.signal.freq_shift(#8128) - Add
lambertwfunction (#8140) - Add
cupyx.signal.channelize_poly(#8141) - Add cupyx.scipy.interpolate.CubicSpline (#8175)
- Add
apply_over_axesAPI (#8177) - Add
cupy.put_along_axisAPI (#8199) - Add
CloughTocher2DInterpolatortocupyx.scipy.interpolate(#8208) - Add
NearestNDInterpolatortocupyx.scipy.interpolate(#8220) - Add
NdBSplinetocupyx.scipy.interpolate(#8223) - ENH: cupyx/scipy/interpolate: add *UnivariateSpline for 1D smoothing splines (#8267)
- Add NdBSpline based interpolation methods to RGI (#8276)
- ENH: cupyx/interpolate: port
interp1dfrom scipy (#8289) - Add batched solve_triangular (#8329)
- Add Incomplete Elliptic Integrals to special (#8425)
- Support system allocated memory (#8442)
- Add CUDA graph debug function (#8502)
- Add
siciandshichito special for sine and cosine integrals (#8620) - Update
unique_xxx(nep52) (#8665) - Add
cupyx.scipy.ndimage.find_objects(#8916)
Enhancements
- Support for break and continue keywords in CuPy JIT (#8010)
- Make
cupyx.signal.radartoolsprivate (#8047) - Remove usages of
numpy.float_andnumpy.complex_(#8050) - Support cusparseLt 0.6.1 (#8074)
- Add incontiguous support for cutensor functions (#8149)
- Add complex support for the digamma function (#8163)
- Fix
expm(complex matrix)(#8206) - Add CutensorMg support (#8212)
- Add
cudaStreamCreateWithPriority(#8219) - Add the
nearestmethod for percentile/quantile estimation (#8224) - Various Jitify improvements (#8235)
- Support fallback algorithm for spgemm (#8252)
- Bump to cuTENSOR 2.0.1 (#8282)
- Preload cuTENSORMg (#8283)
- Use
weakref.finalizeinstead of__del__forRandomState._generatordestruction (#8315) - Support ROCm 6 (#8319)
- cupyx: cleanup use of deprecated NumPy functionality (NumPy 2.0 compatibility) (#8320)
- Add wright_bessel function to special (#8324)
- MAINT: fft, linalg: add
__all__lists (#8333) - Cuda 12.5 Tests (#8337)
- Add axes support in ndimage filters module (#8339)
- MAINT: interpolate: update RBF to scipy 1.13 (#8343)
- Make CuPy import under NumPy 2.0 (#8346)
- Lazy-preload NCCL (#8360)
- Fix
map_coordinatesrecompilation condition (#8378) - Disable jitify for cub & Bump CCCL (#8412)
- Use custom less instead of specializing thrust (#8446)
- Port to Cython 3.0 (#8457)
- Avoid using Jitify everywhere inside CuPy (#8467)
- Get rid of
pkg_resources(#8480) - Drop support for Python 3.9, NumPy 1.22 and 1.23, SciPy 1.7, 1.8 and 1.9 (#8491)
- Remove deprecated
cupyx.scipy.linalg.{tri,tril,triu}(#8499) - Use
.toarray()instead of.Aattribute (#8508) - Support
halfoption inscipy.signal.minimum_phase(#8510) - Increase
MAX_NDIMto 64 (#8511) - Support CUDA 12.6 (#8513)
- Fallback to system headers for future CUDA 12.x versions (#8518)
- Extend runtime header search logic to conda (#8519)
- Support
copy=Noneincp.array/cp.asarray/cp.asanyarray(#8545) - Fix dtype rule of
cupy.scipy.stats.entropyfor SciPy 1.14 (#8547) - Support setuptools 74.0.0 or later (#8583)
- Add
NCCL_ERROR_REMOTE_ERRORto the set of errors from NCCL (#8662) - Replace
numpy.ComplexWarningwithcupy.exceptions.ComplexWarning(#8676) - ENH: Implement dlpack v1 (#8683)
- Fix some NumPy 2.x CI failures (cont.) (#8695)
- Bump CUDA version in cuda11x-cuda-python CI (#8737)
- [ROCm 6.2.2] Conditionally define CUDA_SUCCESS only if it's not (#8793)
- Remove fallback mode (#8816)
- Raise user warning in both
{to,from}Dlpack& Update the Interoperability page (#8831) - Use a custom Min/Max instead of specializing CUB (#8846)
- Updating pylibraft
pairwise_distanceto cuvs (#8847) - add axes support for additional functions in cupyx.scipy.ndimage (from SciPy 1.15.0) (#8858)
- Raise VisibleDeprecationWarning for wavelet functions (#8865)
- Support CUDA 12.8 + Blackwell GPUs (sm_100, sm_120) (#8899)
- Bump library installers for CUDA 12.8 (#8914)
- Use CCCL 2.8.x branch + Use
CUPY_CACHE_KEYin hash keys (#8919) - Use NVIDIA CCCL 2.8 latest w/CUDA 12.3 fix (#8924)
- Use C++17 in JIT compile (#8940)
- Restore CUB histogram and bincount (#8950)
- Broaden usage of C++17 (#8952)
cupyx.scipy.distance: initialize output array with empty instead of zeros (#8971)cupyx.scipy.spatial.distance.cdistremove explicit zeroing of user-provided output array (#8988)- Fix rocThrust build for ROCm 6.3 (#9022)
- Allow discovering cuTENSOR using major version (#9030)
- Update cutensornet accelerator based on cuquantum-python 25.03 deprecation (#9045)
- Support FIPS enabled machines with MD5 hashing (#9053)
- Refactor hashing (#9057)
Enhancements for NumPy & SciPy compatibility:
- Fix
scp.signal.{medfilt,medfilt2d}to raise ValueError for complex64 inputs (#8059) - Deprecate
cupyx.scipywavelet functions (#8061) - Fix
csrmatrix.__pow__to raise ValueError for non-int other (#8063) - Fix
cupyx.scipy.special.betaincfor invalid inputs (#8065) scipy.special.{btdtr,btdtri}are deprecated since SciPy 1.12 (#8066)- Fix
boxcox_llffor SciPy 1.12 changes (#8095) - NEP50 (#8323)
- Resolve Ruff
NPYerrors - fix exception imports andasfarrayusage in test code (#8455) - Fix
sparse.linalgfunction signatures following SciPy 1.14 (#8526) - NumPy 2.0 compatibility: (partially) sync with NEP52 (#8531)
- Fix dtype rule of special functions for SciPy 1.14 (#8532)
- Fix
cupy.histogramarg order to match NumPy (v1.24+) (#8559) - Make
cupy.linalg.solvecompatible withnumpyv2 (#8629) - Silence
FutureWarningemitted whenrcondis missing (#8638) - Fix some NumPy 2.x CI failures (#8690)
- Support
kindarg. in sorting methods (#8708) - Fix
cupy.percentilefor NumPy 2.x (#8726) - Fix some NumPy 2.x CI failures (cupyx) (#8727)
- Skip some tests incompatible with NumPy 2.2 (#8817)
- Fix scipy.spmatrix.sign for complex dtype inputs (#8822)
- Fix return type of
cupy.wherefor scalar arguments for NumPy 2.0 (#8835) - Fix
cupyx.scipy.special.logsumexpfor NumPy 2.0 (#8836) - Fix
cupy.cov(#8839) - Fix
cupy.histogramddfor NumPy 2.x (#8873) - Raise ValueError upon attempts to create 3-dim sparse array (#8877)
- Disable contiguous_check for COO/dense matmul test (#8878)...
v13.4.1
This is the release note of v13.4.1. This is a hot-fix release addressing several issues including DLPack compatibility with existing user code. See here for the complete list of solved issues and merged PRs.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
📝 Changes
Bug Fixes
- Revert
toDlpack()default to the old unversioned one (#9011) - Hot fix for numpy 2 support in some fusion paths (#9016)
- Fix compilation error of
cupy.infin fusion2 (#9044)
Tests
- CI: Pin setuptools version on Windows (#9047)
Others
- Bump version to v13.4.1 (#9051)
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
v13.4.0
This is the release note of v13.4.0. See here for the complete list of solved issues and merged PRs.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
NVIDIA CUDA 12.8 Support
CuPy now supports CUDA 12.8 and the latest NVIDIA Blackwell architecture.
AMD ROCm 6.x Support
CuPy can now be built with AMD ROCm 6.x.
Python 3.13 Support
Binary packages for Python 3.13 are now available.
🛠️ Changes without compatibility
Cython 3.0 as build requirement (#8959)
To provide support for Python 3.13, CuPy codebase has been updated for Cython 3. To build CuPy from source, Cython 3.0 or later is now required instead of Cython 0.29.x.
📝 Changes
New Features
- Add
cupyx.signal.mvdr(#8872)
Enhancements
- Support ROCm 6 (#8608)
- Support setuptools 74.0.0 or later (#8649)
- Use custom less instead of specializing thrust (#8653)
- Add
NCCL_ERROR_REMOTE_ERRORto the set of errors from NCCL (#8667) - Replace
numpy.ComplexWarningwithcupy.exceptions.ComplexWarning(#8678) - Use weakref.finalize instead of del for RandomState._generator destruction (#8680)
- Implement dlpack v1 (#8722)
- Fix some NumPy 2.x CI failures (cont.) (#8725)
- Bump CUDA version in cuda11x-cuda-python CI (#8743)
- ROCm 6.2.2: Conditionally define CUDA_SUCCESS only if it's not (#8799)
- Raise VisibleDeprecationWarning for wavelet functions (#8868)
- Use a custom Min/Max instead of specializing CUB (#8875)
- Updating pylibraft
pairwise_distanceto cuvs (#8897) - Support CUDA 12.8 + Blackwell GPUs (sm_100, sm_120) (#8915)
- Interpolate: update RBF to scipy 1.13 (#8939)
- Use C++17 in JIT compile (#8941)
- Bump library installers for CUDA 12.8 (#8943)
- Use CCCL 2.8.x branch + Use
CUPY_CACHE_KEYin hash keys (#8946) - Use NVIDIA CCCL 2.8 latest w/CUDA 12.3 fix (#8948)
- Broaden usage of C++17 (#8958)
- Port to Cython 3.0 (#8959)
cupyx.scipy.distance: initialize output array with empty instead of zeros (#8981)cupyx.scipy.spatial.distance.cdistremove explicit zeroing of user-provided output array (#8990)- Skip
sparse.linalg.{cg, cgs, gmres}tests for scipy>=1.14 (#8551) cupyx.scipy.sparse testsfor SciPy 1.14 (#8552)- Fix some NumPy 2.x CI failures (cupyx) (#8738)
- Fix
cupy.percentilefor NumPy 2.x (#8752) - Skip some tests incompatible with NumPy 2.2 (#8830)
- Disable contiguous_check for COO/dense matmul test (#8888)
- Raise ValueError upon attempts to create 3-dim sparse array (#8889)
- Skip a test for invalid scipy return value of invalid COO matmul (#8890)
- Fix
fft.fhtfollowing bug fix in SciPy 1.15 (#8891) - Support empty tuple indexing for sparse matrix (#8892)
- Deprecate
cupyx.scipy.linalg.kron(#8902) - Fix test for
special.sph_harmto ignore DeprecationWarning (#8906)
Bug Fixes
- Add nccl.broadcast 64-bit support (#8566)
- Support building CuPy with setuptools 74 (#8577)
- Fix order 'K' with shape given for
*_likearray creation (#8605) hipPointerGetAttributesreturns error when pointer is unregistered in ROCm 5.7 (#8609)- Guard for ROCm 6.x (#8611)
- Fix
HIP_VERSIONunit (#8619) - Switch to using platform.machine() instead of platform.processor() (#8656)
- Properly allocate in RNG when specified dtype is neither float32/float64 (#8658)
- Use
platform.machine()instead ofplatform.processor()(#8673) - Fix sosfilt state output shape when ndim < 2 (#8679)
- Fix undefined inf/nan constant in CuPy JIT (#8712)
- Fix bspline kernel to avoid out of bounds error (#8763)
- Fix race during SoftLink initialization (#8787)
- fix nanargmin and nanargmax's parameter order and pass optional parameters (#8791)
- Fix crashes of quantile and percentile (#8811)
- Fix handling of pinned memory (#8852)
- Use
/bigobjon Windows build (#8967) - Fix
cupyx.scipy.spatial.distance'scdistfor RAPIDS 24.12 compatibility (#8975)
Code Fixes
- Upgrade pre-commit hooks to silence warnings (#8666)
- Resolve import loop (#8714)
- Resolve uncaught type warning (#8798)
- Switch from
.Aattribute to.toarray()method (#8814) - Fix typo in
_cretate_frame_tree(#8944) - Drop unneeded
bytescopy ofCUPY_CACHE_KEY(#8947)
Documentation
- Add docs about CUDA headers (#8595)
- Update
fft.rst(#8617) - Update documentation to use
pre-commit(#8650) - Add tips on Windows development in Contribution Guide (#8704)
- Add notice about
cupy.array_apiremoval (#8751) - Add CUDA 12.8 to docs (#8968)
- Update list of supported versions (#8991)
Installation
- Update conda-build CUDA detection logic for Setuptools 72.2.0 (#8652)
- Use relative path of header files to generate cache key (#8930)
- Fix minimum CUDA version check and update comments (#8938)
- Bump version to v13.4.0 (#8993)
Tests
- Relax
test_firlsatol (#8522) - Skip test_homomorphic in scipy>=1.14 (#8523)
- Skip betaincinv test with SciPy 1.14.1 (#8553)
- Skip special tests for SciPy 1.14 dtype rule changes (#8554)
- Skip
special.logsumexptest for empty input (#8555) - Skip
cupy.scipy.stats.entropytessts for SciPy 1.14 dtype rule change (#8556) - Use
setuptools==73.0.1(#8569) - Revert CI timeout bump (#8571)
- Support SciPy 1.13 and 1.14 (#8572)
- Missing backport for sparse_array.A removal (#8573)
- Skip test_log_expit SciPy 1.7 (#8576)
- Catch
ValueError(#8625) - Use
testing.with_requiresto skip broken tests (#8627) - CI: Update micro versions of Python (#8635)
- Skip tests if
scipyis not installed (#8637) - Accept
OverflowErrorinTestCopytoFromScalarfor NumPy v2 (#8643) - Skip more tests if scipy is not installed (#8645)
- Update precommit (#8663)
- Backport the changes introduced in #8690 (#8694)
- CI: Fix apt repository URL for Ubuntu 22.04 (#8715)
- Remove ndarray.ptp from fallback tests (#8744)
- Temporary skip for NumPy 2.0 tests (#8745)
- Relax tolerance of
test_hilbertfor NumPy 2.0 (#8746) - Bump SciPy version to 1.14 in Windows CI (#8764)
- Add NumPy 2.x CI for Linux (#8768)
- CI: support "skip-ci" label (#8841)
- CI: Fix FlexCI compatibility (#8842)
- Add NumPy 2.2 to CI (#8855)
- Replace
flake8withruff(#8859) - Support Optuna 4 (#8863)
- Add
testing.shaped_linspace(#8900) - Disable contiguous_check for some
signal.cont2discretetests (#8901) - Fix splines tests to remove unexpected skips (#8921)
- Minor updates for sm120 (#8922)
- Add CI for CUDA 12.8 (#8951)
- Increase host memory in Windows CI, free GPU memory in example code (#8969)
- Skip some signal tests for TypeError for inputs of
np.longlongdtype (#8972) - Add CI for Python 3.13 and mpi4py v4 (#8974)
- Pass
localsdict toexec(#8985)
Others
- Add backport reminder (#8684)
- Fix script name of backport reminder (#8686)
- Update
pre-commithooks (#8910) - Fix pull request project board workflows (#8929)
- Regenerate coverage matrix (#8960)
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
@99991 @andfoy @asi1024 @Azusachan @bernhardmgruber @Berrysoft @chainer-ci @cjnolet @dagardner-nv @EarlMilktea @eltociear @ev-br @grlee77 @HollowMan6 @jakirkham @jemiryguo @kmaehashi @leofang @littlewu2508 @mohitreddy1996 @mroeschke @seberg @takagi
v13.3.0
This is the release note of v13.3.0. See here for the complete list of solved issues and merged PRs.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
Updated NVIDIA CCCL
The CCCL library bundled with CuPy has been updated to eliminate the Jitify preprocess phase. Users will no longer see the one-time performance warning (Jitify is performing a one-time only warm-up to populate the persistent cache, this may take a few seconds and will be improved in a future release...) unless explicitly requesting the use of Jitify (e.g., cupy.RawModule(..., jitify=True)).
Enhanced NumPy 2.0 Compatibility
This release provides better interoperability with NumPy 2.0.
Support for CUDA 12.5 & 12.6
CuPy is now tested with CUDA 12.5 and 12.6.
RFC: Removing NumPy Fallback Mode in CuPy v14
The CuPy team is discussing the possibility of removing NumPy fallback feature in CuPy v14. Feel free to join the discussion in #8497 if you have any comments or use-cases using this feature.
📝 Changes
Enhancements
- Support CUDA 12.5 (#8423)
- Avoid using Jitify everywhere inside CuPy (#8473)
- Disable jitify for cub & Bump CCCL (#8487)
- Get rid of
pkg_resources(#8496) - Unregister
cupyx.scipy.linalg.{tri,tril,triu}from uarray (reverted in #8516) (#8506) - Use
.toarray()instead of.Aattribute (#8517) - Extend runtime header search logic to conda (#8520)
- Support CUDA 12.6 (#8524)
- Fallback to system headers for future CUDA 12.x versions (#8529)
Bug Fixes
- Fix spline temp container size in
make_interp_spline(#8390) - MAINT: Avoid using
np.compat.integer_types(#8413) - Fix type dispatcher for arm64 (#8414)
- Fix
ndarray.get()not honoring current stream when layout is not contiguous (#8418) - Fix copyto for NumPy 2 compatibility (#8435)
- Update
compiler.pyto avoid the popup of thenvcc.execonsole (#8438) - Fix
RandomState.seed()for NumPy 2 compatibility (#8439) - Fix the size of temporary CUB output space to consider its alignment (#8447)
- Address
KeyErrorsfromimportlib_metadata(#8465) - upfirdn:
mode=None->mode="constant"(#8495) - Search header files from CTK wheel (#8504)
- Fix CUDA version condition to use headers from wheel (#8507)
- Do not unregister
cupyx.scipy.linalg.{tri,tril,triu}from uarray (#8516) - Fix ROCm 4.3 binary package build broken (#8534)
- Fix cudart header detection for conda (#8535)
Documentation
- eigsh doc correction
_eigen.py(#8383) - typo:
coping->copying(#8427) - Add CUDA 12.5 to list of supported platform (#8428)
- Add comparison table for
(cupyx.)scipy.sparse.*_matrix classesclass methods (#8458)
Installation
- Patch the build system to better support conda-build (#8464)
Tests
- Bump NumPy/SciPy versions in cuda-example CI (#8420)
- Support SciPy 1.12 (#8422)
- Fix CUDA 11.2 CI failure on Linux (#8437)
- Decrease number of threads to avoid "system error: excessive memory usage is detected" (#8462)
- CI: skip CUDA 12.1/12.2/12.3/12.4 CI on "mini" trigger (#8469)
- Resolve Ruff
NPYerrors - fix exception imports andasfarrayusage in test code (#8471) - Skip some tests in aarch64 CI (#8490)
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
@andfoy @arkdong @asi1024 @bmerry @EarlMilktea @emcastillo @hmaarrfk @jakirkham @johnnynunez @kmaehashi @leofang @monzelr @seberg @swelborn @takagi @YanivDorGalron
v13.2.0
This is the release note of v13.2.0. See here for the complete list of solved issues and merged PRs.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
Support for NumPy 2.0 (#8357)
CuPy can now be imported under NumPy 2.0.
Lazily preloading NCCL (#8367)
CuPy now loads NCCL shared library at the time of import cupy.cuda.nccl, instead of import cupy. This improves NCCL compatibility on mixed-library environments.
📝 Changes
Enhancements
- cupyx: cleanup use of deprecated NumPy functionality (NumPy 2.0 compatibility) (#8325)
- make CuPy import under NumPy 2.0 (#8357)
- Lazy-preload NCCL (#8367)
Bug Fixes
- Fix overflow indexing ndarray generated with as_strided (#8349)
- Fix CUB build error on win-64 (#8358)
- Re-enable NVTX range coloring for NVTX3. (#8361)
Documentation
Tests
- [v13] Use the latest NumPy v1 for head CI (#8355)
Others
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
@asi1024 @cclauss @ev-br @grlee77 @kmaehashi @leofang @macrocosme @romerojosh @takagi
v13.1.0
This is the release note of v13.1.0. See here for the complete list of solved issues and merged PRs.
💬 Join the Matrix chat to talk with developers and users and ask quick questions!
🙌 Help us sustain the project by sponsoring CuPy!
✨ Highlights
Support for CUDA 12.3 & 12.4 (#8286)
CuPy now supports CUDA 12.3 and 12.4. Binary packages are available for Linux (x86_64/aarch64) and Windows as cupy-cuda12x.
Fixed Regression on pre-Volta platforms (#8216)
This release fixes the regression in CuPy v13.0.0 that part of CuPy functions were not functioning under pre-Volta platforms (compute capability < 7.0) such as NVIDIA Tesla P100 or GeForce GTX 1080.
📝 Changes
New Features
- Add
cupyx.signal.{complex_cepstrum,real_cepstrum,inverse_complex_cepstrum,minimum_phase}(#8096) - Add
cupyx.signal.{firfilter,firfilter_zi,firfilter2}(#8107) - Add
cupyx.signal.freq_shift(#8131) - Add
cupyx.signal.channelize_poly(#8148) - Add
cupyx.signal.ca_cfar(#8167)
Enhancements
- Add incontiguous support for cutensor functions (#8168)
- Remove usages of
numpy.float_andnumpy.complex_(#8181) - Fix
expm(complex matrix)(#8214) - Various Jitify improvements (#8237)
- Bump to cuTENSOR 2.0.1 (#8291)
NumPy-compatibility Improvements
- Fix
scp.signal.{medfilt,medfilt2d}to raise ValueError for complex64 inputs (#8084) - Fix
boxcox_llffor SciPy 1.12 changes (#8132) - Deprecate
cupyx.scipywavelet functions (#8139)
Bug Fixes
- Fix #7981, Update
_nccl_comm.py(#8112) - Fix Flags not to allow setters (#8138)
- Prevent angular brackets from appearing in Jitify's cache filename (#8160)
- Set
-archin the compiler options unconditionally (#8161) - Allow
cupy.show_config()without CUDA (#8192) - Fix jitify warmup kernel (#8216)
- Fix: remove unnecessary include that causes deployment issue (#8217)
- Fix build system for Thrust detection (#8230)
- Fix: always switch to the submodule dir before checking git tag/commit (#8240)
- Fix overflow of index calculation in random generator API (#8246)
- Fix Generator API parallelism (#8247)
- Fix CUB
min/maxinitial values (#8266) - Fix jitify warmup kernel - Cont'd (#8270)
Documentation
- Update conda installation guide (#8135)
- Fix pdist docstring in order to specify that the returned matrix is condensed (#8187)
- Replace license notice in cupyx.scipy.signal._spectral (#8271)
- Update document for CUDA 12.3 and 12.4 (#8284)
Installation
- Do not search for static libs (#8143)
Tests
- Fix
cupyx.scipy.special.betaincfor invalid inputs (#8098) - Revert CI timeout changes (#8137)
- Fix invalid
vectorstengthtests (#8145) - Fix actions versions used in workflows to avoid node 16 deprecation warning (#8194)
- Add CI to test
cupy.show_config()pass without CUDA installed (#8195) - Add import test without CUDA Toolkit (#8231)
- BUG: cupyx/scipy/signal: fix mpmath test (#8262)
- Tentatively pin SciPy to v1.12 in CI (#8275)
- Add support for CUDA 12.3 & 12.4 (#8286)
👥 Contributors
The CuPy Team would like to thank all those who contributed to this release!
@andfoy @asi1024 @emcastillo @ev-br @jemiryguo @kmaehashi @leofang @takagi