Releases: zlib-ng/zlib-ng
2.3.2 - Hotfix
2.3.1
Warning, bug in crc32 Chorba:
We recommend skipping this release and instead use the 2.3.2 Hotfix release.
First stable release of the 2.3.x branch.
This is a feature release that introduces several optimizations and improvements.
For some benchmarks and graphs, please see #2022.
For recommended os/cmake/compiler versions, please see Wiki.
The biggest addition is the Chorba CRC32 code, this is a major improvement to crc32 calculation speed
for pre-PCLMUL (or equivalent) cpus. For now, we have 3 variants of Chorba: Generic, SSE2 and SSE4.1.
We have also removed our detection and usage of the various aligned alloc functions, because we have to
support an application-provided alloc function, and thus we have to check and fix buffer alignments anyway,
so now we just use malloc() if none is provided.
The gzopen-related init code has been rewritten to clean up and unify the gzread and gzwrite behavior.
Several malloc calls removed, places in the gz* code with malloc calls is down from 7 to 4 places
(using gzopen will now only result in 2-3 calls to malloc total).
The reason for releasing 2.3.x instead of another 2.2 release is the introduction of Chorba CRC32, rewritten
gzopen init code, the increased CMake version requirement, and the removal of NMake project files.
There should not be any API/ABI changes (other than on the previously failing platforms fixed by #1980).
Tip for distro maintainers:
Distros that require x86-64-v3 or higher can optionally choose to disable compiling in the large generic-C
Chorba code by setting the CMake option "WITH_CRC32_CHORBA=OFF", since x86-64-v3 guarantees that pclmul
instructions will always be available on x86-64-v3, and the pclmul optimized crc32 does not depend on it
(the SSE2/SSE4.1 implementations do however).
PS: If your company benefits from zlib-ng, please consider donating so we can expand our fleet of test systems
Changes since 2.3.0-rc2:
Arch-Specific improvements/optimizations
- x86
Buildsystem
- Configure: Determine system architecture properly on *BSD systems #2012
- Configure: Fix SFLAGS and improve sed portability #2016
Tests/Benchmarks
- Update Google Benchmark to v1.9.4 #2013
CI
- CI: Downgrade "Windows GCC Native Instructions (AVX)" workflow #2014
Full changelog since 2.2.5:
Important fixes/changes
- Remove NMake build projects #1899
- Increase minimum supported CMake version from 3.5.1 to 3.12 #1973
- Don't build C-fallback functions that never get used on x86_64 #1984
- Fix type mismatch on platforms where int32_t and uint32_t use long instead of int #1980
Optimizations
- Implement Chorba CRC32 #1837
- Fix a big endian bug on the 32k and larger specializations of chorba #1891
- Explicit SSE2 vectorization of Chorba CRC method #1872
- SSE4.1 optimized chorba #1893
- Fix function declaration for chorba_small_nondestructive_sse2 #1907
- Fix 32bit large chorba #1912
- [WebAssembly] Fix stack overflow in crc32_chorba_118960_nondestructive. #1915
- Reorganize Chorba activation #2004
- Rewrite chorba dispatch functions #2006
- Use aligned loads in the chorba portions of the clmul crc routines #2019
- Conditionally shortcut via the chorba polynomial based on compile flags #2020
- Minor optimization of insert_string #1951
- Optimize compress_block() and build_tree() #1954
- Inline bi_reverse #1955
- Inline read_buf and flush_pending #1952
- Inline the CHUNKSIZE function #1974
Arch-Specific improvements/optimizations
- x86
- RISC-V
- PowerPC
- Use elf_aux_info() on OpenBSD PowerPC #2007
- ARM
- Provide an inline asm fallback for the ARMv8 intrinsics #1697
- ARM Neon: Fold a copy into the adler32 function for UPDATEWINDOW #1870
- Remove volatile keyword from ARM inline assembler #1908
- Disable NEON workaround on Clang 20 and above, and enable it for non-mobile platforms #1942
- Synchronise ARMv8 and Loongarch CRC32 implementations #1969
- Loongarch64
- Port SSE/AVX optimization to Loongarch64 LSX/LASX Vector Intrinsics #1925
Buildsystem
- Fix -Wunused-command-line-argument warnings on Mac OS X #1967
- Fix -Wstrict-prototypes warnings #1968
- Initial support for nVidia toolchain #1993
- CMake: Make test options dependent on ZLIB_ENABLE_TESTS #1933
- CMake: Allow C17 for newer CMake versions #1958
- CMake: Rename targets to avoid clashes when used as a subproject #1970
- CMake: Rename target files to avoid overwrite of PACKAGE_VERSION #1988
- Configure: Added --installnamedir #1867
- Configure: Add support for RISC-V ZBC extension #1917
- Configure: Determine system architecture properly on *BSD systems #2012
- Configure: Fix SFLAGS and improve sed portability #2016
Tests/Benchmarks
- Bench: Add benchmark for insert_string. #1956
- Tests: Fix type mismatch with Windows GCC. #1965
- Tests: Fix cast and truncation warnings. #1978 #1979
- Use CTest to simplify test configuration #2001
- Add tests for crc32_fold_copy functions #2005
- Add benchmark for crc32_fold_copy functions #2008
- Disable benchmark for slide_hash_c with Visual C++ #2009
- Update Google Benchmark to v1.9.4 #2013
CI
- CI: Minor fix for s390x CI runner version selection #1886
- CI: Fix broken actions-runner #1929
- CI: Update MacOS toolchain. #1962
- CI: Install Windows 11 SDK 10.0.22621 for 32-bit ARM. #1964
- CI: Use MacOS 14 for GCC UBSAN. #1963
- CI: Update macOS CI images #1971
- CI: Update s390x actions runner #1981
- CI: Downgrade "Windows GCC Native Instructions (AVX)" workflow #2014
Misc
- Clean up crc32_braid. #1873
- fix the url of the s390x actions worker patch #1882
- port: Use memalign or _aligned_malloc only, when available, fallback to malloc. #1863
- port: Use __cpuid only, when available. #1887
- Use 'block-list' and 'allow-list' terms #1976
- Verify pointers during functable init #1983
- Update README.md and add missing CMake/Configure option descriptions #2000
2.3.0-rc2
Release candidate 2 - Please test
After some recent improvements, another release candidate is appropriate.
The biggest changes were in changing to use CTest for enabling building tests in CMake,
also we found some room for improvements around Chorba-specific code.
Tip for distro maintainers:
Distros requiring x86-64-v3 or higher, can now choose to disable compiling in the large generic-C Chorba code
by setting the CMake option "WITH_CRC32_CHORBA=OFF", since the pclmul instructions will always be available
on x86-64-v3, and the pclmul optimized crc32 does not depend on it (unlike the SSE2/SSE4.1 implementations).
2.3.0-rc2
Optimizations
Arch-Specific improvements/optimizations
Tests/Benchmarks
- Use CTest to simplify test configuration #2001
- Add tests for crc32_fold_copy functions #2005
- Add benchmark for crc32_fold_copy functions #2008
- Disable benchmark for slide_hash_c with Visual C++ #2009
Misc
- Update README.md and add missing CMake/Configure option descriptions #2000
2.3.0-rc1
Release candidate 1
This is a feature release that introduces several optimizations and improvements.
The biggest addition is the Chorba CRC32 code, this is a major improvement to crc32 calculation speed for pre-CLMUL cpus. For now, we have 3 variants of Chorba: Generic, SSE2 and SSE4.1.
We have also removed our detection and usage of the various aligned alloc functions, because we need to support an application-provided alloc function, we have to check and fix buffer alignments anyway, so now we just use malloc() if none is provided.
The gzopen-related init code has been rewritten to clean up and unify the gzread and gzwrite behavior. Several malloc calls removed, places in the gz* code with malloc calls is down from 7 to 4 places (using gzopen will now only result in 2-3 calls to malloc total).
The reason for releasing 2.3.x instead of another 2.2 release is the introduction of Chorba CRC32, rewritten gzopen init code, the increased CMake version requirement, and the removal of NMake project files. There should not be any API/ABI changes (other than on the previously failing platforms fixed by #1980).
2.3.0-rc1
Important fixes/changes
- Remove NMake build projects #1899
- Increase minimum supported CMake version from 3.5.1 to 3.12 #1973
- Don't build C-fallback functions that never get used on x86_64 #1984
- Fix type mismatch on platforms where int32_t and uint32_t use long instead of int #1980
Optimizations
- Implement Chorba CRC32 #1837
- Fix a big endian bug on the 32k and larger specializations of chorba #1891
- Explicit SSE2 vectorization of Chorba CRC method #1872
- SSE4.1 optimized chorba #1893
- Fix function declaration for chorba_small_nondestructive_sse2 #1907
- Fix 32bit large chorba #1912
- [WebAssembly] Fix stack overflow in crc32_chorba_118960_nondestructive. #1915
- Minor optimization of insert_string #1951
- Optimize compress_block() and build_tree() #1954
- Inline bi_reverse #1955
- Inline read_buf and flush_pending #1952
- Inline the CHUNKSIZE function #1974
Arch-Specific improvements/optimizations
- x86
- RISC-V
- ARM
- Provide an inline asm fallback for the ARMv8 intrinsics #1697
- ARM Neon: Fold a copy into the adler32 function for UPDATEWINDOW #1870
- Remove volatile keyword from ARM inline assembler #1908
- Disable NEON workaround on Clang 20 and above, and enable it for non-mobile platforms #1942
- Synchronise ARMv8 and Loongarch CRC32 implementations #1969
- Loongarch64
- Port SSE/AVX optimization to Loongarch64 LSX/LASX Vector Intrinsics #1925
Buildsystem
- Fix -Wunused-command-line-argument warnings on Mac OS X #1967
- Fix -Wstrict-prototypes warnings #1968
- Initial support for nVidia toolchain #1993
- CMake: Make test options dependent on ZLIB_ENABLE_TESTS #1933
- CMake: Allow C17 for newer CMake versions #1958
- CMake: Rename targets to avoid clashes when used as a subproject #1970
- CMake: Rename target files to avoid overwrite of PACKAGE_VERSION #1988
- Configure: Added --installnamedir #1867
- Configure: Add support for RISC-V ZBC extension #1917
Tests/Benchmarks
- Bench: Add benchmark for insert_string. #1956
- Tests: Fix type mismatch with Windows GCC. #1965
- Tests: Fix cast and truncation warnings. #1978 #1979
CI
- CI: Minor fix for s390x CI runner version selection #1886
- CI: Fix broken actions-runner #1929
- CI: Update MacOS toolchain. #1962
- CI: Install Windows 11 SDK 10.0.22621 for 32-bit ARM. #1964
- CI: Use MacOS 14 for GCC UBSAN. #1963
- CI: Update macOS CI images #1971
- CI: Update s390x actions runner #1981
Misc
- Clean up crc32_braid. #1873
- fix the url of the s390x actions worker patch #1882
- port: Use memalign or _aligned_malloc only, when available, fallback to malloc. #1863
- port: Use __cpuid only, when available. #1887
- Use 'block-list' and 'allow-list' terms #1976
- Verify pointers during functable init #1983
2.2.5
This is a bugfix release, backporting a few bugfixes and updating the CI.
Changes
Important fixes
- RiscV: chunkset_rvv: fix SIGSEGV in CHUNKCOPY #1889
- MSVC: Disable optimizations for AVX512 GET_CHUNK_MAG causing inflate failure #1884
- Fix building with runtime CPU detection disabled (native) #1931
- Also check for ZMM support when detecting VPCLMULQDQ support #1932
- Revert "Clean up insert_match() in deflate_medium" due to performance regression #1938
Buildsystem
- Pass POSIX_C_SOURCE for std::alligned_alloc try_compile checks #1896
- X86_AVX512VNNI: check for _mm256_dpbusd_epi32 too #1944
- CMake: Fix incorrect declaration of FORCE_SSE2 #1880
- CMake: Fix CXXFLAGS when coverage enabled #1902
- CMake: Remove late enable_language calls #1903
- CMake: [FreeBSD] Define _XOPEN_SOURCE for gtest_zlib #1900
- CMake: Add bindir into zlib.pc.in for compatibility with Cygwin and Msys2 #1920
- Configure: riscv: add bash configure script support for riscv 1904
Tests/Benchmarks
- Test: Fix pointer type mismatch #1897
- Test: Add large 1mb buffer test for crc32 hashing #1913
- Changes to running benchmark during tests #1892
CI
- CI: Restore support macOS prior 10.15 #1878
- CI: fixes for RISC-V #1890
- CI: Preinstall packages needed for testing and benchmark #1894
- CI: Remove deprecated ubuntu-20.04 image from CI #1898
- CI: Replace deprecated windows-2019 with windows-2022 #1923
Misc
- Add .gitignore to allow run tests with zlib-ng/corpora and local dataset from working copy #1930
2.2.4
This is primarily a bugfix release, fixing a few new and old bugs.
Changes
Important fixes
- Fix potential shift overflow problems reported by static checkers #1859
- VS2015: Fix an unfortunate bug #1862
- RVV: Workaround error G6E97C40B #1853
- s390x: Disable CRC32-VX Extensions for some broken Clang versions #1852
Buildsystem
Tests/Benchmarks
- Add uncompress benchmark #1860
CI
2.2.3
New Years release, zlib-ng's 10 years celebration
My first commit in this repo was in October 8th 2014, although I do remember that I started during summer vacation and had made the zlib cleanup more of a mess than I wanted, so I restarted from scratch in October when I had gotten a better overview of the code and what I wanted to do to clean it up. At that point zlib-ng was not very likely to go anywhere, but despite the odds, over time several people found it and opened PRs with their own improvements, a few of those became long-time contributors, and a few years ago zlib-ng finally became more than an experimental fork. Zlib-ng has since gained traction and several distros have started replacing stock zlib with zlib-ng in compat mode.
The past year we have been lucky enough to have received donations so that we were able to invest in a couple Rpi5 systems for testing, and we are looking forward to hopefully being able to acquire more architectures for development and testing, Risc-V would be interesting for example, and we are lacking a dedicated testing machine capable of AVX512.
Release 2.2.3
This time we have two code fixes for potentially unsafe access, although we have not had any bug-reports about these.
It also contains several optimizations. Especially of note, inflate has been optimized on various instruction sets and also the generic C code has seen improvements, and we have improvements for arches where unaligned accesses are not possible (lacking instructions to handle unaligned access) and also improvements on big endian.
Example benchmarks:
x86-64 AVX2: Inflate ~17.8% faster, Deflate unchanged. -4.6KB library size.
Aarch64: Inflate ~2.3% faster, Deflate unchanged. - 5.5KB library size.
We also took some time to do a comprehensive cleanup of the now misleading UNALIGNED_OK option and of all the "unaligned" functions. We have noticed that some distros have been disabling these, fearing they are using potentially unsafe unaligned pointers, but we already fixed that in 2.1.0-beta1. Since then, these "unaligned" settings/functions have been referring to using unaligned accesses in safe ways, like utilizing unaligned intrinsics or memcpy to fix alignment for example and selecting what safe method is optimal to the arch. So disabling that instead disabled several safe optimizations.
Because this was obviously misleading certain distros into disabling these optimizations, we have cleaned it up, removed a lot of unnecessary preprocessor checks, and made detection of optimal methods happen during compile instead of configure. As a bonus, this cleaned up a lot of code and also let us not compile in many extra variants of compare256/longest_match, saving about 8-10KB of library size.
- PS: s390x is currently potentially unsafe, CI reports a failure on the MSAN test, this is pending investigation by IBM. See #1845.
Update: This is caused by a bug in Clang versions 18 -> 19.1.2, ref: llvm/llvm-project#109113
Any Zlib-ng version on s390x built with VX-extensions enabled compiled using a buggy Clang version will be unsafe. - PPS: 32bit ARM windows release dlls failed to automatically compile due to Github Actions upgrading their build images, so unfortunately there are no binaries for that currently. This does not affect self-built binaries. See #1839.
Changes
Fixes for potentially unsafe access
Optimizations / Cleanups
- Allow the compiler to inline chunkcopy_safe more readily #1781
- Misc inflate cleanup#1797
- Reorder variables in inflate functions to reduce padding holes #1803
- Improve chunkset_avx2 performance #1778
- Simplify inflate fast by dispatching to chunkmemset for all chunkcopy cases #1802
- Make an AVX512 inflate fast with low cost masked writes #1805
- Enable AVX2 functions to be built with BMI2 instructions #1816
- Improve pipeling for AVX512 chunking #1821
- Risc-V: adler32_rvv: Fix two overflow problems #1826
- Remove UNALIGNED_OK checks #1828 #1834 #1835 #1830
- Use GCC's may_alias attribute for unaligned memory access #1548
Big Endian
- Make big endians first class citizens again #1831
- Fix "RLE" compression with big endian architectures #1832
Buildsys fixes / minor fixes
- Fix build on aarch64 android. #1783
- Allow overridde CMAKE_CXX_* variables and fix overridde CMAKE_C_* #1787
- Use target include instead of raw include #1784
- Replace non-ascii characters to fix MSVC warning #1791
- Force Visual C++ to treat source files as UTF-8. #1789
- Explicitly set CMake policy 0169 to silence warning #1792
- configure: Fix linker flags for Haiku. #1799
- configure: add --mandir to override $mandir on command line. #1800
- Force use of latest Windows SDK with 32-bit ARM support #1811
- Fix casting warning/error in test_compress_bound.cc #1814
- Remove unused HAVE_CHUNKMEMSET_1 define #1815
- Fix native detection of ARM CRC instruction #1818
- Address deprecated cmake version warning. #1812
- Add a fallback to ALIGNED_ macro for other compilers #1820
- added in-tree build artifacts to .gitignore #1823
- Fix typos #1825
CI
2.2.2
This release fixes a corruption bug with the inflateback implementation, this was
detected by Docker using pigz w/zlib-ng to decompress a 25GB image file and failing CRC.
Since this is so far the only known way to trigger the bug, it seems to be hard to hit.
Most of the rest are minor changes to avoid triggering warnings in MSVC or optional
warnings in other compilers, as well as a few minor fixes to the buildsystem and CI.
Changes
Important Fixes
- Don't use chunkunroll for inflateBack #1773
Buildsystem
- Enable warning C4242 and treat warnings as errors for Visual C++. #1768
- Fixed false positive HAVE_ARMV6_INTRIN value on old ARM platforms #1774
CI/Test
- Upgrade MacOS ARM64 UBSAN to use gcc-13. #1763
Misc
- Fix compiler warnings #1762 #1764 #1765
- Fix new Windows SDK build break #1771
- Prepare to make use of unaligned loads on big endian in insert_string #1695
S390x
- IBM zSystems: Hardcode HWCAP_S390_VXRS #1766
RISC-V
- Better run-time detection of RVV vector instruction support #1770
2.1.8
This backport release fixes a corruption bug with the inflateback implementation, this was
detected by Docker using pigz w/zlib-ng to decompress a 25GB image file and failing CRC.
Since this is so far the only known way to trigger the bug, it seems to be hard to hit.
Most of the rest are minor changes to avoid triggering warnings in MSVC or optional
warnings in other compilers, as well as a few minor fixes to the buildsystem and CI.
Other changes
Important Fixes
- Don't use chunkunroll for inflateBack #1773
Buildsystem
- Enable warning C4242 and treat warnings as errors for Visual C++. #1768
- Fixed false positive HAVE_ARMV6_INTRIN value on old ARM platforms #1774
- don't use zlib-ng's -Wl,--version-script in tests #1750
CI/Test
- Upgrade MacOS ARM64 UBSAN to use gcc-13. #1763
Misc
S390x
- IBM zSystems: Hardcode HWCAP_S390_VXRS #1766
RISC-V
- Better run-time detection of RVV vector instruction support #1770
2.2.1
This is the first stable release of the 2.2.x branch
Please read the changelog for the 2.2.0 Release Candidate if you didn't already, especially if your software gives zlib-ng a custom allocator.
No bugreports came in during 2.2.0 RC testing, so the only change in 2.2.1 is a small fix for Configure that was already in the pipe:
- Configure: Don't use zlib-ng's -Wl,--version-script in tests #1750