Set `LimitNOFILE=1024:524288` for `crio.service`

### What happened?

I was recently made aware of this configuration line ([contributed Oct 2016](https://github.com/cri-o/cri-o/pull/135)):

https://github.com/cri-o/cri-o/blob/91816d7e8a38df2c307811f80d2e34fb53f01a21/contrib/systemd/crio.service#L20

Quite a bit has changed since then, notably with systemd v240 release in 2018Q4. Both Docker and Containerd projects have recently removed the line from their configs to rely on the `1024:524288` default systemd v240 provides (_unless the system has been configured explicitly to some other value, which the system administrator may do so when they know they need higher limits_).

You can find insights related to those PRs, along with a third link to the Envoy project (_as an example of a popular software that presently does not raise it's soft limit or document that requirement, but has depended upon this implicit config in the environment_) where the linked comment details why the soft limit should be `1024` to avoid software incompatibility:

- https://github.com/moby/moby/pull/45534
- https://github.com/containerd/containerd/pull/8924
- https://github.com/envoyproxy/envoy/issues/31502#issuecomment-1868149343
- https://github.com/awslabs/amazon-eks-ami/pull/1535#issuecomment-1868076423

---

This issue is raised to suggest consider applying the same change.

**Either:**
- Remove the line like Docker and containerd have done
- Include `LimitNOFILE=1024:524288` with contextual comment.
  - Although it may be better to remove the line, instead relying implicitly on the system default.
  - Admins / users could also use a [drop-in service override unit](https://github.com/moby/moby/issues/45436#issuecomment-1534520189).

### What did you expect to happen?

For `LimitNOFILE` to have a soft limit of `1024`, so that software running in a container operates with the same environment defaults of the host system.

Raising the default soft limit should be done explicitly by the admin, or via the process that needs it implicitly (_see Python reproduction below for an example of this_).

### How can we reproduce it (as minimally and precisely as possible)?

### Commands

I am not familiar with `cri-o`, but the equivalent Docker commands demonstrate the difference (_which for `LimitNOFILE=1048576` can be more subtle, for example `postsrsd` would be <500ms vs 8 minutes_):

```console
# Demonstrating the impact on a python process with `LimitNOFILE=1048576`:
$ docker run --rm -it --ulimit "nofile=1048576" --volume './python_close_individual.py:/tmp/test.py' python:3.12-alpine3.19 ash -c 'time python3 /tmp/test.py 1048570 100'
115.63215670500358
real    1m 56.75s
user    0m 4.08s
sys     1m 51.62s

# For this test example, the 1st parameter (number of FDs to open/close per process) is sufficient to alter the iteration behaviour,
# If you lower `--ulimit` instead, the program will fail to open FDs outside the hard limit.
# Much faster than 2 minutes, only 145ms.
$ docker run --rm -it --ulimit "nofile=1048576" --volume './python_close_individual.py:/tmp/test.py' python:3.12-alpine3.19 ash -c 'time python3 /tmp/test.py 1024 100'
0.1456817659927765
real    0m 0.24s
user    0m 0.18s
sys     0m 0.06s

# Fedora 35 (for comparison to next snippet results)
$ docker run --rm -it --ulimit "nofile=1048576" --volume './python_close_individual.py:/tmp/test.py' fedora:35 bash -c 'dnf install -y python3 && time python3 /tmp/test.py 1048570 100'
114.43512667200412
real    1m55.261s
user    0m3.282s
sys     1m50.847s
```

```console
# fedora:34 uses a version of Python (3.9) with a less optimized `closerange()` call
# fedora:35 uses Python 3.10 which can use a faster syscall when available (requires glibc 2.34+ in container and host kernel 5.9+)
$ docker run --rm -it --ulimit "nofile=1048576" --volume './python_close_range.py:/tmp/test.py' fedora:35 bash -c 'dnf install -y python3 && time python3 /tmp/test.py'
real    0m0.015s
user    0m0.000s
sys     0m0.014s

$ docker run --rm -it --ulimit "nofile=1048576" --volume './python_close_range.py:/tmp/test.py' fedora:34 bash -c 'dnf install -y python3 && time python3 /tmp/test.py'
real    0m6.268s
user    0m1.950s
sys     0m4.319s

# Alpine as of Jan 2024 does not have compatibility for the better closerange syscall like fedora:35+ does:
$ docker run --rm -it --ulimit "nofile=1048576" --volume './python_close_range.py:/tmp/test.py' python:alpine ash -c 'time python3 /tmp/test.py'
real    0m 6.81s
user    0m 2.55s
sys     0m 4.26s
```

### Sources

**`python_close_individual.py`:**

```python
import os, subprocess, sys, timeit
from resource import *

# Example, get the soft and hard limits from the environment and try to adjust that (soft limit to hard)
soft, hard = getrlimit(RLIMIT_NOFILE)
setrlimit(RLIMIT_NOFILE, (hard, hard))

# CLI args, number of FDs to open and how many times to run the bench method
num_fds, num_iter = map(int, sys.argv[1:3])

for i in range(num_fds):
    os.open('/dev/null', os.O_RDONLY)

# Spawn a subprocess that inherits the FDs opened (which will close them internally).
# Do this N times to demonstrate the impact:
# https://docs.python.org/3/library/timeit.html
# `subprocess.run()` calls Popen, which by default (close_fd=True) closes each FD above 3 individually:
# https://docs.python.org/3/library/subprocess.html#popen-constructor
# > If `close_fds` is `true`, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.
print(timeit.timeit(lambda: subprocess.run('/bin/true'), number=num_iter))
```

**`python_close_range.py`:**

```python
import os;

# Raise repetition to emulate a more intensive task
num_iter = 100
# Close all FDs after the third to the max, a common initialization practice for daemons
# The faster call with the fedora:35 image is constant, avoiding iteration over a potentially
# large range of FDs, each with an individual `close()` call..
for i in range(num_iter):
  os.closerange(3, os.sysconf("SC_OPEN_MAX"))
```

---

**Reproduction references:**
- https://github.com/docker-mailserver/docker-mailserver/issues/2722#issuecomment-1219104993
- https://github.com/python/cpython/issues/57997

### Anything else we need to know?

While containerd is yet to publish a release with this change AFAIK (should be scheduled for v2.0), AWS eagerly adopted the change and promptly reverted it due to customer feedback with some software failing to communicate a request for a higher soft limit (_some AWS specific software and Envoy are known examples_).

AWS can provide a higher `LimitNOFILE` configuration if that better suits their users (_despite the referenced `1024` soft limit concerns, or the difficult to troubleshoot issues with `LimitNOFILE=infinity`_), but that should be a vendor decision while projects like `cri-o` actually fix the bug.

`LimitNOFILE=1048576` is not as bad as `LimitNOFILE=infinity`, however:
- This concern still applies: https://github.com/envoyproxy/envoy/issues/31502#issuecomment-1868149343
- Software such as [MySQL has been known to allocate excessive memory](https://github.com/overhangio/tutor/pull/810/files#diff-a7c720c9cc5fa1b93c92c33e3e75a0b0880d80915d867d57d48464ff4f472b0aR115-R117), this will be `1,000x` less but affected deployments would still be allocating `1,000x` more than they may need. [Java runtime was also identified as another culprit](https://github.com/awslabs/amazon-eks-ami/issues/1551#issuecomment-1868134322).
- Software like [PostSRSd](https://github.com/roehling/postsrsd/issues/122), [Fail2Ban](https://github.com/fail2ban/fail2ban/issues/3334#issuecomment-1219098856), [Rsyslog](https://github.com/rsyslog/rsyslog/issues/5158#issuecomment-1724755490)
- RPM package managers:
  - [`yum`](https://github.com/moby/moby/issues/45838#issuecomment-1789190578) (_**NOTE:** [PowerDNS had to workaround](https://github.com/PowerDNS/pdns-builder/pull/39) due to **6 hours** image build time, however that was due to a `2^30`, not `2^20` limit_)
  - [`zypper`](https://github.com/moby/moby/issues/23137) (_**NOTE:** `LimitNOFILE=1048576` taking 30-60 minutes, could be much faster_)
  - [`dnf`](https://github.com/rpm-software-management/rpm/pull/444)

### CRI-O and Kubernetes version

<details>

N/A

```console
$ crio --version
# paste output here
```

```console
$ kubectl version --output=json
# paste output here
```

</details>


### OS version

<details>

N/A

Test reproduction environment was WSL2 (Ubuntu), but previously was Arch Linux and Fedora.

```console
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
```

</details>


### Additional environment details (AWS, VirtualBox, physical, etc.)

<details>

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set `LimitNOFILE=1024:524288` for `crio.service` #7703

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Commands

Sources

Anything else we need to know?

CRI-O and Kubernetes version

OS version

Additional environment details (AWS, VirtualBox, physical, etc.)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Set LimitNOFILE=1024:524288 for crio.service #7703

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Commands

Sources

Anything else we need to know?

CRI-O and Kubernetes version

OS version

Additional environment details (AWS, VirtualBox, physical, etc.)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Set `LimitNOFILE=1024:524288` for `crio.service` #7703