Prototype to move to systemd from supervisord (#2650) #3031

Hooverdan96 · 2025-09-23T02:51:49Z

Not quite fixing #2650
This is a draft attempt to move the remaining supervisord services over to systemd.

I split the remaining services into individual services.
the gunicorn service is a pre-req to remaining supervisor services (since they were set to priority 200 vs. gunicorn's priority 100).
because of that removed the rockstor.service entirely - that could/should probably look different in the end as to not impact all documentation, etc.
updated the data_collection file to the changed log file paths (removed the supervisord prefix in the new services)
A couple of files would still have to be deleted (the supervisor.conf file for one).

The installation on an existing VM seemed to work, I am getting the initial set up screen and can set up the instance.
However, and I am not sure that's related, I'm encountering this error when trying to select an update channel (based on the popup):

Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/system/osi.py", line 262, in run_command
    p = subprocess.Popen(
        ^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib64/python3.11/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/yum'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/command.py", line 262, in post
    return Response(rockstor_pkg_update_check(subscription=subo))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/pkg_mgmt.py", line 327, in rockstor_pkg_update_check
    version, date = rpm_build_info(pkg)
                    ^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/pkg_mgmt.py", line 109, in rpm_build_info
    o, e, rc = run_command([YUM, "info", "installed", "-v", pkg])
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/osi.py", line 279, in run_command
    raise Exception("Exception while running command({}): {}".format(cmd, e))
Exception: Exception while running command(['/usr/bin/yum', 'info', 'installed', '-v', 'rockstor*x86_64']): [Errno 2] No such file or directory: '/usr/bin/yum'

I ran the tests:

...
~ #: poetry run django-admin test -v 2
Ran 295 tests in 67.721s

OK
Destroying test database for alias 'default' ('test_storageadmin')...
Destroying test database for alias 'smart_manager' ('test_smartdb')...

I am making this a draft PR, while gathering feedback on whether this is the right approach, or @phillxnet already has it completely in the works, or ...

Trying to do this with training wheels 😄 so any pointers would be appreciated.

phillxnet · 2025-09-23T14:11:17Z

@Hooverdan96 Nice to see progress on this front. I had assigned myself to the liked issue but no worries. Plus I was going for more use of Execstart type approach i.e. from the issue text:

It is proposed that these be incorporated into rockstor.service via multiple ExecStart entries.

But maybe we can handle more services, after-all there isn't a limit on them. And maybe it's better that we do separate out all our concerns. Can't take a look, have more of a think, just yet however as I do want to get our rockon-valiidator into shape-enough for regular use. That way we will have more time for core etc and the Rock-ons, now quite numerous and likely growing in that regard, can be more regimented/predictable - an hopefully easier to review.

Thanks again for stepping up to this one.

@FroggyFlox how do you feel about stacking up yet-more systemd services: I'm not against it in principal myself, I was just a bit concerned about over-doing it. But it does isolate our process more.

phillxnet · 2025-09-23T14:14:48Z

@Hooverdan96 Re:

Exception: Exception while running command(['/usr/bin/yum', 'info', 'installed', '-v', 'rockstor*x86_64']): [Errno 2] No such file or directory: '/usr/bin/yum'

That looks like an old code-base: we don't use yum anymore!! Or at least I had thought I removed all references to it.

See: Remove dnf-yum use #2979 #2985

phillxnet · 2025-09-23T14:19:41Z

Also the name "Rockstor Gunicorn Service" will age badly re:

Move to ASGI using Daphne #2941

Hooverdan96 · 2025-09-23T14:27:40Z

Also the name "Rockstor Gunicorn Service" will age badly re:

Ah yes, may be then this one could be re-named to rockstor.service and the contents would change as Daphne becomes relevant and forum troubleshooting and other documentation is not as much affected (of course combining all of them into the rockstor.service unit could also be the solution).

Re: the mulit-exec vs individual services, I was thinking since quite a few things are modular it would make sense to separate these out as well, or group more logically? e.g. the gunicorn and the ztask/huey together and the replication and data-collector separately as those two are slightly "less" core, but in terms of making other adjustments for parameters for the individual services I figured it would be easier looking forward. But, either way...

FroggyFlox · 2025-09-23T14:28:33Z

@FroggyFlox how do you feel about stacking up yet-more systemd services: I'm not against it in principal myself, I was just a bit concerned about over-doing it. But it does isolate our process more.

I was wondering the same thing:
Pros
More isolation as you indicated, but is it actually an advantage? Maybe it would allow to keep a replication going even when our wsgi/asgi server crashed, but isn't that already the case?

Cons
I'm worried it would break a few areas in our codebase: I do believe we expect the service to be named rockstor.service in our "Services" areas, for instance. Now, we can still split services but keep the name rockstor.service but if we trigger a toggle from the webUI, we may not get what we expect...

Still thinking about it...

Hooverdan96 · 2025-09-23T14:30:09Z

@FroggyFlox

I do believe we expect the service to be named rockstor.service

I was just adjusting my comment above to suggest to name at least one of the services (i.e. gunicorn) back to rockstor.service.

Hooverdan96 · 2025-09-23T14:41:29Z

Maybe it would allow to keep a replication going even when our wsgi/asgi server crashed

From an isolation of concerns, wouldn't we want to minimize the processes that need to be restarted if one of them fails, like in your example above? My thinking is that at least the wsgi/asgi as a core part of the whole thing should be isolated/ability to address in a targeted manner. If new services were to be added in the future should they then also be added to a consolidated service unit to be consistent making that piece grow over time? I am not opposed to any approach (nor am the most knowledgeable here), just wanted to clarify my thinking on this.

phillxnet · 2025-09-23T15:07:09Z

We do also have the lesser-known https://github.com/rockstor/rockstor-core/blob/master/conf/rockstor-hdparm.service service - it's not central and does not feature in the dependency cascade we have regarding the others. But it is a separation of concerns, and a less referenced service where we have split-out stuff.

As per discussion to-date, we should keep the key rockstor.service. We don't really want to invalidate all forum and doc entries. And the top level bootstrap should also remain top-level (mentioned for context here only).

Lets see what it looks like with these changes made. We are, after-all, only adding the following here (assuming no rockstor.servcie renaming):

rockstor-data-collector.service
rockstor-ztask-daemon.service

The latter should be renamed also as it has relevance only to the internal service name (maintained for simplicity), and the Huey we currently use for what ztask used to do, will in time be replaced by a Django native scheduler. So we likely want this to reference scheduling rather than the technology used to do it. I.e. we were thinking of moving our scheduling manager to Django Q2, which I have some familiarity with. But @FroggyFlox has noticed a post Django 5.2 move to bring such things into the fold of native Django. But that may take longer that we have for such a move so we may end-up using Django Q2 first. Likely much closer to that will later become a native/built-in Django capability.

Where an ExecStart entry were-we to be using Django-Q2 would look something like:

ExecStart=/path/to/poetry run python manage.py qcluster

FroggyFlox · 2025-09-23T16:06:18Z

From an isolation of concerns, wouldn't we want to minimize the processes that need to be restarted if one of them fails, like in your example above? My thinking is that at least the wsgi/asgi as a core part of the whole thing should be isolated/ability to address in a targeted manner.

I do think so, yes. We just need to make sure we keep together/linked what needs to be linked. Our WSGI server and Nginx may benefit from being linked as they're interdependent, although not necessarily both ways (if Nginx crashes, we can't access the UI served by WSGI anyway, I think, but if WSGI crashes, then Nginx can still serve all other connections it currently handles), but replication itself might not so it could be left alone if WSGI/Nginx needs restarting. On the other hand, we do our replication using websocket or a constant communication channel that is probably routed by Nginx (I can't remember for sure), so replication may be interrupted as well if one needs to restart Nginx. But again, the actual btrfs processes(s) involved may still run, but would we be able to "catch" them again if the replication service is restarted?
I really do need to refresh myself on how all our services (to-be) are connected 🫤.

If new services were to be added in the future should they then also be added to a consolidated service unit to be consistent making that piece grow over time?

Good point. I guess we need to clearly make sure of what is considered "core" and what isn't.

The latter should be renamed also as it has relevance only to the internal service name (maintained for simplicity), and the Huey we currently use for what ztask used to do, will in time be replaced by a Django native scheduler. So we likely want this to reference scheduling rather than the technology used to do it.

Good point!

Hooverdan96 · 2025-09-23T17:04:09Z

alright, as you can see from the recent pushes, I:

reinstated rockstor.service, only containing the Server Gateway Interface (currently gunicorn) - also generized the log file names (using sgi instead of technology used)
renamed ztask to generic scheduling service to facilitate changes to underlying technology (e.g. eventually moving from huey to Django's native scheduler)

re: the dependency of nginx and (W/A)SGI, should that be reflected in the services? I believe, today it is not, but that probably due to the split usage of supervisord & systemd. Should the SGI (aka rockstor.service) have something like this?

[Unit]
Description=Rockstor SGI Service
After=network.target nginx.service
Requires=nginx.service

[Service]
Type=simple
...

Hooverdan96 · 2025-09-23T20:18:48Z

I'm encountering this error when trying to select an update channel

I think you are correct that this should not have occurred at all, it's probably an issue me using rsync to move stuff from a git managed repository locally where it possibly picked up artefacts it wasn't supposed to. I will look at that process more closely before I run my next round of tests. Interestingly, it did use the changes from supervisord to systemd so it's very strange ...

Hooverdan96 · 2025-09-25T20:32:09Z

When looking at adding the socket file as part of the move to systemd, I have noticed that supervisord essentially uses the example 'dummy' user (used in various examples on the supervisord repo) for both the supervisor server as well as the socket. Looking at the related processes processes running in a current installations, it seems all of them are owned by root, so I assume that for the .socket file I should also use root as the user.

phillxnet · 2025-09-26T10:49:29Z

@Hooverdan96 Can you expand on this:

When looking at adding the socket file as part of the move to systemd

It would be nice to have more sophistication on this front but I was just unclear on your particular aim here.

Hooverdan96 · 2025-09-26T15:37:06Z

Let's see whether I can make it more clear 😄. The supervisord.conf these two sections:

[unix_http_server]
file=/var/run/supervisor.sock
; https://github.com/Supervisor/supervisor/issues/717
chmod = 0700
username = dummy
password = dummy

and

[supervisorctl]
serverurl=unix:///var/run/supervisor.sock
username = dummy
password = dummy

show the usage of dummy for the user/password combo. From everything I've read, these should be real Linux users, but that user does not exist in any of the OpenSUSE images.

The above portion seems to be taken directly from the example referenced in the above conf file:

Supervisor/supervisor#717 (comment)

and then there is discussed why one would use a dummy user. When reading about other setup procedures, then the user/pwd combos are set using Linux user Ids with the justification to integrate with nginx. However, in this instruction set (which is also referenced in the supervisord.conf:

; Gunicorn
; TODO Move to using systemd socket invocation of a systemd service.
; See: "Step 7 — Creating systemd Socket and Service Files for Gunicorn"
; https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu-18-04#step-7-creating-systemd-socket-and-service-files-for-gunicorn
[program:gunicorn]
command=/opt/rockstor/.venv/bin/gunicorn --config ./conf/gunicorn.conf.py wsgi:application

https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu#step-7-creating-systemd-socket-and-service-files-for-gunicorn

no users a specified for the systemd service/socket.

In the gunicorn documentation the user is included again to enable the nginx integration:

https://docs.gunicorn.org/en/latest/deploy.html

In Rockstor's nginx conf I don't see the mention of a designated user.

Hence my conclusion that all is running/connecting via root. This could be wrong of course, since I am not the expert in this area. So my aim was to get confirmation that it should be ok to have the corresponding systemd services run as root user or additional thoughts from the development braintrust.

phillxnet · 2025-09-26T17:05:46Z

@Hooverdan96 I think I remember something about supervisord having a Web-UI; which we have never used. I think the user/pass reference in that file relates to his UI component of supervisord.

As for user in systemd, yes that would be nice - but it way way too much to take on in this PR as it would mean a complete system wide change to all that we do - we, after all, run as root to manage a system at that level. Ideally, and we have plans, want to run what we can under say a rockstor un-privaledged user. But that is a way off as yet - and we would need a lot of changes to handle that element of things.

Take a look at the supervisor config re user/pass to check on the above. I think I remember having to add that a few years ago as it became required; where-as previously it was not and there were some concerns at that time re default Web-UI access to supervisord via the default user/pass.

[EDIT] This was the change I was referencing re dummy dummy:

update supervisord.conf re upstream advice re user and dummpy pass.

See: Replace Buildout with Poetry #2432 #2434

And these are the changes I made back-then: https://github.com/rockstor/rockstor-core/pull/2434/files#diff-2a61879e1360255f36ef99617d00c5f3084b48e17af87e8595c75bae6bc51e28

phillxnet · 2025-09-26T17:20:17Z

Socket invocation of gunicorn is not likely of interest to us currently. Especially given we are, here, moving from supervisord to systemd. So given that could have enough ramifications on it's own (threading etc), we likely want to keep things as contained as we can and stick to straight sevice invokation, not adding in on-demand via socket activity. I'm not adverse to that later maybe, and have done this in another project via a systemd socket listener, but against this issue we want, ideally, a like-for-like I think.

Hooverdan96 · 2025-09-26T17:57:03Z

but it way way too much to take on in this PR

yes, that was not my intent, for sure. So I will stick to the "no user", i.e. it will run as root for now.

Socket invocation of gunicorn is not likely of interest to us currently.

pardon my ignorance here, but are you saying, I should not consider creating a socket service?

FroggyFlox · 2025-09-26T19:51:15Z

I should not consider creating a socket service?

That is my thinking as well, yes. We currently directly bind/listen to localhost on port 8000 instead of using a socket so we might as well keep it that way for this PR. See the relevant bit of config:
https://github.com/rockstor/rockstor-core/blob/testing/conf%2Fgunicorn.conf.py#L4

It seems the socket we currently have set in supervisord is only used for supervisorctl for now, isn't it?

add service files for gunicorn, replication, data-collector and huey. update data_collector.py log paths for gunicorn

deprecate supervisord = rockstor.service add individual services formerly managed by supervisord

adjust service units and scripts to refer back to rockstor.service update log file references to more generic in prep to moving to other asgi from wsgi

rename file to focus on task rather than technology. Align initrock.py with changed service name

updating other service files add socket file to initrock

phillxnet · 2025-09-27T17:21:48Z

src/rockstor/system/tests/test_services.py

+            "rockstor-replication",
+            "rockstor-scheduling",
+            "rockstor-data-collector",


@Hooverdan96 We don't have these intergrated into our DB model etc. Likely we can leave them out for now. We have the others known to our backend, so adding these in here will likely throw test errors - or be inconsistent. Unless you were planning to surface these also within the Web-UI? I don't think that is a requirement in dropping supervisord however. I.e. we didn't surface it's services.

But on the other hand we may still have some supervisord commands that will need replacing with the new equivalents you've introduced. And in which case these test entries and potential accompanying tests will then test the supervisord command replacements in-coming.

Apologies, just nipping in with a quick look and saw these. I know this is all in-draft.

OK, just noticed you have begun work on the supervisord to systemd replacment in-code :).

phillxnet · 2025-09-27T17:33:31Z

src/rockstor/system/services.py

    elif service_name == "sftp":
        # Delegate sshd's sftp subsystem status check to system.ssh.py call.
        return is_sftp_running(return_boolean=False)
-    elif service_name in ("replication", "data-collector", "ztask-daemon"):


Take care with the ztask-daemon service re-name, I seem to remember there were DB elements still using this old name which I left as-was (with ztask to huey move) due to the depth of the changes required. But maybe it's better we get this name-change over and done with. Especially given your move to a generic name now. My concern revolves around DB names within our services model. Again apologies for just nipping in on a draft here.

phillxnet · 2025-09-27T17:36:29Z

@Hooverdan96 & @FroggyFlox Nice to see this task finally under-way now. It will be a relief to cast off the now rather ancient approach of supervisord. But I do worry about this move uncovering some dragons re threads/workers etc. But all good really. Plus this is the perfect time given we are embarking on a new testing phase.

Hooverdan96 · 2025-09-27T20:19:45Z

@phillxnet no apologies needed, if I didn’t want help I would not have pushed a draft here. I am welcoming the hints and highlights of pitfalls so I don’t get too stuck in the wrong direction …
So please have at it.

Hooverdan96 · 2025-09-27T20:22:00Z

Not for this PR but after reading more on systemd would it make sense to look at the cgroup concept after the transition to systemd with everything? You might have mentioned that in another issue I just don’t remember

phillxnet · 2025-09-28T14:07:59Z

@Hooverdan96 Re cgroup, I've no idea currently. But if it means all our stuff can be grouped re compute, then yes; as it could well help us defined neater SELinux rules. And agreed this would make for a nice issue-pr set - once we are done with a complete transition to systemd. Which it looks like we are heading for in this next testing phase.

…ry 2.2.0

Hooverdan96 · 2025-10-05T22:59:40Z

Closing this draft one in favor of a cleaned up (and properly based) PR #3034

- remove supervisor references and config - genericize scheduling service (currently huey, formerly ztaskd) - add new corresponding services, update references in various files - reflect change to systemd in relevant test cases - harmonize rockstor-specific service unit nomenclature - genericize log files - update poetry project toml and lock file (commenting supervisor for future removal - changes based on prior PRs rockstor#3031 & rockstor#3034

phillxnet mentioned this pull request Sep 26, 2025

Package management: pkg_updates_info: Low max_wait: int = 15 leads to frequent logged error messages #3032

Open

FroggyFlox requested a review from phillxnet September 26, 2025 19:50

Hooverdan96 added 5 commits September 26, 2025 15:38

Preparation to move to systemd

5371a3e

add service files for gunicorn, replication, data-collector and huey. update data_collector.py log paths for gunicorn

update systemd services

bb20cf6

deprecate supervisord = rockstor.service add individual services formerly managed by supervisord

change dependency to rockstor-gunicorn

83fc5ea

re-establish rockstor.service

976638e

adjust service units and scripts to refer back to rockstor.service update log file references to more generic in prep to moving to other asgi from wsgi

make scheduling service generic

ec52e95

rename file to focus on task rather than technology. Align initrock.py with changed service name

Hooverdan96 force-pushed the 2650_move_to_systemd branch from d5c433e to ec52e95 Compare September 26, 2025 22:39

Hooverdan96 added 4 commits September 26, 2025 15:42

replace superctl with systemctl method

ef289f7

additional systemd alignment in prep_db and services

f793d28

add services to support list

9dc0f35

change return parameters for former supervisord services

89e748b

Hooverdan96 and others added 7 commits September 26, 2025 15:42

update services test

bb2d914

adding socket file

101605c

updating other service files add socket file to initrock

change to env file

865fe9d

Defer socket-based approach to later

2079ea8

rm rockstor.socket

746cbeb

clean up service unit

a0c0795

align on environment file

7c21e99

phillxnet reviewed Sep 27, 2025

View reviewed changes

Hooverdan96 added 8 commits September 28, 2025 17:22

cleanup more supervisor files

7d1d113

remove supervisor from pyproject.toml and update lock file using poet…

4e42cff

…ry 2.2.0

switch to .env file for new services

45aa35f

adjust ztask to generic rockstor-scheduling

a2dacac

genericize task scheduling

1396121

update replication service

f2fa04a

rename replication service file

66ec8e5

update replication items

b428c3e

Hooverdan96 closed this Oct 5, 2025

Hooverdan96 deleted the 2650_move_to_systemd branch October 5, 2025 22:59

phillxnet mentioned this pull request Oct 9, 2025

Add incoming systemd services rockstor/rockstor-rpmbuild#96

Closed

Hooverdan96 mentioned this pull request Oct 17, 2025

Moving remaining services from supervisor to systemd #2650 #3040

Merged

Prototype to move to systemd from supervisord (#2650) #3031

Prototype to move to systemd from supervisord (#2650) #3031

Uh oh!

Conversation

Hooverdan96 commented Sep 23, 2025

Uh oh!

phillxnet commented Sep 23, 2025

Uh oh!

phillxnet commented Sep 23, 2025

Uh oh!

phillxnet commented Sep 23, 2025

Uh oh!

Hooverdan96 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FroggyFlox commented Sep 23, 2025

Uh oh!

Hooverdan96 commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hooverdan96 commented Sep 23, 2025

Uh oh!

phillxnet commented Sep 23, 2025

Uh oh!

FroggyFlox commented Sep 23, 2025

Uh oh!

Hooverdan96 commented Sep 23, 2025

Uh oh!

Hooverdan96 commented Sep 23, 2025

Uh oh!

Hooverdan96 commented Sep 25, 2025

Uh oh!

phillxnet commented Sep 26, 2025

Uh oh!

Hooverdan96 commented Sep 26, 2025

Uh oh!

phillxnet commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phillxnet commented Sep 26, 2025

Uh oh!

Hooverdan96 commented Sep 26, 2025 • edited by phillxnet Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FroggyFlox commented Sep 26, 2025

Uh oh!

phillxnet Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phillxnet Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

phillxnet Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phillxnet commented Sep 27, 2025

Uh oh!

Hooverdan96 commented Sep 27, 2025

Uh oh!

Hooverdan96 commented Sep 27, 2025

Uh oh!

phillxnet commented Sep 28, 2025

Uh oh!

Hooverdan96 commented Oct 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Hooverdan96 commented Sep 23, 2025 •

edited

Loading

Hooverdan96 commented Sep 23, 2025 •

edited

Loading

phillxnet commented Sep 26, 2025 •

edited

Loading

Hooverdan96 commented Sep 26, 2025 •

edited by phillxnet

Loading

phillxnet Sep 27, 2025 •

edited

Loading

phillxnet Sep 27, 2025 •

edited

Loading