Skip to content

Conversation

@Hooverdan96
Copy link
Member

Not quite fixing #2650
This is a draft attempt to move the remaining supervisord services over to systemd.

  • I split the remaining services into individual services.
  • the gunicorn service is a pre-req to remaining supervisor services (since they were set to priority 200 vs. gunicorn's priority 100).
  • because of that removed the rockstor.service entirely - that could/should probably look different in the end as to not impact all documentation, etc.
  • updated the data_collection file to the changed log file paths (removed the supervisord prefix in the new services)
  • A couple of files would still have to be deleted (the supervisor.conf file for one).

The installation on an existing VM seemed to work, I am getting the initial set up screen and can set up the instance.
However, and I am not sure that's related, I'm encountering this error when trying to select an update channel (based on the popup):

Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/system/osi.py", line 262, in run_command
    p = subprocess.Popen(
        ^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib64/python3.11/subprocess.py", line 1955, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/yum'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/command.py", line 262, in post
    return Response(rockstor_pkg_update_check(subscription=subo))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/pkg_mgmt.py", line 327, in rockstor_pkg_update_check
    version, date = rpm_build_info(pkg)
                    ^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/pkg_mgmt.py", line 109, in rpm_build_info
    o, e, rc = run_command([YUM, "info", "installed", "-v", pkg])
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/osi.py", line 279, in run_command
    raise Exception("Exception while running command({}): {}".format(cmd, e))
Exception: Exception while running command(['/usr/bin/yum', 'info', 'installed', '-v', 'rockstor*x86_64']): [Errno 2] No such file or directory: '/usr/bin/yum'

I ran the tests:

...
~ #: poetry run django-admin test -v 2
Ran 295 tests in 67.721s

OK
Destroying test database for alias 'default' ('test_storageadmin')...
Destroying test database for alias 'smart_manager' ('test_smartdb')...

I am making this a draft PR, while gathering feedback on whether this is the right approach, or @phillxnet already has it completely in the works, or ...

Trying to do this with training wheels 😄 so any pointers would be appreciated.

@phillxnet
Copy link
Member

@Hooverdan96 Nice to see progress on this front. I had assigned myself to the liked issue but no worries. Plus I was going for more use of Execstart type approach i.e. from the issue text:

It is proposed that these be incorporated into rockstor.service via multiple ExecStart entries.

But maybe we can handle more services, after-all there isn't a limit on them. And maybe it's better that we do separate out all our concerns. Can't take a look, have more of a think, just yet however as I do want to get our rockon-valiidator into shape-enough for regular use. That way we will have more time for core etc and the Rock-ons, now quite numerous and likely growing in that regard, can be more regimented/predictable - an hopefully easier to review.

Thanks again for stepping up to this one.

@FroggyFlox how do you feel about stacking up yet-more systemd services: I'm not against it in principal myself, I was just a bit concerned about over-doing it. But it does isolate our process more.

@phillxnet
Copy link
Member

@Hooverdan96 Re:

Exception: Exception while running command(['/usr/bin/yum', 'info', 'installed', '-v', 'rockstor*x86_64']): [Errno 2] No such file or directory: '/usr/bin/yum'

That looks like an old code-base: we don't use yum anymore!! Or at least I had thought I removed all references to it.

See: Remove dnf-yum use #2979 #2985

@phillxnet
Copy link
Member

Also the name "Rockstor Gunicorn Service" will age badly re:

Move to ASGI using Daphne #2941

@Hooverdan96
Copy link
Member Author

Hooverdan96 commented Sep 23, 2025

Also the name "Rockstor Gunicorn Service" will age badly re:

Ah yes, may be then this one could be re-named to rockstor.service and the contents would change as Daphne becomes relevant and forum troubleshooting and other documentation is not as much affected (of course combining all of them into the rockstor.service unit could also be the solution).

Re: the mulit-exec vs individual services, I was thinking since quite a few things are modular it would make sense to separate these out as well, or group more logically? e.g. the gunicorn and the ztask/huey together and the replication and data-collector separately as those two are slightly "less" core, but in terms of making other adjustments for parameters for the individual services I figured it would be easier looking forward. But, either way...

@FroggyFlox
Copy link
Member

@FroggyFlox how do you feel about stacking up yet-more systemd services: I'm not against it in principal myself, I was just a bit concerned about over-doing it. But it does isolate our process more.

I was wondering the same thing:
Pros
More isolation as you indicated, but is it actually an advantage? Maybe it would allow to keep a replication going even when our wsgi/asgi server crashed, but isn't that already the case?

Cons
I'm worried it would break a few areas in our codebase: I do believe we expect the service to be named rockstor.service in our "Services" areas, for instance. Now, we can still split services but keep the name rockstor.service but if we trigger a toggle from the webUI, we may not get what we expect...

Still thinking about it...

@Hooverdan96
Copy link
Member Author

Hooverdan96 commented Sep 23, 2025

@FroggyFlox

I do believe we expect the service to be named rockstor.service

I was just adjusting my comment above to suggest to name at least one of the services (i.e. gunicorn) back to rockstor.service.

@Hooverdan96
Copy link
Member Author

Maybe it would allow to keep a replication going even when our wsgi/asgi server crashed

From an isolation of concerns, wouldn't we want to minimize the processes that need to be restarted if one of them fails, like in your example above? My thinking is that at least the wsgi/asgi as a core part of the whole thing should be isolated/ability to address in a targeted manner. If new services were to be added in the future should they then also be added to a consolidated service unit to be consistent making that piece grow over time? I am not opposed to any approach (nor am the most knowledgeable here), just wanted to clarify my thinking on this.

@phillxnet
Copy link
Member

We do also have the lesser-known https://github.com/rockstor/rockstor-core/blob/master/conf/rockstor-hdparm.service service - it's not central and does not feature in the dependency cascade we have regarding the others. But it is a separation of concerns, and a less referenced service where we have split-out stuff.

As per discussion to-date, we should keep the key rockstor.service. We don't really want to invalidate all forum and doc entries. And the top level bootstrap should also remain top-level (mentioned for context here only).

Lets see what it looks like with these changes made. We are, after-all, only adding the following here (assuming no rockstor.servcie renaming):

  • rockstor-data-collector.service
  • rockstor-ztask-daemon.service

The latter should be renamed also as it has relevance only to the internal service name (maintained for simplicity), and the Huey we currently use for what ztask used to do, will in time be replaced by a Django native scheduler. So we likely want this to reference scheduling rather than the technology used to do it. I.e. we were thinking of moving our scheduling manager to Django Q2, which I have some familiarity with. But @FroggyFlox has noticed a post Django 5.2 move to bring such things into the fold of native Django. But that may take longer that we have for such a move so we may end-up using Django Q2 first. Likely much closer to that will later become a native/built-in Django capability.

Where an ExecStart entry were-we to be using Django-Q2 would look something like:

ExecStart=/path/to/poetry run python manage.py qcluster

@FroggyFlox
Copy link
Member

From an isolation of concerns, wouldn't we want to minimize the processes that need to be restarted if one of them fails, like in your example above? My thinking is that at least the wsgi/asgi as a core part of the whole thing should be isolated/ability to address in a targeted manner.

I do think so, yes. We just need to make sure we keep together/linked what needs to be linked. Our WSGI server and Nginx may benefit from being linked as they're interdependent, although not necessarily both ways (if Nginx crashes, we can't access the UI served by WSGI anyway, I think, but if WSGI crashes, then Nginx can still serve all other connections it currently handles), but replication itself might not so it could be left alone if WSGI/Nginx needs restarting. On the other hand, we do our replication using websocket or a constant communication channel that is probably routed by Nginx (I can't remember for sure), so replication may be interrupted as well if one needs to restart Nginx. But again, the actual btrfs processes(s) involved may still run, but would we be able to "catch" them again if the replication service is restarted?
I really do need to refresh myself on how all our services (to-be) are connected 🫤.

If new services were to be added in the future should they then also be added to a consolidated service unit to be consistent making that piece grow over time?

Good point. I guess we need to clearly make sure of what is considered "core" and what isn't.

The latter should be renamed also as it has relevance only to the internal service name (maintained for simplicity), and the Huey we currently use for what ztask used to do, will in time be replaced by a Django native scheduler. So we likely want this to reference scheduling rather than the technology used to do it.

Good point!

@Hooverdan96
Copy link
Member Author

alright, as you can see from the recent pushes, I:

  • reinstated rockstor.service, only containing the Server Gateway Interface (currently gunicorn) - also generized the log file names (using sgi instead of technology used)
  • renamed ztask to generic scheduling service to facilitate changes to underlying technology (e.g. eventually moving from huey to Django's native scheduler)

re: the dependency of nginx and (W/A)SGI, should that be reflected in the services? I believe, today it is not, but that probably due to the split usage of supervisord & systemd. Should the SGI (aka rockstor.service) have something like this?

[Unit]
Description=Rockstor SGI Service
After=network.target nginx.service
Requires=nginx.service

[Service]
Type=simple
...

@Hooverdan96
Copy link
Member Author

I'm encountering this error when trying to select an update channel

I think you are correct that this should not have occurred at all, it's probably an issue me using rsync to move stuff from a git managed repository locally where it possibly picked up artefacts it wasn't supposed to. I will look at that process more closely before I run my next round of tests. Interestingly, it did use the changes from supervisord to systemd so it's very strange ...

@Hooverdan96
Copy link
Member Author

When looking at adding the socket file as part of the move to systemd, I have noticed that supervisord essentially uses the example 'dummy' user (used in various examples on the supervisord repo) for both the supervisor server as well as the socket. Looking at the related processes processes running in a current installations, it seems all of them are owned by root, so I assume that for the .socket file I should also use root as the user.

@phillxnet
Copy link
Member

@Hooverdan96 Can you expand on this:

When looking at adding the socket file as part of the move to systemd

It would be nice to have more sophistication on this front but I was just unclear on your particular aim here.

@Hooverdan96
Copy link
Member Author

Let's see whether I can make it more clear 😄. The supervisord.conf these two sections:

[unix_http_server]
file=/var/run/supervisor.sock
; https://github.com/Supervisor/supervisor/issues/717
chmod = 0700
username = dummy
password = dummy

and

[supervisorctl]
serverurl=unix:///var/run/supervisor.sock
username = dummy
password = dummy

show the usage of dummy for the user/password combo. From everything I've read, these should be real Linux users, but that user does not exist in any of the OpenSUSE images.

The above portion seems to be taken directly from the example referenced in the above conf file:

Supervisor/supervisor#717 (comment)

and then there is discussed why one would use a dummy user. When reading about other setup procedures, then the user/pwd combos are set using Linux user Ids with the justification to integrate with nginx. However, in this instruction set (which is also referenced in the supervisord.conf:

; Gunicorn
; TODO Move to using systemd socket invocation of a systemd service.
; See: "Step 7 — Creating systemd Socket and Service Files for Gunicorn"
; https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu-18-04#step-7-creating-systemd-socket-and-service-files-for-gunicorn
[program:gunicorn]
command=/opt/rockstor/.venv/bin/gunicorn --config ./conf/gunicorn.conf.py wsgi:application

https://www.digitalocean.com/community/tutorials/how-to-set-up-django-with-postgres-nginx-and-gunicorn-on-ubuntu#step-7-creating-systemd-socket-and-service-files-for-gunicorn

no users a specified for the systemd service/socket.

In the gunicorn documentation the user is included again to enable the nginx integration:

https://docs.gunicorn.org/en/latest/deploy.html

In Rockstor's nginx conf I don't see the mention of a designated user.

Hence my conclusion that all is running/connecting via root. This could be wrong of course, since I am not the expert in this area. So my aim was to get confirmation that it should be ok to have the corresponding systemd services run as root user or additional thoughts from the development braintrust.

@phillxnet
Copy link
Member

phillxnet commented Sep 26, 2025

@Hooverdan96 I think I remember something about supervisord having a Web-UI; which we have never used. I think the user/pass reference in that file relates to his UI component of supervisord.

As for user in systemd, yes that would be nice - but it way way too much to take on in this PR as it would mean a complete system wide change to all that we do - we, after all, run as root to manage a system at that level. Ideally, and we have plans, want to run what we can under say a rockstor un-privaledged user. But that is a way off as yet - and we would need a lot of changes to handle that element of things.

Take a look at the supervisor config re user/pass to check on the above. I think I remember having to add that a few years ago as it became required; where-as previously it was not and there were some concerns at that time re default Web-UI access to supervisord via the default user/pass.

[EDIT] This was the change I was referencing re dummy dummy:

  • update supervisord.conf re upstream advice re user and dummpy pass.

See: Replace Buildout with Poetry #2432 #2434

And these are the changes I made back-then: https://github.com/rockstor/rockstor-core/pull/2434/files#diff-2a61879e1360255f36ef99617d00c5f3084b48e17af87e8595c75bae6bc51e28

@phillxnet
Copy link
Member

Socket invocation of gunicorn is not likely of interest to us currently. Especially given we are, here, moving from supervisord to systemd. So given that could have enough ramifications on it's own (threading etc), we likely want to keep things as contained as we can and stick to straight sevice invokation, not adding in on-demand via socket activity. I'm not adverse to that later maybe, and have done this in another project via a systemd socket listener, but against this issue we want, ideally, a like-for-like I think.

@Hooverdan96
Copy link
Member Author

Hooverdan96 commented Sep 26, 2025

but it way way too much to take on in this PR

yes, that was not my intent, for sure. So I will stick to the "no user", i.e. it will run as root for now.

Socket invocation of gunicorn is not likely of interest to us currently.

pardon my ignorance here, but are you saying, I should not consider creating a socket service?

@FroggyFlox
Copy link
Member

I should not consider creating a socket service?

That is my thinking as well, yes. We currently directly bind/listen to localhost on port 8000 instead of using a socket so we might as well keep it that way for this PR. See the relevant bit of config:
https://github.com/rockstor/rockstor-core/blob/testing/conf%2Fgunicorn.conf.py#L4

It seems the socket we currently have set in supervisord is only used for supervisorctl for now, isn't it?

add service files for gunicorn, replication, data-collector and huey.
update data_collector.py log paths for gunicorn
deprecate supervisord = rockstor.service
add individual services formerly managed by supervisord
adjust service units and scripts to refer back to rockstor.service
update log file references to more generic in prep to moving to other asgi from wsgi
rename file to focus on task rather than technology. Align initrock.py with changed service name
Comment on lines 71 to 73
"rockstor-replication",
"rockstor-scheduling",
"rockstor-data-collector",
Copy link
Member

@phillxnet phillxnet Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Hooverdan96 We don't have these intergrated into our DB model etc. Likely we can leave them out for now. We have the others known to our backend, so adding these in here will likely throw test errors - or be inconsistent. Unless you were planning to surface these also within the Web-UI? I don't think that is a requirement in dropping supervisord however. I.e. we didn't surface it's services.

But on the other hand we may still have some supervisord commands that will need replacing with the new equivalents you've introduced. And in which case these test entries and potential accompanying tests will then test the supervisord command replacements in-coming.

Apologies, just nipping in with a quick look and saw these. I know this is all in-draft.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, just noticed you have begun work on the supervisord to systemd replacment in-code :).

elif service_name == "sftp":
# Delegate sshd's sftp subsystem status check to system.ssh.py call.
return is_sftp_running(return_boolean=False)
elif service_name in ("replication", "data-collector", "ztask-daemon"):
Copy link
Member

@phillxnet phillxnet Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take care with the ztask-daemon service re-name, I seem to remember there were DB elements still using this old name which I left as-was (with ztask to huey move) due to the depth of the changes required. But maybe it's better we get this name-change over and done with. Especially given your move to a generic name now. My concern revolves around DB names within our services model. Again apologies for just nipping in on a draft here.

@phillxnet
Copy link
Member

@Hooverdan96 & @FroggyFlox Nice to see this task finally under-way now. It will be a relief to cast off the now rather ancient approach of supervisord. But I do worry about this move uncovering some dragons re threads/workers etc. But all good really. Plus this is the perfect time given we are embarking on a new testing phase.

@Hooverdan96
Copy link
Member Author

@phillxnet no apologies needed, if I didn’t want help I would not have pushed a draft here. I am welcoming the hints and highlights of pitfalls so I don’t get too stuck in the wrong direction …
So please have at it.

@Hooverdan96
Copy link
Member Author

Not for this PR but after reading more on systemd would it make sense to look at the cgroup concept after the transition to systemd with everything? You might have mentioned that in another issue I just don’t remember

@phillxnet
Copy link
Member

@Hooverdan96 Re cgroup, I've no idea currently. But if it means all our stuff can be grouped re compute, then yes; as it could well help us defined neater SELinux rules. And agreed this would make for a nice issue-pr set - once we are done with a complete transition to systemd. Which it looks like we are heading for in this next testing phase.

@Hooverdan96
Copy link
Member Author

Closing this draft one in favor of a cleaned up (and properly based) PR #3034

@Hooverdan96 Hooverdan96 closed this Oct 5, 2025
@Hooverdan96 Hooverdan96 deleted the 2650_move_to_systemd branch October 5, 2025 22:59
Hooverdan96 added a commit to Hooverdan96/rockstor-core that referenced this pull request Oct 17, 2025
- remove supervisor references and config
- genericize scheduling service (currently huey, formerly ztaskd)
- add new corresponding services, update references in various files
- reflect change to systemd in relevant test cases
- harmonize rockstor-specific service unit nomenclature
- genericize log files
- update poetry project toml and lock file (commenting supervisor for future removal
- changes based on prior PRs rockstor#3031 & rockstor#3034
Hooverdan96 added a commit to Hooverdan96/rockstor-core that referenced this pull request Oct 17, 2025
- remove supervisor references and config
- genericize scheduling service (currently huey, formerly ztaskd)
- add new corresponding services, update references in various files
- reflect change to systemd in relevant test cases
- harmonize rockstor-specific service unit nomenclature
- genericize log files
- update poetry project toml and lock file (commenting supervisor for future removal
- changes based on prior PRs rockstor#3031 & rockstor#3034
Hooverdan96 added a commit to Hooverdan96/rockstor-core that referenced this pull request Oct 17, 2025
- remove supervisor references and config
- genericize scheduling service (currently huey, formerly ztaskd)
- add new corresponding services, update references in various files
- reflect change to systemd in relevant test cases
- harmonize rockstor-specific service unit nomenclature
- genericize log files
- update poetry project toml and lock file (commenting supervisor for future removal
- changes based on prior PRs rockstor#3031 & rockstor#3034
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants