- 
                Notifications
    
You must be signed in to change notification settings  - Fork 143
 
Prototype to move to systemd from supervisord (#2650) #3031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| 
           @Hooverdan96 Nice to see progress on this front. I had assigned myself to the liked issue but no worries. Plus I was going for more use of Execstart type approach i.e. from the issue text: 
 But maybe we can handle more services, after-all there isn't a limit on them. And maybe it's better that we do separate out all our concerns. Can't take a look, have more of a think, just yet however as I do want to get our rockon-valiidator into shape-enough for regular use. That way we will have more time for core etc and the Rock-ons, now quite numerous and likely growing in that regard, can be more regimented/predictable - an hopefully easier to review. Thanks again for stepping up to this one. @FroggyFlox how do you feel about stacking up yet-more systemd services: I'm not against it in principal myself, I was just a bit concerned about over-doing it. But it does isolate our process more.  | 
    
| 
           @Hooverdan96 Re: 
 That looks like an old code-base: we don't use yum anymore!! Or at least I had thought I removed all references to it.  | 
    
| 
           Also the name "Rockstor Gunicorn Service" will age badly re: Move to ASGI using Daphne #2941  | 
    
          
 Ah yes, may be then this one could be re-named to  Re: the mulit-exec vs individual services, I was thinking since quite a few things are modular it would make sense to separate these out as well, or group more logically? e.g. the gunicorn and the ztask/huey together and the replication and data-collector separately as those two are slightly "less" core, but in terms of making other adjustments for parameters for the individual services I figured it would be easier looking forward. But, either way...  | 
    
          
 I was wondering the same thing: Cons Still thinking about it...  | 
    
          
 I was just adjusting my comment above to suggest to name at least one of the services (i.e. gunicorn) back to rockstor.service.  | 
    
          
 From an isolation of concerns, wouldn't we want to minimize the processes that need to be restarted if one of them fails, like in your example above? My thinking is that at least the wsgi/asgi as a core part of the whole thing should be isolated/ability to address in a targeted manner. If new services were to be added in the future should they then also be added to a consolidated service unit to be consistent making that piece grow over time? I am not opposed to any approach (nor am the most knowledgeable here), just wanted to clarify my thinking on this.  | 
    
| 
           We do also have the lesser-known https://github.com/rockstor/rockstor-core/blob/master/conf/rockstor-hdparm.service service - it's not central and does not feature in the dependency cascade we have regarding the others. But it is a separation of concerns, and a less referenced service where we have split-out stuff. As per discussion to-date, we should keep the key rockstor.service. We don't really want to invalidate all forum and doc entries. And the top level bootstrap should also remain top-level (mentioned for context here only). Lets see what it looks like with these changes made. We are, after-all, only adding the following here (assuming no rockstor.servcie renaming): 
 The latter should be renamed also as it has relevance only to the internal service name (maintained for simplicity), and the Huey we currently use for what ztask used to do, will in time be replaced by a Django native scheduler. So we likely want this to reference scheduling rather than the technology used to do it. I.e. we were thinking of moving our scheduling manager to Django Q2, which I have some familiarity with. But @FroggyFlox has noticed a post Django 5.2 move to bring such things into the fold of native Django. But that may take longer that we have for such a move so we may end-up using Django Q2 first. Likely much closer to that will later become a native/built-in Django capability. Where an ExecStart entry were-we to be using Django-Q2 would look something like: 
  | 
    
          
 I do think so, yes. We just need to make sure we keep together/linked what needs to be linked. Our WSGI server and Nginx may benefit from being linked as they're interdependent, although not necessarily both ways (if Nginx crashes, we can't access the UI served by WSGI anyway, I think, but if WSGI crashes, then Nginx can still serve all other connections it currently handles), but replication itself might not so it could be left alone if WSGI/Nginx needs restarting. On the other hand, we do our replication using websocket or a constant communication channel that is probably routed by Nginx (I can't remember for sure), so replication may be interrupted as well if one needs to restart Nginx. But again, the actual btrfs processes(s) involved may still run, but would we be able to "catch" them again if the replication service is restarted? 
 Good point. I guess we need to clearly make sure of what is considered "core" and what isn't. 
 Good point!  | 
    
| 
           alright, as you can see from the recent pushes, I: 
 re: the dependency of nginx and (W/A)SGI, should that be reflected in the services? I believe, today it is not, but that probably due to the split usage of supervisord & systemd. Should the SGI (aka   | 
    
          
 I think you are correct that this should not have occurred at all, it's probably an issue me using rsync to move stuff from a git managed repository locally where it possibly picked up artefacts it wasn't supposed to. I will look at that process more closely before I run my next round of tests. Interestingly, it did use the changes from supervisord to systemd so it's very strange ...  | 
    
| 
           When looking at adding the socket file as part of the move to systemd, I have noticed that supervisord essentially uses the example 'dummy' user (used in various examples on the supervisord repo) for both the supervisor server as well as the socket. Looking at the related processes processes running in a current installations, it seems all of them are owned by root, so I assume that for the   | 
    
| 
           @Hooverdan96 Can you expand on this: 
 It would be nice to have more sophistication on this front but I was just unclear on your particular aim here.  | 
    
| 
           Let's see whether I can make it more clear 😄. The  and show the usage of  The above portion seems to be taken directly from the example referenced in the above conf file: Supervisor/supervisor#717 (comment) and then there is discussed why one would use a dummy user. When reading about other setup procedures, then the user/pwd combos are set using Linux user Ids with the justification to integrate with nginx. However, in this instruction set (which is also referenced in the  no users a specified for the systemd service/socket. In the gunicorn documentation the user is included again to enable the nginx integration: https://docs.gunicorn.org/en/latest/deploy.html In Rockstor's nginx conf I don't see the mention of a designated user. Hence my conclusion that all is running/connecting via   | 
    
| 
           @Hooverdan96 I think I remember something about supervisord having a Web-UI; which we have never used. I think the user/pass reference in that file relates to his UI component of supervisord. As for user in systemd, yes that would be nice - but it way way too much to take on in this PR as it would mean a complete system wide change to all that we do - we, after all, run as root to manage a system at that level. Ideally, and we have plans, want to run what we can under say a rockstor un-privaledged user. But that is a way off as yet - and we would need a lot of changes to handle that element of things. Take a look at the supervisor config re user/pass to check on the above. I think I remember having to add that a few years ago as it became required; where-as previously it was not and there were some concerns at that time re default Web-UI access to supervisord via the default user/pass. [EDIT] This was the change I was referencing re dummy dummy: 
 See: Replace Buildout with Poetry #2432 #2434 And these are the changes I made back-then: https://github.com/rockstor/rockstor-core/pull/2434/files#diff-2a61879e1360255f36ef99617d00c5f3084b48e17af87e8595c75bae6bc51e28  | 
    
| 
           Socket invocation of gunicorn is not likely of interest to us currently. Especially given we are, here, moving from supervisord to systemd. So given that could have enough ramifications on it's own (threading etc), we likely want to keep things as contained as we can and stick to straight sevice invokation, not adding in on-demand via socket activity. I'm not adverse to that later maybe, and have done this in another project via a systemd socket listener, but against this issue we want, ideally, a like-for-like I think.  | 
    
          
 yes, that was not my intent, for sure. So I will stick to the "no user", i.e. it will run as root for now. 
 pardon my ignorance here, but are you saying, I should not consider creating a socket service?  | 
    
          
 That is my thinking as well, yes. We currently directly bind/listen to localhost on port 8000 instead of using a socket so we might as well keep it that way for this PR. See the relevant bit of config: It seems the socket we currently have set in supervisord is only used for supervisorctl for now, isn't it?  | 
    
add service files for gunicorn, replication, data-collector and huey. update data_collector.py log paths for gunicorn
deprecate supervisord = rockstor.service add individual services formerly managed by supervisord
adjust service units and scripts to refer back to rockstor.service update log file references to more generic in prep to moving to other asgi from wsgi
rename file to focus on task rather than technology. Align initrock.py with changed service name
d5c433e    to
    ec52e95      
    Compare
  
    updating other service files add socket file to initrock
| "rockstor-replication", | ||
| "rockstor-scheduling", | ||
| "rockstor-data-collector", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Hooverdan96 We don't have these intergrated into our DB model etc. Likely we can leave them out for now. We have the others known to our backend, so adding these in here will likely throw test errors - or be inconsistent. Unless you were planning to surface these also within the Web-UI? I don't think that is a requirement in dropping supervisord however. I.e. we didn't surface it's services.
But on the other hand we may still have some supervisord commands that will need replacing with the new equivalents you've introduced. And in which case these test entries and potential accompanying tests will then test the supervisord command replacements in-coming.
Apologies, just nipping in with a quick look and saw these. I know this is all in-draft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, just noticed you have begun work on the supervisord to systemd replacment in-code :).
| elif service_name == "sftp": | ||
| # Delegate sshd's sftp subsystem status check to system.ssh.py call. | ||
| return is_sftp_running(return_boolean=False) | ||
| elif service_name in ("replication", "data-collector", "ztask-daemon"): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take care with the ztask-daemon service re-name, I seem to remember there were DB elements still using this old name which I left as-was (with ztask to huey move) due to the depth of the changes required. But maybe it's better we get this name-change over and done with. Especially given your move to a generic name now. My concern revolves around DB names within our services model. Again apologies for just nipping in on a draft here.
| 
           @Hooverdan96 & @FroggyFlox Nice to see this task finally under-way now. It will be a relief to cast off the now rather ancient approach of supervisord. But I do worry about this move uncovering some dragons re threads/workers etc. But all good really. Plus this is the perfect time given we are embarking on a new testing phase.  | 
    
| 
           @phillxnet no apologies needed, if I didn’t want help I would not have pushed a draft here. I am welcoming the hints and highlights of pitfalls so I don’t get too stuck in the wrong direction …  | 
    
| 
           Not for this PR but after reading more on systemd would it make sense to look at the cgroup concept after the transition to systemd with everything? You might have mentioned that in another issue I just don’t remember  | 
    
| 
           @Hooverdan96 Re cgroup, I've no idea currently. But if it means all our stuff can be grouped re compute, then yes; as it could well help us defined neater SELinux rules. And agreed this would make for a nice issue-pr set - once we are done with a complete transition to systemd. Which it looks like we are heading for in this next testing phase.  | 
    
| 
           Closing this draft one in favor of a cleaned up (and properly based) PR #3034  | 
    
- remove supervisor references and config - genericize scheduling service (currently huey, formerly ztaskd) - add new corresponding services, update references in various files - reflect change to systemd in relevant test cases - harmonize rockstor-specific service unit nomenclature - genericize log files - update poetry project toml and lock file (commenting supervisor for future removal - changes based on prior PRs rockstor#3031 & rockstor#3034
- remove supervisor references and config - genericize scheduling service (currently huey, formerly ztaskd) - add new corresponding services, update references in various files - reflect change to systemd in relevant test cases - harmonize rockstor-specific service unit nomenclature - genericize log files - update poetry project toml and lock file (commenting supervisor for future removal - changes based on prior PRs rockstor#3031 & rockstor#3034
- remove supervisor references and config - genericize scheduling service (currently huey, formerly ztaskd) - add new corresponding services, update references in various files - reflect change to systemd in relevant test cases - harmonize rockstor-specific service unit nomenclature - genericize log files - update poetry project toml and lock file (commenting supervisor for future removal - changes based on prior PRs rockstor#3031 & rockstor#3034
Not quite fixing #2650
This is a draft attempt to move the remaining supervisord services over to systemd.
supervisordprefix in the new services)The installation on an existing VM seemed to work, I am getting the initial set up screen and can set up the instance.
However, and I am not sure that's related, I'm encountering this error when trying to select an update channel (based on the popup):
I ran the tests:
I am making this a draft PR, while gathering feedback on whether this is the right approach, or @phillxnet already has it completely in the works, or ...
Trying to do this with training wheels 😄 so any pointers would be appreciated.