Skip to content

Conversation

@fanf
Copy link
Member

@fanf fanf commented Sep 22, 2025

https://issues.rudder.io/issues/27574

This commit handle one annoyance and a semantic change for campaign hooks:

  • add clearer logs in case of warning or error with the corresponding log level and faulty script
  • we always go to post-hooks, even if pre-hooks failed, so that we let user a chance to correct state changed in pre-hooks

Clearer logs

Now, when a campaign hook fails or warn, it has a corresponding line in webapp logs:

2025-10-03 22:40:58+0200 ERROR hooks.campaigns.1bb22fc5-f41c-46fb-a44b-54b68b8254f2.pre-hooks - Campaign 'test campaign' pre-hooks returned code '24' in 'test-hook.sh'

This needed some adaptation:

  • also keep hook names in the history from RunHooks.asyncRunHistory
  • post process campaign hook history execution to log to correct level

Workflow change: always go to post-hook

The idea here is that is we do something in pre-hooks, we may want to undo it in post-hooks.
For that, we must always go to post-hooks - but we need to more info:

  • one to let the user know that we came to that post-hooks because something failed and the campaign wasn't exec
  • one to let the workflow know that after these post-hooks, the final campaign state must be "failure" even if all post hooks succeded.

And actually, it's the same bit of information: we just need to set a next-state value in post-hook before we go to them:

  • if it's from a failure in pre-hook, the next-state is failure ,
  • if it's from the running state, the next-state is finished.
    And we let the user know in the hook with a new environnment variable: CAMPAIGN_NEXT_STATE.

From a code point of view, I had to:

  • change CampaignEventState.PostHooks to store nextState: CampaignEventStateType
  • change the workflow to direct to post-hooks in place of failure after pre hooks, seting nextState
  • change serialisation to save the nextState in base for post-hooks
    And add a unit test to demonstrate the behavior

@fanf fanf force-pushed the bug_27574/post_hooks_for_campaigns_should_be_executed_even_even_if_pre_hooks_are_in_failure branch 2 times, most recently from 823b2ff to 9806fb6 Compare October 3, 2025 20:55
@fanf
Copy link
Member Author

fanf commented Oct 3, 2025

PR updated with a new commit

object HookReturnCode {

def isError(code: Int): Boolean = code < 0 || code > 0 && code < 32
def isWarning(code: Int): Boolean = code >= 32 && code < 64
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go well

PureHooksLogger.For(logIdentifier).trace(s" -> stderr : ${result.stderr}")
}
_ <- ZIO.when(result.code >= 32 && result.code <= 64) { // warning
_ <- ZIO.when(HookReturnCode.isWarning(result.code)) { // warning
Copy link
Member Author

@fanf fanf Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change that will certainly lead to no problem at all, it's really not the kind of change that lead to error, at all.
Please double check definition of isWarning.

} else if (HookReturnCode.isError(result.code)) { // error
ScriptError(path, result.code, result.stdout, result.stderr, msg)
} else if (result.code >= 32 && result.code <= 64) { // warning
} else if (HookReturnCode.isWarning(result.code)) { // warning
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change that will certainly lead to no problem at all, it's really not the kind of change that lead to error, at all.
Please double check definition of isError and isWarning.

case HookExecutionHistory.DoNotKeep => (c, Nil)
// Noop are not real hook execution, just filter them out from the result
case HookExecutionHistory.Keep => (c, (x :: historyList).filterNot(_ == Noop))
val runAllSeq: IOResult[(HookReturnCode, List[(String, HookReturnCode)])] = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the String which is the command name in the history of returned code

history match {
case HookExecutionHistory.DoNotKeep => (c, Nil)
// Noop are not real hook execution, just filter them out from the result
case HookExecutionHistory.Keep => (c, ((nextHookName, c) :: historyList).filterNot(_._2 == Noop))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now keep the last code also as head of the history, because we need the name.
So we could just have no first returned value and use a non empty list with some default value and check the head, but I think we will want to have different global value and last one at some point - perhaps in case of warning for example.

Scheduled,
PreHooks(HookResults(Nil)),
PreHooks(HookResults(HookResult(e.id.value, 1, "pre-hooks", "", "") :: Nil)),
PostHooks(FailureType, HookResults(Nil)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look, directly from pre-hooks to post-hooks.

case (None, FailureType) => Failure("pre-hooks were in error and post-hooks completed successfully", "")
case (None, s) => getDefault(s)
case (Some(f1), FailureType) => Failure("pre-hooks and post-hooks were in error. Last error:", f1.cause)
case (Some(f1), _) => f1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic is just here to put something relevant in the failure message. We don't keep the whole error, so we don't know what hook errored at that point.
In the futur, we will just have the whole history in the campaign event UI and the user will be able to see what broke.

timeHooks1 <- currentTimeMillis
_ <-
PureHooksLogger.For(loggerName).trace(s"Campaign pre-hooks ran in ${timeHooks1 - timeHooks0} ms")
PureHooksLogger.For(loggerName).trace(s"Campaign ${hookType.entryName} ran in ${timeHooks1 - timeHooks0} ms")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all that foreach just for better logs :)

("CAMPAIGN_EVENT_ID", e.id.value),
("CAMPAIGN_EVENT_NAME", e.name)
)
.optAdd("CAMPAIGN_NEXT_STATE", nextState.map(_.entryName)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new parameter, only defined in post-hooks

@fanf fanf marked this pull request as ready for review October 3, 2025 21:15
}

case PostHooksType =>
val state = event.state.asInstanceOf[PostHooks]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this not be avoided, by matching on the event.state: CampaignEventState instead ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but most 8f the time it's far too powerful for the workflow engine, those state's coverture becomes very hard to check.
But I agree that having that is a smell in the encoding that exposes a mismatch. I chose ro keep it because it's very obviously Not Good and claims it needs to be refactored and I didn't want to massively extends the risks with a big refactoring so late.

val nextState = {
val optError = res.results.collectFirst { case r if isError(r) => Failure("post-hooks were in error", r.stderr) }
(optError, state.nextState) match {
case (None, FailureType) => Failure("pre-hooks were in error and post-hooks completed successfully", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the FailureType not mean that the post-hook completed with a "failure", instead of "successfully" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no optError is None then it mean that post-hooks ran successfully, but we have nextState in Failure type, meaning that there was an error in pre-hooks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case you talk about is treated two lines below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Normation-Quality-Assistant
Copy link
Contributor

This PR is not mergeable to upper versions.
Since it is "Ready for merge" you must merge it by yourself using the following command:
rudder-dev merge https://github.com/Normation/rudder/pull/6611
-- Your faithful QA
Kant merge: "Live your life as though your every act were to become a universal law."
(https://ci.normation.com/jenkins/job/merge-accepted-pr/108127/console)

@fanf
Copy link
Member Author

fanf commented Oct 7, 2025

OK, squash merging this PR

@fanf fanf force-pushed the bug_27574/post_hooks_for_campaigns_should_be_executed_even_even_if_pre_hooks_are_in_failure branch from 23235cd to 4b4dc4f Compare October 7, 2025 19:44
@fanf fanf merged commit 4b4dc4f into Normation:branches/rudder/9.0 Oct 7, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants