-
Notifications
You must be signed in to change notification settings - Fork 272
Description
What happens
The safe-outputs job and conclusion job unconditionally add agent-output artifact download steps via buildAgentOutputDownloadSteps(). When the agent step fails before producing output (e.g., network failure, sandbox crash, permission error), the artifact doesn't exist. The download step is marked continue-on-error: true, so the job continues — but GH_AW_AGENT_OUTPUT is set to a path that doesn't exist. Downstream scripts that read that path emit ENOENT errors in the logs.
The failure issue title is always generic ([aw] <workflow> failed) with no pre-agent diagnostic context, making it hard to distinguish "agent never started" from "agent started and failed."
One improvement was landed recently: commit 31dc15f added inference-access-specific context in handle_agent_failure.cjs:461-468. But the broader pre-agent failure case (sandbox crash, network timeout, MCP server failure, etc.) still produces generic errors.
What should happen
- The download step's
continue-on-error: trueshould be paired with a conditional check — downstream steps should skip when the artifact wasn't downloaded successfully GH_AW_AGENT_OUTPUTshould only be set when the artifact actually exists- Pre-agent failures should include the failure stage in the issue title (e.g.,
[aw] <workflow> failed (pre-agent)or[aw] <workflow> failed (sandbox setup))
Where in the code
All references are to main at 99b2107.
Unconditional download steps:
compiler_safe_outputs_job.go:53—steps = append(steps, buildAgentOutputDownloadSteps()...)notify_comment.go:57—steps = append(steps, buildAgentOutputDownloadSteps()...)(conclusion job)
Download step with silent failure:
artifacts.go:44—continue-on-error: trueon the download stepartifacts.go:37-62—buildAgentOutputDownloadSteps()sets env var regardless of download outcome
Downstream ENOENT paths:
load_agent_output.cjs:55-63— readsprocess.env.GH_AW_AGENT_OUTPUT, attemptsJSON.parse(fs.readFileSync(...)), catches ENOENTnoop.cjs:15-17— callsloadAgentOutput(), returns silently on failure (no diagnostic)notify_comment_error.cjs:79-88— callsloadAgentOutput()in the error notification path
Generic failure title:
handle_agent_failure.cjs:593-596— always uses[aw] <workflow> failedregardless of failure stage
Evidence
Source-level verification (2026-03-03):
- Confirmed
buildAgentOutputDownloadSteps()is called unconditionally at both call sites - Confirmed
continue-on-error: trueatartifacts.go:44 - Confirmed no conditional guard on downstream env var usage
Local reproduction:
- Ran
noop.cjswithGH_AW_AGENT_OUTPUTpointing at a nonexistent file - Output: ENOENT error logged, then silent return — no indication of why the file is missing
- Same behavior from
notify_comment_error.cjs
Proposed fix
- In
artifacts.go, add anid:to the download step and use a step outcome check (if: steps.<id>.outcome == 'success') on the env-setting step, soGH_AW_AGENT_OUTPUTis only set when the artifact was actually downloaded - In
handle_agent_failure.cjs, detect the failure stage (pre-agent vs. during-agent) by checking whether the agent-output artifact exists, and include this context in the failure issue title
Impact
Frequency: Every pre-agent failure. In our pipeline this occurs ~2-3 times per run batch when there are infrastructure issues (network, permissions, sandbox initialization).
Cost: Moderate — the ENOENT errors are noise in already-failing runs, but they obscure the real failure cause. The generic issue title means operators must dig into the full run log to determine whether the agent even started. Fixing this would significantly reduce triage time for pre-agent failures.