-
Notifications
You must be signed in to change notification settings - Fork 24
Description
🔍 Smoke Test Investigation - Run #18777690706
Summary
The Smoke Copilot workflow failed because the safe-outputs MCP server crashed during startup due to malformed JSON in the GH_AW_SAFE_OUTPUTS_CONFIG environment variable. The config contained unexpected backslash escaping that the JSON parser couldn't handle, causing the MCP server to exit before the agent could start. As a result, the agent ran successfully but had no access to safe-outputs tools, and the downstream create_issue job failed when it couldn't find agent_output.json.
Failure Details
- Run: #18777690706
- Commit: d37e7de
- Branch: copilot/update-copilot-agent-engine
- Trigger: workflow_dispatch
- Duration: 2.2 minutes
- Failed Jobs: create_issue (6s duration)
Root Cause Analysis
Primary Error Chain
1. Safe-Outputs MCP Server Startup Failure
From /tmp/gh-aw/aw-mcp/logs/run-18777690706/session-d7c597ca-46fe-4189-906e-b639a440aeca.log:21:
Error: Failed to parse GH_AW_SAFE_OUTPUTS_CONFIG: Unexpected token '\', "\{"create_"... is not valid JSON
at Object.<anonymous> (/tmp/gh-aw/safe-outputs/mcp-server.cjs:53:13)
Environment Variable Details:
- Variable:
GH_AW_SAFE_OUTPUTS_CONFIG - Length: 53 characters
- Expected:
{"create_issue":{"max":1,"min":1},"missing_tool":{}} - Actual Preview:
"\{"create_"...(starts with backslash-escaped quote)
2. MCP Client Connection Failure
2025-10-24T10:58:49.713Z [ERROR] Failed to start MCP client for safe-outputs: McpError: MCP error -32000: Connection closed
The Copilot CLI attempted to connect to the safe-outputs MCP server, but the server had already crashed, resulting in a closed connection.
3. Agent Runs Without Safe-Outputs
The agent job succeeded (1.1m duration) but had no access to safe-outputs tools. The agent attempted to complete its task (reviewing last 5 merged PRs) but encountered authentication issues and couldn't use the create_issue tool.
4. Downstream Job Failure
The create_issue job failed with:
Error reading agent output file: ENOENT: no such file or directory,
open '/tmp/gh-aw/safe-outputs/agent_output.json'
Investigation Findings
The Configuration Bug
The GH_AW_SAFE_OUTPUTS_CONFIG environment variable is being generated with improper JSON escaping. The variable shows:
- Config length: 53 characters
- Preview:
"\{"create_"...
This suggests the JSON is being double-escaped or incorrectly quoted, creating invalid JSON that the Node.js JSON parser rejects.
Expected vs Actual:
| Expected | Actual (Malformed) |
|---|---|
{"create_issue":{...}} |
"\{\"create_issue\":{...}}" |
Likely Source: This appears to be related to the recent changes on the copilot/update-copilot-agent-engine branch, possibly in:
- Workflow compilation logic that generates environment variables
- YAML escaping when embedding JSON in environment variables
- Changes to JSON validation/compaction functions
Context: GitHub API Authentication Issue
Additionally, the agent encountered GitHub API authentication failures:
401 Bad credentials
Permission denied and could not request permission from user
This is a separate issue (GitHub MCP read-only token configuration) but worth noting as it prevented the agent from completing its task even if safe-outputs had been available.
Failed Jobs and Errors
Job Sequence
- ✅ activation - succeeded (2s)
- ✅ agent - succeeded (1.1m) - BUT safe-outputs MCP unavailable
- ✅ detection - succeeded (23s)
- ❌ create_issue - failed (6s)
- ⏭️ missing_tool - skipped
Error Summary
From audit report:
- Total Errors: 19
- Total Warnings: 6
Key Errors:
- Safe-outputs config parse error (session log:21)
- MCP connection closed (session log:35)
- Agent output file not found (create_issue job)
- GitHub API 401 errors (agent execution)
Comparison with Related Patterns
| Pattern | Issue | Similarity | Difference |
|---|---|---|---|
| OPENCODE_NO_SAFE_OUTPUTS | #2143, #2121 | Agent completes but no safe-outputs created | OpenCode: Agent didn't use tools. This: MCP crashed. |
| COPILOT_BASE64_MCP_CONFIG | #2262 | MCP config format issue | That: Base64 vs JSON. This: Malformed JSON escaping. |
| COPILOT_INVALID_JSON_ESCAPED_CHAR | #2270 | JSON escaping issue | That: Bad escape in --additional-mcp-config. This: Bad escape in env var. |
This is a NEW pattern: COPILOT_SAFE_OUTPUTS_MALFORMED_CONFIG
Recommended Actions
Critical Priority ⚠️
-
Investigate workflow compilation for GH_AW_SAFE_OUTPUTS_CONFIG generation
- Location: Likely in
pkg/workflow/safe-outputs job builder - Check how JSON config is embedded in YAML environment variables
- Why: This is the source of the malformed JSON
- Location: Likely in
-
Check recent changes on copilot/update-copilot-agent-engine branch
git log --oneline -10 copilot/update-copilot-agent-engine -- pkg/workflow/
- Look for changes to JSON escaping, environment variable generation, or config building
- Why: Issue appeared on this branch
-
Reproduce locally
# Extract the actual value of GH_AW_SAFE_OUTPUTS_CONFIG # Attempt to parse it with node -e 'JSON.parse(process.env.CONFIG)' # Identify exact escaping issue
- Why: Need to see the exact malformed JSON to fix it
High Priority
-
Add validation before setting environment variable
// Before setting GH_AW_SAFE_OUTPUTS_CONFIG: var test map[string]interface{} if err := json.Unmarshal([]byte(configJSON), &test); err != nil { return fmt.Errorf("invalid safe-outputs config JSON: %w", err) }
- Why: Fail fast with clear error during compilation
-
Improve safe-outputs MCP server error messages
// In mcp-server.cjs: console.error('[safe-outputs-mcp-server] Raw config value:', process.env.GH_AW_SAFE_OUTPUTS_CONFIG); console.error('[safe-outputs-mcp-server] Config length:', process.env.GH_AW_SAFE_OUTPUTS_CONFIG?.length);
- Why: Better debugging information for future failures
-
Add pre-flight check in workflow
- name: Validate Safe Outputs Config run: | echo "Config: $GH_AW_SAFE_OUTPUTS_CONFIG" echo "$GH_AW_SAFE_OUTPUTS_CONFIG" | jq . || exit 1
- Why: Catch config issues before agent runs
Medium Priority
-
Fix GitHub API authentication issue
- Agent received 401 errors when accessing GitHub API
- Review GitHub MCP token permissions
- Why: Separate issue but blocks agent functionality
-
Review all environment variable JSON embedding
- Check other places where JSON is embedded in YAML env vars
- Ensure consistent escaping strategy
- Why: Prevent similar issues elsewhere
-
Add integration test for safe-outputs config
func TestSafeOutputsConfigEnvVar(t *testing.T) { // Generate config as workflow would // Parse as JSON to verify validity // Test with various config combinations }
- Why: Catch this in CI before deployment
Prevention Strategies
-
JSON Validation Pipeline
- Validate all JSON strings before embedding in YAML
- Use Go's
json.Marshal()for guaranteed valid JSON - Verify with
json.Unmarshal()before setting env vars
-
Safe Escaping Strategy
// Use proper YAML string escaping for JSON in env vars: // Option 1: Single-line JSON (no quotes needed in value) // Option 2: Base64 encode (if safe-outputs MCP supports it) // Option 3: Write to file instead of env var
-
Early Detection
- Add workflow pre-flight validation step
- Test MCP server startup before agent execution
- Add health check for all MCP servers
-
Better Error Messages
- Log raw config values when parse fails
- Include config length and preview in error messages
- Distinguish between "no config" vs "bad config"
Technical Details
Environment Context
- Node.js: v24.10.0
- Copilot CLI: 0.0.349 (Commit: 3469b3e)
- Staged Mode: true (
GH_AW_SAFE_OUTPUTS_STAGED=true) - Platform: ubuntu-latest (GitHub Actions runner)
MCP Server Startup Sequence
1. Copilot CLI starts (10:58:49.425Z)
2. GitHub MCP server starts successfully (10:58:49.672Z)
3. Safe-outputs MCP server starts (10:58:49.679Z)
4. Safe-outputs attempts to parse GH_AW_SAFE_OUTPUTS_CONFIG (10:58:49.709Z)
5. JSON parse error → server crashes (10:58:49.709Z)
6. Connection closed (10:58:49.712Z)
7. Copilot CLI reports MCP client failure (10:58:49.713Z)
8. GitHub MCP connects successfully (10:58:49.857Z)
9. Agent proceeds without safe-outputs MCP
Expected Config Format
{
"create_issue": {
"max": 1,
"min": 1
},
"missing_tool": {}
}Workflow Log Evidence
From workflow-logs/agent/25_Upload Safe Outputs.txt:14:
##[warning]No files were found with the provided path: /tmp/gh-aw/safe-outputs/outputs.jsonl.
No artifacts will be uploaded.
This confirms the agent never created safe-outputs because the MCP server was unavailable.
Historical Context
Similar Issues:
- [smoke-detector] 🔍 Smoke Test Investigation - Smoke OpenCode Run #18722224746: Agent Does Not Use Safe-Outputs MCP Tools #2143 (OpenCode): Agent didn't use safe-outputs tools (closed)
- [smoke-outpost] 🔍 Smoke Test Investigation - Smoke OpenCode: Missing agent_output.json File #2121 (OpenCode): Missing agent_output.json (closed)
- [smoke-detector] 🔍 Smoke Test Investigation - Smoke Copilot: Bad Escaped Character in MCP Config JSON #2270 (Copilot): Bad escaped character in MCP config (open)
- [smoke-detector] 🔍 Smoke Test Investigation - Smoke Claude: Invalid MCP Configuration Argument #2267 (Claude): Invalid MCP configuration argument (open)
Pattern Evolution: This is the 4th MCP configuration issue in the smoke tests, indicating ongoing challenges with:
- JSON escaping across different contexts (CLI args, env vars, files)
- Engine-specific configuration handling
- Workflow compilation and YAML generation
New Pattern: COPILOT_SAFE_OUTPUTS_MALFORMED_CONFIG
- Category: Configuration Error
- Severity: Critical
- First seen: 2025-10-24
- Investigation saved to:
/tmp/gh-aw/cache-memory/investigations/2025-10-24-18777690706.json
Related Information
- Branch: copilot/update-copilot-agent-engine (active development)
- Workflow Source:
.github/workflows/smoke-copilot.md - MCP Server:
/tmp/gh-aw/safe-outputs/mcp-server.cjs - Related Issues: [smoke-detector] 🔍 Smoke Test Investigation - Smoke Copilot: Bad Escaped Character in MCP Config JSON #2270, [smoke-detector] 🔍 Smoke Test Investigation - Smoke Claude: Invalid MCP Configuration Argument #2267, [smoke-detector] 🔍 Smoke Test Investigation - Smoke Copilot: Base64 MCP Config vs JSON Expected #2262, [smoke-detector] 🔍 Smoke Test Investigation - Smoke OpenCode Run #18722224746: Agent Does Not Use Safe-Outputs MCP Tools #2143, [smoke-outpost] 🔍 Smoke Test Investigation - Smoke OpenCode: Missing agent_output.json File #2121
Investigation Metadata:
- Investigator: Smoke Detector
- Investigation Run: #18777733905
- Pattern ID: COPILOT_SAFE_OUTPUTS_MALFORMED_CONFIG
- Severity: Critical
- Is Flaky: No
- Category: Configuration Error
Labels: smoke-test, investigation, copilot, safe-outputs, configuration, critical, mcp
AI generated by Smoke Detector - Smoke Test Failure Investigator