Skip to content

[smoke-detector] 🔍 Smoke Test Investigation - Smoke Copilot: Safe-Outputs MCP Crashes Due to Malformed Config JSON #2280

@github-actions

Description

@github-actions

🔍 Smoke Test Investigation - Run #18777690706

Summary

The Smoke Copilot workflow failed because the safe-outputs MCP server crashed during startup due to malformed JSON in the GH_AW_SAFE_OUTPUTS_CONFIG environment variable. The config contained unexpected backslash escaping that the JSON parser couldn't handle, causing the MCP server to exit before the agent could start. As a result, the agent ran successfully but had no access to safe-outputs tools, and the downstream create_issue job failed when it couldn't find agent_output.json.

Failure Details

  • Run: #18777690706
  • Commit: d37e7de
  • Branch: copilot/update-copilot-agent-engine
  • Trigger: workflow_dispatch
  • Duration: 2.2 minutes
  • Failed Jobs: create_issue (6s duration)

Root Cause Analysis

Primary Error Chain

1. Safe-Outputs MCP Server Startup Failure

From /tmp/gh-aw/aw-mcp/logs/run-18777690706/session-d7c597ca-46fe-4189-906e-b639a440aeca.log:21:

Error: Failed to parse GH_AW_SAFE_OUTPUTS_CONFIG: Unexpected token '\', "\{"create_"... is not valid JSON
    at Object.<anonymous> (/tmp/gh-aw/safe-outputs/mcp-server.cjs:53:13)

Environment Variable Details:

  • Variable: GH_AW_SAFE_OUTPUTS_CONFIG
  • Length: 53 characters
  • Expected: {"create_issue":{"max":1,"min":1},"missing_tool":{}}
  • Actual Preview: "\{"create_"... (starts with backslash-escaped quote)

2. MCP Client Connection Failure

2025-10-24T10:58:49.713Z [ERROR] Failed to start MCP client for safe-outputs: McpError: MCP error -32000: Connection closed

The Copilot CLI attempted to connect to the safe-outputs MCP server, but the server had already crashed, resulting in a closed connection.

3. Agent Runs Without Safe-Outputs

The agent job succeeded (1.1m duration) but had no access to safe-outputs tools. The agent attempted to complete its task (reviewing last 5 merged PRs) but encountered authentication issues and couldn't use the create_issue tool.

4. Downstream Job Failure

The create_issue job failed with:

Error reading agent output file: ENOENT: no such file or directory, 
open '/tmp/gh-aw/safe-outputs/agent_output.json'

Investigation Findings

The Configuration Bug

The GH_AW_SAFE_OUTPUTS_CONFIG environment variable is being generated with improper JSON escaping. The variable shows:

  • Config length: 53 characters
  • Preview: "\{"create_"...

This suggests the JSON is being double-escaped or incorrectly quoted, creating invalid JSON that the Node.js JSON parser rejects.

Expected vs Actual:

Expected Actual (Malformed)
{"create_issue":{...}} "\{\"create_issue\":{...}}"

Likely Source: This appears to be related to the recent changes on the copilot/update-copilot-agent-engine branch, possibly in:

  • Workflow compilation logic that generates environment variables
  • YAML escaping when embedding JSON in environment variables
  • Changes to JSON validation/compaction functions

Context: GitHub API Authentication Issue

Additionally, the agent encountered GitHub API authentication failures:

401 Bad credentials
Permission denied and could not request permission from user

This is a separate issue (GitHub MCP read-only token configuration) but worth noting as it prevented the agent from completing its task even if safe-outputs had been available.

Failed Jobs and Errors

Job Sequence

  1. activation - succeeded (2s)
  2. agent - succeeded (1.1m) - BUT safe-outputs MCP unavailable
  3. detection - succeeded (23s)
  4. create_issue - failed (6s)
  5. ⏭️ missing_tool - skipped

Error Summary

From audit report:

  • Total Errors: 19
  • Total Warnings: 6

Key Errors:

  1. Safe-outputs config parse error (session log:21)
  2. MCP connection closed (session log:35)
  3. Agent output file not found (create_issue job)
  4. GitHub API 401 errors (agent execution)

Comparison with Related Patterns

Pattern Issue Similarity Difference
OPENCODE_NO_SAFE_OUTPUTS #2143, #2121 Agent completes but no safe-outputs created OpenCode: Agent didn't use tools. This: MCP crashed.
COPILOT_BASE64_MCP_CONFIG #2262 MCP config format issue That: Base64 vs JSON. This: Malformed JSON escaping.
COPILOT_INVALID_JSON_ESCAPED_CHAR #2270 JSON escaping issue That: Bad escape in --additional-mcp-config. This: Bad escape in env var.

This is a NEW pattern: COPILOT_SAFE_OUTPUTS_MALFORMED_CONFIG

Recommended Actions

Critical Priority ⚠️

  • Investigate workflow compilation for GH_AW_SAFE_OUTPUTS_CONFIG generation

    • Location: Likely in pkg/workflow/ safe-outputs job builder
    • Check how JSON config is embedded in YAML environment variables
    • Why: This is the source of the malformed JSON
  • Check recent changes on copilot/update-copilot-agent-engine branch

    git log --oneline -10 copilot/update-copilot-agent-engine -- pkg/workflow/
    • Look for changes to JSON escaping, environment variable generation, or config building
    • Why: Issue appeared on this branch
  • Reproduce locally

    # Extract the actual value of GH_AW_SAFE_OUTPUTS_CONFIG
    # Attempt to parse it with node -e 'JSON.parse(process.env.CONFIG)'
    # Identify exact escaping issue
    • Why: Need to see the exact malformed JSON to fix it

High Priority

  • Add validation before setting environment variable

    // Before setting GH_AW_SAFE_OUTPUTS_CONFIG:
    var test map[string]interface{}
    if err := json.Unmarshal([]byte(configJSON), &test); err != nil {
        return fmt.Errorf("invalid safe-outputs config JSON: %w", err)
    }
    • Why: Fail fast with clear error during compilation
  • Improve safe-outputs MCP server error messages

    // In mcp-server.cjs:
    console.error('[safe-outputs-mcp-server] Raw config value:', process.env.GH_AW_SAFE_OUTPUTS_CONFIG);
    console.error('[safe-outputs-mcp-server] Config length:', process.env.GH_AW_SAFE_OUTPUTS_CONFIG?.length);
    • Why: Better debugging information for future failures
  • Add pre-flight check in workflow

    - name: Validate Safe Outputs Config
      run: |
        echo "Config: $GH_AW_SAFE_OUTPUTS_CONFIG"
        echo "$GH_AW_SAFE_OUTPUTS_CONFIG" | jq . || exit 1
    • Why: Catch config issues before agent runs

Medium Priority

  • Fix GitHub API authentication issue

    • Agent received 401 errors when accessing GitHub API
    • Review GitHub MCP token permissions
    • Why: Separate issue but blocks agent functionality
  • Review all environment variable JSON embedding

    • Check other places where JSON is embedded in YAML env vars
    • Ensure consistent escaping strategy
    • Why: Prevent similar issues elsewhere
  • Add integration test for safe-outputs config

    func TestSafeOutputsConfigEnvVar(t *testing.T) {
        // Generate config as workflow would
        // Parse as JSON to verify validity
        // Test with various config combinations
    }
    • Why: Catch this in CI before deployment

Prevention Strategies

  1. JSON Validation Pipeline

    • Validate all JSON strings before embedding in YAML
    • Use Go's json.Marshal() for guaranteed valid JSON
    • Verify with json.Unmarshal() before setting env vars
  2. Safe Escaping Strategy

    // Use proper YAML string escaping for JSON in env vars:
    // Option 1: Single-line JSON (no quotes needed in value)
    // Option 2: Base64 encode (if safe-outputs MCP supports it)
    // Option 3: Write to file instead of env var
  3. Early Detection

    • Add workflow pre-flight validation step
    • Test MCP server startup before agent execution
    • Add health check for all MCP servers
  4. Better Error Messages

    • Log raw config values when parse fails
    • Include config length and preview in error messages
    • Distinguish between "no config" vs "bad config"

Technical Details

Environment Context

  • Node.js: v24.10.0
  • Copilot CLI: 0.0.349 (Commit: 3469b3e)
  • Staged Mode: true (GH_AW_SAFE_OUTPUTS_STAGED=true)
  • Platform: ubuntu-latest (GitHub Actions runner)

MCP Server Startup Sequence

1. Copilot CLI starts (10:58:49.425Z)
2. GitHub MCP server starts successfully (10:58:49.672Z)
3. Safe-outputs MCP server starts (10:58:49.679Z)
4. Safe-outputs attempts to parse GH_AW_SAFE_OUTPUTS_CONFIG (10:58:49.709Z)
5. JSON parse error → server crashes (10:58:49.709Z)
6. Connection closed (10:58:49.712Z)
7. Copilot CLI reports MCP client failure (10:58:49.713Z)
8. GitHub MCP connects successfully (10:58:49.857Z)
9. Agent proceeds without safe-outputs MCP

Expected Config Format

{
  "create_issue": {
    "max": 1,
    "min": 1
  },
  "missing_tool": {}
}

Workflow Log Evidence

From workflow-logs/agent/25_Upload Safe Outputs.txt:14:

##[warning]No files were found with the provided path: /tmp/gh-aw/safe-outputs/outputs.jsonl. 
No artifacts will be uploaded.

This confirms the agent never created safe-outputs because the MCP server was unavailable.

Historical Context

Similar Issues:

Pattern Evolution: This is the 4th MCP configuration issue in the smoke tests, indicating ongoing challenges with:

  • JSON escaping across different contexts (CLI args, env vars, files)
  • Engine-specific configuration handling
  • Workflow compilation and YAML generation

New Pattern: COPILOT_SAFE_OUTPUTS_MALFORMED_CONFIG

  • Category: Configuration Error
  • Severity: Critical
  • First seen: 2025-10-24
  • Investigation saved to: /tmp/gh-aw/cache-memory/investigations/2025-10-24-18777690706.json

Related Information


Investigation Metadata:

  • Investigator: Smoke Detector
  • Investigation Run: #18777733905
  • Pattern ID: COPILOT_SAFE_OUTPUTS_MALFORMED_CONFIG
  • Severity: Critical
  • Is Flaky: No
  • Category: Configuration Error

Labels: smoke-test, investigation, copilot, safe-outputs, configuration, critical, mcp

AI generated by Smoke Detector - Smoke Test Failure Investigator

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions