Skip to content

[refactor] Semantic Function Clustering Analysis: Refactoring Opportunities #2853

@github-actions

Description

@github-actions

🔧 Semantic Function Clustering Analysis

Analysis of repository: githubnext/gh-aw

Analysis performed using Serena semantic code analysis tools on the Go codebase to identify refactoring opportunities through function clustering and duplicate detection.

Executive Summary

Analyzed 173 non-test Go source files across 7 packages, identifying clear patterns of:

  • Near-duplicate functions with >85% similarity requiring consolidation
  • Scattered validation functions across 10+ files instead of centralized validation
  • Parse config functions spread across 17+ files following inconsistent patterns
  • Render MCP config functions duplicated across 3 engine implementations
  • Helper functions scattered when they could be centralized

The codebase is generally well-organized, especially the create_* pattern and engine structure. However, there are specific high-impact opportunities for improving maintainability through consolidation.

Full Report Details

Codebase Structure

By Package

Package Files Primary Purpose
pkg/workflow 104 Workflow execution, compilation, and step generation
pkg/cli 58 CLI commands and utilities
pkg/parser 6 Parsing utilities (YAML, frontmatter, JSON paths)
pkg/console 3 Console output formatting
pkg/constants 1 Application constants
pkg/logger 1 Logging utilities
pkg/timeutil 1 Time formatting utilities

Total: 174 Go files analyzed


Identified Issues

1. Near-Duplicate String Sanitization Functions

Issue: Two very similar string sanitization functions with overlapping logic

Functions

Function 1: SanitizeWorkflowName in pkg/workflow/strings.go:46

func SanitizeWorkflowName(name string) string {
    name = strings.ToLower(name)
    name = strings.ReplaceAll(name, ":", "-")
    name = strings.ReplaceAll(name, "\\", "-")
    name = strings.ReplaceAll(name, "/", "-")
    name = strings.ReplaceAll(name, " ", "-")
    // Replace non-alphanumeric (except . _ -) with "-"
    name = sanitizeNamePattern.ReplaceAllString(name, "-")
    // Consolidate multiple hyphens
    name = multipleHyphens.ReplaceAllString(name, "-")
    return name
}

Function 2: SanitizeIdentifier in pkg/workflow/workflow_name.go:18

func SanitizeIdentifier(name string) string {
    identifier := strings.ToLower(name)
    identifier = strings.ReplaceAll(identifier, " ", "-")
    identifier = strings.ReplaceAll(identifier, "_", "-")
    // Remove non-alphanumeric or hyphens
    identifier = identifierNonAlphanumeric.ReplaceAllString(identifier, "")
    // Consolidate multiple hyphens
    identifier = identifierMultipleHyphens.ReplaceAllString(identifier, "-")
    // Remove leading/trailing hyphens
    identifier = strings.Trim(identifier, "-")
    if identifier == "" {
        return "github-agentic-workflow"
    }
    return identifier
}

Similarity: ~85% similar logic with different edge case handling

Current Usage:

  • SanitizeWorkflowName: Used in pkg/workflow/copilot_engine.go for squid log naming
  • SanitizeIdentifier: Used in pkg/workflow/codex_engine.go for user agent generation

Recommendation:

  • Create unified SanitizeName function with options parameter for different behaviors
  • Alternatively, make SanitizeIdentifier call SanitizeWorkflowName with additional trimming
  • Place in centralized location: pkg/workflow/strings.go

Estimated Impact: Reduced duplication, single source of truth for name sanitization


2. Scattered Validation Functions

Issue: Validation functions distributed across 10+ files instead of centralized

Validation Functions by File

File Validation Functions
pkg/workflow/validation.go 9 validation functions (schema, expressions, containers, features)
pkg/workflow/strict_mode.go 5 validation functions (strict mode checks)
pkg/workflow/pip.go 4 validation functions (Python packages)
pkg/workflow/npm.go 1 validation function (NPX packages)
pkg/workflow/docker.go 1 validation function (Docker images)
pkg/workflow/permissions_validator.go 1 validation function (permissions)
pkg/workflow/mcp-config.go 3 validation functions (MCP configs)
pkg/workflow/expression_safety.go 2 validation functions (expression safety)
pkg/workflow/imports.go 1 validation function (permissions merging)
pkg/workflow/jobs.go 1 validation function (job dependencies)

Examples of Outliers:

  1. Validation in Pip.go:

    • pkg/workflow/pip.go:42: func (*Compiler).validatePythonPackagesWithPip
    • pkg/workflow/pip.go:106: func (*Compiler).validatePipPackages
    • pkg/workflow/pip.go:123: func (*Compiler).validateUvPackages
  2. Validation in NPM.go:

    • pkg/workflow/npm.go:19: func (*Compiler).validateNpxPackages
  3. Validation in Docker.go:

    • pkg/workflow/docker.go:42: func validateDockerImage

Analysis:

  • Package-specific validation functions (pip, npm, docker) are in their domain files, which is reasonable for encapsulation
  • However, this makes it harder to find all validation logic
  • validation.go exists but doesn't consolidate all validation

Recommendation:

  • Option A (Moderate): Keep domain-specific validations in their files but add cross-references in validation.go
  • Option B (Aggressive): Move all validation functions to pkg/workflow/validation/ subpackage organized by domain
    • validation/packages.go - pip, npm, go package validation
    • validation/containers.go - docker validation
    • validation/permissions.go - permission validation
    • validation/expressions.go - expression validation

Estimated Impact: Improved discoverability, easier testing of validation logic


3. Duplicated MCP Config Rendering Functions

Issue: MCP configuration rendering logic duplicated across multiple engine implementations

Render MCP Config Functions

In mcp-config.go (shared):

  • renderPlaywrightMCPConfig (line 23)
  • renderPlaywrightMCPConfigWithOptions (line 27)
  • renderSafeOutputsMCPConfig (line 64)
  • renderSafeOutputsMCPConfigWithOptions (line 68)
  • renderAgenticWorkflowsMCPConfig (line 98)
  • renderAgenticWorkflowsMCPConfigWithOptions (line 102)

In claude_mcp.go (Claude-specific):

  • (*ClaudeEngine).RenderMCPConfig
  • (*ClaudeEngine).renderGitHubClaudeMCPConfig
  • (*ClaudeEngine).renderPlaywrightMCPConfig
  • (*ClaudeEngine).renderClaudeMCPConfig
  • (*ClaudeEngine).renderCacheMemoryMCPConfig
  • (*ClaudeEngine).renderSafeOutputsMCPConfig
  • (*ClaudeEngine).renderAgenticWorkflowsMCPConfig

In copilot_engine.go (Copilot-specific):

  • (*CopilotEngine).RenderMCPConfig
  • (*CopilotEngine).renderGitHubCopilotMCPConfig
  • (*CopilotEngine).renderPlaywrightCopilotMCPConfig
  • (*CopilotEngine).renderSafeOutputsCopilotMCPConfig
  • (*CopilotEngine).renderAgenticWorkflowsCopilotMCPConfig
  • (*CopilotEngine).renderCopilotMCPConfig

In engine_helpers.go:

  • RenderGitHubMCPDockerConfig
  • RenderGitHubMCPRemoteConfig
  • RenderJSONMCPConfig

Analysis:

  • Similar rendering logic for Playwright, SafeOutputs, AgenticWorkflows repeated
  • Some consolidation exists in mcp-config.go but not fully leveraged
  • Each engine needs slight variations but core logic is duplicated

Recommendation:

  • Enhance mcp-config.go to have a more flexible MCPConfigRenderer with strategy pattern
  • Create engine-specific adapters that call shared rendering functions
  • Consolidate TOML vs YAML rendering logic

Example Refactor:

// In mcp-config.go
type MCPConfigFormat int
const (
    MCPFormatClaudeYAML MCPConfigFormat = iota
    MCPFormatCopilotYAML
    MCPFormatTOML
)

func RenderMCPConfigUnified(
    yaml *strings.Builder,
    toolName string,
    config map[string]any,
    format MCPConfigFormat,
    isLast bool,
) error {
    // Unified rendering logic with format-specific branches
}

Estimated Impact: Significant reduction in duplicated rendering code, easier to maintain MCP configs


4. Scattered Parse Config Functions

Issue: Parse configuration functions spread across 17+ files

Parse Config Distribution

File Parse Config Functions
pkg/workflow/tools_types.go 8 (parseGitHubTool, parseBashTool, etc.) ✓ Good
pkg/workflow/config.go 3 (parseLabelsFromConfig, etc.)
pkg/workflow/compiler.go 3 (parseOnSection, parseBaseSafeOutputConfig, etc.)
pkg/workflow/create_*.go 1 each in 6 files (parseIssuesConfig, parsePullRequestsConfig, etc.)
pkg/workflow/add_comment.go 1 (parseCommentsConfig)
pkg/workflow/threat_detection.go 1 (parseThreatDetectionConfig)
pkg/workflow/publish_assets.go 1 (parseUploadAssetConfig)
...and 8 more files 1 each

Analysis:

  • tools_types.go is well-organized with 8 parse functions for tools ✓
  • Each create_*.go file has its own parse config function - this is acceptable as it keeps related code together
  • Generic parsing helpers in config.go could be more discoverable

Recommendation:

  • Keep current organization for create_*.go files (one parse function per file is fine)
  • Consider adding a // Parse Config Functions comment section in config.go to improve discoverability
  • Ensure consistent naming: all should be parse*Config (most already follow this)

Estimated Impact: Low priority - current organization is mostly acceptable


5. Helper Functions Scattered Across Files

Issue: Helper/utility functions distributed rather than centralized

Helper Function Distribution

String/Escape Functions:

  • pkg/workflow/shell.go: shellEscapeArg, shellEscapeCommandString
  • pkg/workflow/redact_secrets.go: escapeSingleQuote
  • pkg/workflow/threat_detection.go: formatStringAsJavaScriptLiteral

Cleanup Functions:

  • pkg/cli/trial_command.go: cleanupTrialRepository, cleanupTrialSecrets, sanitizeRepoSlugForFilename
  • pkg/cli/remove_command.go: cleanupOrphanedIncludes, cleanupAllIncludes
  • pkg/workflow/copilot_engine.go: GetCleanupStep, generateAWFCleanupStep

Format Functions (13 in pkg/console/console.go - ✓ Good centralization):

  • All Format* functions properly centralized

Extract Package Functions:

  • pkg/workflow/pip.go: extractPipPackages, extractPipFromCommands
  • pkg/workflow/npm.go: extractNpxPackages, extractNpxFromCommands
  • pkg/workflow/dependabot.go: extractGoPackages

Analysis:

  • Escape functions serve different contexts but could share common utilities
  • Cleanup functions are command-specific, current organization is reasonable
  • Extract package functions follow good pattern using collectPackagesFromWorkflow

Recommendation:

  • Low Priority: Create pkg/workflow/escaping.go for consolidated escape utilities
  • Consider: Add a pkg/cli/cleanup_utils.go for shared cleanup logic if duplication increases

Estimated Impact: Low - current scatter is mostly justified by context


Detailed Function Clusters

Cluster 1: Creation Functions ✓

Pattern: create_* files with corresponding parse and build functions

Files:

  • pkg/workflow/create_agent_task.go
  • pkg/workflow/create_code_scanning_alert.go
  • pkg/workflow/create_discussion.go
  • pkg/workflow/create_issue.go
  • pkg/workflow/create_pr_review_comment.go
  • pkg/workflow/create_pull_request.go

Analysis: Excellent organization

  • Each output type has its own file
  • Each contains: config struct, parse function, build function
  • Follows consistent naming pattern
  • Easy to find and maintain

Example from create_issue.go:

  • CreateIssuesConfig struct
  • (*Compiler).parseIssuesConfig function
  • (*Compiler).buildCreateOutputIssueJob function

No changes needed - this is a model pattern for the codebase.


Cluster 2: Engine Implementation ✓

Pattern: *_engine.go files implementing CodingAgentEngine interface

Files:

  • pkg/workflow/agentic_engine.go (base + registry)
  • pkg/workflow/claude_engine.go
  • pkg/workflow/codex_engine.go
  • pkg/workflow/copilot_engine.go
  • pkg/workflow/custom_engine.go
  • pkg/workflow/engine.go (interface + base implementation)

Supporting Files:

  • pkg/workflow/engine_helpers.go (shared utilities)
  • pkg/workflow/engine_output.go (output collection)
  • pkg/workflow/engine_firewall_support.go (firewall integration)
  • pkg/workflow/engine_network_hooks.go (network hooks)

Analysis: Good organization

Minor improvement opportunity: Consolidate MCP rendering as discussed in Issue #3


Cluster 3: Compiler Functions

Pattern: compiler*.go files for workflow compilation

Files:

  • pkg/workflow/compiler.go (main compiler, 2000+ lines)
  • pkg/workflow/compiler_jobs.go (job generation)
  • pkg/workflow/compiler_yaml.go (YAML generation)

Analysis: Reasonable split

  • compiler.go handles parsing and validation
  • compiler_jobs.go handles job building
  • compiler_yaml.go handles final YAML output
  • Large but logically organized

No major issues - the compiler is complex and the split makes sense.


Cluster 4: Validation Functions

Pattern: validate* functions across multiple files

See Issue #2 above for full analysis and recommendations.


Cluster 5: Parsing Functions

Pattern: parse* functions for configuration and data extraction

Files with parse functions: 17+ files

Analysis: Mixed

  • Good: Tools parsing in tools_types.go
  • Acceptable: Config parsing in create_*.go files
  • Could improve: Generic parsing helpers could be more centralized

See Issue #4 above for recommendations.


Cluster 6: Rendering Functions

Pattern: render* and generate* functions for YAML/config generation

Major functions:

  • Runtime setup: pkg/workflow/runtime_setup.go

    • GenerateRuntimeSetupSteps
    • generateSetupStep
  • Engine output: pkg/workflow/engine_output.go

    • generateCleanupStep
    • (*Compiler).generateEngineOutputCollection
  • MCP configs: Multiple files (see Issue Add workflow: githubnext/agentics/weekly-research #3)

  • JavaScript formatting: pkg/workflow/js.go

    • FormatJavaScriptForYAML (used 12+ times across codebase ✓)

Analysis:


Refactoring Recommendations

Priority 1: High Impact, Medium Effort

1.1 Consolidate String Sanitization Functions

  • Effort: 2-3 hours
  • Files affected: 2 main files, ~4 call sites
  • Benefits:
    • Single source of truth for name sanitization
    • Easier to test edge cases
    • Reduced maintenance burden

Steps:

  1. Create unified function in pkg/workflow/strings.go:

    type SanitizeOptions struct {
        PreserveSpecialChars []rune  // chars to preserve (e.g., '.', '_')
        TrimHyphens          bool     // trim leading/trailing hyphens
        DefaultValue         string   // value if empty after sanitization
    }
    
    func SanitizeName(name string, opts *SanitizeOptions) string {
        // Unified implementation
    }
  2. Replace SanitizeWorkflowName implementation:

    func SanitizeWorkflowName(name string) string {
        return SanitizeName(name, &SanitizeOptions{
            PreserveSpecialChars: []rune{'.', '_', '-'},
        })
    }
  3. Replace SanitizeIdentifier implementation:

    func SanitizeIdentifier(name string) string {
        return SanitizeName(name, &SanitizeOptions{
            TrimHyphens:  true,
            DefaultValue: "github-agentic-workflow",
        })
    }
  4. Update tests in both strings_test.go and verify callers

1.2 Consolidate MCP Config Rendering

  • Effort: 6-8 hours
  • Files affected: mcp-config.go, claude_mcp.go, copilot_engine.go, codex_engine.go
  • Benefits:
    • 30-40% reduction in MCP rendering code
    • Single source of truth for MCP config format
    • Easier to add new MCP tools
    • Easier to support new engines

Steps:

  1. Enhance MCPConfigRenderer struct in mcp-config.go
  2. Create engine-specific rendering options
  3. Refactor Claude engine to use shared renderers
  4. Refactor Copilot engine to use shared renderers
  5. Update tests

Priority 2: Medium Impact, Low-Medium Effort

2.1 Improve Validation Function Discoverability

  • Effort: 3-4 hours
  • Files affected: Documentation and organization
  • Benefits: Easier to find and understand validation logic

Steps:

  1. Add comprehensive doc comment to validation.go listing all validation functions and their locations
  2. Consider creating pkg/workflow/validation/ subpackage (if desired)
  3. Add cross-reference comments in domain files pointing to related validation

Example addition to validation.go:

// Package workflow provides validation functions for workflow compilation.
//
// Validation Functions by Category:
//
// # Core Validations (this file)
//   - validateExpressionSizes: Checks expression length limits
//   - validateContainerImages: Validates Docker container images
//   - validateRuntimePackages: Validates runtime package declarations
//   - validateGitHubActionsSchema: Validates against GitHub Actions schema
//   - validateRepositoryFeatures: Checks repository features (discussions, issues)
//
// # Package Validations (domain-specific files)
//   - validatePipPackages (pip.go): Python pip package validation
//   - validateNpxPackages (npm.go): NPX package validation  
//   - validateDockerImage (docker.go): Docker image validation
//
// # Permission Validations
//   - ValidatePermissions (permissions_validator.go): Permission scope validation
//   - ValidatePermissions (imports.go): Imported permissions merging
//
// # Configuration Validations
//   - ValidateMCPConfigs (mcp-config.go): MCP server config validation
//
// # Safety Validations
//   - validateExpressionSafety (expression_safety.go): Expression injection checks
//   - validateStrictMode (strict_mode.go): Strict mode compliance

2.2 Standardize Parse Config Function Naming

  • Effort: 1-2 hours (verification and documentation)
  • Files affected: Review 17+ files
  • Benefits: Consistent naming makes functions easier to find

Steps:

  1. Audit all parse functions to ensure consistent parse*Config naming
  2. Document the pattern in a code style guide or CONTRIBUTING.md
  3. No code changes likely needed - most already follow convention

Priority 3: Nice-to-Have, Future Improvements

3.1 Consider Generics for Extract Package Functions

  • Effort: 4-5 hours
  • Applicability: Limited (Go 1.18+ required)
  • Benefits: Type-safe extraction with less boilerplate

Current Pattern:

func extractPipPackages(workflowData *WorkflowData) []string {
    return collectPackagesFromWorkflow(workflowData, extractPipFromCommands, "")
}

func extractNpxPackages(workflowData *WorkflowData) []string {
    return collectPackagesFromWorkflow(workflowData, extractNpxFromCommands, "npx")
}

Potential Generic Approach: (evaluate if worth the complexity)

type PackageExtractor[T any] interface {
    Extract(workflowData *WorkflowData) []T
}

Decision: Low priority - current approach is clear and works well


Implementation Checklist

Phase 1: High-Impact Quick Wins

  • Review and approve Priority 1 recommendations
  • Create feature branch: refactor/function-clustering-consolidation
  • Implement: Consolidate string sanitization functions (1.1)
  • Run tests: Verify no regressions
  • Code review and merge

Phase 2: MCP Config Consolidation

  • Design: Create unified MCP rendering architecture
  • Implement: Enhanced MCPConfigRenderer (1.2)
  • Refactor: Claude engine MCP rendering
  • Refactor: Copilot engine MCP rendering
  • Run tests: Verify all engines work correctly
  • Code review and merge

Phase 3: Validation Organization

  • Document: Add validation function catalog to validation.go (2.1)
  • Review: Consider validation subpackage if team prefers
  • Implement: Chosen validation organization approach
  • Update: Developer documentation

Phase 4: Future Improvements

  • Consider: Parse config standardization (2.2)
  • Evaluate: Generic package extractors (3.1)
  • Monitor: Helper function duplication for future consolidation

Analysis Metadata

  • Total Go Files Analyzed: 173 (non-test files)
  • Total Functions Cataloged: ~800+ functions
  • Function Clusters Identified: 6 major clusters
  • High-Priority Issues Found: 2 (string sanitization, MCP rendering)
  • Medium-Priority Issues Found: 2 (validation discoverability, parse config naming)
  • Low-Priority Issues Found: 1 (helper function scatter)
  • Detection Method: Serena semantic code analysis + pattern matching
  • Analysis Date: 2025-10-31
  • Analysis Tool: Claude Code + Serena MCP Server (gopls language service)

Conclusion

The gh-aw codebase demonstrates good overall organization, particularly in:

  • ✅ Create function pattern (create_*.go files)
  • ✅ Engine implementation structure
  • ✅ JavaScript formatting centralization
  • ✅ Tools parsing organization

Key improvements identified:

  1. Consolidate near-duplicate sanitization functions (highest impact)
  2. Reduce MCP rendering duplication across engines (high impact)
  3. Improve validation function discoverability (medium impact)

The recommendations focus on high-impact, practical refactoring that will improve maintainability without disrupting the existing architecture. The codebase doesn't require major restructuring - these are targeted improvements to reduce duplication and improve discoverability.


Generated by: gh-aw semantic analysis workflow
Workflow Run: §18966551002

AI generated by Semantic Function Refactoring

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions