-
Notifications
You must be signed in to change notification settings - Fork 23
Description
🔧 Semantic Function Clustering Analysis
Analysis of repository: githubnext/gh-aw
Analysis performed using Serena semantic code analysis tools on the Go codebase to identify refactoring opportunities through function clustering and duplicate detection.
Executive Summary
Analyzed 173 non-test Go source files across 7 packages, identifying clear patterns of:
- Near-duplicate functions with >85% similarity requiring consolidation
- Scattered validation functions across 10+ files instead of centralized validation
- Parse config functions spread across 17+ files following inconsistent patterns
- Render MCP config functions duplicated across 3 engine implementations
- Helper functions scattered when they could be centralized
The codebase is generally well-organized, especially the create_* pattern and engine structure. However, there are specific high-impact opportunities for improving maintainability through consolidation.
Full Report Details
Codebase Structure
By Package
| Package | Files | Primary Purpose |
|---|---|---|
| pkg/workflow | 104 | Workflow execution, compilation, and step generation |
| pkg/cli | 58 | CLI commands and utilities |
| pkg/parser | 6 | Parsing utilities (YAML, frontmatter, JSON paths) |
| pkg/console | 3 | Console output formatting |
| pkg/constants | 1 | Application constants |
| pkg/logger | 1 | Logging utilities |
| pkg/timeutil | 1 | Time formatting utilities |
Total: 174 Go files analyzed
Identified Issues
1. Near-Duplicate String Sanitization Functions
Issue: Two very similar string sanitization functions with overlapping logic
Functions
Function 1: SanitizeWorkflowName in pkg/workflow/strings.go:46
func SanitizeWorkflowName(name string) string {
name = strings.ToLower(name)
name = strings.ReplaceAll(name, ":", "-")
name = strings.ReplaceAll(name, "\\", "-")
name = strings.ReplaceAll(name, "/", "-")
name = strings.ReplaceAll(name, " ", "-")
// Replace non-alphanumeric (except . _ -) with "-"
name = sanitizeNamePattern.ReplaceAllString(name, "-")
// Consolidate multiple hyphens
name = multipleHyphens.ReplaceAllString(name, "-")
return name
}Function 2: SanitizeIdentifier in pkg/workflow/workflow_name.go:18
func SanitizeIdentifier(name string) string {
identifier := strings.ToLower(name)
identifier = strings.ReplaceAll(identifier, " ", "-")
identifier = strings.ReplaceAll(identifier, "_", "-")
// Remove non-alphanumeric or hyphens
identifier = identifierNonAlphanumeric.ReplaceAllString(identifier, "")
// Consolidate multiple hyphens
identifier = identifierMultipleHyphens.ReplaceAllString(identifier, "-")
// Remove leading/trailing hyphens
identifier = strings.Trim(identifier, "-")
if identifier == "" {
return "github-agentic-workflow"
}
return identifier
}Similarity: ~85% similar logic with different edge case handling
Current Usage:
SanitizeWorkflowName: Used inpkg/workflow/copilot_engine.gofor squid log namingSanitizeIdentifier: Used inpkg/workflow/codex_engine.gofor user agent generation
Recommendation:
- Create unified
SanitizeNamefunction with options parameter for different behaviors - Alternatively, make
SanitizeIdentifiercallSanitizeWorkflowNamewith additional trimming - Place in centralized location:
pkg/workflow/strings.go
Estimated Impact: Reduced duplication, single source of truth for name sanitization
2. Scattered Validation Functions
Issue: Validation functions distributed across 10+ files instead of centralized
Validation Functions by File
| File | Validation Functions |
|---|---|
pkg/workflow/validation.go |
9 validation functions (schema, expressions, containers, features) |
pkg/workflow/strict_mode.go |
5 validation functions (strict mode checks) |
pkg/workflow/pip.go |
4 validation functions (Python packages) |
pkg/workflow/npm.go |
1 validation function (NPX packages) |
pkg/workflow/docker.go |
1 validation function (Docker images) |
pkg/workflow/permissions_validator.go |
1 validation function (permissions) |
pkg/workflow/mcp-config.go |
3 validation functions (MCP configs) |
pkg/workflow/expression_safety.go |
2 validation functions (expression safety) |
pkg/workflow/imports.go |
1 validation function (permissions merging) |
pkg/workflow/jobs.go |
1 validation function (job dependencies) |
Examples of Outliers:
-
Validation in Pip.go:
pkg/workflow/pip.go:42:func (*Compiler).validatePythonPackagesWithPippkg/workflow/pip.go:106:func (*Compiler).validatePipPackagespkg/workflow/pip.go:123:func (*Compiler).validateUvPackages
-
Validation in NPM.go:
pkg/workflow/npm.go:19:func (*Compiler).validateNpxPackages
-
Validation in Docker.go:
pkg/workflow/docker.go:42:func validateDockerImage
Analysis:
- Package-specific validation functions (pip, npm, docker) are in their domain files, which is reasonable for encapsulation
- However, this makes it harder to find all validation logic
validation.goexists but doesn't consolidate all validation
Recommendation:
- Option A (Moderate): Keep domain-specific validations in their files but add cross-references in
validation.go - Option B (Aggressive): Move all validation functions to
pkg/workflow/validation/subpackage organized by domainvalidation/packages.go- pip, npm, go package validationvalidation/containers.go- docker validationvalidation/permissions.go- permission validationvalidation/expressions.go- expression validation
Estimated Impact: Improved discoverability, easier testing of validation logic
3. Duplicated MCP Config Rendering Functions
Issue: MCP configuration rendering logic duplicated across multiple engine implementations
Render MCP Config Functions
In mcp-config.go (shared):
renderPlaywrightMCPConfig(line 23)renderPlaywrightMCPConfigWithOptions(line 27)renderSafeOutputsMCPConfig(line 64)renderSafeOutputsMCPConfigWithOptions(line 68)renderAgenticWorkflowsMCPConfig(line 98)renderAgenticWorkflowsMCPConfigWithOptions(line 102)
In claude_mcp.go (Claude-specific):
(*ClaudeEngine).RenderMCPConfig(*ClaudeEngine).renderGitHubClaudeMCPConfig(*ClaudeEngine).renderPlaywrightMCPConfig(*ClaudeEngine).renderClaudeMCPConfig(*ClaudeEngine).renderCacheMemoryMCPConfig(*ClaudeEngine).renderSafeOutputsMCPConfig(*ClaudeEngine).renderAgenticWorkflowsMCPConfig
In copilot_engine.go (Copilot-specific):
(*CopilotEngine).RenderMCPConfig(*CopilotEngine).renderGitHubCopilotMCPConfig(*CopilotEngine).renderPlaywrightCopilotMCPConfig(*CopilotEngine).renderSafeOutputsCopilotMCPConfig(*CopilotEngine).renderAgenticWorkflowsCopilotMCPConfig(*CopilotEngine).renderCopilotMCPConfig
In engine_helpers.go:
RenderGitHubMCPDockerConfigRenderGitHubMCPRemoteConfigRenderJSONMCPConfig
Analysis:
- Similar rendering logic for Playwright, SafeOutputs, AgenticWorkflows repeated
- Some consolidation exists in
mcp-config.gobut not fully leveraged - Each engine needs slight variations but core logic is duplicated
Recommendation:
- Enhance
mcp-config.goto have a more flexibleMCPConfigRendererwith strategy pattern - Create engine-specific adapters that call shared rendering functions
- Consolidate TOML vs YAML rendering logic
Example Refactor:
// In mcp-config.go
type MCPConfigFormat int
const (
MCPFormatClaudeYAML MCPConfigFormat = iota
MCPFormatCopilotYAML
MCPFormatTOML
)
func RenderMCPConfigUnified(
yaml *strings.Builder,
toolName string,
config map[string]any,
format MCPConfigFormat,
isLast bool,
) error {
// Unified rendering logic with format-specific branches
}Estimated Impact: Significant reduction in duplicated rendering code, easier to maintain MCP configs
4. Scattered Parse Config Functions
Issue: Parse configuration functions spread across 17+ files
Parse Config Distribution
| File | Parse Config Functions |
|---|---|
pkg/workflow/tools_types.go |
8 (parseGitHubTool, parseBashTool, etc.) ✓ Good |
pkg/workflow/config.go |
3 (parseLabelsFromConfig, etc.) |
pkg/workflow/compiler.go |
3 (parseOnSection, parseBaseSafeOutputConfig, etc.) |
pkg/workflow/create_*.go |
1 each in 6 files (parseIssuesConfig, parsePullRequestsConfig, etc.) |
pkg/workflow/add_comment.go |
1 (parseCommentsConfig) |
pkg/workflow/threat_detection.go |
1 (parseThreatDetectionConfig) |
pkg/workflow/publish_assets.go |
1 (parseUploadAssetConfig) |
| ...and 8 more files | 1 each |
Analysis:
tools_types.gois well-organized with 8 parse functions for tools ✓- Each
create_*.gofile has its own parse config function - this is acceptable as it keeps related code together - Generic parsing helpers in
config.gocould be more discoverable
Recommendation:
- Keep current organization for
create_*.gofiles (one parse function per file is fine) - Consider adding a
// Parse Config Functionscomment section inconfig.goto improve discoverability - Ensure consistent naming: all should be
parse*Config(most already follow this)
Estimated Impact: Low priority - current organization is mostly acceptable
5. Helper Functions Scattered Across Files
Issue: Helper/utility functions distributed rather than centralized
Helper Function Distribution
String/Escape Functions:
pkg/workflow/shell.go:shellEscapeArg,shellEscapeCommandStringpkg/workflow/redact_secrets.go:escapeSingleQuotepkg/workflow/threat_detection.go:formatStringAsJavaScriptLiteral
Cleanup Functions:
pkg/cli/trial_command.go:cleanupTrialRepository,cleanupTrialSecrets,sanitizeRepoSlugForFilenamepkg/cli/remove_command.go:cleanupOrphanedIncludes,cleanupAllIncludespkg/workflow/copilot_engine.go:GetCleanupStep,generateAWFCleanupStep
Format Functions (13 in pkg/console/console.go - ✓ Good centralization):
- All Format* functions properly centralized
Extract Package Functions:
pkg/workflow/pip.go:extractPipPackages,extractPipFromCommandspkg/workflow/npm.go:extractNpxPackages,extractNpxFromCommandspkg/workflow/dependabot.go:extractGoPackages
Analysis:
- Escape functions serve different contexts but could share common utilities
- Cleanup functions are command-specific, current organization is reasonable
- Extract package functions follow good pattern using
collectPackagesFromWorkflow
Recommendation:
- Low Priority: Create
pkg/workflow/escaping.gofor consolidated escape utilities - Consider: Add a
pkg/cli/cleanup_utils.gofor shared cleanup logic if duplication increases
Estimated Impact: Low - current scatter is mostly justified by context
Detailed Function Clusters
Cluster 1: Creation Functions ✓
Pattern: create_* files with corresponding parse and build functions
Files:
pkg/workflow/create_agent_task.gopkg/workflow/create_code_scanning_alert.gopkg/workflow/create_discussion.gopkg/workflow/create_issue.gopkg/workflow/create_pr_review_comment.gopkg/workflow/create_pull_request.go
Analysis: Excellent organization ✓
- Each output type has its own file
- Each contains: config struct, parse function, build function
- Follows consistent naming pattern
- Easy to find and maintain
Example from create_issue.go:
CreateIssuesConfigstruct(*Compiler).parseIssuesConfigfunction(*Compiler).buildCreateOutputIssueJobfunction
No changes needed - this is a model pattern for the codebase.
Cluster 2: Engine Implementation ✓
Pattern: *_engine.go files implementing CodingAgentEngine interface
Files:
pkg/workflow/agentic_engine.go(base + registry)pkg/workflow/claude_engine.gopkg/workflow/codex_engine.gopkg/workflow/copilot_engine.gopkg/workflow/custom_engine.gopkg/workflow/engine.go(interface + base implementation)
Supporting Files:
pkg/workflow/engine_helpers.go(shared utilities)pkg/workflow/engine_output.go(output collection)pkg/workflow/engine_firewall_support.go(firewall integration)pkg/workflow/engine_network_hooks.go(network hooks)
Analysis: Good organization ✓
- Each engine has dedicated file
- Shared utilities in
engine_helpers.go - Some duplication in MCP rendering (see Issue Add workflow: githubnext/agentics/weekly-research #3 above)
Minor improvement opportunity: Consolidate MCP rendering as discussed in Issue #3
Cluster 3: Compiler Functions
Pattern: compiler*.go files for workflow compilation
Files:
pkg/workflow/compiler.go(main compiler, 2000+ lines)pkg/workflow/compiler_jobs.go(job generation)pkg/workflow/compiler_yaml.go(YAML generation)
Analysis: Reasonable split ✓
compiler.gohandles parsing and validationcompiler_jobs.gohandles job buildingcompiler_yaml.gohandles final YAML output- Large but logically organized
No major issues - the compiler is complex and the split makes sense.
Cluster 4: Validation Functions
Pattern: validate* functions across multiple files
See Issue #2 above for full analysis and recommendations.
Cluster 5: Parsing Functions
Pattern: parse* functions for configuration and data extraction
Files with parse functions: 17+ files
Analysis: Mixed
- Good: Tools parsing in
tools_types.go - Acceptable: Config parsing in
create_*.gofiles - Could improve: Generic parsing helpers could be more centralized
See Issue #4 above for recommendations.
Cluster 6: Rendering Functions
Pattern: render* and generate* functions for YAML/config generation
Major functions:
-
Runtime setup:
pkg/workflow/runtime_setup.goGenerateRuntimeSetupStepsgenerateSetupStep
-
Engine output:
pkg/workflow/engine_output.gogenerateCleanupStep(*Compiler).generateEngineOutputCollection
-
MCP configs: Multiple files (see Issue Add workflow: githubnext/agentics/weekly-research #3)
-
JavaScript formatting:
pkg/workflow/js.goFormatJavaScriptForYAML(used 12+ times across codebase ✓)
Analysis:
- JavaScript formatting properly centralized ✓
- MCP rendering has duplication (Issue Add workflow: githubnext/agentics/weekly-research #3)
- Other rendering functions appropriately located in context
Refactoring Recommendations
Priority 1: High Impact, Medium Effort
1.1 Consolidate String Sanitization Functions
- Effort: 2-3 hours
- Files affected: 2 main files, ~4 call sites
- Benefits:
- Single source of truth for name sanitization
- Easier to test edge cases
- Reduced maintenance burden
Steps:
-
Create unified function in
pkg/workflow/strings.go:type SanitizeOptions struct { PreserveSpecialChars []rune // chars to preserve (e.g., '.', '_') TrimHyphens bool // trim leading/trailing hyphens DefaultValue string // value if empty after sanitization } func SanitizeName(name string, opts *SanitizeOptions) string { // Unified implementation }
-
Replace
SanitizeWorkflowNameimplementation:func SanitizeWorkflowName(name string) string { return SanitizeName(name, &SanitizeOptions{ PreserveSpecialChars: []rune{'.', '_', '-'}, }) }
-
Replace
SanitizeIdentifierimplementation:func SanitizeIdentifier(name string) string { return SanitizeName(name, &SanitizeOptions{ TrimHyphens: true, DefaultValue: "github-agentic-workflow", }) }
-
Update tests in both
strings_test.goand verify callers
1.2 Consolidate MCP Config Rendering
- Effort: 6-8 hours
- Files affected:
mcp-config.go,claude_mcp.go,copilot_engine.go,codex_engine.go - Benefits:
- 30-40% reduction in MCP rendering code
- Single source of truth for MCP config format
- Easier to add new MCP tools
- Easier to support new engines
Steps:
- Enhance
MCPConfigRendererstruct inmcp-config.go - Create engine-specific rendering options
- Refactor Claude engine to use shared renderers
- Refactor Copilot engine to use shared renderers
- Update tests
Priority 2: Medium Impact, Low-Medium Effort
2.1 Improve Validation Function Discoverability
- Effort: 3-4 hours
- Files affected: Documentation and organization
- Benefits: Easier to find and understand validation logic
Steps:
- Add comprehensive doc comment to
validation.golisting all validation functions and their locations - Consider creating
pkg/workflow/validation/subpackage (if desired) - Add cross-reference comments in domain files pointing to related validation
Example addition to validation.go:
// Package workflow provides validation functions for workflow compilation.
//
// Validation Functions by Category:
//
// # Core Validations (this file)
// - validateExpressionSizes: Checks expression length limits
// - validateContainerImages: Validates Docker container images
// - validateRuntimePackages: Validates runtime package declarations
// - validateGitHubActionsSchema: Validates against GitHub Actions schema
// - validateRepositoryFeatures: Checks repository features (discussions, issues)
//
// # Package Validations (domain-specific files)
// - validatePipPackages (pip.go): Python pip package validation
// - validateNpxPackages (npm.go): NPX package validation
// - validateDockerImage (docker.go): Docker image validation
//
// # Permission Validations
// - ValidatePermissions (permissions_validator.go): Permission scope validation
// - ValidatePermissions (imports.go): Imported permissions merging
//
// # Configuration Validations
// - ValidateMCPConfigs (mcp-config.go): MCP server config validation
//
// # Safety Validations
// - validateExpressionSafety (expression_safety.go): Expression injection checks
// - validateStrictMode (strict_mode.go): Strict mode compliance2.2 Standardize Parse Config Function Naming
- Effort: 1-2 hours (verification and documentation)
- Files affected: Review 17+ files
- Benefits: Consistent naming makes functions easier to find
Steps:
- Audit all parse functions to ensure consistent
parse*Confignaming - Document the pattern in a code style guide or CONTRIBUTING.md
- No code changes likely needed - most already follow convention
Priority 3: Nice-to-Have, Future Improvements
3.1 Consider Generics for Extract Package Functions
- Effort: 4-5 hours
- Applicability: Limited (Go 1.18+ required)
- Benefits: Type-safe extraction with less boilerplate
Current Pattern:
func extractPipPackages(workflowData *WorkflowData) []string {
return collectPackagesFromWorkflow(workflowData, extractPipFromCommands, "")
}
func extractNpxPackages(workflowData *WorkflowData) []string {
return collectPackagesFromWorkflow(workflowData, extractNpxFromCommands, "npx")
}Potential Generic Approach: (evaluate if worth the complexity)
type PackageExtractor[T any] interface {
Extract(workflowData *WorkflowData) []T
}Decision: Low priority - current approach is clear and works well
Implementation Checklist
Phase 1: High-Impact Quick Wins
- Review and approve Priority 1 recommendations
- Create feature branch:
refactor/function-clustering-consolidation - Implement: Consolidate string sanitization functions (1.1)
- Run tests: Verify no regressions
- Code review and merge
Phase 2: MCP Config Consolidation
- Design: Create unified MCP rendering architecture
- Implement: Enhanced MCPConfigRenderer (1.2)
- Refactor: Claude engine MCP rendering
- Refactor: Copilot engine MCP rendering
- Run tests: Verify all engines work correctly
- Code review and merge
Phase 3: Validation Organization
- Document: Add validation function catalog to validation.go (2.1)
- Review: Consider validation subpackage if team prefers
- Implement: Chosen validation organization approach
- Update: Developer documentation
Phase 4: Future Improvements
- Consider: Parse config standardization (2.2)
- Evaluate: Generic package extractors (3.1)
- Monitor: Helper function duplication for future consolidation
Analysis Metadata
- Total Go Files Analyzed: 173 (non-test files)
- Total Functions Cataloged: ~800+ functions
- Function Clusters Identified: 6 major clusters
- High-Priority Issues Found: 2 (string sanitization, MCP rendering)
- Medium-Priority Issues Found: 2 (validation discoverability, parse config naming)
- Low-Priority Issues Found: 1 (helper function scatter)
- Detection Method: Serena semantic code analysis + pattern matching
- Analysis Date: 2025-10-31
- Analysis Tool: Claude Code + Serena MCP Server (gopls language service)
Conclusion
The gh-aw codebase demonstrates good overall organization, particularly in:
- ✅ Create function pattern (
create_*.gofiles) - ✅ Engine implementation structure
- ✅ JavaScript formatting centralization
- ✅ Tools parsing organization
Key improvements identified:
- Consolidate near-duplicate sanitization functions (highest impact)
- Reduce MCP rendering duplication across engines (high impact)
- Improve validation function discoverability (medium impact)
The recommendations focus on high-impact, practical refactoring that will improve maintainability without disrupting the existing architecture. The codebase doesn't require major restructuring - these are targeted improvements to reduce duplication and improve discoverability.
Generated by: gh-aw semantic analysis workflow
Workflow Run: §18966551002
AI generated by Semantic Function Refactoring