Skip to content

Detect Repeated String Formatting and Suggest CompositeFormat Optimization #903

@VahidN

Description

@VahidN

Feature Request: New Roslyn Analyzer for Optimizing Repeated String Formatting with CompositeFormat

Description

In .NET applications, string formatting is a common operation used in logging, debugging, user messages, and dynamic content generation. However, repeated calls to string.Format or interpolated strings with the same format string (e.g., in loops or high-frequency methods) incur unnecessary parsing overhead. Each invocation parses the format string anew, scanning for placeholders, validating structure, and building an internal representation—leading to performance waste in high-throughput scenarios like web servers, batch processors, or real-time systems.

.NET 8 introduced CompositeFormat to address this by allowing developers to parse the format string once and reuse it multiple times, eliminating redundant parsing and improving performance (e.g., 15-30% reduction in execution time, fewer allocations, and less GC pressure based on .NET runtime benchmarks).

I propose a new Roslyn-based analyzer that detects patterns of repeated string formatting with constant format strings and suggests refactoring to use CompositeFormat. This would help developers identify and optimize hot paths incrementally, aligning with .NET's philosophy of zero-cost abstractions.

Diagnostic Details

  • ID: CAXXXX (e.g., CA1870 – to be assigned)
  • Severity: Info (or Warning for performance-critical contexts)
  • Category: Performance
  • Title: "Repeated string formatting with constant format string; consider using CompositeFormat for optimization"
  • Description: "The same format string is being parsed multiple times in a loop or frequently called method. Use CompositeFormat to parse once and reuse."
  • Help Link: Link to relevant .NET documentation on CompositeFormat (e.g., https://learn.microsoft.com/en-us/dotnet/api/system.text.compositeformat?view=net-9.0)

Detection Criteria

The analyzer should trigger when:

  • string.Format is called with a constant string literal as the format argument.
  • The call occurs inside a loop (e.g., for, foreach, while) or in a method invoked frequently (potentially via heuristics like method attributes or call graph analysis).
  • The format string is identical across multiple invocations.
  • The code targets .NET 8 or later (to ensure CompositeFormat availability).

Ignore cases like:

  • One-off formatting.
  • Dynamic format strings (e.g., built at runtime).
  • Simple interpolated strings where readability is prioritized over performance.

Code Fix

Provide an automated code fix that:

  1. Extracts the format string to a static readonly CompositeFormat field (e.g., in the class or a centralized formats class).
  2. Replaces string.Format(format, args) with string.Format(null, compositeFormat, args).
  3. Handles format specifiers and ensures thread-safety.

Example "Before" Code:

for (int i = 0; i < 10000; i++)
{
    string message = string.Format("Processing item {0} of {1}", i, 10000);
    // Log or use message
}

Example "After" Code (with fix):

private static readonly CompositeFormat ProcessingFormat = CompositeFormat.Parse("Processing item {0} of {1}");

for (int i = 0; i < 10000; i++)
{
    string message = string.Format(null, ProcessingFormat, i, 10000);
    // Log or use message
}

Benefits

  • Improves application performance in logging-heavy or high-throughput workloads.
  • Encourages best practices without requiring manual profiling.
  • Complements existing performance analyzers (e.g., for allocations or GC).

Additional Considerations

  • Benchmark Integration: Suggest using BenchmarkDotNet to measure improvements post-fix.
  • Best Practices: Recommend caching CompositeFormat instances as static readonly for reuse across threads.
  • Evolution: The analyzer could evolve to support .NET 9+ optimizations, like reduced allocations for complex formats.
  • Testing: Include unit tests for common scenarios (loops, logging integrations) and edge cases (dynamic formats, pre-.NET 8 targets).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions