DotCompute

Universal Compute Framework for .NET 9+

DotCompute provides GPU and CPU acceleration capabilities for .NET applications through a modern C# API. Define compute kernels using [Kernel] attributes for automatic optimization across different hardware backends, with IDE integration and Native AOT support.

Key Features

Modern C# API: Define kernels with [Kernel] attributes for cleaner code organization
Automatic Optimization: CPU/GPU backend selection based on workload characteristics
Developer Tools: Roslyn analyzer integration with real-time feedback and code fixes
Cross-Backend Debugging: Validation system to ensure consistent results across backends
Performance Monitoring: Built-in telemetry and profiling capabilities
Native AOT Support: Compatible with Native AOT compilation for improved startup times

Overview

DotCompute is a compute acceleration framework for .NET applications that provides:

CPU SIMD vectorization using AVX2/AVX512 instruction sets
CUDA GPU acceleration for NVIDIA hardware (Compute Capability 5.0+)
LINQ expression compilation to optimized kernels
Reactive Extensions integration for streaming compute
Native AOT compilation support
Unified memory management with automatic pooling

Production Status

Core Components

Kernel API: [Kernel] attribute-based development with source generators
CPU Backend: AVX2/AVX512 SIMD vectorization (benchmarked 3.7x speedup on vectorizable operations)
CUDA Backend: NVIDIA GPU support for Compute Capability 5.0+ devices
Memory Management: Unified buffers with pooling (measured 90% allocation reduction)
Developer Tools: 12 Roslyn diagnostic rules with 5 automated code fixes
Debugging: Cross-backend validation for result consistency
Optimization: Adaptive backend selection with performance profiling
Native AOT: Full trimming support with reduced startup times
Testing: Comprehensive test suite with integration and performance benchmarks

Backend Support

Backend	Status	Performance	Features
CPU	Production	3.7x measured speedup	AVX2/AVX512, multi-threading
CUDA	Production	GPU acceleration	P2P transfers, unified memory
Metal	In Development	-	macOS GPU support (planned)
ROCm	Planned	-	AMD GPU support (roadmap)
OpenCL	Experimental	-	Cross-vendor GPU support

Installation

dotnet add package DotCompute.Core --version 0.2.0-alpha
dotnet add package DotCompute.Backends.CPU --version 0.2.0-alpha
dotnet add package DotCompute.Backends.CUDA --version 0.2.0-alpha

🚀 Quick Start - Modern Kernel API

Step 1: Define Kernels with C# Attributes

using DotCompute.Core;
using System;

// Modern approach - pure C# with [Kernel] attribute
public static class MyKernels
{
    [Kernel]
    public static void VectorAdd(ReadOnlySpan<float> a, ReadOnlySpan<float> b, Span<float> result)
    {
        int idx = Kernel.ThreadId.X;
        if (idx < result.Length)
        {
            result[idx] = a[idx] + b[idx];
        }
    }

    [Kernel]
    public static void MatrixMultiply(ReadOnlySpan<float> matA, ReadOnlySpan<float> matB,
                                     Span<float> result, int width)
    {
        int row = Kernel.ThreadId.Y;
        int col = Kernel.ThreadId.X;

        if (row < width && col < width)
        {
            float sum = 0.0f;
            for (int k = 0; k < width; k++)
            {
                sum += matA[row * width + k] * matB[k * width + col];
            }
            result[row * width + col] = sum;
        }
    }
}

Step 2: Service Registration and Execution

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using DotCompute.Runtime;

// Configure services
var builder = Host.CreateApplicationBuilder(args);

// Add DotCompute with production optimizations
builder.Services.AddDotComputeRuntime();
builder.Services.AddProductionOptimization();  // Intelligent backend selection
builder.Services.AddProductionDebugging();     // Cross-backend validation

var app = builder.Build();

// Execute kernels with automatic optimization
var orchestrator = app.Services.GetRequiredService<IComputeOrchestrator>();

// Automatic backend selection - uses GPU if available, CPU otherwise
var result = await orchestrator.ExecuteAsync("VectorAdd", a, b, output);

// Explicit backend selection if needed
var gpuResult = await orchestrator.ExecuteAsync("MatrixMultiply",
    matA, matB, result, width, backend: "CUDA");

Step 3: Real-Time IDE Experience

The Roslyn analyzer provides instant feedback as you type:

[Kernel]
public void BadKernel(object param) // ❌ DC001: Must be static
//           ~~~~~~~~~ // ❌ DC002: Invalid parameter type
{
    for (int i = 0; i < 1000; i++)   // ⚠️  DC010: Use Kernel.ThreadId.X
    {
        // Missing bounds check         // ⚠️  DC011: Add bounds validation
    }
}

// ✅ Auto-fixed version after applying IDE suggestions:
[Kernel]
public static void GoodKernel(Span<float> data)
{
    int idx = Kernel.ThreadId.X;
    if (idx >= data.Length) return;

    data[idx] = data[idx] * 2.0f;
}

🛠️ Developer Experience Features

Real-Time Code Analysis

// Visual Studio / VS Code integration provides:
// 🔍 Real-time diagnostics (12 rules)
// 💡 One-click automated fixes (5 fixes)
// 📊 Performance suggestions
// ⚡ GPU compatibility analysis

[Kernel]
public static void ImageBlur(ReadOnlySpan<byte> input, Span<byte> output, int width, int height)
{
    int x = Kernel.ThreadId.X;
    int y = Kernel.ThreadId.Y;

    if (x >= width || y >= height) return;

    // IDE shows: ✅ Optimal GPU pattern detected
    //           📊 Vectorization opportunity available
    //           ⚡ Expected 4-8x speedup on target hardware

    int idx = y * width + x;
    // Blur algorithm implementation...
}

Cross-Backend Debugging & Validation

// Automatic validation during development
services.AddProductionDebugging(); // Enables comprehensive validation

// Debug features:
// 🔍 CPU vs GPU result comparison
// 📊 Performance analysis and bottleneck detection
// 🧪 Determinism testing across runs
// 📋 Memory access pattern validation
// ⚠️  Automatic error detection and reporting

var debugInfo = await orchestrator.ValidateKernelAsync("MyKernel", testData);
if (debugInfo.HasIssues)
{
    foreach (var issue in debugInfo.Issues)
    {
        Console.WriteLine($"⚠️  {issue.Severity}: {issue.Message}");
        Console.WriteLine($"💡 Suggestion: {issue.Recommendation}");
    }
}

Performance Intelligence & Monitoring

// Built-in performance profiling
services.AddProductionOptimization();

// Automatic features:
// 🤖 ML-powered backend selection
// 📊 Real-time performance monitoring
// 🎯 Workload pattern recognition
// ⚡ Automatic optimization suggestions
// 📈 Historical performance tracking

// Get performance insights
var metrics = await orchestrator.GetPerformanceMetricsAsync("VectorAdd");
Console.WriteLine($"Average execution time: {metrics.AverageExecutionTime}ms");
Console.WriteLine($"Recommended backend: {metrics.OptimalBackend}");
Console.WriteLine($"Expected speedup: {metrics.ExpectedSpeedup:F1}x");

LINQ Extensions

DotCompute.Linq provides GPU-accelerated LINQ operations through expression compilation:

using DotCompute.Linq;

// Standard LINQ automatically accelerated
var result = data
    .AsComputeQueryable()
    .Where(x => x > threshold)
    .Select(x => x * factor)
    .Sum();

// Reactive streaming with GPU acceleration
var stream = observable
    .ToComputeObservable()
    .Window(TimeSpan.FromSeconds(1))
    .SelectMany(w => w.Average())
    .Subscribe(avg => Console.WriteLine($"Average: {avg}"));

Features

Expression Compilation: Automatic conversion of LINQ expressions to optimized kernels
Streaming Compute: Reactive Extensions integration with adaptive batching
Kernel Fusion: Multiple operations combined into single kernel execution
Memory Optimization: Intelligent caching and buffer reuse

Requirements

System Requirements

.NET 9.0 SDK or later
C# 13.0 language features
64-bit operating system (Windows, Linux, macOS)

For CUDA Support

NVIDIA GPU with Compute Capability 5.0 or higher
CUDA Toolkit 12.0 or later
Compatible NVIDIA drivers

Building from Source

# Clone the repository
git clone https://github.com/mivertowski/DotCompute.git
cd DotCompute

# Build the solution
dotnet build DotCompute.sln --configuration Release

# Run tests (CPU only)
dotnet test --filter "Category!=Hardware"

# Run all tests (requires NVIDIA GPU)
dotnet test

Architecture

Development Stack

graph TB
    A[C# Kernel with [Kernel] Attribute] --> B[Source Generator]
    B --> C[Runtime Orchestrator]
    C --> D[Backend Selector]
    D --> E[CPU SIMD Engine]
    D --> F[CUDA GPU Engine]
    D --> G[Future: Metal/ROCm]

    H[Roslyn Analyzer] --> A
    I[Cross-Backend Debugger] --> C
    J[Performance Profiler] --> D

Component Layers

Kernel Development

Source Generator: Compile-time kernel wrapper generation from attributes
Roslyn Analyzer: 12 diagnostic rules with automated fixes
IDE Integration: Real-time feedback in Visual Studio and VS Code

Runtime Orchestration

IComputeOrchestrator: Unified execution interface
Backend Selector: Workload-based backend selection
Performance Monitor: Metrics collection with hardware counters
Memory Manager: Unified buffers with pooling

Backend Acceleration

CPU Engine: AVX2/AVX512 SIMD vectorization
CUDA Engine: NVIDIA GPU support with memory optimization
Planned Backends: Metal (macOS), ROCm (AMD)

Developer Tools

Debug Service: Cross-backend result validation
Profiling Service: Performance analysis and optimization
Telemetry Service: Performance tracking and historical analysis
Error Reporting: Comprehensive diagnostics with actionable insights

Performance

Benchmarked Performance

Operation	Dataset Size	Standard .NET	DotCompute CPU	Improvement
Vector Operations	100K elements	2.14ms	0.58ms	3.7x
Sum Reduction	100K elements	0.65ms	0.17ms	3.8x
Memory Allocations	Per operation	48 bytes	0 bytes	100% reduction

Benchmarks performed with BenchmarkDotNet on .NET 9.0. GPU performance requires CUDA-capable hardware and varies significantly based on data size and operation complexity.

Performance Features

Automatic Backend Selection: Chooses between CPU and GPU based on workload
Memory Pooling: Reduces allocations by reusing buffers
Kernel Caching: Compiled kernels are cached for reuse
Native AOT Support: Enables faster startup times
Performance Profiling: Built-in metrics collection and analysis

Production Deployment

System Requirements

Minimum Requirements

.NET 9.0 Runtime
64-bit operating system
4GB RAM

For GPU Acceleration

NVIDIA GPU with Compute Capability 5.0+
CUDA Toolkit 12.0+
Compatible NVIDIA drivers

For Optimal Performance

CPU with AVX2/AVX512 support
16GB+ RAM for large datasets
NVMe SSD for improved I/O

Contributing

Contributions are welcome in the following areas:

Performance optimizations for specific hardware
Additional backend implementations (Metal, ROCm)
Documentation and examples
Bug reports and fixes
Test coverage improvements

Development Setup

git clone https://github.com/mivertowski/DotCompute.git
cd DotCompute

# Build the solution
dotnet build DotCompute.sln --configuration Release

# Run tests
dotnet test --configuration Release

# Run hardware-specific tests (requires NVIDIA GPU)
dotnet test --filter "Category=Hardware"

License

Licensed under the MIT License - see LICENSE file for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: API reference and guides in the docs/ directory

Project Status

DotCompute v0.2.0-alpha provides a foundation for GPU and CPU compute acceleration in .NET applications. The framework includes:

Attribute-based kernel definition system
CPU SIMD and CUDA GPU backends
Source generators and Roslyn analyzers
Cross-backend debugging capabilities
LINQ expression compilation (experimental)
Performance monitoring and profiling tools

The project continues to evolve with planned support for additional backends and optimization strategies.

Name		Name	Last commit message	Last commit date
Latest commit History 394 Commits
.github		.github
backups		backups
benchmarks		benchmarks
ci		ci
docs		docs
hive		hive
scripts		scripts
src		src
test-reports		test-reports
tests		tests
tools		tools
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Directory.Build.props		Directory.Build.props
Directory.Build.targets		Directory.Build.targets
Directory.Packages.props		Directory.Packages.props
Directory.Test.props		Directory.Test.props
DotCompute.sln		DotCompute.sln
DotCompute.sln.backup		DotCompute.sln.backup
GitVersion.yml		GitVersion.yml
LICENSE		LICENSE
Memory_Interface_Fix_Summary.md		Memory_Interface_Fix_Summary.md
README.md		README.md
cleanup_duplicates.sh		cleanup_duplicates.sh
coverlet.runsettings		coverlet.runsettings
fix_analyzer_syntax.sh		fix_analyzer_syntax.sh
fix_kernel_language.sh		fix_kernel_language.sh
global.json		global.json
kernel-pipeline-implementation-summary.md		kernel-pipeline-implementation-summary.md
test-results.json		test-results.json
warnings.txt		warnings.txt

License

mivertowski/DotCompute

Folders and files

Latest commit

History

Repository files navigation

DotCompute

Key Features

Overview

Production Status

Core Components

Backend Support

Installation

🚀 Quick Start - Modern Kernel API

Step 1: Define Kernels with C# Attributes

Step 2: Service Registration and Execution

Step 3: Real-Time IDE Experience

🛠️ Developer Experience Features

Real-Time Code Analysis

Cross-Backend Debugging & Validation

Performance Intelligence & Monitoring

LINQ Extensions

Features

Requirements

System Requirements

For CUDA Support

Building from Source

Architecture

Development Stack

Component Layers

Kernel Development

Runtime Orchestration

Backend Acceleration

Developer Tools

Performance

Benchmarked Performance

Performance Features

Production Deployment

System Requirements

Minimum Requirements

For GPU Acceleration

For Optimal Performance

Contributing

Development Setup

License

Support

Project Status

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Languages