Skip to content

Add docs for CRAWLER_LICENSEE_PARALLELISM #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 21, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions service_config/crawler.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- [CRAWLER\_GITHUB\_TOKEN](#crawler_github_token)
- [CRAWLER\_HOST](#crawler_host)
- [CRAWLER\_INSIGHTS\_KEY](#crawler_insights_key)
- [CRAWLER\_LICENSEE\_PARALLELISM](#crawler_licensee_parallelism)
- [CRAWLER\_NAME](#crawler_name)
- [CRAWLER\_QUEUE\_PREFIX](#crawler_queue_prefix)
- [CRAWLER\_QUEUE\_PROVIDER](#crawler_queue_provider)
Expand Down Expand Up @@ -34,6 +35,7 @@ The environmental variables for the cdcrawler-dev App Service include:
* CRAWLER_GITHUB_TOKEN
* CRAWLER_HOST
* CRAWLER_INSIGHTS_KEY
* CRAWLER_LICENSEE_PARALLELISM
* CRAWLER_NAME
* CRAWLER_QUEUE_AZURE_CONNECTION_STRING
* CRAWLER_QUEUE_PREFIX
Expand Down Expand Up @@ -87,6 +89,12 @@ Note that we only use this in the development environment, not in the production

We use [Azure Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview) to monitor the crawler application. This requires a key and this is where it is kept.

### CRAWLER_LICENSEE_PARALLELISM

This is the maximum number of `licensee` processes to run in parallel. `licensee` is a tool to collect license
information. The default value is `10` and setting it to a smaller value can reduce CPU spikes and lead to the crawler
having a more uniform CPU usage.

### CRAWLER_NAME

This is a name to refer to the crawler with. Note that we set it in the App Service in the development environment and in [the Docker file](https://github.com/clearlydefined/crawler/blob/32a0d6b59edfda5d3226c50680e4a8338af395cd/Dockerfile) for the Prod environment.
Expand Down
Loading