Skip to content

Fix autoscaled pool scaling behavior on 429 Too Many Requests #1437

@vdusek

Description

@vdusek

Description

  • Crawlee does not currently handle 429 Too Many Requests responses correctly.
  • When a target server starts returning 429s, Crawlee does not slow down.
  • Instead, due to the current autoscaled pool logic, Crawlee may actually scale concurrency up when responses get slower (because of less CPU work).
  • This creates a "death spiral" - the slower the server, the faster Crawlee increases concurrency, which can quickly overwhelm small websites.

Proposed solution

  • Detect 429 responses and implement proper backoff logic (reducing concurrency of autoscaled pool, cooldown period, ...).
  • Ensure the autoscaled pool does not interpret slow responses or 429s as a signal to increase concurrency.
  • Consider respecting Retry-After headers if present.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions