-
-
Notifications
You must be signed in to change notification settings - Fork 957
Closed
Labels
bugSomething isn't workingSomething isn't workingpri/highHigh priority issueHigh priority issuestatus/approvedThis issue is ready to be implementedThis issue is ready to be implemented
Milestone
Description
Describe the Bug
When a direct internet connection is not available and the site can only be accessed through a proxy, the crawler job fails by timeout.
It seems that removing the metascraperLogo() plugin from the metascraper resolves the issue.
Also seems that adding generic HTTP_PROXY can solve problem (but this method can occasionally break other internal network connections and not all libraries respects NO_PROXY variable)
Steps to Reproduce
- block site or internet access for crawler
- setup proxy for access target site (instagram or etc.)
- add link to blocked site to karakeep
Expected Behaviour
scraping must 100% relay on proxy settings
Screenshots or Additional Context
in worker logs newer appears message "Done extracting metadata from the page."
2025-08-21T19:36:27.309Z info: [Crawler][172] Successfully navigated to "https://www.instagram.com/p/DKe1DhCtlUB". Waiting for the page to load ...
2025-08-21T19:36:30.045Z info: [Crawler][172] Finished waiting for the page to load.
2025-08-21T19:36:30.071Z info: [Crawler][172] Successfully fetched the page content.
2025-08-21T19:36:30.132Z info: [Crawler][172] Finished capturing page content and a screenshot. FullPageScreenshot: false
2025-08-21T19:36:30.148Z info: [Crawler][172] Will attempt to extract metadata from page ...
2025-08-21T19:36:30.296Z info: [Crawler][172] Will attempt to extract readable content ...
2025-08-21T19:36:30.560Z info: [Crawler][172] Done extracting readable content.
2025-08-21T19:36:30.652Z info: [Crawler][172] Stored the screenshot as assetId: f8d970d7-83f3-4908-a5c9-1c65a70048fd
Device Details
No response
Exact Karakeep Version
0.26.0
Have you checked the troubleshooting guide?
- I have checked the troubleshooting guide and I haven't found a solution to my problem
MintBrain
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingpri/highHigh priority issueHigh priority issuestatus/approvedThis issue is ready to be implementedThis issue is ready to be implemented