-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Improve url downloads for file objects #8978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
4552e83
changes
aff5a4b
changes
795d787
add changeset
gradio-pr-bot 43bcd28
add changeset
gradio-pr-bot 98475a2
Ci security tweaks (#9010)
pngwn d145350
change
6390dae
changes
eedb008
changes
54fe329
changes
52c5d88
changes
3c1bc10
changes
51e934f
changes
c05a489
changes
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| --- | ||
| "gradio": patch | ||
| --- | ||
|
|
||
| feat:Improve url downloads for file objects |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,16 +2,19 @@ | |
|
|
||
| import base64 | ||
| import hashlib | ||
| import ipaddress | ||
| import json | ||
| import logging | ||
| import os | ||
| import shutil | ||
| import socket | ||
| import subprocess | ||
| import tempfile | ||
| import warnings | ||
| from io import BytesIO | ||
| from pathlib import Path | ||
| from typing import TYPE_CHECKING, Any | ||
| from urllib.parse import urlparse | ||
|
|
||
| import aiofiles | ||
| import httpx | ||
|
|
@@ -22,7 +25,7 @@ | |
| from gradio import utils, wasm_utils | ||
| from gradio.data_classes import FileData, GradioModel, GradioRootModel, JsonData | ||
| from gradio.exceptions import Error | ||
| from gradio.utils import abspath, get_upload_folder, is_in_or_equal | ||
| from gradio.utils import abspath, get_hash_seed, get_upload_folder, is_in_or_equal | ||
|
|
||
| with warnings.catch_warnings(): | ||
| warnings.simplefilter("ignore") # Ignore pydub warning if ffmpeg is not installed | ||
|
|
@@ -177,8 +180,12 @@ def encode_pil_to_bytes(pil_image, format="png"): | |
| return output_bytes.getvalue() | ||
|
|
||
|
|
||
| hash_seed = get_hash_seed().encode("utf-8") | ||
|
|
||
|
|
||
| def hash_file(file_path: str | Path, chunk_num_blocks: int = 128) -> str: | ||
| sha1 = hashlib.sha1() | ||
| sha1.update(hash_seed) | ||
| with open(file_path, "rb") as f: | ||
| for chunk in iter(lambda: f.read(chunk_num_blocks * sha1.block_size), b""): | ||
| sha1.update(chunk) | ||
|
|
@@ -187,18 +194,21 @@ def hash_file(file_path: str | Path, chunk_num_blocks: int = 128) -> str: | |
|
|
||
| def hash_url(url: str) -> str: | ||
| sha1 = hashlib.sha1() | ||
| sha1.update(hash_seed) | ||
| sha1.update(url.encode("utf-8")) | ||
| return sha1.hexdigest() | ||
|
|
||
|
|
||
| def hash_bytes(bytes: bytes): | ||
| sha1 = hashlib.sha1() | ||
| sha1.update(hash_seed) | ||
| sha1.update(bytes) | ||
| return sha1.hexdigest() | ||
|
|
||
|
|
||
| def hash_base64(base64_encoding: str, chunk_num_blocks: int = 128) -> str: | ||
| sha1 = hashlib.sha1() | ||
| sha1.update(hash_seed) | ||
| for i in range(0, len(base64_encoding), chunk_num_blocks * sha1.block_size): | ||
| data = base64_encoding[i : i + chunk_num_blocks * sha1.block_size] | ||
| sha1.update(data.encode("utf-8")) | ||
|
|
@@ -260,20 +270,51 @@ def save_file_to_cache(file_path: str | Path, cache_dir: str) -> str: | |
| return full_temp_file_path | ||
|
|
||
|
|
||
| def check_public_url(url: str): | ||
| parsed_url = urlparse(url) | ||
| if parsed_url.scheme not in ["http", "https"]: | ||
| raise httpx.RequestError(f"Invalid URL: {url}") | ||
| hostname = parsed_url.hostname | ||
| if not hostname: | ||
| raise httpx.RequestError(f"Invalid URL: {url}") | ||
|
|
||
| try: | ||
| addrinfo = socket.getaddrinfo(hostname, None) | ||
| except socket.gaierror: | ||
| raise httpx.RequestError(f"Cannot resolve hostname: {hostname}") from None | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking about whether this would cause any issues if there's no internet connection, but I believe socket.getaddrinfo works for localhost addresses even without internet connections so this should be fine |
||
|
|
||
| for family, _, _, _, sockaddr in addrinfo: | ||
| ip = sockaddr[0] | ||
| if family == socket.AF_INET6: | ||
| ip = ip.split("%")[0] # Remove scope ID if present | ||
|
|
||
| if not ipaddress.ip_address(ip).is_global: | ||
| raise httpx.RequestError( | ||
| f"Non-public IP address found: {ip} for URL: {url}" | ||
| ) | ||
|
|
||
| return True | ||
|
|
||
|
|
||
| def save_url_to_cache(url: str, cache_dir: str) -> str: | ||
| """Downloads a file and makes a temporary file path for a copy if does not already | ||
| exist. Otherwise returns the path to the existing temp file.""" | ||
| check_public_url(url) | ||
|
|
||
| temp_dir = hash_url(url) | ||
| temp_dir = Path(cache_dir) / temp_dir | ||
| temp_dir.mkdir(exist_ok=True, parents=True) | ||
| name = client_utils.strip_invalid_filename_characters(Path(url).name) | ||
| full_temp_file_path = str(abspath(temp_dir / name)) | ||
|
|
||
| if not Path(full_temp_file_path).exists(): | ||
| with sync_client.stream("GET", url, follow_redirects=True) as r, open( | ||
| with sync_client.stream("GET", url, follow_redirects=True) as response, open( | ||
| full_temp_file_path, "wb" | ||
| ) as f: | ||
| for chunk in r.iter_raw(): | ||
| for redirect in response.history: | ||
| check_public_url(str(redirect.url)) | ||
|
|
||
| for chunk in response.iter_raw(): | ||
| f.write(chunk) | ||
|
|
||
| return full_temp_file_path | ||
|
|
@@ -282,6 +323,8 @@ def save_url_to_cache(url: str, cache_dir: str) -> str: | |
| async def async_save_url_to_cache(url: str, cache_dir: str) -> str: | ||
| """Downloads a file and makes a temporary file path for a copy if does not already | ||
| exist. Otherwise returns the path to the existing temp file. Uses async httpx.""" | ||
| check_public_url(url) | ||
|
|
||
| temp_dir = hash_url(url) | ||
| temp_dir = Path(cache_dir) / temp_dir | ||
| temp_dir.mkdir(exist_ok=True, parents=True) | ||
|
|
@@ -290,6 +333,9 @@ async def async_save_url_to_cache(url: str, cache_dir: str) -> str: | |
|
|
||
| if not Path(full_temp_file_path).exists(): | ||
| async with async_client.stream("GET", url, follow_redirects=True) as response: | ||
| for redirect in response.history: | ||
| check_public_url(str(redirect.url)) | ||
|
|
||
| async with aiofiles.open(full_temp_file_path, "wb") as f: | ||
| async for chunk in response.aiter_raw(): | ||
| await f.write(chunk) | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.