Skip to content

Commit 88f1980

Browse files
Add changelog, add e2e tests, update tools, remove WebUnblocker (#16)
* Add changelog, add e2e tests, update tools, remove WebUnblocker * Add Geolocation and User Agent type params to universal scraper, remove parse parameter for universal scraper, update tests
1 parent 1489685 commit 88f1980

23 files changed

+1488
-835
lines changed

.github/workflows/lint_and_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,4 @@ jobs:
3737
3838
- name: Run tests
3939
run: |
40-
uv run pytest --cov=src --cov-report xml --cov-report term --cov-fail-under=90 ./tests
40+
uv run pytest --cov=src --cov-report xml --cov-report term --cov-fail-under=90 tests/unit tests/integration

.github/workflows/publish_to_pypi.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ name: Publish Python 🐍 distributions 📦 to PyPI
22

33
on:
44
push:
5-
branches: [ "main" ]
6-
5+
tags:
6+
- 'v[0-9]+.[0-9]+.[0-9]+'
77
jobs:
88
build-n-publish:
99
name: Build and publish Python distribution to PyPI

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ ipython_config.py
7070
__pypackages__/
7171

7272
# Environments
73-
.env
73+
*/.env
7474
.venv
7575
env/
7676
venv/

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Changelog
2+
3+
## [0.2.0] - 2025-05-13
4+
5+
### Added
6+
7+
- Changelog
8+
- E2E tests
9+
- Geolocation and User Agent type parameters to universal scraper
10+
11+
### Changed
12+
13+
- Descriptions for tools
14+
- Descriptions for tool parameters
15+
- Default values for tool parameters
16+
17+
### Removed
18+
19+
- WebUnblocker tool
20+
- Parse parameter for universal scraper

Makefile

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ virtualenv_dir ?= .venv
77

88
.PHONY: install_deps
99
install_deps: $(virtualenv_dir)
10-
uv sync
10+
uv sync --group dev
1111

1212
.PHONY: lint
1313
lint: install_deps
@@ -22,11 +22,16 @@ format: $(virtualenv_dir)
2222

2323
.PHONY: test
2424
test: install_deps
25-
uv run pytest --cov=src --cov-report xml --cov-report term --cov-fail-under=90 ./tests
25+
uv run pytest --cov=src --cov-report xml --cov-report term --cov-fail-under=90 tests/unit tests/integration
26+
27+
.PHONY: test-e2e
28+
test-e2e:
29+
uv sync --group dev --group e2e-tests
30+
uv run pytest --cov=src --cov-report xml --cov-report term tests/e2e
2631

2732
.PHONY: run
2833
run: install_deps
29-
npx @modelcontextprotocol/inspector@0.3.0 \
34+
npx @modelcontextprotocol/inspector \
3035
uv \
3136
--directory $(current_dir) \
3237
run \

README.md

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -239,16 +239,6 @@ make run
239239
```
240240
Then access MCP Inspector at `http://localhost:5173`. You may need to add your username and password as environment variables in the inspector under `OXYLABS_USERNAME` and `OXYLABS_PASSWORD`.
241241

242-
243-
## 🛠️ Technical Details
244-
245-
This server provides two main tools:
246-
247-
1. **oxylabs_scraper**: Uses Oxylabs Web Scraper API for general website scraping
248-
2. **oxylabs_web_unblocker**: Uses Oxylabs Web Unblocker for hard-to-access websites
249-
250-
[Web Scraper API](https://oxylabs.io/products/scraper-api/web) supports JavaScript rendering, parsed structured data, and cleaned HTML in Markdown format. [Web Unblocker](https://oxylabs.io/products/web-unblocker) offers JavaScript rendering and cleaned HTML, but doesn’t return parsed data.
251-
252242
---
253243

254244
## License

pyproject.toml

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "oxylabs-mcp"
3-
version = "0.1.7"
3+
version = "0.2.0"
44
description = "Oxylabs MCP server"
55
authors = [
66
{name="Augis Braziunas", email="[email protected]"},
@@ -24,7 +24,7 @@ dependencies = [
2424
"lxml>=5.3.0",
2525
"lxml-html-clean>=0.4.1",
2626
"markdownify>=0.14.1",
27-
"mcp[cli]>=1.6.0",
27+
"mcp[cli]>=1.8.0",
2828
"pydantic>=2.10.5",
2929
"pydantic-settings>=2.8.1",
3030
]
@@ -40,6 +40,12 @@ dev = [
4040
"pytest-mock>=3.14.0",
4141
"ruff>=0.9.1",
4242
]
43+
e2e-tests = [
44+
"agno>=1.4.5",
45+
"anthropic>=0.50.0",
46+
"google-genai>=1.13.0",
47+
"openai>=1.77.0",
48+
]
4349

4450
[build-system]
4551
requires = ["hatchling"]
@@ -89,7 +95,8 @@ lint.ignore = [
8995
]
9096

9197
[tool.ruff.lint.per-file-ignores]
92-
"tests/*" = ["D", "S101", "ARG001", "ANN", "PT011", "FBT"]
98+
"tests/*" = ["D", "S101", "ARG001", "ANN", "PT011", "FBT", "PLR2004"]
99+
"src/oxylabs_mcp/url_params.py" = ["E501"]
93100

94101
[tool.ruff.lint.pycodestyle]
95102
max-line-length = 100
@@ -100,6 +107,7 @@ lines-after-imports = 2
100107

101108
[tool.pytest.ini_options]
102109
asyncio_default_fixture_loop_scope = "session"
110+
asyncio_mode = "auto"
103111

104112
[tool.black]
105113
line-length = 100

src/oxylabs_mcp/server.py

Lines changed: 53 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,53 @@
11
from typing import Any
22

33
from mcp.server.fastmcp import Context, FastMCP
4+
from mcp.types import ToolAnnotations
45

56
from oxylabs_mcp import url_params
67
from oxylabs_mcp.config import settings
78
from oxylabs_mcp.exceptions import MCPServerError
8-
from oxylabs_mcp.utils import (
9-
convert_html_to_md,
10-
get_content,
11-
oxylabs_client,
12-
strip_html,
13-
)
9+
from oxylabs_mcp.utils import get_content, oxylabs_client
1410

1511

16-
mcp = FastMCP("oxylabs_mcp", dependencies=["mcp", "httpx"])
12+
mcp = FastMCP("oxylabs_mcp")
1713

1814

19-
@mcp.tool(
20-
name="oxylabs_universal_scraper",
21-
description="Scrape url using Oxylabs Web API with universal scraper",
22-
)
23-
async def scrape_universal_url(
15+
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
16+
async def universal_scraper(
2417
ctx: Context, # type: ignore[type-arg]
2518
url: url_params.URL_PARAM,
26-
parse: url_params.PARSE_PARAM = False, # noqa: FBT002
2719
render: url_params.RENDER_PARAM = "",
20+
user_agent_type: url_params.USER_AGENT_TYPE_PARAM = "",
21+
geo_location: url_params.GEO_LOCATION_PARAM = "",
22+
output_format: url_params.OUTPUT_FORMAT_PARAM = "",
2823
) -> str:
29-
"""Scrape url using Oxylabs Web API with universal scraper."""
24+
"""Get a content of any webpage.
25+
26+
Supports browser rendering, parsing of certain webpages
27+
and different output formats.
28+
"""
3029
try:
31-
async with oxylabs_client(ctx, with_auth=True) as client:
30+
async with oxylabs_client(ctx) as client:
3231
payload: dict[str, Any] = {"url": url}
33-
if parse:
34-
payload["parse"] = parse
32+
3533
if render:
3634
payload["render"] = render
35+
if user_agent_type:
36+
payload["user_agent_type"] = user_agent_type
37+
if geo_location:
38+
payload["geo_location"] = geo_location
3739

3840
response = await client.post(settings.OXYLABS_SCRAPER_URL, json=payload)
3941

4042
response.raise_for_status()
4143

42-
return get_content(response, parse)
43-
except MCPServerError as e:
44-
return e.stringify()
45-
46-
47-
@mcp.tool(
48-
name="oxylabs_web_unblocker",
49-
description="Scrape url using Oxylabs Web Unblocker",
50-
)
51-
async def scrape_with_web_unblocker(
52-
ctx: Context, # type: ignore[type-arg]
53-
url: url_params.URL_PARAM,
54-
render: url_params.RENDER_PARAM = "",
55-
) -> str:
56-
"""Scrape url using Oxylabs Web Unblocker.
57-
58-
This tool manages the unblocking process to extract public data
59-
even from the most difficult websites.
60-
"""
61-
headers: dict[str, Any] = {}
62-
if render:
63-
headers["X-Oxylabs-Render"] = render
64-
65-
try:
66-
async with oxylabs_client(ctx, with_proxy=True, verify=False, headers=headers) as client:
67-
response = await client.get(url)
68-
69-
response.raise_for_status()
70-
71-
return convert_html_to_md(strip_html(response.text))
44+
return get_content(response, output_format=output_format)
7245
except MCPServerError as e:
7346
return e.stringify()
7447

7548

76-
@mcp.tool(
77-
name="oxylabs_google_search_scraper",
78-
description="Scrape Google Search results using Oxylabs Web API",
79-
)
80-
async def scrape_google_search(
49+
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
50+
async def google_search_scraper(
8151
ctx: Context, # type: ignore[type-arg]
8252
query: url_params.GOOGLE_QUERY_PARAM,
8353
parse: url_params.PARSE_PARAM = True, # noqa: FBT002
@@ -90,10 +60,15 @@ async def scrape_google_search(
9060
geo_location: url_params.GEO_LOCATION_PARAM = "",
9161
locale: url_params.LOCALE_PARAM = "",
9262
ad_mode: url_params.AD_MODE_PARAM = False, # noqa: FBT002
63+
output_format: url_params.OUTPUT_FORMAT_PARAM = "",
9364
) -> str:
94-
"""Scrape Google Search results using Oxylabs Web API."""
65+
"""Scrape Google Search results.
66+
67+
Supports content parsing, different user agent types, pagination,
68+
domain, geolocation, locale parameters and different output formats.
69+
"""
9570
try:
96-
async with oxylabs_client(ctx, with_auth=True) as client:
71+
async with oxylabs_client(ctx) as client:
9772
payload: dict[str, Any] = {"query": query}
9873

9974
if ad_mode:
@@ -124,16 +99,13 @@ async def scrape_google_search(
12499

125100
response.raise_for_status()
126101

127-
return get_content(response, parse)
102+
return get_content(response, parse=parse, output_format=output_format)
128103
except MCPServerError as e:
129104
return e.stringify()
130105

131106

132-
@mcp.tool(
133-
name="oxylabs_amazon_search_scraper",
134-
description="Scrape Amazon Search results using Oxylabs Web API",
135-
)
136-
async def scrape_amazon_search(
107+
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
108+
async def amazon_search_scraper(
137109
ctx: Context, # type: ignore[type-arg]
138110
query: url_params.AMAZON_SEARCH_QUERY_PARAM,
139111
category_id: url_params.CATEGORY_ID_CONTEXT_PARAM = "",
@@ -147,10 +119,16 @@ async def scrape_amazon_search(
147119
domain: url_params.DOMAIN_PARAM = "",
148120
geo_location: url_params.GEO_LOCATION_PARAM = "",
149121
locale: url_params.LOCALE_PARAM = "",
122+
output_format: url_params.OUTPUT_FORMAT_PARAM = "",
150123
) -> str:
151-
"""Scrape Amazon Search results using Oxylabs Web API."""
124+
"""Scrape Amazon search results.
125+
126+
Supports content parsing, different user agent types, pagination,
127+
domain, geolocation, locale parameters and different output formats.
128+
Supports Amazon specific parameters such as category id, merchant id, currency.
129+
"""
152130
try:
153-
async with oxylabs_client(ctx, with_auth=True) as client:
131+
async with oxylabs_client(ctx) as client:
154132
payload: dict[str, Any] = {"source": "amazon_search", "query": query}
155133

156134
context = []
@@ -184,16 +162,13 @@ async def scrape_amazon_search(
184162

185163
response.raise_for_status()
186164

187-
return get_content(response, parse)
165+
return get_content(response, parse=parse, output_format=output_format)
188166
except MCPServerError as e:
189167
return e.stringify()
190168

191169

192-
@mcp.tool(
193-
name="oxylabs_amazon_product_scraper",
194-
description="Scrape Amazon Products using Oxylabs Web API",
195-
)
196-
async def scrape_amazon_products(
170+
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
171+
async def amazon_product_scraper(
197172
ctx: Context, # type: ignore[type-arg]
198173
query: url_params.AMAZON_SEARCH_QUERY_PARAM,
199174
autoselect_variant: url_params.AUTOSELECT_VARIANT_CONTEXT_PARAM = False, # noqa: FBT002
@@ -204,10 +179,17 @@ async def scrape_amazon_products(
204179
domain: url_params.DOMAIN_PARAM = "",
205180
geo_location: url_params.GEO_LOCATION_PARAM = "",
206181
locale: url_params.LOCALE_PARAM = "",
182+
output_format: url_params.OUTPUT_FORMAT_PARAM = "",
207183
) -> str:
208-
"""Scrape Amazon Products using Oxylabs Web API."""
184+
"""Scrape Amazon products.
185+
186+
Supports content parsing, different user agent types, domain,
187+
geolocation, locale parameters and different output formats.
188+
Supports Amazon specific parameters such as currency and getting
189+
more accurate pricing data with auto select variant.
190+
"""
209191
try:
210-
async with oxylabs_client(ctx, with_auth=True) as client:
192+
async with oxylabs_client(ctx) as client:
211193
payload: dict[str, Any] = {"source": "amazon_product", "query": query}
212194

213195
context = []
@@ -235,7 +217,7 @@ async def scrape_amazon_products(
235217

236218
response.raise_for_status()
237219

238-
return get_content(response, parse)
220+
return get_content(response, parse=parse, output_format=output_format)
239221
except MCPServerError as e:
240222
return e.stringify()
241223

0 commit comments

Comments
 (0)