docs

mdrxy · mdrxy · commit a820081c15be · 2025-07-14T12:37:44.000-04:00
diff --git a/docs/docs/integrations/tools/scrapegraph.ipynb b/docs/docs/integrations/tools/scrapegraph.ipynb
@@ -10,36 +10,6 @@
     "---"
    ]
   },
-  {
-   "cell_type": "raw",
-   "id": "f725a8a2",
-   "metadata": {
-    "vscode": {
-     "languageId": "raw"
-    }
-   },
-   "source": [
-    "**Note**: This notebook has been updated to include `SmartCrawlerTool` and remove `LocalScraperTool`. The SmartCrawlerTool provides advanced crawling capabilities for multi-page data extraction.\n",
-    "\n",
-    "### Updated Integration Details\n",
-    "\n",
-    "| Class | Package | Serializable | JS support | Package latest |\n",
-    "| :--- | :--- | :---: | :---: | :---: |\n",
-    "| [SmartScraperTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |\n",
-    "| [SmartCrawlerTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |\n",
-    "| [MarkdownifyTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |\n",
-    "| [GetCreditsTool](https://python.langchain.com/docs/integrations/tools/scrapegraph) | langchain-scrapegraph | ✅ | ❌ | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapegraph?style=flat-square&label=%20) |\n",
-    "\n",
-    "### Updated Tool Features\n",
-    "\n",
-    "| Tool | Purpose | Input | Output |\n",
-    "| :--- | :--- | :--- | :--- |\n",
-    "| SmartScraperTool | Extract structured data from websites | URL + prompt | JSON |\n",
-    "| SmartCrawlerTool | Extract data from multiple pages with crawling | URL + prompt + crawl options | JSON |\n",
-    "| MarkdownifyTool | Convert webpages to markdown | URL | Markdown text |\n",
-    "| GetCreditsTool | Check API credits | None | Credit info |\n"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "a6f91f20",
@@ -188,7 +158,8 @@
    ]
   },
   {
-   "cell_type": "raw",
+   "cell_type": "markdown",
+   "id": "d5a88cf2",
    "metadata": {
     "vscode": {
      "languageId": "raw"
@@ -239,17 +210,21 @@
     "\n",
     "# SmartCrawler\n",
     "url = \"https://scrapegraphai.com/\"\n",
-    "prompt = \"What does the company do? and I need text content from their privacy and terms\"\n",
+    "prompt = (\n",
+    "    \"What does the company do? and I need text content from their privacy and terms\"\n",
+    ")\n",
     "\n",
     "# Use the tool with crawling parameters\n",
-    "result_crawler = smartcrawler.invoke({\n",
-    "    \"url\": url,\n",
-    "    \"prompt\": prompt,\n",
-    "    \"cache_website\": True,\n",
-    "    \"depth\": 2,\n",
-    "    \"max_pages\": 2,\n",
-    "    \"same_domain_only\": True\n",
-    "})\n",
+    "result_crawler = smartcrawler.invoke(\n",
+    "    {\n",
+    "        \"url\": url,\n",
+    "        \"prompt\": prompt,\n",
+    "        \"cache_website\": True,\n",
+    "        \"depth\": 2,\n",
+    "        \"max_pages\": 2,\n",
+    "        \"same_domain_only\": True,\n",
+    "    }\n",
+    ")\n",
     "\n",
     "print(\"\\nSmartCrawler Result:\")\n",
     "print(json.dumps(result_crawler, indent=2))\n",
@@ -279,19 +254,23 @@
     "\n",
     "# Example based on the provided code snippet\n",
     "url = \"https://scrapegraphai.com/\"\n",
-    "prompt = \"What does the company do? and I need text content from their privacy and terms\"\n",
+    "prompt = (\n",
+    "    \"What does the company do? and I need text content from their privacy and terms\"\n",
+    ")\n",
     "\n",
     "# Use the tool with crawling parameters\n",
-    "result = tool.invoke({\n",
-    "    \"url\": url,\n",
-    "    \"prompt\": prompt,\n",
-    "    \"cache_website\": True,\n",
-    "    \"depth\": 2,\n",
-    "    \"max_pages\": 2,\n",
-    "    \"same_domain_only\": True\n",
-    "})\n",
-    "\n",
-    "print(json.dumps(result, indent=2))\n"
+    "result = tool.invoke(\n",
+    "    {\n",
+    "        \"url\": url,\n",
+    "        \"prompt\": prompt,\n",
+    "        \"cache_website\": True,\n",
+    "        \"depth\": 2,\n",
+    "        \"max_pages\": 2,\n",
+    "        \"same_domain_only\": True,\n",
+    "    }\n",
+    ")\n",
+    "\n",
+    "print(json.dumps(result, indent=2))"
    ]
   },
   {
@@ -428,15 +407,21 @@
    "source": [
     "## API reference\n",
     "\n",
-    "For detailed documentation of all ScrapeGraph features and configurations head to the Langchain API reference: https://python.langchain.com/docs/integrations/tools/scrapegraph\n",
+    "For detailed documentation of all ScrapeGraph features and configurations head to [the Langchain API reference](https://python.langchain.com/docs/integrations/tools/scrapegraph).\n",
     "\n",
-    "Or to the official SDK repo: https://github.com/ScrapeGraphAI/langchain-scrapegraph"
+    "Or to [the official SDK repo](https://github.com/ScrapeGraphAI/langchain-scrapegraph)."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d710dad8",
+   "metadata": {},
+   "source": []
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "langchain",
    "language": "python",
    "name": "python3"
   },
@@ -450,7 +435,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.9"
+   "version": "3.10.16"
   }
  },
  "nbformat": 4,