You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: table-of-bot-metrics.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@
9
9
| CCBot |[Common Crawl Foundation](https://commoncrawl.org)|[Yes](https://commoncrawl.org/ccbot)| Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). |
10
10
| ChatGPT-User |[OpenAI](https://openai.com)| Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. |
11
11
| Claude-Web |[Anthropic](https://www.anthropic.com)| Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. |
12
-
| ClaudeBot |[Anthropic](https://www.anthropic.com)|Unclear at this time.| Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. |
12
+
| ClaudeBot |[Anthropic](https://www.anthropic.com)|[Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler)| Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. |
13
13
| Diffbot |[Diffbot](https://www.diffbot.com/)| At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. |
14
14
| FacebookBot | Meta/Facebook |[Yes](https://developers.facebook.com/docs/sharing/bot/)| Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. |
15
15
| FriendlyCrawler | Unknown |[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)| We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. |
0 commit comments