Replies: 2 comments 1 reply
-
|
Sorry, seeing this for the first time. I think your DNS technique may already work for our crawlers? |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Would it be possible to get this documented somewhere? I'd be afraid to start using this technique without some level of commitment from IA, for fear that it'd break your crawler. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I could be mistaken, but I haven't been able to find the documentation on how to verify the IA crawler. The closest thing I've found is that you can check the User-Agent string of the crawler, but that's easily faked. My issue is that I want to invite the IA crawler to crawl my content, but I want to detect things like spammers and block them.
Google and Bing both handle this by using a reverse DNS request of the IP address of the crawler, followed by a regular DNS request checking the host returned by the reverse DNS.
Put another way:
So, since the second host command returns the same IP that we started with, and since the domain ends with googlebot.com, we're in business.
Here's google's docs: https://support.google.com/webmasters/answer/80553?hl=en
And Bings: https://www.bing.com/webmaster/help/how-to-verify-bingbot-3905dc26
Could IA add this feature too? I think it would only require that you do some work with your DNS whenever you have a new IP address.
Beta Was this translation helpful? Give feedback.
All reactions