Skip to content

Blocking Bots on Your AVideo Platform

Daniel Neto edited this page Apr 18, 2025 · 3 revisions

AVideo – Bot Blocking Configuration

AVideo includes a native mechanism to identify and block unwanted automated access based on the User-Agent header sent by HTTP clients. This feature helps mitigate abusive crawling, bandwidth waste, content scraping, and bot-based attacks, while allowing reputable crawlers (e.g., Googlebot) to function normally.


🔐 Purpose

  • Protect your server resources from abusive or unknown bots.
  • Reduce unnecessary traffic and CPU usage.
  • Maintain compatibility with trusted bots like search engine crawlers.

⚙️ How It Works

  1. AVideo inspects every incoming HTTP request’s User-Agent header during early initialization (in include_config.php).
  2. If the header matches any pattern listed in $global['stopBotsList'], the request is blocked immediately.
  3. If the header also matches a value in $global['stopBotsWhiteList'], the request is allowed.
  4. If the User-Agent is empty or unrecognized, the system ignores bot checks.

The check is enforced before sessions are started or the database is connected, ensuring low resource impact.


📂 Configuration

Edit the file:

/var/www/html/AVideo/videos/configuration.php

1. Define Bot Block List

Add the following array to define patterns of unwanted bots:

$global['stopBotsList'] = array(
    'headless', 'bot', 'spider', 'rouwler', 'Nuclei', 'MegaIndex',
    'NetSystemsResearch', 'CensysInspect', 'slurp', 'crawler',
    'curl', 'fetch', 'loader'
);

These entries will be matched case-insensitively against the incoming User-Agent string.


2. Define Bot Whitelist (Optional)

To allow specific bots, even if they match the blocked terms, define:

$global['stopBotsWhiteList'] = array(
    'google', 'facebook', 'bing', 'yahoo', 'yandex', 'twitter'
);

These bots are allowed regardless of partial matches in the block list.


🧾 Example Configuration Snippet

Here’s how your configuration might look with both lists added:

<?php
$global['webSiteRootURL'] = 'https://example.com/';
$global['systemRootPath'] = '/var/www/html/AVideo/';

$global['stopBotsList'] = array('headless', 'bot', 'spider', 'rouwler', 'Nuclei', 'MegaIndex', 'NetSystemsResearch', 'CensysInspect', 'slurp', 'crawler', 'curl', 'fetch', 'loader');
$global['stopBotsWhiteList'] = array('google', 'facebook', 'bing', 'yahoo', 'yandex', 'twitter');

require_once $global['systemRootPath'].'objects/include_config.php';

🛑 Do not modify other global variables like $global['salt'], database credentials, or system paths unless instructed.


🔍 Behavior Examples

Example 1 – Unwanted Bot Blocked

  • User-Agent: Mozilla/5.0 (compatible; MegaIndexBot/1.0)
  • Matches: MegaIndex (in stopBotsList) ✅
  • Not in: stopBotsWhiteList
    ➡️ Blocked with response:
Bot Found [MegaIndex] Mozilla/5.0 (compatible; MegaIndexBot/1.0)

Example 2 – Allowed Search Bot

  • User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Matches: bot (in stopBotsList) ✅
  • Also matches: google (in stopBotsWhiteList) ✅
    ➡️ Allowed. Request continues as normal.

📌 Important Notes

  • Matching is based on stripos() (case-insensitive substring match).
  • If the request method is HEAD and $global['stopHeadRequests'] is set, the request will be blocked regardless of bot rules.
  • You may log rejected bot attempts by customizing the error_log() line inside include_config.php.
Clone this wiki locally