tiktok_scraper is a Rust library designed to search for TikTok videos using a multi-layered strategy. It prioritizes performance and reliability by utilizing a Redis cache, the official TikTok Research API, and a fallback Selenium WebDriver scraper.
- Hybrid Search Strategy: Automatically switches between data sources based on availability:
- Redis Cache: Returns previously fetched results to minimize latency.
- TikTok API: Uses the official Research API (if a token is provided).
- Scraper Fallback: Uses a headless browser to scrape results if the API is unavailable.
- Browser Pooling: Maintains a pool of WebDriver sessions to reduce initialization overhead.
- Stealth Scraping: Implements "eager" loading strategies and modifies browser arguments to mitigate bot detection.
Before using this library, ensure the following dependencies are running:
- Redis Server: Used for caching search results.
- Selenium WebDriver: A Chrome instance controlled via WebDriver. We recommend using Docker to run a compatible Selenium Standalone Chrome instance.
Run the following command to start the Selenium Grid with appropriate memory limits and session capabilities:
docker run -d \
-p 4444:4444 \
--shm-size="2g" \
-e SE_NODE_MAX_SESSIONS=NUM \
-e SE_NODE_OVERRIDE_MAX_SESSIONS=true \
--name tiktok-chrome \
selenium/standalone-chromeAdd the library to your Cargo.toml. If the library is local, use the path dependency:
[dependencies]
tiktok_hybrid = { path = "./tiktok_hybrid" }
tokio = { version = "1", features = ["full"] }Below is a basic example of how to initialize the client and perform a search.
use tiktok_hybrid::{TikTokClient, TikTokConfig};
use tokio::signal;
#[tokio::main]
async fn main() {
// Configure the client
let config = TikTokConfig {
redis_url: "redis://127.0.0.1/".to_string(),
api_token: "".to_string(), // Leave empty to force scraper usage
webdriver_url: "http://127.0.0.1:4444".to_string(),
browser_instances: 2, // Number of concurrent browsers
search_limit: 10, // Max videos to retrieve
cache_ttl_sec: 3600, // Cache duration in seconds
};
let client = TikTokClient::new(config)
.await
.expect("Failed to initialize TikTok Client");
println!("Client initialized. Starting search...");
// Search
match client.search("linux terminal tips").await {
Ok(videos) => {
println!("Found {} videos:", videos.len());
for video in videos {
println!("- Title: {}", video.title);
println!(" URL: {}", video.url);
println!(" Command: {}", video.download_cmd);
}
}
Err(e) => eprintln!("Search failed: {}", e),
}
// Closes browser sessions
client.shutdown().await;
}The TikTokConfig struct allows for the following customizations:
| Field | Type | Description |
|---|---|---|
redis_url |
String |
Connection string for the Redis server. |
api_token |
String |
Bearer token for the TikTok Research API. If empty, the API step is skipped. |
browser_instances |
usize |
Number of browser sessions to keep open in the pool. |
search_limit |
usize |
The maximum number of videos to fetch per query. |
cache_ttl_sec |
u64 |
Time-to-live for cached results in seconds. |
The project includes integration tests that scraper functionality.
To run the tests (ensure Docker containers are running):
# Run live scraping tests (requires internet connection and WebDriver)
cargo test -- --ignored --test-threads=1This project is licensed under the MIT License. See the LICENSE file for details.