Using LangChain, Jupyter, Django, Bright Data, and other tools, let's build an app that can find near real-time information on keywords and topics you care about all trending on Reddit and various sub-Reddits.
🎥 Watch Full Video: Building an AI-Powered Reddit Scraping System
- 00:00:00 Welcome
- 00:03:46 Demo (final code)
- 00:12:03 Using Search Engine Results
- 00:14:16 Setup your Python Project (section code)
- 00:20:36 Load API Keys with Dotenv Files (section code)
- 00:24:26 Intro to LangChain
- 00:26:19 Bright Data Serp API with Python & LangChain (section code)
- 00:38:01 Strip Notebook Outputs for Security with pre-commit (section code)
- 00:42:56 Setup Google Gemini Models with LangChain (section code)
- 00:52:43 LLM with Structured Output (section code)
- 00:59:58 LLM Tool Calling The Hard Way (section code)
- 01:08:19 Tool Calling with LangGraph (section code)
- 01:23:41 Search & Format Reddit Communities via LLM and Bright Data (section code)
- 01:29:38 Scrape Reddit with the Bright Data Crawl API (section code)
- 01:41:58 Get Crawl API Snapshot Progress (section code)
- 01:47:00 Download Data from the Crawl API (section code)
- 01:54:53 Automating Data Pulls for Users
- 01:58:39 Install & Start the Django Project (section code)
- 02:02:31 Combine Django with Jupyter (section code)
- 02:05:23 Implement Postgres Database with Django (section code)
- 02:15:19 Setup Redis for Django & Caching (section code)
- 02:22:36 Getting Started with Celery & Django (section code)
- 02:33:51 Webhooks & Cloudflare Tunnels
- 02:36:47 Setup Cloudflare Tunnel with a Custom Domain (section code)
- 02:45:24 Django Qstash for Webhook-based Background Tasks (section code)
- 02:52:55 Bright Data to Django Model Part 1 (section code)
- 03:02:16 Bright Data to Django Model Part 2 (section code)
- 03:09:38 Store Bright Data Snapshots (section code)
- 03:17:38 Helper Functions for Scraping Events Part 1 (section code)
- 03:24:56 Helper Functions for Scraping Events Part 2 (section code)
- 03:32:52 Saving Snapshot Scraping Results (section code)
- 03:38:29 Configure Scraping as Background Tasks (section code)
- 03:49:48 Run Background Scraping Tasks (section code)
- 03:53:48 Poll Scrape Status as Background Task (section code)
- 04:02:04 Tracking Scrape Event Finished At Time (section code)
- 04:08:36 A Webhook Handler View in Django (section code)
- 04:16:11 Tracking Scraping Snapshots through Webhooks with Django (section code)
- 04:25:48 Improved Auth Key for Webhooks (section code)
- 04:30:42 Webhook Handler for Reddit Posts (section code)
- 04:38:32 Adjust Data to Scrape (section code)
- 04:50:47 Background Sync Snapshot Reddit Results (section code)
- 05:04:39 Storing Reddit Communities in Django (section code)
- 05:16:53 Reddit AI Agent into Django Project (section code)
- 05:26:04 Topic Extraction Agent (section code)
- 05:32:50 Fuzzy Query to Scraping (section code)
- 05:40:45 Auto Scrape Reddit Communities on Save (section code)
- 05:52:37 Scraping Workflow as a Service Function (section code)
- 06:00:17 Store Queries & Topics (section code)
- 06:09:24 Topics to Reddit Communities (section code)
- 06:16:41 Full Query Automation (section code)
- 06:23:09 Reddit Community Trackablity (section code)
- 06:28:19 Scheduled Background Task to Trigger Reddit Scraping (section code)
- 06:33:27 Django Management Command to Trigger Scraping (section code)
- 06:36:17 Final Query Commands (section code)
- 06:38:22 Thank you and next steps