GitHub - kristiandiana/JustTheInstructions: Chrome extension with a custom TensorFlow model exported to ONNX for in-browser inference — extracts clean, step-by-step instructions from recipes & DIY pages.

💡 Project Overview

Scrolling past paragraphs of filler text just to find recipe steps or DIY instructions? JustTheInstructions is a Google Chrome Extension which solves this issue by detecting instructional content in real time as soon as a page loads — delivering clean, formatted instructions in just one click.

What makes it different?

Automatic Detection: Pages are analyzed locally in under 500ms using a custom ONNX model.
Smart Notifications: A Chrome notification instantly appears when instructional content is detected — no need to open the extension.
One-Click Extraction: Click once, and the instructions are automatically extracted, cleaned, and formatted for you — ready in seconds instead of endless scrolling.
Optional Manual Mode: Notifications can be toggled off — you can still run the extension manually in just 2 clicks (open extension → click “Extract Instructions”).

📈 Performance & Impact

Real-Time Detection: Page content scored for instructional relevance in 300–500ms after load
One-Click Flow: Notification → click → clean instructions
Model Accuracy: 99.6% on a 100k-sentence test set (80/10/10 split)
Inference Speed: ~11ms/sentence (~80+ sentences/sec)
Extraction Speed: Formatted instructions returned in ~2–10 seconds
Model Size: 2.5MB ONNX — small enough for fast extension loading & smooth performance on low-end laptops
Data Scale: ~1M labeled sentences scraped from 5,000 instructional websites
Privacy: Detection is 100% client-side; formatting only runs when you explicitly request it

🧭 Objectives

Score instructional relevance with >99% accuracy while running entirely client-side.
Deliver clean, formatted instructions in under 10 seconds end-to-end.
Eliminate manual interaction — automated detection + one-click flow, with manual fallback.
Maintain privacy-first design by limiting server calls to formatting only.

🧠 How It Works

Real-Time Detection (Client-Side)
- As soon as a page loads, text is segmented into sentences.
- A TensorFlow-trained binary classifier (exported to ONNX) assigns each page a confidence score (low/medium/high instructional content).
- Detection runs fully in-browser via ONNX Runtime (WebAssembly).
Why ONNX Runtime?
- 2.5MB model size — ~40% smaller than a comparable TensorFlow.js build, reducing download & memory usage
- Faster inference due to WebAssembly optimizations
- Better for weak devices: Lightweight enough for low-end laptops and Chromebooks
Notification-Driven UX
- If the score passes a threshold, you receive a Chrome notification — no need to open the extension.
- Clicking the notification triggers automatic extraction.
- Notifications Off? You can still run the model manually in 2 clicks (open → extract).
Backend-Powered Formatting
- Candidate sentences are sent to a lightweight Flask API, which:
  - Cleans & structures the text into Markdown-formatted steps
  - Adds prerequisites (ingredients, tools, etc.) when relevant
- The API is Dockerized and deployed on GCP Cloud Run for scalable, low-latency performance.
Privacy & Safety
- Detection always runs locally — only when you explicitly click “Extract Instructions” is page data sent for formatting.
- Sensitive domains (e.g., Gmail, banking) are automatically skipped.

📊 Data Collection & Training

Scraped 5,000 instructional websites across five categories:
🍲 Recipes • 🎨 Crafts • 🔌 Circuits • 🧪 DIY Science • 🛠️ General Tutorials
Collected ~1M labeled sentences via a custom BeautifulSoup + Pandas pipeline.
Dataset split into 80% training / 10% test / 10% validation.
Achieved 99.6% test accuracy (100k sentences).
Model exported to ONNX Runtime for optimized frontend inference.

Model Architecture:
Binary text classifier — Embedding → GlobalAveragePooling → Dense(16, ReLU) → Dropout(0.4) → Dense(16, ReLU) → Dropout(0.4) → Sigmoid (~64K parameters, Adam optimizer)

📄 Colab Notebooks (for reproducibility):

🛠️ Tech Stack

Layer	Tech
Model	ONNX Runtime (browser), TensorFlow (training)
Reasoning	ONNX chosen for smaller bundle size & faster inference
Data Processing	Pandas, NumPy, BeautifulSoup
Frontend	JavaScript (Chrome Extension with DOM injection)
Backend API	Flask, Docker, Google Cloud Run
Formatting	GPT-4.1-nano (Markdown restructuring & cleaning)

⭐️ Found this useful? — Star this repo
📝 Help others discover it — Leave a review on the Chrome Web Store

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
api		api
extension		extension
.gitignore		.gitignore
PRIVACY.md		PRIVACY.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

💡 Project Overview

What makes it different?

📈 Performance & Impact

🧭 Objectives

🧠 How It Works

📊 Data Collection & Training

🛠️ Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

kristiandiana/JustTheInstructions

Folders and files

Latest commit

History

Repository files navigation

💡 Project Overview

What makes it different?

📈 Performance & Impact

🧭 Objectives

🧠 How It Works

📊 Data Collection & Training

🛠️ Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages