GitHub

Sightsense

Technical Description

Sightsense is a real-time computer vision tool designed to assist visually impaired users by recognizing objects, reading text, and delivering spoken feedback. It combines optical character recognition (OCR), an object detection model, and a text-to-speech engine to transform live camera input into meaningful audio cues.

Core Components

Camera Input
Uses a webcam or device camera to capture video frames. The default implementation leverages OpenCV to access and read frames from the camera feed.
Object Detection
A pretrained model, configured within the repository, processes each captured frame and draws bounding boxes or labels around recognized items (for example, YOLO or TensorFlow-based models).
Text Recognition
Integrates an OCR library (such as Tesseract) to parse text within the frame and return recognized strings.
Text-to-Speech Engine
Converts the recognized labels and text into audio output. Depending on the user’s environment, the code might leverage libraries like pyttsx3, gTTS, or a cloud TTS API.

Architecture

Frame Acquisition
- Continuously reads camera frames.
- Passes raw images to the processing pipeline.
Processing Pipeline
- Object Detection: Processes each frame, identifies objects, and returns detection results (class labels, confidence scores, bounding box coordinates).
- OCR Module (Conditional): If enabled, crops relevant regions containing text and applies OCR to extract text data.
Output Generation
- Merges object detection results (and OCR findings if available) into a textual description.
- Uses a text-to-speech engine to generate human-audible feedback.
Real-Time Feedback Loop
- Plays the generated audio description.
- Continues acquiring subsequent frames.

Repository Structure

Folder/File	Description
src/	Contains the primary vision processing scripts and object detection logic.
src/ocr/	Houses OCR-related utility functions or Tesseract wrapper scripts.
src/audio/	Implements the text-to-speech functionality, including engine setup.
requirements.txt	Specifies all Python dependencies needed for object detection, OCR, and audio.
main.py	Entry point to launch camera acquisition, run inferences, and output audio.

Installation and Setup

Clone the repository.
Install required Python packages:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Sightsense		Sightsense
__pycache__		__pycache__
.gitignore		.gitignore
README copy.md		README copy.md
README.md		README.md
backendcodeforobjectgrabber.py		backendcodeforobjectgrabber.py
image-description.py		image-description.py
image-to-tts.py		image-to-tts.py
main.py		main.py
yolov8l.pt		yolov8l.pt
yolov8m.pt		yolov8m.pt
yolov8n.pt		yolov8n.pt
yolov8s.pt		yolov8s.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sightsense

Technical Description

Architecture

Repository Structure

Installation and Setup

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SammyTourani/SightSense

Folders and files

Latest commit

History

Repository files navigation

Sightsense

Technical Description

Architecture

Repository Structure

Installation and Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages