A document redaction system that helps protect sensitive information in your documents using AI-powered redaction techniques.
The backend is built with Flask and consists of several key components:
-
app.py - Main Application Entry Point
- Handles all HTTP routes and API endpoints
- Manages file uploads and document processing
- Implements email notifications
- Key endpoints:
/email/send- Sends notification emails/document/add- Handles document uploads/documents- Lists all documents/document/hash/<hash>- Retrieves documents by hash/structured- Processes structured data/redact- Handles document redaction
-
redaction_service.py
- Core redaction logic implementation
- Handles PDF processing and text extraction
- Manages redaction patterns and rules
-
ocr_redaction.py
- OCR (Optical Character Recognition) implementation
- Processes scanned documents
- Extracts text from images within PDFs
-
ollamahandler.py
- Integration with Ollama AI model
- Handles AI-powered text analysis
- Manages model interactions and responses
-
preprocessor.py
- Document preprocessing utilities
- Text cleaning and normalization
- Format conversion helpers
-
auto_emailer.py
- Email notification system
- Template management
- Email sending utilities
-
config.py
- Configuration management
- Environment variables
- System settings
-
model.py
- Database models
- Data structures
- Schema definitions
- SQLite database implementation
- Document storage and retrieval
- Hash management for documents
temp_uploads/- Temporary file storagedocument_storage/- Permanent document storage
The frontend is built with React, TypeScript, and Vite, featuring a modern component-based architecture.
-
App.tsx - Root Component
- Application routing
- Theme management
- Global state setup
- Toast notifications
-
Components Directory (
components/)- Reusable UI components
- Theme toggle
- Form elements
- Layout components
-
Pages Directory (
pages/)- Route-based components
- Main application views
- Error pages
-
Hooks Directory (
hooks/)- Custom React hooks
- API integration hooks
- State management hooks
-
Types Directory (
types/)- TypeScript type definitions
- Interface declarations
- Type utilities
-
Lib Directory (
lib/)- Utility functions
- API clients
- Helper functions
- Docker
- Docker Compose
- Ollama
Ollama running with llama3.2:latest default model.
- Clone the repository:
git clone [your-repo-url]
cd Hackfest25-23- Start the application:
docker-compose up --buildThe application will be available at:
- Frontend: http://localhost:5173
- Backend API: http://localhost:5000
- Upload Document
POST /document/add
Content-Type: multipart/form-data
Body:
- file: PDF file
- email: [email protected]- List Documents
GET /documents?[email protected]- Get Document by Hash
GET /document/hash/<hash>- Process Document
POST /redact
Content-Type: multipart/form-data
Body:
- files: PDF file(s)
- method: redaction methodPOST /email/send
Content-Type: application/json
Body:
{
"email": "[email protected]",
"subject": "Document Verification",
"contents": "Please verify the redacted document"
}- File upload validation
- Secure file storage
- Hash-based document retrieval
- Rate limiting
- CORS protection
- Input sanitization
- Follow PEP 8 style guide
- Add docstrings to all functions
- Implement error handling
- Write unit tests for new features
- Use TypeScript for type safety
- Follow React best practices
- Implement responsive design
- Maintain component reusability
This project is licensed under the MIT License.