Skip to content

A lightweight LLM-based tool for document analysis. Upload PDFs or text files and ask questions to get AI-powered insights, summaries, and key information. πŸš€ Features: βœ… Upload & analyze documents βœ… AI-powered Q&A & summarization βœ… Fast & user-friendly πŸ”§ Tech: Python, LangChain, OpenAI API, Streamlit πŸ’‘ Future: Multi-doc support, advanced NLP.

Notifications You must be signed in to change notification settings

Sayeem3051/document-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Document Analysis

A lightweight LLM-based tool for document analysis. Upload PDFs or text files and ask questions to get AI-powered insights, summaries, and key information.

πŸš€ Features

  • Upload & Analyze Documents: Easily upload PDF or text files for analysis.
  • AI-powered Q&A & Summarization: Get answers to your questions and summaries generated by advanced AI models.
  • Fast & User-friendly: Simple web interface for quick interactions.

πŸ”§ Tech Stack

  • Python
  • LangChain – for orchestrating document processing and LLM interactions.
  • mistral API – for enabling advanced natural language processing and understanding.
  • Streamlit – for building an interactive, web-based user interface.

🌐 Live Demo

Experience the application at ai-doc-reader.streamlit.app

πŸ› οΈ How It Works

  1. Document Upload:
    Users upload PDF or plain text documents through the Streamlit interface.

  2. Text Extraction:
    The backend parses and extracts text from the uploaded files.

  3. LLM Processing:
    Extracted content is passed through LangChain pipelines, which interact with the OpenAI API to:

    • Summarize content
    • Answer user questions
    • Extract key information
  4. User Interaction:
    Results (summaries, answers, highlights) are displayed immediately in the Streamlit app.

πŸ“ Architecture

  • Frontend:
    Built with Streamlit, providing a modern, reactive UI for uploads and queries.

  • Backend:
    Python-based, using LangChain to structure LLM tasks and OpenAI for processing. The backend handles:

    • File parsing and validation
    • Query routing to LLM
    • Result formatting
  • Extensibility:
    The project is modular, allowing easy integration of new LLM providers, custom analytics, or additional file formats.

πŸ’‘ Future Plans

  • Support for analyzing multiple documents simultaneously.
  • Enhanced NLP analytics (topic modeling, entity extraction).
  • Exportable analysis reports.

πŸ› οΈ Setup Instructions

  1. Clone the Repository

    git clone https://github.com/Sayeem3051/document-analysis.git
    cd document-analysis
  2. Install Dependencies

    • Requires Python 3.9+.
    • Install common dependencies (exact requirements file not detected, but typical packages are):
      pip install streamlit langchain openai
  3. Configure OpenAI API Key

    • Obtain your API key from mistral.ai.
    • Set it as an environment variable:
      export mistral_API_KEY='your-api-key'
  4. Run the Application

    streamlit run app.py

    (Replace app.py with the main script name if different.)

πŸ“„ Usage

  • Open the web interface.
  • Upload a document (PDF or text).
  • Ask questions or request a summary.
  • Instantly view insights and answers.

🀝 Contributing

Contributions are welcome! Please open issues or pull requests for new features, bug fixes, or improvements.

πŸ‘€ Author


This tool leverages cutting-edge LLMs and modern Python libraries to make document analysis easy and powerful for everyone!

About

A lightweight LLM-based tool for document analysis. Upload PDFs or text files and ask questions to get AI-powered insights, summaries, and key information. πŸš€ Features: βœ… Upload & analyze documents βœ… AI-powered Q&A & summarization βœ… Fast & user-friendly πŸ”§ Tech: Python, LangChain, OpenAI API, Streamlit πŸ’‘ Future: Multi-doc support, advanced NLP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages