Skip to content

This project is an AI-powered clinical decision support system that assists healthcare professionals by analyzing patient reports and providing preliminary diagnostic insights. It leverages a Retrieval-Augmented Generation (RAG) architecture to deliver comprehensive and context-aware analysis.

Notifications You must be signed in to change notification settings

Sreejit-Sengupto/BioInsight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered Clinical Decision Support System

This project is an AI-powered clinical decision support system that assists healthcare professionals by analyzing patient reports and providing preliminary diagnostic insights. It leverages a Retrieval-Augmented Generation (RAG) architecture to deliver comprehensive and context-aware analysis.

About The Project

The system takes a patient's medical report (in PDF format) as input, extracts relevant clinical data, and then compares it against a vast database of existing medical cases. By identifying similarities and patterns, it generates a detailed report that includes:

  • Potential diseases or health concerns
  • Recommended precautions and lifestyle changes
  • Suggestions for FDA-approved medications and first-aid
  • Actionable insights for doctors, supported by data from similar cases
  • A clear list of the data points that informed the conclusion

This tool is designed to augment the expertise of medical professionals, not to replace it. The AI-generated analysis should always be reviewed and validated by a qualified doctor.

Features

  • Multi-Disease Analysis: The system is trained on datasets for various conditions, including diabetes, kidney stones, heart disease, and anemia.
  • RAG Architecture: It uses a Retrieval-Augmented Generation (RAG) model to ground its analysis in real-world data, improving accuracy and relevance.
  • Vector Similarity Search: Employs Qdrant and sentence-transformer embeddings to efficiently find similar patient cases from the knowledge base.
  • Web-Enhanced Insights: Augments its analysis with real-time information from the web, ensuring the recommendations are current.
  • Comprehensive Reporting: Generates detailed, multi-section reports to support clinical decision-making.

How It Works

  1. Data Preprocessing: Medical datasets (CSV files) are cleaned, transformed, and stored as a collection of documents.
  2. Embedding: The processed documents are converted into vector embeddings using a sentence-transformer model and stored in a Qdrant vector database.
  3. Patient Report Analysis: When a new patient report (PDF) is provided, the system extracts the clinical text.
  4. Similarity Search: The extracted text is used to query the Qdrant database, retrieving the most similar patient cases.
  5. Web Search: An AI-generated query is used to search the web for additional, relevant medical information.
  6. Report Generation: A large language model (Llama 3.1) synthesizes the information from the patient report, similar cases, and web search results to generate a final, comprehensive analysis.

Getting Started

Follow these steps to set up and run the project locally.

Prerequisites

  • Python 3.8+
  • Pip for package management
  • A Hugging Face API token
  • A Qdrant account (for cloud storage) or a local Qdrant instance

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/Healthcare-Assistant.git
    cd Healthcare-Assistant
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate
  3. Install the required packages:

    pip install -r requirements.txt
  4. Set up your environment variables: Create a .env file in the root directory and add your API keys:

    HUGGINGFACE_API_TOKEN="your_huggingface_api_token"
    QDRANT_API_KEY="your_qdrant_api_key"
    

Data Folder Setup

The data directory is not tracked by Git. You will need to create it and populate it with the necessary raw data files.

  1. Create the directory structure:

    mkdir -p data/raw
    
  2. Add the raw data files:

    Place your raw CSV data files in the data/raw/ directory. The project is configured to use the following files:

    • diabetes_classification.csv
    • kidney_stone_dataset.csv
    • heart_disease.csv
    • anemia.csv
    • thyroidDf.csv

    Your data directory should look like this:

    data/
    └── raw/
        ├── diabetes_classification.csv
        ├── kidney_stone_dataset.csv
        ├── heart_disease.csv
        ├── anemia.csv
        └── thyroidDf.csv
    

Datasets

The following datasets are used in this project:

Usage

  1. Preprocess the data: Run the preprocessing script to prepare the datasets.

    python src/pre-processing/preprocess.py
  2. Create the embeddings: Generate embeddings from the preprocessed data and store them in your Qdrant database.

    python src/embeddings/create_embeddings.py
  3. Run the analysis: Place your patient report PDF in the data/raw/ directory (e.g., test_report_2.pdf) and run the retrieval script.

    python src/embeddings/retrieve_embeddings.py

    The system will output the final analysis to the console.

About

This project is an AI-powered clinical decision support system that assists healthcare professionals by analyzing patient reports and providing preliminary diagnostic insights. It leverages a Retrieval-Augmented Generation (RAG) architecture to deliver comprehensive and context-aware analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages