AI-Powered Clinical Decision Support System

This project is an AI-powered clinical decision support system that assists healthcare professionals by analyzing patient reports and providing preliminary diagnostic insights. It leverages a Retrieval-Augmented Generation (RAG) architecture to deliver comprehensive and context-aware analysis.

About The Project

The system takes a patient's medical report (in PDF format) as input, extracts relevant clinical data, and then compares it against a vast database of existing medical cases. By identifying similarities and patterns, it generates a detailed report that includes:

Potential diseases or health concerns
Recommended precautions and lifestyle changes
Suggestions for FDA-approved medications and first-aid
Actionable insights for doctors, supported by data from similar cases
A clear list of the data points that informed the conclusion

This tool is designed to augment the expertise of medical professionals, not to replace it. The AI-generated analysis should always be reviewed and validated by a qualified doctor.

Features

Multi-Disease Analysis: The system is trained on datasets for various conditions, including diabetes, kidney stones, heart disease, and anemia.
RAG Architecture: It uses a Retrieval-Augmented Generation (RAG) model to ground its analysis in real-world data, improving accuracy and relevance.
Vector Similarity Search: Employs Qdrant and sentence-transformer embeddings to efficiently find similar patient cases from the knowledge base.
Web-Enhanced Insights: Augments its analysis with real-time information from the web, ensuring the recommendations are current.
Comprehensive Reporting: Generates detailed, multi-section reports to support clinical decision-making.

How It Works

Data Preprocessing: Medical datasets (CSV files) are cleaned, transformed, and stored as a collection of documents.
Embedding: The processed documents are converted into vector embeddings using a sentence-transformer model and stored in a Qdrant vector database.
Patient Report Analysis: When a new patient report (PDF) is provided, the system extracts the clinical text.
Similarity Search: The extracted text is used to query the Qdrant database, retrieving the most similar patient cases.
Web Search: An AI-generated query is used to search the web for additional, relevant medical information.
Report Generation: A large language model (Llama 3.1) synthesizes the information from the patient report, similar cases, and web search results to generate a final, comprehensive analysis.

Getting Started

Follow these steps to set up and run the project locally.

Prerequisites

Python 3.8+
Pip for package management
A Hugging Face API token
A Qdrant account (for cloud storage) or a local Qdrant instance

Installation

Clone the repository:

git clone https://github.com/your-username/Healthcare-Assistant.git
cd Healthcare-Assistant

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate

Install the required packages:
```
pip install -r requirements.txt
```
Set up your environment variables: Create a .env file in the root directory and add your API keys:
```
HUGGINGFACE_API_TOKEN="your_huggingface_api_token"
QDRANT_API_KEY="your_qdrant_api_key"
```

Data Folder Setup

The data directory is not tracked by Git. You will need to create it and populate it with the necessary raw data files.

Create the directory structure:
```
mkdir -p data/raw
```
Add the raw data files:

Place your raw CSV data files in the data/raw/ directory. The project is configured to use the following files:
- diabetes_classification.csv
- kidney_stone_dataset.csv
- heart_disease.csv
- anemia.csv
- thyroidDf.csv
Your data directory should look like this:
```
data/
└── raw/
    ├── diabetes_classification.csv
    ├── kidney_stone_dataset.csv
    ├── heart_disease.csv
    ├── anemia.csv
    └── thyroidDf.csv
```

Datasets

The following datasets are used in this project:

Usage

Preprocess the data: Run the preprocessing script to prepare the datasets.
```
python src/pre-processing/preprocess.py
```
Create the embeddings: Generate embeddings from the preprocessed data and store them in your Qdrant database.
```
python src/embeddings/create_embeddings.py
```
Run the analysis: Place your patient report PDF in the data/raw/ directory (e.g., test_report_2.pdf) and run the retrieval script.
```
python src/embeddings/retrieve_embeddings.py
```
The system will output the final analysis to the console.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Powered Clinical Decision Support System

About The Project

Features

How It Works

Getting Started

Prerequisites

Installation

Data Folder Setup

Datasets

Usage

About

Uh oh!

Releases

Packages

Languages

Sreejit-Sengupto/BioInsight

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Clinical Decision Support System

About The Project

Features

How It Works

Getting Started

Prerequisites

Installation

Data Folder Setup

Datasets

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages