Blood-Brain Barrier (BBB) Permeability Prediction

This project uses RDKit to extract molecular descriptors from drug SMILES notation and applies machine learning techniques to predict Blood Brain Barrier (BBB) permeability.

Features

Molecular Analysis: Extract 200+ molecular descriptors from SMILES notation
Similarity Analysis: Compare molecular similarity using Morgan fingerprints and Tanimoto coefficients
Exploratory Data Analysis: Comprehensive EDA with PCA visualization
Machine Learning Models: Implemented Random Forest, SVM, and Logistic Regression for BBB prediction
Model Evaluation: Cross-validation, confusion matrices, ROC curves, and feature importance analysis

Project Structure

BBB_permeability_prediction/
├── README.md                 # This file
├── requirements.txt          # Python dependencies
├── .gitignore               # Git ignore file
├── LICENSE                  # MIT License
├── data/                    # Data directory
│   ├── sample_data.csv      # Sample dataset
│   └── README.md           # Data documentation
├── src/                     # Source code
│   ├── bbb.py              # Main analysis script
│   ├── ml_models.py        # Machine learning models
│   └── predict_bbb.py      # Prediction script
├── notebooks/               # Jupyter notebooks
│   └── bbb_analysis.ipynb  # Interactive analysis
├── docs/                    # Documentation
│   └── molecular_descriptors.md
├── results/                 # Output files
│   ├── plots/              # Generated plots
│   └── models/             # Trained models
└── tests/                  # Unit tests
    └── test_bbb.py

Installation

Prerequisites

Python 3.7 or higher
pip package manager

Setup

Clone the repository:

git clone <repository-url>
cd BBB_permeability_prediction

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Basic Usage

Place your BBB dataset as data/BBB_datasets.csv with columns:
- SMILES: Chemical structure in SMILES notation
- Class: BBB permeability class (BBB+ or BBB-)
Run the analysis:

python src/bbb.py

Predict BBB permeability for new molecules:

python src/predict_bbb.py "CC(=O)NC1=CC=C(C=C1)O" "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"

Or using Makefile:

make predict SMILES="CC(=O)NC1=CC=C(C=C1)O"

Jupyter Notebook

For interactive analysis, use the Jupyter notebook:

jupyter notebook notebooks/bbb_analysis.ipynb

Dataset Format

The script expects a CSV file with the following columns:

Column	Description	Example
SMILES	Chemical structure in SMILES notation	`CC(=O)NC1=CC=C(C=C1)O`
Class	BBB permeability class	`BBB+` or `BBB-`

Molecular Descriptors

The script extracts 200+ molecular descriptors including:

Molecular weight
LogP (lipophilicity)
Number of rotatable bonds
Hydrogen bond donors/acceptors
Topological descriptors
And many more...

Output

The script generates:

Molecular similarity analysis
PCA visualization of molecular descriptors
Statistical summaries of the dataset
Plots showing drug clustering by BBB permeability
Machine learning model training and evaluation
Feature importance analysis
Model performance comparison
Confusion matrices and ROC curves
Trained models saved for future predictions

Example Molecules Analyzed

Paracetamol: CC(=O)NC1=CC=C(C=C1)O
Caffeine: CN1C=NC2=C1C(=O)N(C(=O)N2C)C
Theophylline: CN1C2=C(C(=O)N(C1=O)C)NC=N2
MDMA: CC(CC1=CC2=C(C=C1)OCO2)NC

Dependencies

RDKit: Cheminformatics toolkit
pandas: Data manipulation
numpy: Numerical computing
matplotlib: Plotting
seaborn: Statistical visualization
scikit-learn: Machine learning

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

RDKit community for the excellent cheminformatics toolkit
Original Colab notebook: BBB Analysis

Machine Learning Models

The project implements three machine learning algorithms:

1. Random Forest Classifier

Advantages: Handles non-linear relationships, provides feature importance
Parameters: 100 estimators, max depth 10, min samples split 5
Use Case: Best for interpretable predictions with feature importance

2. Support Vector Machine (SVM)

Advantages: Effective for high-dimensional data, good generalization
Parameters: RBF kernel, C=1.0, gamma='scale'
Use Case: Good performance on molecular descriptor data

3. Logistic Regression

Advantages: Fast training, interpretable coefficients
Parameters: C=1.0, max iterations 1000
Use Case: Baseline model and fast predictions

Model Evaluation

Cross-validation: 5-fold CV for robust performance estimation
Metrics: Accuracy, precision, recall, F1-score, AUC-ROC
Feature Importance: Top 20 most important molecular descriptors
Visualization: Confusion matrices, ROC curves, performance comparison

Future Enhancements

Implement machine learning models (Random Forest, SVM, Logistic Regression)
Add feature importance analysis
Cross-validation and model evaluation
Hyperparameter tuning with GridSearch
Neural Networks (Deep Learning)
Web interface for drug prediction
API for batch processing
Integration with drug databases

Troubleshooting

Common Issues

RDKit installation issues: Try using conda instead of pip:
```
conda install -c conda-forge rdkit
```
Missing dataset: Ensure BBB_datasets.csv is in the data/ directory
Memory issues: For large datasets, consider processing in batches

Getting Help

Check the Issues page
Create a new issue with detailed error information
Include your Python version and operating system

Citation

If you use this project in your research, please cite:

@software{bbb_prediction,
  title={Blood-Brain Barrier Permeability Prediction},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/BBB_permeability_prediction}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Blood-Brain Barrier (BBB) Permeability Prediction

Features

Project Structure

Installation

Prerequisites

Setup

Usage

Basic Usage

Jupyter Notebook

Dataset Format

Molecular Descriptors

Output

Example Molecules Analyzed

Dependencies

Contributing

License

Acknowledgments

Machine Learning Models

1. Random Forest Classifier

2. Support Vector Machine (SVM)

3. Logistic Regression

Model Evaluation

Future Enhancements

Troubleshooting

Common Issues

Getting Help

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
data		data
docs		docs
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
bbb_backup.py		bbb_backup.py
requirements.txt		requirements.txt
setup.py		setup.py

License

orvelte/BBB_permeability_prediction

Folders and files

Latest commit

History

Repository files navigation

Blood-Brain Barrier (BBB) Permeability Prediction

Features

Project Structure

Installation

Prerequisites

Setup

Usage

Basic Usage

Jupyter Notebook

Dataset Format

Molecular Descriptors

Output

Example Molecules Analyzed

Dependencies

Contributing

License

Acknowledgments

Machine Learning Models

1. Random Forest Classifier

2. Support Vector Machine (SVM)

3. Logistic Regression

Model Evaluation

Future Enhancements

Troubleshooting

Common Issues

Getting Help

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages