Offensive Yoruba Language Detection App

This repository contains a machine learning-based application for detecting offensive language in yoruba text. The app is built using either Streamlit or Flask and leverages a Logistic Regression model trained on a dataset of tweets. The model uses TF-IDF Vectorization for text preprocessing and classification.

Project Overview

The goal of this project is to detect if a lanuange in yoruba is based off of text inputs. The app takes a sentence as input and predicts whether it contains offensive language, hate speech, or is normal. The model is trained on a dataset of tweets and uses TF-IDF Vectorization for feature extraction and Logistic Regression for classification.

Features

Text Input: Users can input a sentence to check for offensive language.
Real-Time Prediction: The app provides instant predictions using a pre-trained machine learning model.
Clean and Preprocess Text: The app cleans and preprocesses the input text (e.g., removes emojis, URLs, and special characters) before making predictions.
Streamlit and Flask Support: The app can be deployed using either Streamlit or Flask.

Installation

Prerequisites

Python 3.7 or higher
pip (Python package manager)

Steps

Clone the Repository:

git clone https://github.com/DominionAkinrotimi/Yoruba-Offensive-Language-Detection-Model.git
cd Yoruba-Offensive-Language-Detection-Model

Create a Virtual Environment (Optional but Recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```
Download NLTK Data: The app uses NLTK for text preprocessing. Download the required NLTK data by running:
```
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt')"
```

Usage

Streamlit App

Run the Streamlit App:
```
streamlit run app.py
```
Open the App: The app will open in your default web browser at http://localhost:8501.
Enter Text: Input a sentence in the text box and click Predict to see the result.

Flask App

Run the Flask App:
```
python main.py
```
Open the App: The app will be available at http://localhost:5000.
Enter Text: Input a sentence in the text box and click Predict to see the result.

Model Development

The model was developed using the following steps:

Data Cleaning:
- Convert text to lowercase.
- Remove emojis, URLs, hashtags, mentions, and special characters.
- Remove digits and extra spaces.
Text Preprocessing:
- Tokenize the text.
- Remove stopwords.
Feature Extraction:
- Use TF-IDF Vectorization to convert text into numerical features.
Model Training:
- Train a Logistic Regression model on the preprocessed data.
Model Saving:
- Save the trained model and vectorizer using joblib.

File Structure

offensive-language-detection/
├── app.py                  # Streamlit app
├── main.py                  # Flask app
├── requirements.txt        # List of dependencies
├── lr_model.pkl  # Trained Logistic Regression model
├── tfidf_vectorizer.pkl    # Fitted TF-IDF Vectorizer
├── README.md               # Project documentation
├── templates/              # Flask HTML templates
│   └── index.html          # Flask app homepage
└── notebooks/              # Jupyter notebooks for model development
    └── model_development.ipynb

Contributing

Contributions are welcome! If you'd like to contribute, please follow these steps:

Fork the repository.
Create a new branch for your feature or bugfix.
Commit your changes and push to the branch.
Submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

The dataset used for training was sourced from 🙈 (I'm shy).
Special thanks to the developers of scikit-learn, Streamlit, and Flask for their amazing libraries.

Contact

For questions or feedback, please contact:

Dominion Akinrotimi
Email: [email protected]

Enjoy using the Offensive Yoruba Language Detection App! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Offensive Yoruba Language Detection App

Table of Contents

Project Overview

Features

Installation

Prerequisites

Steps

Usage

Streamlit App

Flask App

Model Development

File Structure

Contributing

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.devcontainer		.devcontainer
templates		templates
README.md		README.md
app.py		app.py
lr_model.pkl		lr_model.pkl
main.py		main.py
requirements.txt		requirements.txt
tfidf_vectorizer.pkl		tfidf_vectorizer.pkl

DominionAkinrotimi/Yoruba-Offensive-Language-Detection-Model

Folders and files

Latest commit

History

Repository files navigation

Offensive Yoruba Language Detection App

Table of Contents

Project Overview

Features

Installation

Prerequisites

Steps

Usage

Streamlit App

Flask App

Model Development

File Structure

Contributing

License

Acknowledgments

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages