Kaggle Competition: Titanic - Machine Learning from Disaster

This project is a submission for the Titanic - Machine Learning from Disaster competition hosted on Kaggle.The goal is to predict survival outcomes on the Titanic based on passenger attributes using classical machine learning methods.

Overview

Using a combination of feature engineering, ensemble modeling, and cross-validation, this notebook achieves strong performance on the validation set and test submission.

Modeling Strategy:

Feature extraction from raw data (titles, categorical encodings)
Imputation for missing values
One-hot encoding for categorical variables
Ensemble classification using:
- RandomForestClassifier
- XGBRFClassifier
- Combined via soft-voting VotingClassifier

Brief explanation of the methods used

Random Forest Core Idea: A Random Forest is an ensemble of decision trees trained on random subsets of the data and features. The final prediction is made by classification or regression.
Key Concepts:

Bagging: Each tree is trained on a random sample (with replacement) of the training data.
Feature Randomness: At each split, only a random subset of features is considered which reduces correlation between trees.
Result: Low bias + low variance leads to strong generalization.

XGBoost Core Idea: XGBoost is a gradient boosting algorithm; trees are built sequentially each trying to correct the errors of the last.
Key Concepts:

Boosting: Unlike Random Forest, XGBoost builds trees one after another. Each tree focuses on where the previous model did poorly.
Gradient Descent: It minimizes a loss function using gradients.
Regularization: XGBoost penalizes overly complex trees (L1/L2 regularization), making it robust against overfitting.

Features Used

Feature	Description
Pclass	Passenger class (proxy for wealth)
Sex	Binary-encoded gender
Age	Median-imputed age
SibSp, Parch	Number of siblings/spouses and parents/children aboard
Fare	Ticket price (proxy for economic status)
Embarked	Port of embarkation (one-hot encoded)
Title	Extracted honorific from Name field

Results

Metric	Value
Cross-Validation Accuracy	`83.5% +- 3.9%`
Validation Accuracy	`83.8%`

The model was trained using RepeatedStratifiedKFold with 10 folds repeated 3 times to reduce variance in cross-validation estimates.

Key Techniques

Title extraction from passenger names (Mr, Mrs, Miss, etc.) to improve prediction
Ensemble learning to boost performance and reduce overfitting
Cross-validation for more reliable model evaluation
One-hot encoding with drop_first=True to avoid multicollinearity

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
ensemble-between-xgb-and-rf.ipynb		ensemble-between-xgb-and-rf.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kaggle Competition: Titanic - Machine Learning from Disaster

Overview

Brief explanation of the methods used

Features Used

Results

Key Techniques

About

Uh oh!

Releases

Packages

Languages

majdkhalife/Kaggle_Competition_Titanic

Folders and files

Latest commit

History

Repository files navigation

Kaggle Competition: Titanic - Machine Learning from Disaster

Overview

Brief explanation of the methods used

Features Used

Results

Key Techniques

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages