The aim of this project is to scrape data from the IMDb database and then use the team member's Netflix view history to predict what kind of movies we would love to watch as a team. This project is part of our Data Science practice where we will apply techniques such as EDA, Linear regression, web scraping with beautifulsoup, selenium, and feature engineering.
- IMDb: We will scrape movie data from the IMDb website, which is one of the most comprehensive sources for movie information.
- Netflix: We will use the team member's Netflix view history to understand what kind of movies we would love to watch as a team.
- Web scraping with beautifulsoup: We will use this Python library to extract data from the IMDb website.
- Selenium: We will use this Python library to automate the scraping process and make it more efficient.
- EDA: We will perform exploratory data analysis to gain insights into the data and identify patterns.
- Linear regression: We will use linear regression to build a predictive model that can recommend movies based on our viewing history.
- Feature engineering: We will create new features from the existing data to improve the accuracy of our predictive model.
- Presentation File: We will create a visual and oral presentation to showcase our project and findings.
- Project Repository: We will create a GitHub repository to share our code and project details.
- Blog Post: We will publish a blog post on the internet (e.g. Medium) to share our project and findings with the broader data science community.
This project aims to provide insights into what kind of movies we would love to watch as a team based on our Netflix view history and IMDb data. We will use a combination of web scraping, exploratory data analysis, linear regression, and feature engineering to build a predictive model. Our project will showcase our data science skills and provide us with valuable experience in working with real-world datasets.