Skip to content

romabansal/App-Rating-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

App-Rating-Data-Analysis

This project analyzes a publicly available dataset of app ratings from the Google Play Store using Python and Pandas. The project includes code for loading and cleaning the data, calculating basic statistics, creating visualizations, and drawing conclusions based on the analysis.

Getting Started

To get started with this project, you will need to download the dataset from Kaggle and have Python and Pandas installed on your machine.

Once you have downloaded the dataset, you can load it into a Pandas DataFrame using the following code:

import pandas as pd

Load the dataset

df = pd.read_csv("googleplaystore.csv")

Print the first five rows

print(df.head())

Cleaning the Data : The dataset may contain duplicates or missing values, so it is important to clean the data before analyzing it. To remove duplicates and missing values, you can use the following code:

Drop duplicates

df.drop_duplicates(subset="App", inplace=True)

Drop rows with missing values

df.dropna(inplace=True)

Analyzing the Data : Once the data is clean, you can start analyzing it. The code in this project includes examples of how to calculate basic statistics such as the average rating and number of downloads for each app category:

Calculate the average rating and number of downloads for each app category

category_stats = df.groupby("Category").agg({"Rating": "mean", "Installs": "sum"}) category_stats.rename(columns={"Rating": "Average Rating", "Installs": "Total Downloads"}, inplace=True)

Print the results

print(category_stats)

You can also create visualizations to help understand the data better, such as histograms and scatter plots:

import matplotlib.pyplot as plt

Create a histogram of the app ratings

df.hist(column="Rating")

Create a scatter plot of the app ratings versus the number of downloads

plt.scatter(df["Installs"], df["Rating"]) plt.xlabel("Number of Downloads") plt.ylabel("Rating")

Conclusions

Based on the analysis, you can draw several conclusions about the dataset. For example, you can see that the average rating is highest for apps in the "Events" and "Education" categories, while the total number of downloads is highest for apps in the "Communication" and "Social" categories. You can also see that there is a positive correlation between the number of downloads and the app rating, although the relationship is not linear.

This project demonstrates how to use Python and Pandas to analyze app rating data and draw meaningful insights. With further analysis and refinement, this data could be used to optimize app development and marketing strategies.

Acknowledgments

This project is based on a publicly available dataset from Kaggle. The code in this project was inspired by tutorials and examples from the Python and Pandas documentation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published