DATA ANALYST
INSTACART
Tools
Microsoft Excel
Anaconda
Jupyter Notebook
Python
Python libraries Pandas, NumPy, Seaborn, Matplotlib, and SciPy
Skills
Data cleaning, wrangling, subsetting.
Combining & exporting data.
Data consistency checks.
Deriving new variables.
Grouping and aggregating data with python.
Visualization with Python.
Code etiquette $ excel reporting.
Data
Data is the Instacart Online
Grocery Shopping Dataset 2017, accessed from
https://www.instacart.com/datasets/grocery-shopping-2017 viaKaggle on Nov 15th, 2023.
Data dictionary was provided
Project brief
The analysis is to help finding more information about their sales Patterns, purchases behaviors and the variety of customers.
An online grocery store that operates through an app.
The top 5 departments frequently visit are: Produce, dairy eggs, snacks, beverages and frozen.
The top 5 department with minimum visits are: International, Alcohol, pets, others and bulk.
Regular customers are the ones who orders more, we know now they are not big spenders, but they are the ones who order more frequent, especially during the weekend.
And between 9:00 am to 4pm are the most sales and where you can find the customers to be the majority regulars.
Most products are between $1 and $15, while a few are higher priced at $15 to $25.
The peaks on days zero and six mean that most money is spent on Friday and Saturday.
This might be due to people stocking up on things before the weekend.
Price volume distribution appear to be steady and do not change much during the day. It fluctuates between 7.750 and 7.850.
The busiest days of the week are weekends, Saturday and Sunday. Busiest hours of day for ordering are between 9-16 (9:00 am - 4:00 pm).
Ads should be run on weekdays before 9AM or after 4PM.
Ordering and Pricing Findings
Customers married with children are the ones that have most of the orders placed in Instacart.
The peak days are Thursday, Friday and Saturday, then we see a declination of orders on Sunday, one of the reason could be they save that day for family gatherings in comparison with the other customers profiles which they pretty much maintain the same flow.
Produce and dairy eggs are the top orders from all customers, but the ones with most orders are the customers married with children.
Regarding spending habits seems that customers do not spend much in pricey products, most likely they will spend on the basics they will need.
Customers married with children order more in any region comparing to the other profiles.
Profiling By Regions & Spenders
Based on the scatter plot, an customer income increases after the age of 40+.
Married with children are most of the customers who shop on Instacart.
In terms of Loyalty, Regular customers orders more products comparing to loyal customers or new customer.
Customer Profile
The grey boxes in the first row of the population flow represent the original data sets as they were when I downloaded them.
The second row of boxes (colored) represents the data sets after I manipulated them, e.g., removed missing values and duplicates. This offers a visual overview of how the data flows throughout the data consistency checks.
The third row, where also the arrows are colored, represents the merges I performed between the datasets.
Population Flow
If you are targeting ads while people are already spending, ads should be targeted for Fri-Sun from 9am-4pm. The departments should focus on Produce (especially organic), Dairy & Eggs, Snacks, and Beverages.
Different groups of customers can be targeted thanks to our flags we created. These include:
Loyalty (how many orders placed)
Spending (how much they spend on average)
Frequency (how often they order)
Region (where do they live)
Regular Customers order more frequently, follow by Loyal Customer and the last New Customers. A savings program to reward members for ordering more frequently would increase consistency of sales. A survey for the new customers to get to know their shopping behavior, and Survey in general to find out the shopping habits of all customers.
In All 4 regions most of the orders are made by regular customers. Ads to target the loyal and new customer in all regions to increase sales and incentivized more orders placed by these two groups.
In all 4 regions low spenders dominate the orders trends. With a saving program and special offers for those products that are not popular may incentive all customers to spend more.
Conclusions and Recommendations
Ivonne Aspilcueta
Data Analyst
Hermosa Beach, CA, United States