FoodHub

Business Description: The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer’s location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin on the delivery order from the restaurants.

1. Objective

The data is related to the orders made by the registered customers in their online portal.
Questions to Consider:

Analyze data to gain insight into the demand of the different restaurants in order to enhance the customer experience.
Which are the top 5 restaurants in terms of the number of orders received?
Which is the most popular cuisine on weekends?
What percentage of the orders cost more than 20 dollars?
What is the mean order delivery time?
What is the net revenue generated by the company across all orders?
What percentage of orders take more than 60 minutes to get delivered from the time the order is placed?
How does the mean delivery time vary during weekdays and weekends?

2. Data Analysis

Utilized Python to analyze the data set

Data Description Rows and columns are present in the data -

# use the shape method to determine the number of rows and columns of the data
df.shape

Observations: (1898, 9)

Datatypes of the different columns in the dataset

# Use info() to print a concise summary of the DataFrame
df.info()

Observations:

There are 4 features (columns) which are int64
There is 1 feature (column), cost_of_the_order, which is float64
There are 3 features (columns) that are objects; however, the feature ‘rating’ should be an int64 or float64, so this should be investigated further.
Some ‘ratings’ were not given; therefore, the data type is labeled an object.
The ratings which were not given either have to be discarded, or the average of the rating for that cuisine or restaurant should be given.

Missing values in the data

# determine if there are missing values
print('Missing Values')
df.isna().sum()

Observations:

There are no missing values

Statistical summary of the data

# use the describe method to determine the statistical summary of the data
df.describe()

Observations:

The statistical summary was completed for all of the quantitative variables
The order_id and the customer_id are not relevant for the statistical summary as they are identifiers
The food_preparation_time statistical information:
average: 27.371970 minutes
minimum: 20 minutes
maximum: 35 minutes

## how many unique restaurant_names, cuisine_types and looking at other qualitative variables
df.describe(include = 'all')

Observations:

There are 178 unique restaurant_names, with Shake Shack (219 orders) being the top restaurant_name
There are 14 unique cuisine_types, with American (584 orders) being the top cuisine_type
The most orders are made on the Weekend, with 1351 orders
The rating should be an integer; however, because 736 orders have a rating of ‘Not given’ it is an object. And it shows that the majority of customers do not leave a rating.

Number of orders that are rated / not rated

# determine the number of orders which are not rated
df.groupby('rating').count()

Observations:

There are 736 orders which are not rated.

3. Exploratory Data Analysis (EDA)

Univariate Analysis Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.)

#Countplots for the qualitative variables
# displaying the 'restaurant_name' using a countplot
plt.figure(figsize = (25, 5))
sns.countplot(df.restaurant_name, order = df.restaurant_name.value_counts().index)

#add a title to the countplot
plt.title('Countplot of Restaurants by Name')

#rotate the xtick labels for readability
plt.xticks(rotation = 90)

# displaying the 'cuisine_type' using a countplot
plt.figure(figsize = (25, 5))
sns.countplot(df.cuisine_type, order = df.cuisine_type.value_counts().index)

#add a title to the countplot
plt.title('Countplot of cuisine_type')

#rotate the xtick labels for readability
plt.xticks(rotation = 45)

# displaying the 'day_of_the_week' using a countplot
plt.figure(figsize = (25, 5))
sns.countplot(df.day_of_the_week, order = df.day_of_the_week.value_counts().index)

#add a title to the countplot
plt.title('Countplot of day_of_the_week')

#set of boxplots of the quantitative variables
fig, axs = plt.subplots(1, 3)
#Boxplot for the cost_of_the_order
axs[0].boxplot(df['cost_of_the_order'])
axs[0].set_title('cost_of_the_order')

#Boxplot for the food_preparation_time
axs[1].boxplot(df['food_preparation_time'])
axs[1].set_title('food_preparation_time')

#Boxplot for the delivery_time
axs[2].boxplot(df['delivery_time'])
axs[2].set_title('delivery_time')

#adjust spacing for readability
fig.subplots_adjust(left=0.08, right=2, bottom=0.05, top=2,
                    hspace=0.5, wspace=0.5)

# set of distribution plots for the quantitative variables
# distribution plot for the cost_of_the_order
plt.figure(figsize = (15, 5))
sns.displot(df.cost_of_the_order, kde=True)
plt.title('Distribution of cost_of_the_order')
plt.xlabel('cost_of_the_order')
plt.ylabel('count')
          
# distribution plot for the food_preparation_time
plt.figure(figsize = (15, 5))
sns.displot(df.food_preparation_time, kde=True)
plt.title('Distribution of food_preparation_time')
plt.xlabel('food_preparation_time')
plt.ylabel('count')
          
# distribution plot for the delivery_time
plt.figure(figsize = (15, 5))
sns.displot(df.delivery_time, kde=True)
plt.title('Distribution of delivery_time')
plt.xlabel('delivery_time')
plt.ylabel('count')

#display plot
plt.show()

Countplot of Restaurants by Name

Countplot of Cuisine Type

Countplot of Day of Week

Box Plots

Distribution of Cost of the Order

Distribution of Food Preparation Time

Distribution of Delivery Time

Top 5 restaurants in terms of the number of orders received

#list the top 5 restaurants by the number of orders received
num_orders_per_restaurant = df.groupby('restaurant_name').count()
top_5_restaurants = num_orders_per_restaurant.sort_values(by = ['order_id'], ascending = False).head(5)
top_5_restaurants

Observations: The top 5 restaurants, ranked by the number of orders received, are:

Shake Shack - 219 orders
The Meatball Shop - 132 orders
Blue Ribbon Sushi - 119 orders
Blue Ribbon Fried Chicken - 96 orders
Parm - 68 orders

Most popular cuisine on weekends

# find the most popular cuisines on the weekends
cuisine_by_day_of_week = pd.crosstab(df['cuisine_type'],df['day_of_the_week'])
popular_cuisine_weekend = cuisine_by_day_of_week.sort_values(by = ['Weekend'] , ascending = False)
popular_cuisine_weekend

Observations: The most popular top 5, cuisines on the weekend are:

American - 415 orders
Japanese - 335 orders
Italian - 207 orders
Chinese - 163 orders
Mexican - 53 orders

Percentage of the orders cost more than 20 dollars

#find the number of orders greater than 20usd
greater_than_20usd = df['cost_of_the_order'] > 20
num_greater_than_20usd = greater_than_20usd.sum()

# to find the percentage of orders greater than 20usd, you will take the (number of orders greater than 20usd / total number of orders)*100
total_orders = df['cost_of_the_order'].count()

percent_orders_greater_20usd = (num_greater_than_20usd/total_orders)*100

print('The number of orders greater than 20usd is:', num_greater_than_20usd)
print('The number of total number of orders:', total_orders)
print('The percent of orders greater than 20usd is:', percent_orders_greater_20usd)

Observations:

The number of orders greater than 20usd is: 555
The number of total number of orders: 1898
The percent of orders greater than 20usd is: 29.24130663856691

Mean order delivery time

# find the mean order delivery time
mean_delivery_time = df['delivery_time'].mean()

print ('The mean order delivery time is:', mean_delivery_time)

Observations:

The mean order delivery time is approximately 24 minutes

The top 3 most frequent customers

#list the top 3 most frequent customers
freq_customers = df.groupby('customer_id').count()
top_3_customers = freq_customers.sort_values(by = ['order_id'], ascending = False).head(3)
top_3_customers

Observations:

The three most frequent customers are:

| Customer_id | number of orders |

| 52832 | 13 |
| 47440 | 10 |
| 83287 | 9 |

Multivariate Analysis Perform a multivariate analysis to explore relationships between the important variables in the dataset.

# create a pair plot to see if there are any relationships 
sns.pairplot(df)


# identify relationships between the cuisine type and the cost of the order
df.hist(by='cuisine_type',column = 'cost_of_the_order', figsize=(20,30))

# identify relationships between the cuisine type and the day_of_the_week
df.hist(by='cuisine_type',column = 'day_of_the_week', figsize=(20,30))

# identify relationships between the cuisine type and the rating
df.hist(by='cuisine_type',column = 'rating', figsize=(20,30))

# identify relationships between the cuisine type and the food_preparation_time
df.hist(by='cuisine_type',column = 'food_preparation_time', figsize=(20,30))

# identify relationships between the cuisine type and the delivery_time
df.hist(by='cuisine_type',column = 'delivery_time', figsize=(20,30))

# identify relationships between the cost_of_the_order and the day_of_the_week
df.hist(by='day_of_the_week',column = 'cost_of_the_order', figsize=(10,15))

# identify relationships between the food_preparation_time and the day_of_the_week
df.hist(by='day_of_the_week',column = 'food_preparation_time', figsize=(10,15))

# identify relationships between the food_preparation_time and the day_of_the_week
df.hist(by='day_of_the_week',column = 'delivery_time', figsize=(10,15))

# identify relationships between the cost_of_the_order and the rating
df.hist(by='rating',column = 'cost_of_the_order', figsize=(10,15))



# find the correlations between the features
corr = df.corr()
corr

# create the heatmap of the correlation between the features
sns.heatmap(corr, annot=True)

plt.show()

Pairplot

Cuisine Type and Cost of Order

Cuisine Type and Day of the Week

Cuisine Type and Rating

Cuisine Type and Food Preparation Time

Cuisine Type and Delivery Time

Cost of Order and Day of the Week

Food Preparation Time and Day of the Week

Delivery Time and Day of the Week

Rating and Cost of Order & Heatmap

Determine the number of restaurants with a rating count of more than 50 and with an average rating greater than 4 Restaurants meeting these criteria are able to become part of a promotional offer for advertising their restaurant.

# create a dataframe with the specific ratings for each restaurant 
all_ratings_by_restaurant = pd.crosstab(df['restaurant_name'],df['rating'] )

# need to drop the 'not given' responses in order to sum the total ratings for each restaurant
ratings_by_restaurant = all_ratings_by_restaurant.drop('Not given', axis=1)

# find the total number of reveiws each restaurant received
total_reviews = ratings_by_restaurant['total_reviews'] = ratings_by_restaurant.sum(axis=1)

# find the average rating for each restaurant
column = list(ratings_by_restaurant.columns)
temp_lst = ['3', '4', '5']

for column in temp_lst:
    ratings_by_restaurant['average_rating'] = (ratings_by_restaurant['3'] * 3 + ratings_by_restaurant['4'] * 4 + ratings_by_restaurant['5'] * 5)/ratings_by_restaurant['total_reviews']
    
# sort the total_reviews in descending order
ratings_by_restaurant.sort_values(by = ['total_reviews'], ascending = False)

temp_lst2 = ['total_reviews', 'average_rating']

# use a loop to go through all of the objects to determine if the restaurant has 50 or more reviews and an average of greater than 4.0
for column in temp_lst2:
    ratings_by_restaurant['More_50'] = ratings_by_restaurant['total_reviews'] >= 50
    ratings_by_restaurant['More_4'] = ratings_by_restaurant['average_rating'] >= 4

# determine which restaurants meet the criteria 
promotion_restaurants = ratings_by_restaurant[(ratings_by_restaurant['More_50'] == True) & (ratings_by_restaurant['More_4'] == True)]

# display the restaurants which meet the criteria
promotion_restaurants

Outcome_Sorted

Observations:

There are only 4 restaurants that meet the criteria for the promotional offer.
Blue Ribbon Fried Chicken
Blue Ribbon Sushi
Shake Shack
The Meatball Shop

FoodHub charges the restaurant 25% on orders having cost greater than 20 dollars and 15% on orders having cost greater than 5 dollars.

The net revenue generated by the company across all orders

# find the orders that are over 5 usd and over 20 usd
# create a column 'charge' with the cost *0.15 or cost*0.25 respectively%

#create a column 'charge' set to zero
df['charge'] = 0

# find the objects between 5 and 20 and multiply by 0.15 and find objects greater than 20 and multiply by 0.25
for index, row in df.iterrows():
    cost = row['cost_of_the_order']
    if cost > 5 and cost <= 20:
        df.at[index, 'charge'] = cost * 0.15
    elif cost > 20:
        df.at[index, 'charge'] = cost * 0.25

# find the total revenue by summing the 'charge' column
net_revenue = df['charge'].sum()

print('The net revenue generated by the company across all orders is:', net_revenue)

The net revenue generated by the company across all orders is: 6166.303

Observations: The total net revenue by the company across all orders is: 6166.31 USD

The Percentage of orders which take more than 60 minutes to get delivered from the time the order is placed? (Inclusive of food preparation time)

# find the number of orders which require more than 60 minutes to deliver the food

delivery_time_from_order = df['food_preparation_time'] + df['delivery_time']
delivery_time_from_order.describe()

df['over_60min'] = (delivery_time_from_order > 60).astype(int)

df['over_60min'].sum()

percent_delivery_over_60 = (df['over_60min'].sum() / df['order_id'].count()) * 100

print('The percent of orders that take more than 60 minutes to get delivered from the time the order is placed is:', percent_delivery_over_60)

The percent of orders that take more than 60 minutes to get delivered from the time the order is placed is: 10.537407797681771

Observations:

About 10.5% of the orders take over 60 minutes to get delivered from the time the order is placed

Mean delivery time during weekdays vs weekends

# find the mean delivery time grouped by the day_of_the_week 
by_day_of_the_week = df.groupby('day_of_the_week').mean()

mean_delivery_by_day_of_the_week = by_day_of_the_week['delivery_time']
#delivery_time_by_day_of_week = pd.crosstab(df.groupby('day_of_the_week'), df['delivery_time'])

#delivery_time_by_day_of_week = pd.crosstab(df['day_of_the_week'], df['delivery_time'])

by_day_of_the_week 
mean_delivery_by_day_of_the_week

day_of_the_week

Weekday 28.340037

Weekend 22.470022

Name: delivery_time, dtype: float64

Observation:

The weekend has a shorter delivery time.

4. Conclusion and Recommendations

There were 1898 orders analyzed.

There are 178 unique restaurants. Shake Shack was the restaurant with the most orders, 219.

There are 14 unique cuisine types. American was the most ordered cuisine with 584 orders.

Costs of the Orders The costs of the orders range from 4.47 USD to 35.41 USD with the mean cost being $16.50. The cost of the food has a positively skewed distribution. Only approximately 29.24% (555 orders) of the orders cost over 20 USD.

The days of the week were separated into two variables, weekday and weekend. 71% (1351/1898) of the orders occur on the weekend.

Reviews Food hub asks customers to review the food, and 4 unique reviews were offered: 3, 4, 5, Not given. 736 (38.7%) of the customers chose not to review their received order.

Not Given was the most common response.

Rating 3 - 188
Rating 4 - 386
Rating 5 - 588
Not Given - 736

Food Preparation and Delivery The food_preparation_time ranged from 20 minutes to 35 minutes, with the mean being 27.37 minutes. The food preparation time was uniformly distributed. The food delivery time ranged from 15 to 33 minutes, with a mean of 24.16 minutes. The delivery time was negatively skewed. The mean delivery time for:

Weekday 28.340037
Weekend 22.470022

The summary statistics for the delivery time for orders, including food preparation, are:

count 1898.000000
mean 51.533720
std 6.833603
min 35.000000
25% 47.000000
50% 52.000000
75% 56.000000
max 68.000000

Only ‘10%’ of the orders placed take over 60 minutes for delivery from the time it was placed.

Relationships There are no correlations identified between the quantitative variables.

The Cuisine and x The relationship shown by: The cuisine and cost of the order is positively skewed for all except The Southern cuisine and the day of the week shows the weekend is the most popular for all cuisines and the ratings given show Not Given was the highest received for 8 cuisines, however for Indian, 5 was the highest, Korean was equal between 4 and Not given, Mediterranean has more 5, Middle Eastern and Southern has 4 as the most given review, and Vietnamese had a tie between 3, 5 and Not given.

For the restaurants which received reviews, the following four received the most reviews:

Shake Shack - 133 reviews, average rating - 4.28
The Meatball Shop - 84 reviews, average rating - 4.51
Blue Ribbon Fried Chicken - 73 reviews, average rating - 4.32
Blue Ribbon Sushi - 64 reviews, average rating - 4.22

The top 5 restaurants ordered from are:

Shake Shack - 219 orders
The Meatball Shop - 132 orders
Blue Ribbon Sushi - 119 orders
Blue Ribbon Fried Chicken - 96 orders
Parm - 68 orders

The most popular top 5 cuisines on the weekend are:

American - 415 orders
Japanese - 335 orders
Italian - 207 orders
Chinese - 163 orders
Mexican - 53 orders

Net Revenue Almost all orders are charged a fee: ‘15%’ for orders over ‘5 USD’ and ‘25%’ for orders over ‘20 USD’. The company received a net revenue of $6166.31 across all orders.

Recommendations:

-As the mean averages are 16.50 USD, look at how the charges are broken down. -It may be beneficial to add 20% for orders that are over $10 USD.

-To gain more reviews from the customers add a 5 USD off coupon on their next order for them to leave a review. This would increase customer engagement and encourage return business.

-The food prep and delivery had reasonable time frames with 75% taking 56 minutes or less. -Foodhub could have delivery drivers closer to the restaurants which receive the most orders.