Carla A. Rudder, PMP

Logo

Data-driven professional with a strong mathematics, statistics, data analysis, and education background. I use data to find patterns and insights, drive innovation, and create meaningful change.

Technical Skills: Excel, SQL, Python, R, PowerBI, Tableau, and Certified Scrum Master

Applied Data Science & ML - MIT

BSc Applied Mathematics

MSc Mathematics Ed (Financial Math)

Ph.D. Mathematics Ed (Problem-Solving)

View My LinkedIn Profile

View My GitHub Profile

FoodHub

Business Description: The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer’s location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin on the delivery order from the restaurants.

1. Objective

The data is related to the orders made by the registered customers in their online portal.
Questions to Consider:

2. Data Analysis

Utilized Python to analyze the data set

Data Description Rows and columns are present in the data -

# use the shape method to determine the number of rows and columns of the data
df.shape

Observations: (1898, 9)

Datatypes of the different columns in the dataset

# Use info() to print a concise summary of the DataFrame
df.info()

Observations:

Missing values in the data

# determine if there are missing values
print('Missing Values')
df.isna().sum()

Observations:

Statistical summary of the data

# use the describe method to determine the statistical summary of the data
df.describe()

Observations:

## how many unique restaurant_names, cuisine_types and looking at other qualitative variables
df.describe(include = 'all')

Observations:

Number of orders that are rated / not rated

# determine the number of orders which are not rated
df.groupby('rating').count()

Observations:

3. Exploratory Data Analysis (EDA)

Univariate Analysis Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.)

#Countplots for the qualitative variables
# displaying the 'restaurant_name' using a countplot
plt.figure(figsize = (25, 5))
sns.countplot(df.restaurant_name, order = df.restaurant_name.value_counts().index)

#add a title to the countplot
plt.title('Countplot of Restaurants by Name')

#rotate the xtick labels for readability
plt.xticks(rotation = 90)

# displaying the 'cuisine_type' using a countplot
plt.figure(figsize = (25, 5))
sns.countplot(df.cuisine_type, order = df.cuisine_type.value_counts().index)

#add a title to the countplot
plt.title('Countplot of cuisine_type')

#rotate the xtick labels for readability
plt.xticks(rotation = 45)

# displaying the 'day_of_the_week' using a countplot
plt.figure(figsize = (25, 5))
sns.countplot(df.day_of_the_week, order = df.day_of_the_week.value_counts().index)

#add a title to the countplot
plt.title('Countplot of day_of_the_week')

#set of boxplots of the quantitative variables
fig, axs = plt.subplots(1, 3)
#Boxplot for the cost_of_the_order
axs[0].boxplot(df['cost_of_the_order'])
axs[0].set_title('cost_of_the_order')

#Boxplot for the food_preparation_time
axs[1].boxplot(df['food_preparation_time'])
axs[1].set_title('food_preparation_time')

#Boxplot for the delivery_time
axs[2].boxplot(df['delivery_time'])
axs[2].set_title('delivery_time')

#adjust spacing for readability
fig.subplots_adjust(left=0.08, right=2, bottom=0.05, top=2,
                    hspace=0.5, wspace=0.5)

# set of distribution plots for the quantitative variables
# distribution plot for the cost_of_the_order
plt.figure(figsize = (15, 5))
sns.displot(df.cost_of_the_order, kde=True)
plt.title('Distribution of cost_of_the_order')
plt.xlabel('cost_of_the_order')
plt.ylabel('count')
          
# distribution plot for the food_preparation_time
plt.figure(figsize = (15, 5))
sns.displot(df.food_preparation_time, kde=True)
plt.title('Distribution of food_preparation_time')
plt.xlabel('food_preparation_time')
plt.ylabel('count')
          
# distribution plot for the delivery_time
plt.figure(figsize = (15, 5))
sns.displot(df.delivery_time, kde=True)
plt.title('Distribution of delivery_time')
plt.xlabel('delivery_time')
plt.ylabel('count')

#display plot
plt.show()

Countplot of Restaurants by Name

Countplot of Cuisine Type

Countplot of Day of Week

Box Plots

Distribution of Cost of the Order

Distribution of Food Preparation Time

Distribution of Delivery Time

Top 5 restaurants in terms of the number of orders received

#list the top 5 restaurants by the number of orders received
num_orders_per_restaurant = df.groupby('restaurant_name').count()
top_5_restaurants = num_orders_per_restaurant.sort_values(by = ['order_id'], ascending = False).head(5)
top_5_restaurants

Observations: The top 5 restaurants, ranked by the number of orders received, are:

Most popular cuisine on weekends

# find the most popular cuisines on the weekends
cuisine_by_day_of_week = pd.crosstab(df['cuisine_type'],df['day_of_the_week'])
popular_cuisine_weekend = cuisine_by_day_of_week.sort_values(by = ['Weekend'] , ascending = False)
popular_cuisine_weekend

Observations: The most popular top 5, cuisines on the weekend are:

Percentage of the orders cost more than 20 dollars

#find the number of orders greater than 20usd
greater_than_20usd = df['cost_of_the_order'] > 20
num_greater_than_20usd = greater_than_20usd.sum()

# to find the percentage of orders greater than 20usd, you will take the (number of orders greater than 20usd / total number of orders)*100
total_orders = df['cost_of_the_order'].count()

percent_orders_greater_20usd = (num_greater_than_20usd/total_orders)*100

print('The number of orders greater than 20usd is:', num_greater_than_20usd)
print('The number of total number of orders:', total_orders)
print('The percent of orders greater than 20usd is:', percent_orders_greater_20usd)

Observations:

Mean order delivery time

# find the mean order delivery time
mean_delivery_time = df['delivery_time'].mean()

print ('The mean order delivery time is:', mean_delivery_time)

Observations:

The top 3 most frequent customers

#list the top 3 most frequent customers
freq_customers = df.groupby('customer_id').count()
top_3_customers = freq_customers.sort_values(by = ['order_id'], ascending = False).head(3)
top_3_customers

Observations:

| Customer_id | number of orders |

| 52832 | 13 |
| 47440 | 10 |
| 83287 | 9 |

Multivariate Analysis Perform a multivariate analysis to explore relationships between the important variables in the dataset.

# create a pair plot to see if there are any relationships 
sns.pairplot(df)


# identify relationships between the cuisine type and the cost of the order
df.hist(by='cuisine_type',column = 'cost_of_the_order', figsize=(20,30))

# identify relationships between the cuisine type and the day_of_the_week
df.hist(by='cuisine_type',column = 'day_of_the_week', figsize=(20,30))

# identify relationships between the cuisine type and the rating
df.hist(by='cuisine_type',column = 'rating', figsize=(20,30))

# identify relationships between the cuisine type and the food_preparation_time
df.hist(by='cuisine_type',column = 'food_preparation_time', figsize=(20,30))

# identify relationships between the cuisine type and the delivery_time
df.hist(by='cuisine_type',column = 'delivery_time', figsize=(20,30))

# identify relationships between the cost_of_the_order and the day_of_the_week
df.hist(by='day_of_the_week',column = 'cost_of_the_order', figsize=(10,15))

# identify relationships between the food_preparation_time and the day_of_the_week
df.hist(by='day_of_the_week',column = 'food_preparation_time', figsize=(10,15))

# identify relationships between the food_preparation_time and the day_of_the_week
df.hist(by='day_of_the_week',column = 'delivery_time', figsize=(10,15))

# identify relationships between the cost_of_the_order and the rating
df.hist(by='rating',column = 'cost_of_the_order', figsize=(10,15))



# find the correlations between the features
corr = df.corr()
corr

# create the heatmap of the correlation between the features
sns.heatmap(corr, annot=True)

plt.show()

Pairplot

Cuisine Type and Cost of Order

Cuisine Type and Day of the Week

Cuisine Type and Rating

Cuisine Type and Food Preparation Time

Cuisine Type and Delivery Time

Cost of Order and Day of the Week

Food Preparation Time and Day of the Week

Delivery Time and Day of the Week

Rating and Cost of Order & Heatmap

Determine the number of restaurants with a rating count of more than 50 and with an average rating greater than 4 Restaurants meeting these criteria are able to become part of a promotional offer for advertising their restaurant.

# create a dataframe with the specific ratings for each restaurant 
all_ratings_by_restaurant = pd.crosstab(df['restaurant_name'],df['rating'] )

# need to drop the 'not given' responses in order to sum the total ratings for each restaurant
ratings_by_restaurant = all_ratings_by_restaurant.drop('Not given', axis=1)

# find the total number of reveiws each restaurant received
total_reviews = ratings_by_restaurant['total_reviews'] = ratings_by_restaurant.sum(axis=1)

# find the average rating for each restaurant
column = list(ratings_by_restaurant.columns)
temp_lst = ['3', '4', '5']

for column in temp_lst:
    ratings_by_restaurant['average_rating'] = (ratings_by_restaurant['3'] * 3 + ratings_by_restaurant['4'] * 4 + ratings_by_restaurant['5'] * 5)/ratings_by_restaurant['total_reviews']
    
# sort the total_reviews in descending order
ratings_by_restaurant.sort_values(by = ['total_reviews'], ascending = False)

temp_lst2 = ['total_reviews', 'average_rating']

# use a loop to go through all of the objects to determine if the restaurant has 50 or more reviews and an average of greater than 4.0
for column in temp_lst2:
    ratings_by_restaurant['More_50'] = ratings_by_restaurant['total_reviews'] >= 50
    ratings_by_restaurant['More_4'] = ratings_by_restaurant['average_rating'] >= 4

# determine which restaurants meet the criteria 
promotion_restaurants = ratings_by_restaurant[(ratings_by_restaurant['More_50'] == True) & (ratings_by_restaurant['More_4'] == True)]

# display the restaurants which meet the criteria
promotion_restaurants

Outcome_Sorted

Observations:

FoodHub charges the restaurant 25% on orders having cost greater than 20 dollars and 15% on orders having cost greater than 5 dollars.

The net revenue generated by the company across all orders

# find the orders that are over 5 usd and over 20 usd
# create a column 'charge' with the cost *0.15 or cost*0.25 respectively%

#create a column 'charge' set to zero
df['charge'] = 0

# find the objects between 5 and 20 and multiply by 0.15 and find objects greater than 20 and multiply by 0.25
for index, row in df.iterrows():
    cost = row['cost_of_the_order']
    if cost > 5 and cost <= 20:
        df.at[index, 'charge'] = cost * 0.15
    elif cost > 20:
        df.at[index, 'charge'] = cost * 0.25

# find the total revenue by summing the 'charge' column
net_revenue = df['charge'].sum()

print('The net revenue generated by the company across all orders is:', net_revenue)

The net revenue generated by the company across all orders is: 6166.303

Observations: The total net revenue by the company across all orders is: 6166.31 USD

The Percentage of orders which take more than 60 minutes to get delivered from the time the order is placed? (Inclusive of food preparation time)

# find the number of orders which require more than 60 minutes to deliver the food

delivery_time_from_order = df['food_preparation_time'] + df['delivery_time']
delivery_time_from_order.describe()

df['over_60min'] = (delivery_time_from_order > 60).astype(int)

df['over_60min'].sum()

percent_delivery_over_60 = (df['over_60min'].sum() / df['order_id'].count()) * 100

print('The percent of orders that take more than 60 minutes to get delivered from the time the order is placed is:', percent_delivery_over_60)

The percent of orders that take more than 60 minutes to get delivered from the time the order is placed is: 10.537407797681771

Observations:

Mean delivery time during weekdays vs weekends

# find the mean delivery time grouped by the day_of_the_week 
by_day_of_the_week = df.groupby('day_of_the_week').mean()

mean_delivery_by_day_of_the_week = by_day_of_the_week['delivery_time']
#delivery_time_by_day_of_week = pd.crosstab(df.groupby('day_of_the_week'), df['delivery_time'])

#delivery_time_by_day_of_week = pd.crosstab(df['day_of_the_week'], df['delivery_time'])

by_day_of_the_week 
mean_delivery_by_day_of_the_week

day_of_the_week

Weekday 28.340037

Weekend 22.470022

Name: delivery_time, dtype: float64

Observation:

4. Conclusion and Recommendations

There were 1898 orders analyzed.

There are 178 unique restaurants. Shake Shack was the restaurant with the most orders, 219.

There are 14 unique cuisine types. American was the most ordered cuisine with 584 orders.

Costs of the Orders The costs of the orders range from 4.47 USD to 35.41 USD with the mean cost being $16.50. The cost of the food has a positively skewed distribution. Only approximately 29.24% (555 orders) of the orders cost over 20 USD.

The days of the week were separated into two variables, weekday and weekend. 71% (1351/1898) of the orders occur on the weekend.

Reviews Food hub asks customers to review the food, and 4 unique reviews were offered: 3, 4, 5, Not given. 736 (38.7%) of the customers chose not to review their received order.

Not Given was the most common response.

Food Preparation and Delivery The food_preparation_time ranged from 20 minutes to 35 minutes, with the mean being 27.37 minutes. The food preparation time was uniformly distributed. The food delivery time ranged from 15 to 33 minutes, with a mean of 24.16 minutes. The delivery time was negatively skewed. The mean delivery time for:

The summary statistics for the delivery time for orders, including food preparation, are:

Only ‘10%’ of the orders placed take over 60 minutes for delivery from the time it was placed.

Relationships There are no correlations identified between the quantitative variables.

The Cuisine and x The relationship shown by: The cuisine and cost of the order is positively skewed for all except The Southern cuisine and the day of the week shows the weekend is the most popular for all cuisines and the ratings given show Not Given was the highest received for 8 cuisines, however for Indian, 5 was the highest, Korean was equal between 4 and Not given, Mediterranean has more 5, Middle Eastern and Southern has 4 as the most given review, and Vietnamese had a tie between 3, 5 and Not given.

For the restaurants which received reviews, the following four received the most reviews:

The top 5 restaurants ordered from are:

The most popular top 5 cuisines on the weekend are:

Net Revenue Almost all orders are charged a fee: ‘15%’ for orders over ‘5 USD’ and ‘25%’ for orders over ‘20 USD’. The company received a net revenue of $6166.31 across all orders.

Recommendations:

-As the mean averages are 16.50 USD, look at how the charges are broken down. -It may be beneficial to add 20% for orders that are over $10 USD.

-To gain more reviews from the customers add a 5 USD off coupon on their next order for them to leave a review. This would increase customer engagement and encourage return business.

-The food prep and delivery had reasonable time frames with 75% taking 56 minutes or less. -Foodhub could have delivery drivers closer to the restaurants which receive the most orders.