Bike Sharing | R & Data Visualization

Overview

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations

Characters & Teams

Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Scenario

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.

Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.

Moreno has assigned you to answer the question: How do annual members and casual riders use Cyclistic bikes differently?

Methodology

In order to answer the key business questions, I will follow the six steps of the Google’s Data Analytics Framework.

Ask – Deliver clear business objective
Prepare – Deliver description of data sources
Process – Deliver documentation of data cleaning and manipulation
Analyze – Deliver summary of analysis
Share – Deliver supporting visualizations and key findings
Act – Deliver top 3 recommendations

1. Ask

Key Deliverable: Deliver a clear statement of the business objective:

Questions that may reveal key insights about members vs casual riders:

What are the bike type preferences?
How often are rides taken?
What time of hour/day/week/month are preferred?
What is the average duration ride?
What is the average distance traveled?
What are the location preferences?

By leveraging data driven insights and visualizations we aim to present a strategy that resonates with our stakeholders, secures their endorsement for implementation, and drives the growth and profitability of Cyclistic.

Primary business objective: Investigate the factors and trends that distinguish casual riders from annual members.

2. Prepare

Key Objective: Deliver a description of data sources utilized:

All data was collected from a real world ride share company called Divvy Bikes. The divvy-tripdata has been made publically available by Motivate International Inc. under this license. The following files will be used to analyze the data from the previous 12 months (August 2022 – July 2023).

202208-divvy-tripdata.csv, 202209-divvy-publictripdata.csv, 202210-divvy-tripdata.csv, 202211-divvy-tripdata.csv, 202212-divvy-tripdata.csv, 202301-divvy-tripdata.csv, 202302-divvy-tripdata.csv, 202303-divvy-tripdata.csv, 202304-divvy-tripdata.csv, 202305-divvy-tripdata.csv, 202306-divvy-tripdata.csv, 202307-divvy-tripdata.csv

Files are provided in separate monthly .csv files each with 13 columns of data. The data includes the following information: rider ID number, type of bike ridden, time of ride start and finish, station name and ID of ride start and finish, geographical coordinates of the stations, and the type of client (member/casual).

Before we begin the analysis, lets check the quality of our data by ensuring the dataset meets the criteria for ROCCC:

Reliable
Original
Complete
Current
Consistent

The data is reliable and original because it has been vetted by the Google Data Analytics team and comes directly from a real world company (Divvy Bikes) who has made their data publicly available. We will check the data for completeness in the Process phase. The data is current as we have the most recent 12 months available, and the data appears to be consistent with the number of columns and naming conventions. We will inspect the consistency in more detail in the Process phase.

Data Sources: The data, sourced from Divvy Bikes, has been vetted, is reliable, original, complete for the specified time period, current, and demonstrates consistency in its structure, meeting the criteria for ROCCC and making it suitable for analysis.

3. Process (R Studio)

Key Objective: Deliver documentation of data cleaning and manipulation:

This analysis was conducted with R-Studio (Posit Cloud) due to the powerful capability for data manipulation, visualization, and reproducibility.

To set up the environment, all relevant packages were installed and loaded:

# Install relevant packages if not available if(!require("tidyverse")) install.packages("tidyverse") if(!require("janitor")) install.packages("janitor") if(!require("geosphere")) install.packages("geosphere") 

# Load the required packages 
library(tidyverse) 
library(janitor) 
library(geosphere) 

Loading required package: tidyverse 
── Attaching core tidyverse packages ─────── tidyverse 2.0.0 ── 
✔ dplyr 1.1.2 ✔ readr 2.1.4 ✔ forcats 1.0.0 ✔ stringr 1.5.0 ✔ ggplot2 3.4.2 ✔ tibble 3.2.1 ✔ lubridate 1.9.2 ✔ tidyr 1.3.0 ✔ purrr 1.0.1 

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── 
✖ dplyr::filter() masks stats::filter() 
✖ dplyr::lag() masks stats::lag() 

ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors Loading required package: janitor 

Attaching package: ‘janitor’ 

The following objects are masked from ‘package:stats’: chisq.test, fisher.test 

Loading required package: geosphere 

The legacy packages maptools, rgdal, and rgeos, underpinning the sp package, which was just loaded, will retire in October 2023. Please refer to R-spatial evolution reports for details, especially https://r-spatial.org/r/2023/05/15/evolution4.html. It may be desirable to make the sf package available; package maintainers should consider adding sf to Suggests:. The sp package is now running under evolution status 2 (status 2 uses the sf package in place of rgdal)

Next, we can read in the .csv files that have been uploaded to Kaggle using the read_csv() function. Note that 09/2022 was provided with the unique name “divvy-publictripdata.csv” and had to be read in accordingly.

# Read CSV and assign names to each data frame 
trips_2022_08 <- read_csv("/kaggle/input/cyclistic-tripdata/202208-divvy-tripdata.csv")
 
trips_2022_09 <- read_csv("/kaggle/input/cyclistic-tripdata/202209-divvy-publictripdata.csv")
 
trips_2022_10 <- read_csv("/kaggle/input/cyclistic-tripdata/202210-divvy-tripdata.csv") 

trips_2022_11 <- read_csv("/kaggle/input/cyclistic-tripdata/202211-divvy-tripdata.csv") 

trips_2022_12 <- read_csv("/kaggle/input/cyclistic-tripdata/202212-divvy-tripdata.csv") 

trips_2023_01 <- read_csv("/kaggle/input/cyclistic-tripdata/202301-divvy-tripdata.csv") 

trips_2023_02 <- read_csv("/kaggle/input/cyclistic-tripdata/202302-divvy-tripdata.csv") 

trips_2023_03 <- read_csv("/kaggle/input/cyclistic-tripdata/202303-divvy-tripdata.csv") 

trips_2023_04 <- read_csv("/kaggle/input/cyclistic-tripdata/202304-divvy-tripdata.csv") 

trips_2023_05 <- read_csv("/kaggle/input/cyclistic-tripdata/202305-divvy-tripdata.csv") 

trips_2023_06 <- read_csv("/kaggle/input/cyclistic-tripdata/202306-divvy-tripdata.csv") 

trips_2023_07 <- read_csv("/kaggle/input/cyclistic-tripdata/202307-divvy-tripdata.csv") 

Rows: 785932 Columns: 13 
── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 701339 Columns: 13 
── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 558685 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 337735 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 181806 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 190301 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 190445 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 258678 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 426590 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 604827 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 719618 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. 

Rows: 767650 Columns: 13 ── Column specification ──────────────────────────────────────────────────────── 
Delimiter: "," 
chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_... 
dbl (4): start_lat, start_lng, end_lat, end_lng 
dttm (2): started_at, ended_at 

ℹ Use `spec()` to retrieve the full column specification for this data. 
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Before we can combine the individual monthly .csv files into a single table, we must confirm that the data from each data set is bindable. We will utilize the function: compare_df_cols() from the “janitor” package to check that the column names and the number of columns in each file match. If all checks pass, it returns TRUE, indicating that the data frames can be row-bound without mismatching rows.

# Check whether the set of data frames are row-bindable compare_df_cols_same( 
trips_2022_08,trips_2022_09,trips_2022_10, trips_2022_11,trips_2022_12,trips_2023_01, trips_2023_02,trips_2023_03,trips_2023_04, trips_2023_05,trips_2023_06,trips_2023_07 )

TRUE

Once that is confirmed we can bind all the files into a single data frame: total_trips_original

# Combine all data frames into a single data frame total_trips_original <- bind_rows( trips_2022_08,trips_2022_09,trips_2022_10, trips_2022_11,trips_2022_12,trips_2023_01, trips_2023_02,trips_2023_03,trips_2023_04, trips_2023_05,trips_2023_06,trips_2023_07 )

Now that the data is combined into a single data frame, we will assess the data’s integrity and make changes to suit our needs. Some important items to check include:

Standardize column names
Check for duplicates
Confirm ride_ID has 16 characters
Confirm start date < end date
Replace NA values
Add additional columns

We can standardize column names by using the clean_names() function from the “janitor” package. This will remove special characters, spaces, and convert all column names to lowercase.

# Standardize & clean column names
total_trips_cleaned <- clean_names(total_trips_original)
colnames(total_trips_cleaned)

'ride_id''rideable_type''started_at''ended_at''start_station_name''start_station_id''end_station_name''end_station_id''start_lat''start_lng''end_lat''end_lng''member_casual'

Preview the first 6 rows of the data frame using base R function: head()

head(total_trips_cleaned)

ride_id	rideable_type	started_at	ended_at	start_station_name	start_station_id	end_station_name	end_station_id	start_lat	start_lng	end_lat	end_lng	member_casual
<chr>	<chr>	<dttm>	<dttm>	<chr>	<chr>	<chr>	<chr>	<dbl>	<dbl>	<dbl>	<dbl>	<chr>
550CF7EFEAE0C618	electric_bike	2022-08-07 21:34:15	2022-08-07 21:41:46	NA	NA	NA	NA	41.93	-87.69	41.94	-87.72	casual
DAD198F405F9C5F5	electric_bike	2022-08-08 14:39:21	2022-08-08 14:53:23	NA	NA	NA	NA	41.89	-87.64	41.92	-87.64	casual
E6F2BC47B65CB7FD	electric_bike	2022-08-08 15:29:50	2022-08-08 15:40:34	NA	NA	NA	NA	41.97	-87.69	41.97	-87.66	casual
F597830181C2E13C	electric_bike	2022-08-08 02:43:50	2022-08-08 02:58:53	NA	NA	NA	NA	41.94	-87.65	41.97	-87.69	casual
0CE689BB4E313E8D	electric_bike	2022-08-07 20:24:06	2022-08-07 20:29:58	NA	NA	NA	NA	41.85	-87.65	41.84	-87.66	casual
BFA7E7CC69860C20	electric_bike	2022-08-08 13:06:08	2022-08-08 13:19:09	NA	NA	NA	NA	41.79	-87.72	41.82	-87.69	casual

List the column information and data types (numeric, character, etc) using the base R function: str()

str(total_trips_cleaned)

spc_tbl_ [5,723,606 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ride_id           : chr [1:5723606] "550CF7EFEAE0C618" "DAD198F405F9C5F5" "E6F2BC47B65CB7FD" "F597830181C2E13C" ...
 $ rideable_type     : chr [1:5723606] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
 $ started_at        : POSIXct[1:5723606], format: "2022-08-07 21:34:15" "2022-08-08 14:39:21" ...
 $ ended_at          : POSIXct[1:5723606], format: "2022-08-07 21:41:46" "2022-08-08 14:53:23" ...
 $ start_station_name: chr [1:5723606] NA NA NA NA ...
 $ start_station_id  : chr [1:5723606] NA NA NA NA ...
 $ end_station_name  : chr [1:5723606] NA NA NA NA ...
 $ end_station_id    : chr [1:5723606] NA NA NA NA ...
 $ start_lat         : num [1:5723606] 41.9 41.9 42 41.9 41.9 ...
 $ start_lng         : num [1:5723606] -87.7 -87.6 -87.7 -87.7 -87.7 ...
 $ end_lat           : num [1:5723606] 41.9 41.9 42 42 41.8 ...
 $ end_lng           : num [1:5723606] -87.7 -87.6 -87.7 -87.7 -87.7 ...
 $ member_casual     : chr [1:5723606] "casual" "casual" "casual" "casual" ...
 - attr(*, "spec")=
  .. cols(
  ..   ride_id = col_character(),
  ..   rideable_type = col_character(),
  ..   started_at = col_datetime(format = ""),
  ..   ended_at = col_datetime(format = ""),
  ..   start_station_name = col_character(),
  ..   start_station_id = col_character(),
  ..   end_station_name = col_character(),
  ..   end_station_id = col_character(),
  ..   start_lat = col_double(),
  ..   start_lng = col_double(),
  ..   end_lat = col_double(),
  ..   end_lng = col_double(),
  ..   member_casual = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

Overview of the combined data frame using “dplyr” function: glimpse()

glimpse(total_trips_cleaned)

Rows: 5,723,606
Columns: 13
$ ride_id            <chr> "550CF7EFEAE0C618", "DAD198F405F9C5F5", "E6F2BC47B6…
$ rideable_type      <chr> "electric_bike", "electric_bike", "electric_bike", …
$ started_at         <dttm> 2022-08-07 21:34:15, 2022-08-08 14:39:21, 2022-08-…
$ ended_at           <dttm> 2022-08-07 21:41:46, 2022-08-08 14:53:23, 2022-08-…
$ start_station_name <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ start_station_id   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ end_station_name   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ end_station_id     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ start_lat          <dbl> 41.93, 41.89, 41.97, 41.94, 41.85, 41.79, 41.89, 41…
$ start_lng          <dbl> -87.69, -87.64, -87.69, -87.65, -87.65, -87.72, -87…
$ end_lat            <dbl> 41.94, 41.92, 41.97, 41.97, 41.84, 41.82, 41.89, 41…
$ end_lng            <dbl> -87.72, -87.64, -87.66, -87.69, -87.66, -87.69, -87…
$ member_casual      <chr> "casual", "casual", "casual", "casual", "casual", "…

Highlight the statistical summary of data frame with base R function: summary()

summary(total_trips_cleaned)

   ride_id          rideable_type        started_at                 
 Length:5723606     Length:5723606     Min.   :2022-08-01 00:00:00  
 Class :character   Class :character   1st Qu.:2022-09-28 13:56:43  
 Mode  :character   Mode  :character   Median :2023-02-16 13:53:51  
                                       Mean   :2023-02-01 23:55:22  
                                       3rd Qu.:2023-06-03 07:41:37  
                                       Max.   :2023-07-31 23:59:56  
                                                                    
    ended_at                   start_station_name start_station_id  
 Min.   :2022-08-01 00:05:00   Length:5723606     Length:5723606    
 1st Qu.:2022-09-28 14:12:20   Class :character   Class :character  
 Median :2023-02-16 14:04:56   Mode  :character   Mode  :character  
 Mean   :2023-02-02 00:13:43                                        
 3rd Qu.:2023-06-03 08:00:15                                        
 Max.   :2023-08-12 04:53:41                                        
                                                                    
 end_station_name   end_station_id       start_lat       start_lng     
 Length:5723606     Length:5723606     Min.   :41.64   Min.   :-87.92  
 Class :character   Class :character   1st Qu.:41.88   1st Qu.:-87.66  
 Mode  :character   Mode  :character   Median :41.90   Median :-87.64  
                                       Mean   :41.90   Mean   :-87.65  
                                       3rd Qu.:41.93   3rd Qu.:-87.63  
                                       Max.   :42.07   Max.   :-87.52  
                                                                       
    end_lat         end_lng       member_casual     
 Min.   : 0.00   Min.   :-88.16   Length:5723606    
 1st Qu.:41.88   1st Qu.:-87.66   Class :character  
 Median :41.90   Median :-87.64   Mode  :character  
 Mean   :41.90   Mean   :-87.65                     
 3rd Qu.:41.93   3rd Qu.:-87.63                     
 Max.   :42.18   Max.   :  0.00                     
 NA's   :6102    NA's   :6102

Check for duplicate rows by ‘ride ID’ to ensure all rows are unique.

num_duplicate_rows <- sum(
duplicated(total_trips_cleaned$ride_id))
cat("Number of rows with duplicates:", num_duplicate_rows, "\n")

Number of rows with duplicates: 0

Verify total distinct rows = total # rows (Rows: 5,723,606)

n_distinct(total_trips_cleaned)

5723606

Confirm each ride ID has 16 characters

sum(nchar(total_trips_cleaned$ride_id)!=16)

0

Confirm all dates are inside expected time frame (7/1/2022 – 7/31/2023)

max_date <- total_trips_cleaned %>% summarise(max_date = max(started_at))
min_date <- total_trips_cleaned %>% summarise(min_date = min(started_at))
print(max_date)
print(min_date)

# A tibble: 1 × 1
  max_date           
  <dttm>             
1 2023-07-31 23:59:56
# A tibble: 1 × 1
  min_date           
  <dttm>             
1 2022-08-01 00:00:00

From the glimpse() output we saw that there are a number of inputs listed as NA. We want to replace all NA inputs with the text “Missing” if possible.

First we will count the total number of NA values in each column of the data frame

rows_before_mutation <- nrow(total_trips_cleaned)
na_count <- sapply(total_trips_cleaned, function(col) sum(is.na(col)))
print(na_count)

           ride_id      rideable_type         started_at           ended_at 
                 0                  0                  0                  0 
start_station_name   start_station_id   end_station_name     end_station_id 
            868772             868904             925008             925149 
         start_lat          start_lng            end_lat            end_lng 
                 0                  0               6102               6102 
     member_casual 
                 0

We can see there are 6 columns with NA values in the output above. The end_lat and end_lng columns are type ‘dbl’ and cannot be adjusted with characters. All the other 4 columns are type ‘chr’ and can be replaced.

columns_to_mutate <- c("start_station_name", "start_station_id",
                       "end_station_name", "end_station_id")
total_trips_prepared <- total_trips_cleaned %>%
  mutate(across(all_of(columns_to_mutate), ~ replace_na(., "missing")))

Finally, we will confirm the number of rows mutated for each column

missing_recount <- sapply(total_trips_prepared[columns_to_mutate],
                          function(col) sum(col == "missing"))
print(missing_recount)

start_station_name   start_station_id   end_station_name     end_station_id 
            868772             868904             925008             925149

Replacing NA inputs is important to maintain data consistency and ensure that missing values are clearly labeled, preventing errors and enabling meaningful analysis and visualization.

Another item we want to check with this data is for incorrect times where start time is reported being later in the day than end times. This data is unable to be utilized without further information about what the cause of the error was. For this purpose we will remove these rows and list the total number of rows removed.

total_trips_transformed <- total_trips_prepared %>%
  filter(started_at <= ended_at)

rows_filtered_out <- nrow(total_trips_prepared) - nrow(total_trips_transformed)

cat("Number of rows filtered out:", rows_filtered_out, "\n")
Number of rows filtered out: 121

Now we can take a look at a summary of the remaining missing values and assess the number of distinct responses per column.

missing_summary <- sapply(total_trips_transformed, function(col) sum(is.na(col)))

total_inputs <- nrow(total_trips_transformed)

percent_difference <- (missing_summary / total_inputs) * 100

distinct_responses <- sapply(total_trips_transformed, function(col) length(unique(col)))

summary_table <- data.frame(
  column = names(total_trips_transformed),
  missing_Count = missing_summary,
  total_Inputs = rep(total_inputs, length(total_trips_transformed)),
  percent_difference = percent_difference,
  distinct_responses = distinct_responses
)

print(summary_table)

                               column missing_Count total_Inputs
ride_id                       ride_id             0      5723485
rideable_type           rideable_type             0      5723485
started_at                 started_at             0      5723485
ended_at                     ended_at             0      5723485
start_station_name start_station_name             0      5723485
start_station_id     start_station_id             0      5723485
end_station_name     end_station_name             0      5723485
end_station_id         end_station_id             0      5723485
start_lat                   start_lat             0      5723485
start_lng                   start_lng             0      5723485
end_lat                       end_lat          6102      5723485
end_lng                       end_lng          6102      5723485
member_casual           member_casual             0      5723485
                   percent_difference distinct_responses
ride_id                     0.0000000            5723485
rideable_type               0.0000000                  3
started_at                  0.0000000            4820913
ended_at                    0.0000000            4833032
start_station_name          0.0000000               1802
start_station_id            0.0000000               1503
end_station_name            0.0000000               1800
end_station_id              0.0000000               1506
start_lat                   0.0000000             782347
start_lng                   0.0000000             740445
end_lat                     0.1066134              13868
end_lng                     0.1066134              13983
member_casual               0.0000000                  2

From the out put we can see there are three distinct responses for “rideable_type” which are ‘classic’, ‘electric’ and ‘docked’. For the member_casual column there are two distinct responses: ‘member’ and ‘casual’.

We can also see that the remaining number of NA inputs only accounts for ~0.1% of the total values for end_lat and end_lng. This shouldn’t cause any problems.

And finally to wrap up the Process phase of the analysis we will modify the data frame using the mutate() function from tidyverse package to include new columns for date, year, month, day, hour, duration_s, duration_h, ride_distance_km

total_trips_transformed <- total_trips_transformed %>%
  mutate(
    date = as.Date(started_at),
    year = year(started_at),
    month = month(started_at, label = TRUE),
    day = wday(started_at, label = TRUE),
    hour = hour(started_at),
    duration_s = as.numeric(difftime(ended_at, started_at, units = "secs")),
    duration_h = round(duration_s / 2400,3),
    ride_distance_km = distGeo(
      matrix(c(total_trips_transformed$start_lng, total_trips_transformed$start_lat), ncol = 2),
      matrix(c(total_trips_transformed$end_lng, total_trips_transformed$end_lat), ncol = 2)
    ) / 1000
  )

glimpse(total_trips_transformed)

Rows: 5,723,485
Columns: 21
$ ride_id            <chr> "550CF7EFEAE0C618", "DAD198F405F9C5F5", "E6F2BC47B6…
$ rideable_type      <chr> "electric_bike", "electric_bike", "electric_bike", …
$ started_at         <dttm> 2022-08-07 21:34:15, 2022-08-08 14:39:21, 2022-08-…
$ ended_at           <dttm> 2022-08-07 21:41:46, 2022-08-08 14:53:23, 2022-08-…
$ start_station_name <chr> "missing", "missing", "missing", "missing", "missin…
$ start_station_id   <chr> "missing", "missing", "missing", "missing", "missin…
$ end_station_name   <chr> "missing", "missing", "missing", "missing", "missin…
$ end_station_id     <chr> "missing", "missing", "missing", "missing", "missin…
$ start_lat          <dbl> 41.93, 41.89, 41.97, 41.94, 41.85, 41.79, 41.89, 41…
$ start_lng          <dbl> -87.69, -87.64, -87.69, -87.65, -87.65, -87.72, -87…
$ end_lat            <dbl> 41.94, 41.92, 41.97, 41.97, 41.84, 41.82, 41.89, 41…
$ end_lng            <dbl> -87.72, -87.64, -87.66, -87.69, -87.66, -87.69, -87…
$ member_casual      <chr> "casual", "casual", "casual", "casual", "casual", "…
$ date               <date> 2022-08-07, 2022-08-08, 2022-08-08, 2022-08-08, 20…
$ year               <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 202…
$ month              <ord> Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, Aug, A…
$ day                <ord> Sun, Mon, Mon, Mon, Sun, Mon, Mon, Sun, Sun, Sun, S…
$ hour               <int> 21, 14, 15, 2, 20, 13, 14, 20, 21, 23, 20, 11, 22, …
$ duration_s         <dbl> 451, 842, 644, 903, 352, 781, 536, 1077, 683, 669, …
$ duration_h         <dbl> 0.188, 0.351, 0.268, 0.376, 0.147, 0.325, 0.223, 0.…
$ ride_distance_km   <dbl> 2.7247194, 3.3321432, 2.4866899, 4.7012385, 1.38687…

Now we are ready to start analyzing our data!

4. Analyze (R Studio)

Key Objective: Deliver summary of analysis:

The goal of the analysis phase is to assess any relationships we can find between members and casual riders. The following comparisons will be investigated for member vs casual:

Total # of trips
Mean/median/max/min of ride duration
Average ride time per day
Average ride time per week
Total # of rides per day
Total # of rides per hour
Total # of rides per month
Average distance per month
Average distance per day
Average distance per hour
Bike type preference
Top start stations
Top end stations

ride_summary <- total_trips_transformed %>%
  group_by(member_casual) %>%
  summarise(total_rides = n()) %>%
  mutate(percentage = (total_rides / sum(total_rides)) * 100)

print(ride_summary)

# A tibble: 2 × 3
  member_casual total_rides percentage
  <chr>               <int>      <dbl>
1 casual            2169497       37.9
2 member            3553988       62.1

Right off the bat we determine that casual riders make up 37.9%, and members make up 62.1% of the total number of rides.

We can take a look at the mean/median/max/min of the ride duration

summary(total_trips_transformed$duration_h)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
   0.000    0.136    0.240    0.459    0.428 1286.535

We can see the min value for all durations is 0. This confirms that we removed all negative values due to the start time > end time

Next we break up the mean/median/max/min by member casual to get an overview of the statistics.

aggregate(total_trips_transformed$duration_h ~ total_trips_transformed$member_casual, FUN = mean)
aggregate(total_trips_transformed$duration_h ~ total_trips_transformed$member_casual, FUN = median)
aggregate(total_trips_transformed$duration_h ~ total_trips_transformed$member_casual, FUN = max)
aggregate(total_trips_transformed$duration_h ~ total_trips_transformed$member_casual, FUN = min)

A data.frame: 2 × 2
total_trips_transformed$member_casual	total_trips_transformed$duration_h
<chr>	<dbl>
casual	0.7035874
member	0.3096731
A data.frame: 2 × 2
total_trips_transformed$member_casual	total_trips_transformed$duration_h
<chr>	<dbl>
casual	0.296
member	0.213
A data.frame: 2 × 2
total_trips_transformed$member_casual	total_trips_transformed$duration_h
<chr>	<dbl>
casual	1286.535
member	38.992
A data.frame: 2 × 2
total_trips_transformed$member_casual	total_trips_transformed$duration_h
<chr>	<dbl>
casual	0
member	0

We can see:

Casual rider average trip time = 0.70 hours
Member rider average trip time = 0.31 hours

So even though there are more members, casual riders average more than double the average member ride duration.

We will group the average ride time each day by member/casual

aggregate(total_trips_transformed$duration_h ~ total_trips_transformed$member_casual +
          total_trips_transformed$day, FUN = mean)

A data.frame: 14 × 3
total_trips_transformed$member_casual	total_trips_transformed$day	total_trips_transformed$duration_h
<chr>	<ord>	<dbl>
casual	Sun	0.8258949
member	Sun	0.3421641
casual	Mon	0.6817051
member	Mon	0.2958516
casual	Tue	0.6286331
member	Tue	0.2977303
casual	Wed	0.6018506
member	Wed	0.2943977
casual	Thu	0.5961304
member	Thu	0.2972516
casual	Fri	0.6846345
member	Fri	0.3088395
casual	Sat	0.8146626
member	Sat	0.3467641

Lets turn this into a plot using ‘tidyverse’ package: ggplot2()

average_dur <- total_trips_transformed %>%
  group_by(member_casual, day) %>%
  summarise(number_of_rides = n(),
            average_duration = mean(duration_h)) %>%
  arrange(member_casual, day) %>%
  ggplot(aes(x = day, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge") +
  labs(title = "Average Ride Duration per Week",
       x = "Day of the Week",
       y = "Average Duration (hours)",
       fill = "Member/Casual") 

print(average_dur)

The takeaway from this plot is that casual riders are averaging at least twice the duration of rides as member riders every single day. Both members and casual riders average longer ride duration on the weekends (Fri,Sat,Sun) than the weekdays.

Lets take a look at the total number of member/casual riders per day

total_riders <- total_trips_transformed %>%
  group_by(member_casual, day) %>%
  summarise(number_of_rides = n(),
            average_duration= mean(duration_h)) %>%
  arrange(member_casual, day) %>%
  ggplot(aes(x=day,y=number_of_rides, fill=member_casual))+
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Total Number of Rides per Day",
       x = "Day of the Week",
       y = "Number of Rides",
       fill = "Member/Casual") 

print(total_riders)

`summarise()` has grouped output by 'member_casual'. You can override using the
`.groups` argument.

This plot shows us that members tend to have more rides during the week than the weekend, whereas the casual riders are the opposite: they take more rides during the weekend.

Lets break it down even further and compare the total rides per hour

casual_hourly <- total_trips_transformed %>%
  group_by(hour, member_casual) %>%
  summarise(number_of_rides = n()) %>%
  ggplot(aes(x = hour, y = number_of_rides, fill = member_casual)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Total Number of Rides per Hour",
       x = "Hour",
       y = "Number of Rides",
       fill = "Member/Casual") +
  theme_minimal() +
  scale_y_continuous(labels = scales::comma) 

print(casual_hourly)

`summarise()` has grouped output by 'hour'. You can override using the
`.groups` argument.

This plot shows that the peak hour for both member/casual rides is hour 17 (5:00pm) which corresponds with the end of the workday. This corresponds with leaving work or the beginning of leisure time. There is a spike in member usage early in the morning around 7:00 – 8:00 which indicates that members appear to be using the bike service to ride to work in the morning.

One last time frame we’ll take a look at is the total number of member/casual riders per month.

casual_monthly <- total_trips_transformed %>%
  group_by(month, member_casual) %>%
  summarise(number_of_rides = n()) %>%
  ggplot(aes(x = month, y = number_of_rides, fill = member_casual)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Total Number of Rides per Month",
       x = "Month",
       y = "Number of Rides",
       fill = "Member/Casual") +
  theme_minimal()+
  scale_y_continuous(labels = scales::comma) 

print(casual_monthly)

`summarise()` has grouped output by 'month'. You can override using the
`.groups` argument.

Casual and member riders experience peak season during the summer months of Jun, July, August. There is a steady decline for members and a steep decline in casual riders during the winter months of November – March.

From these total number of trips comparison we can make the following conclusions:

Casual riders average much longer rides than members.
Casual riders prefer to ride on weekends vs members prefer weekdays.
Members tend to ride for work and leisure.
Casual riders tend to ride for leisure.
Members and casual riders prefer summer months

We can also look at ride distance. This is calculated using the distGeo() function from the geosphere package. First we’ll take a look at average distance traveled per day.

distance_month <- total_trips_transformed %>%
  group_by(month, member_casual) %>%
  filter(!is.na(ride_distance_km) | ride_distance_km != 0) %>%
  summarize(average_distance = mean(ride_distance_km)) %>%
  ggplot(aes(x = month, y = average_distance, 
             color = member_casual, group = member_casual)) +
  geom_line() +
  labs(title = "Average Distance per Month",
       x = "Month",
       y = "Average Distance",
       color = "Member/Casual") +
  theme_minimal()

print(distance_month)

`summarise()` has grouped output by 'month'. You can override using the
`.groups` argument.

Both members and casual riders follow similar trends for the average distance traveled per month. Members experience a slight jump in rides at the beginning of November but no distinctive conclusions can be made from this plot.

distance_day_line <- total_trips_transformed %>%
  group_by(day, member_casual) %>%
  filter(!is.na(ride_distance_km) | ride_distance_km != 0) %>%
  summarize(average_distance = mean(ride_distance_km)) %>%
  ggplot(aes(x = day, y = average_distance, color = member_casual, 
             group = member_casual)) +
  geom_line() +
  labs(title = "Average Distance per Day",
       x = "Day",
       y = "Average Distance",
       color = "Member/Casual") +
  theme_minimal()

print(distance_day_line)

`summarise()` has grouped output by 'day'. You can override using the `.groups`
argument.

If we zoom into the average distance traveled per day, we can see again that casual riders tend to travel further than members on the weekends. Members peak average occurs on Wednesday.

Finally, we will look at the average distance traveled per hour.

distance_hour_line <- total_trips_transformed %>%
  group_by(hour, member_casual) %>%
  filter(!is.na(ride_distance_km) | ride_distance_km != 0) %>%
  summarize(average_distance = mean(ride_distance_km)) %>%
  ggplot(aes(x = hour, y = average_distance, color = member_casual)) +
  geom_line() +
  labs(title = "Average Distance per Hour",
       x = "Hour",
       y = "Average Distance",
       color = "Member/Casual") +
  theme_minimal()

print(distance_hour_line)

`summarise()` has grouped output by 'hour'. You can override using the
`.groups` argument.

This plot gives us more information since there divergence between members and causal riders.

From these distance comparison we can make the following conclusions:

Casual riders make longer rides on weekends
Members make longer rides early mornings and late nights.
Casual riders make longer rides during the hours of 10:00am – 4:00pm

Next we can take a look at the bike preferences for members and casual riders. There are three different types of bike that Cyclistic offers: Classic, Electric, Docked.

pie_chart_member <- total_trips_transformed %>%
  filter(member_casual == "member") %>%
  group_by(rideable_type) %>%
  summarise(count = n()) %>%
  mutate(percentage = count / sum(count) * 100) %>%
  arrange(desc(percentage)) %>%  
  mutate(position = cumsum(percentage) - percentage / 2) %>%  
  ggplot(aes(x = "", y = percentage, fill = rideable_type)) +
  geom_bar(stat = "identity", width = 1, position = position_stack(vjust = 0.5),
           color = "white") +
  geom_text(aes(label = ifelse(percentage %in% c(0, 25, 50, 100), "", 
                               paste0(round(percentage), "%"))),
            position = position_stack(vjust = 0.5),
            check_overlap = TRUE,
            color = "white") +  
  coord_polar(theta = "y") +
  scale_x_discrete(breaks = NULL) +  
  labs(title = "Bike Type Distribution for Member Riders",
       fill = "Bike Type") +
  theme_void() +
  theme(legend.position = "right")

print(pie_chart_member)

Members do not utilize docked bikes. They split usage between electric bikes (52%) and classic bikes (48%)

pie_chart_casual <- total_trips_transformed %>%
  filter(member_casual == "casual") %>%
  group_by(rideable_type) %>%
  summarise(count = n()) %>%
  mutate(percentage = count / sum(count) * 100) %>%
  arrange(desc(percentage)) %>%  
  mutate(position = cumsum(percentage) - percentage / 2) %>%  
  ggplot(aes(x = "", y = percentage, fill = rideable_type)) +
  geom_bar(stat = "identity", width = 1, position = position_stack(vjust = 0.5),
           color = "white") +
  geom_text(aes(label = ifelse(percentage %in% c(0, 25, 50, 100), "", 
                               paste0(round(percentage), "%"))),
            position = position_stack(vjust = 0.5),
            check_overlap = TRUE,
            color = "white") +  
  coord_polar(theta = "y") +
  scale_x_discrete(breaks = NULL) +  
  labs(title = "Bike Type Distribution for Casual Riders",
       fill = "Bike Type") +
  theme_void() +
  theme(legend.position = "right")

print(pie_chart_casual)

Casual riders tend to favor electric bikes with a 58% preference, while classic bikes are used 36% of the time and 6% for docked bikes.

Lastly we will take a look at the location preferences of the riders. We will output a table for the start_station and end_station preferences of members and casual riders.

top_start_stations <- total_trips_transformed %>%
  group_by(member_casual, start_station_name) %>%
  filter(start_station_name != "missing") %>%
  summarise(total_rides = n()) %>%
  arrange(member_casual, desc(total_rides)) %>%
  group_by(member_casual) %>%
  top_n(10)

print(top_start_stations)

# List the top 10 end stations 
top_end_stations <- total_trips_transformed %>%
  group_by(member_casual, end_station_name) %>%
  filter(end_station_name != "missing") %>%
  summarise(total_rides = n()) %>%
  arrange(member_casual, desc(total_rides)) %>%
  group_by(member_casual) %>%
  top_n(10)

print(top_end_stations)

`summarise()` has grouped output by 'member_casual'. You can override using the
`.groups` argument.
Selecting by total_rides

# A tibble: 20 × 3
# Groups:   member_casual [2]
   member_casual start_station_name                 total_rides
   <chr>         <chr>                                    <int>
 1 casual        Streeter Dr & Grand Ave                  50025
 2 casual        DuSable Lake Shore Dr & Monroe St        30193
 3 casual        Michigan Ave & Oak St                    23119
 4 casual        Millennium Park                          22437
 5 casual        DuSable Lake Shore Dr & North Blvd       20973
 6 casual        Shedd Aquarium                           18706
 7 casual        Theater on the Lake                      16757
 8 casual        Wells St & Concord Ln                    14061
 9 casual        Dusable Harbor                           13933
10 casual        Indiana Ave & Roosevelt Rd               12346
11 member        Kingsbury St & Kinzie St                 25282
12 member        Clark St & Elm St                        23971
13 member        Clinton St & Washington Blvd             23101
14 member        Wells St & Concord Ln                    21505
15 member        University Ave & 57th St                 20876
16 member        Loomis St & Lexington St                 20525
17 member        Ellis Ave & 60th St                      19772
18 member        Wells St & Elm St                        19495
19 member        Clinton St & Madison St                  19178
20 member        Broadway & Barry Ave                     18679

`summarise()` has grouped output by 'member_casual'. You can override using the
`.groups` argument.
Selecting by total_rides
# A tibble: 20 × 3
# Groups:   member_casual [2]
   member_casual end_station_name                   total_rides
   <chr>         <chr>                                    <int>
 1 casual        Streeter Dr & Grand Ave                  52721
 2 casual        DuSable Lake Shore Dr & Monroe St        27567
 3 casual        Michigan Ave & Oak St                    24321
 4 casual        Millennium Park                          24130
 5 casual        DuSable Lake Shore Dr & North Blvd       23064
 6 casual        Theater on the Lake                      17863
 7 casual        Shedd Aquarium                           16629
 8 casual        Wells St & Concord Ln                    13586
 9 casual        Dusable Harbor                           12950
10 casual        Clark St & Armitage Ave                  12285
11 member        Kingsbury St & Kinzie St                 25487
12 member        Clinton St & Washington Blvd             24148
13 member        Clark St & Elm St                        23829
14 member        Wells St & Concord Ln                    22340
15 member        University Ave & 57th St                 21035
16 member        Loomis St & Lexington St                 20812
17 member        Clinton St & Madison St                  20242
18 member        Wells St & Elm St                        19611
19 member        Ellis Ave & 60th St                      19491
20 member        Broadway & Barry Ave                     18968

# Create a gradient color bar chart for casual riders
casual_bar_chart <- top_start_stations %>%
  filter(member_casual == "casual") %>%
  ggplot(aes(x = reorder(start_station_name, total_rides), y = total_rides, 
             fill = total_rides)) +
  geom_bar(stat = "identity") +  # No need to specify fill color here
  scale_fill_gradient(low = "lightblue", high = "darkblue") +  
  labs(title = "Top Start Stations for Casual Riders",
       x = "Start Station Name",
       y = "Total Rides") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none")  # Remove legend

print(casual_bar_chart)

# Create a gradient color bar chart for member riders
member_bar_chart <- top_start_stations %>%
  filter(member_casual == "member") %>%
  ggplot(aes(x = reorder(start_station_name, total_rides), y = total_rides, 
             fill = total_rides)) +
  geom_bar(stat = "identity") +  # No need to specify fill color here
  scale_fill_gradient(low = "peachpuff", high = "darkorange") +  
  labs(title = "Top Start Stations for Member Riders",
       x = "Start Station Name",
       y = "Total Rides") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "none")  

print(member_bar_chart)

Neither the members nor the casual riders overlap preferences for the top 5 utilized station locations. This can be very important for marketing purposes.

Casual riders have a clear preference for Streeter Dr & Grand Ave. station. Members are relatively evenly distributed among the top ten stations.

5. Share

Key Objective: Deliver supporting visualizations and key findings:

The visualizations above highlighed the following key differences between members and casual riders:

Casual riders make up 37.9% and members make up 62.1% of the total number of rides.
Casual riders ride longer than members.
Casual riders ride more on weekends and members ride more on weekdays.
Members tend to ride for work and leisure, casual riders tend to ride for leisure.
Members make longer rides early mornings and late nights, casual riders make longer rides during the midday hours of 10:00am – 4:00pm.
Casual riders prefer electric bikes.
Only casual riders utilize docked bikes.
Casual riders and member prefer to start at different stations.
Casual riders heavily utilize Streeter Dr. & Grand Ave. station.

6. Act

Key Objective: Deliver top 3 recommendations

Based on the results from the analysis, we make the following recommendations:

Membership promotions

Offer special discounts or benefits for weekend trips, reaching a specific number of trips, or for hitting a number of hours traveled milestone. All of these are tailored to convert casual riders to members
- Make casual riders aware of the milestone rewards program through targeted marketing campaigns, app notifications, and clear in-app displays. Highlight the potential savings and benefits they can unlock as members when they achieve these milestones.

Targeted Communications:

Use targeted email, app notifications, or SMS campaigns to communicate the benefits of membership to casual riders. Geolocation data for casual riders shows heavily utilization of the Streeter Dr. & Grand Ave. station. This should be the primary target for converting casual riders.
- Create messages that highlight how membership can improve the rental experience at Streeter Dr. & Grand Ave. station. Mention benefits such as priority access to bikes at this station or extended ride durations in the area. Make it clear that membership is tailored to enhance their rides at this specific location.

Station-Specific Feedback Surveys

Conduct station-specific feedback surveys at the most frequently used casual rider stations, with a focus on Streeter Dr. & Grand Ave. station, to gather insights on the factors that would motivate casual riders to convert to members.
- By directly engaging with casual riders at their preferred stations, Cyclistic can gain valuable feedback tailored to the needs and preferences of riders in specific locations. These surveys can help uncover station-specific pain points, identify desired benefits, and gather suggestions for membership incentives. The insights collected can inform targeted conversion strategies and enhance the overall rider experience.

Conclusion

To recap, these recommendations were developed through the use of data-driven insights generated from in-depth data analysis and visualization. This enabled us to present a convincing strategy that is in line with the priorities of our stakeholders and positions Cyclistic for growth and profitability. We expect that these strategic initiatives will resonate with our stakeholders, leading to a growth in membership and, ultimately, the achievement of our business goals.