Google Data Analytics Case Study 1 - Bike share
The business task is to help Cyclistic identify key
metrics that will allow them to grow their business. Cyclistic is a successful
bike sharing company that has a fleet of 5,824 bicycles that are geotracked and
locked into a network of 692 stations across Chicago. Cyclistic launched operations in 2016.
Marketing
strategies goal
The goal is to convert casual riders into
annual members by understanding how annual members and casual riders differ. The
marketing analyst team wants to know why casual riders would buy a membership
and how digital media could affect their marketing tactics. Insights that can
drive business decisions would be how member and casual riders use the bikes,
when they use them and how far do they ride.
Preparing
the data used
The data that was used was the monthly data from the year 2022. The data was located on a local hard
drive in a folder labeled data/zipfiles. It was decided that since there was too much data to combine
into one single spreadsheet file, that the data would be cleaned and then imported into a database table.
The csv files were scanned for any problems that needed to be addressed before importing to database
. This was completed by removing all rows with empty cells. Once the csv files were cleaned, they were
imported into DB browser for SQlite.
The combined csv files were placed into a SQL table names Cyclistic that gave me a total of 4,299,690 rows of information. Since further analysis will be conducted in Excel, this must be reduced by sampling. An online sample calculator was used to reduce the data to a more manageable level and to have the data fit into one spreadsheet table.
Using the sample size calculator with a confidence level of 99.9% produced a sample size of 1083. Next came the challenge of randomly sampling for the 1083 rows of information that was calculated. A SQL query was performed to pull a random sample of 1083 rows from the main table that was constructed by the twelve csv files.
The tools that I am
using are Excel, DB bowser for SQlite, Rstudio and Tableau. Excel will be the
main tool used as this is the tool that I am most comfortable with. Every
effort was made to ensure the data’s integrity and that the data is clean. As
stated previously, the data was cleaned my removing all rows with empty cells
and scanning the data for spelling mistakes. As instructed after downloading and cleaning
the data, new column were created for ride_length and day_of_week. A distance
traveled column was also created by using the latitude and longitude
information provided in the csv file. The formula was found via a Google
search, and is as follows:
=SIN(ABS(K2-I2)*PI()/180/2)^2+COS(I2*PI()/180)*COS(K2*PI()/180)*SIN(ABS(L2-J2)*PI()/180/2)^2
=2*ATAN2(SQRT(1-Q2),SQRT(Q2))
=S2*R2 *This converts the data to Miles
Analyzing
the Data
The information provided
in this pivot table clearly show that the casual and member riders are using
the bicycles very differently. This
shows that the casuals heavy use days are Saturday and Sunday, suggesting that
the casual riders are tourists or weekend adventurers. The members’ daily use
is fairly consistent, but with higher use on the weekdays (Monday-Friday),
suggesting that the members are city dwellers, people who both live and work in
the city.
The next data that was analyzed was the total rider per month by membership. This table considered rides by the casual and members, per month.
The information
provided by this graph shows more evidence that the casual riders are tourist
and urban explorers. As shown in the graph, peak season for all riders is in
the late spring to late fall timeframe (May-October), but for the casuals, peak
riding is June-August. The member riders
are more consistent throughout the year, with their use dipping naturally due
to the weather. The type of bike used per month is featured in the next graph.
As the data shows, the
classic bike was the most popular choice. There were some confusion as to what
docked bike was and how it differed from classic and electric. A search was
conducted as to what was meant by docked bike and there was no definitive answer,
only that classic bike and docked bike were the same thing. Since this was discussion forum information,
it was decided to leave docked bike in the calculations and to keep
classic/docked bike as separate entities.
Further analysis was
conducted on bike use to determine total use by bike type and by member status.
This bar chart was created using the R programming language. The aggregate data
clearly shows a preference for the classic bike for both the casual and the
members. However, the electric bike usage was significant.
To analyze this data
further, a chart was created to show total rides by bike type per week day.
This data is an average of the twelve months in a one week view. (1=Sunday,
7=Saturday)
The data provided by
Cyclistic included latitude and longitude information, so with the formula as
mentioned earlier, distance by mile was calculated. This first chart represents
the total miles rode per bike type per month by casual and member riders.
The data was also
viewed in a weekly format. This next chart explains on average, how casual and
member riders used the bikes per day of week. Miles rode per week shows that the
members are consistent in their use while the casual riders have low use Monday
thru Wednesday, with their highest use day on Saturday, followed by Sunday.
The last part of the data that was analyzed were the start and end stations. The first chart that will be presented represents most popular bike type per start station. Again, as proven by previous data points, the classic bike was the most popular. As the data in the chart shows, Dusable Lake Shore DR & North Blvd had the most riders per sample pulled. This station also shows a heavy preference for the classic bike. The chart represents the most popular start stations and not all start stations.
The next chart
describes how many times a rider rode to the same end station from a start
station. This analysis is to prove which stations are the most popular for all riders,
where are they starting from and where are they going. So the next chart
represents the point of view of the start station and how many times the riders
traveled to the same end station. The
last chart describes the most popular end stations and how many times riders
traveled from the same start station. The most used start station was located
at Streeter Dr & Grand Ave, and the most used end station was Dusable Lake
Shore DR & North Blvd.
Recommendations
1. The first recommendation to Cyclistic to
convert casual riders into members would be to develop a strong advertising
campaign starting in March thru May to help create awareness of the company
Cyclistic and what it offers. Advertising should be in both traditional media
as well as social media. In this advertising, a membership discount should be
offered to entice membership. Billboards should be displayed near the most
popular stations that advertise the spring limited time membership discount.
2. Upon
signup of any new membership should be the option for the member to enter in a
telephone number and email to their account. This can be used to remind the
members of an upcoming expiry to membership. The option to auto renew should be
developed into the system so there are no lapses in membership.
3. Develop a maintenance plan to conduct
annual maintenance on bicycles during the winter months when usage is low.
December thru February would be the optimal time to perform maintenance on
bikes. This is to minimize downtime for the bikes and to maximize availability
during peak usage.
*Update 1 - A more refined data cleaning method option.

Comments
Post a Comment