Google Data Analytics Case Study 1 - Bike share


Statement of the business task

The business task is to help Cyclistic identify key metrics that will allow them to grow their business. Cyclistic is a successful bike sharing company that has a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago.  Cyclistic launched operations in 2016.



Marketing strategies goal

 The goal is to convert casual riders into annual members by understanding how annual members and casual riders differ. The marketing analyst team wants to know why casual riders would buy a membership and how digital media could affect their marketing tactics. Insights that can drive business decisions would be how member and casual riders use the bikes, when they use them and how far do they ride.


Preparing the data used

   The data that was used was the monthly data from the year 2022. The data was located on a local hard

 drive in a folder labeled data/zipfiles.  It was decided that since there was too much data to combine

 into one single spreadsheet file, that the data would be cleaned and then imported into a database table.

 The csv files were scanned for any problems that needed to be addressed before importing to database

. This was completed by removing all rows with empty cells. Once the csv files were cleaned, they were

 imported into DB browser for SQlite.




The combined csv files were placed into a SQL table names Cyclistic that gave me a total of 4,299,690 rows of information. Since further analysis will be conducted in Excel, this must be reduced by sampling. An online sample calculator was used to reduce the data to a more manageable level and to have the data fit into one spreadsheet table.

Using the sample size calculator with a confidence level of 99.9% produced a sample size of 1083.  Next came the challenge of randomly sampling for the 1083 rows of information that was calculated. A SQL query was performed to pull a random sample of 1083 rows from the main table that was constructed by the twelve csv files.


 

 Processing the data

The tools that I am using are Excel, DB bowser for SQlite, Rstudio and Tableau. Excel will be the main tool used as this is the tool that I am most comfortable with. Every effort was made to ensure the data’s integrity and that the data is clean. As stated previously, the data was cleaned my removing all rows with empty cells and scanning the data for spelling mistakes.  As instructed after downloading and cleaning the data, new column were created for ride_length and day_of_week. A distance traveled column was also created by using the latitude and longitude information provided in the csv file. The formula was found via a Google search, and is as follows:

   =SIN(ABS(K2-I2)*PI()/180/2)^2+COS(I2*PI()/180)*COS(K2*PI()/180)*SIN(ABS(L2-J2)*PI()/180/2)^2

  =2*ATAN2(SQRT(1-Q2),SQRT(Q2))

=S2*R2 *This converts the data to Miles

Analyzing the Data

  I believe that I have been able to answer the question of how annual members and casual riders

 use Cyclistic bikes and the following information should tell the story. The first pivot table with graph

 was created by calculating the total_rides with the day_of_week.  The table gave an average time of

 both casual and member cyclists. This also gave the total number of riders per day, per the sample

 pulled. 




The information provided in this pivot table clearly show that the casual and member riders are using the bicycles very differently.  This shows that the casuals heavy use days are Saturday and Sunday, suggesting that the casual riders are tourists or weekend adventurers. The members’ daily use is fairly consistent, but with higher use on the weekdays (Monday-Friday), suggesting that the members are city dwellers, people who both live and work in the city.

The next data that was analyzed was the total rider per month by membership. This table considered rides by the casual and members, per month.

 


The information provided by this graph shows more evidence that the casual riders are tourist and urban explorers. As shown in the graph, peak season for all riders is in the late spring to late fall timeframe (May-October), but for the casuals, peak riding is June-August.  The member riders are more consistent throughout the year, with their use dipping naturally due to the weather. The type of bike used per month is featured in the next graph.



As the data shows, the classic bike was the most popular choice. There were some confusion as to what docked bike was and how it differed from classic and electric. A search was conducted as to what was meant by docked bike and there was no definitive answer, only that classic bike and docked bike were the same thing.  Since this was discussion forum information, it was decided to leave docked bike in the calculations and to keep classic/docked bike as separate entities.

Further analysis was conducted on bike use to determine total use by bike type and by member status. This bar chart was created using the R programming language. The aggregate data clearly shows a preference for the classic bike for both the casual and the members. However, the electric bike usage was significant.




To analyze this data further, a chart was created to show total rides by bike type per week day. This data is an average of the twelve months in a one week view. (1=Sunday, 7=Saturday)



The data provided by Cyclistic included latitude and longitude information, so with the formula as mentioned earlier, distance by mile was calculated. This first chart represents the total miles rode per bike type per month by casual and member riders.


The data was also viewed in a weekly format. This next chart explains on average, how casual and member riders used the bikes per day of week.  Miles rode per week shows that the members are consistent in their use while the casual riders have low use Monday thru Wednesday, with their highest use day on Saturday, followed by Sunday.


The last part of the data that was analyzed were the start and end stations. The first chart that will be presented represents most popular bike type per start station. Again, as proven by previous data points, the classic bike was the most popular. As the data in the chart shows, Dusable Lake Shore DR & North Blvd had the most riders per sample pulled.  This station also shows a heavy preference for the classic bike.  The chart represents the most popular start stations and not all start stations.

 


The next chart describes how many times a rider rode to the same end station from a start station. This analysis is to prove which stations are the most popular for all riders, where are they starting from and where are they going. So the next chart represents the point of view of the start station and how many times the riders traveled to the same end station.  The last chart describes the most popular end stations and how many times riders traveled from the same start station. The most used start station was located at Streeter Dr & Grand Ave, and the most used end station was Dusable Lake Shore DR & North Blvd.



Recommendations

1.       The first recommendation to Cyclistic to convert casual riders into members would be to develop a strong advertising campaign starting in March thru May to help create awareness of the company Cyclistic and what it offers. Advertising should be in both traditional media as well as social media. In this advertising, a membership discount should be offered to entice membership. Billboards should be displayed near the most popular stations that advertise the spring limited time membership discount.

2.     Upon signup of any new membership should be the option for the member to enter in a telephone number and email to their account. This can be used to remind the members of an upcoming expiry to membership. The option to auto renew should be developed into the system so there are no lapses in membership.

3.      Develop a maintenance plan to conduct annual maintenance on bicycles during the winter months when usage is low. December thru February would be the optimal time to perform maintenance on bikes. This is to minimize downtime for the bikes and to maximize availability during peak usage. 


*Update 1 - A more refined data cleaning method option.





Comments

Popular posts from this blog

SQL data cleaning method for Google Data Analytics Case Study 1 - Bike share