Exploratory Data Analysis: How do casual riders use the Divvy Bike-Share service?

(Google data analytics capstone project)

As part of the Google Data Analytics Certification, I have completed a case study on Divvy* bike-share data. The dataset used in this case study can be found here, and the scenario is as follows: A bike-share company is hoping gain more "member" riders (people who have purchased an annual membership) by converting "casual" riders (people who use the bike-share service, but are not members) into members. My task is to use this data to determine any meaningful differences between how casual and member riders use the bike-share service, and to report my findings along with my top three recommendations on how a marketing team may better target casual riders.

(*The scenario in this case study is built around a fictional bike-share company "Cyclistic", but the data is coming from a real bike-share company called "Divvy". In this report I will refer to the real company "Divvy" rather than the fictional company "Cyclistic" since I reference Divvy media in my documentation, and I think it would be confusing to have two different company names floating around.)

TL;DR

Casual riders are more likely to: use the bike-share service on the weekend, end their trips at attractions such as parks and museums, and use electric bikes over classic bikes. Find more details on the work behind this project below, and feel free to explore the Tableau dashboard here to explore the data even further.

Full Report:

Business Task

Determine how casual and member riders use Divvy bikes differently, and provide a clear report of the results of this analysis to the marketing team.

Description of Data

Divvy's bike-share data can be found here. We will use the previous 12 months of data (as specified in the case study) for our analysis--"202111-divvy-tripdata.zip" to "202210-divvy-tripdata.zip". Each file corresponds to a particular month, and all contain the following information:

  • ride_id: a unique identifier for each ride
  • rideable_type: the type of bike used
  • started_at: time at which the ride started
  • ended_at: time at which the ride ended
  • start_station_name: name of the Divvy parking station the bike was picked up from
  • start_station_id: ID of the corresponding start station
  • end_station_name: name of the Divvy parking station the bike was dropped off at
  • end_station_id: ID of the corresponding end station
  • start_lat: lattitude of the starting location
  • start_lng: longitude of the starting location
  • end_lat: lattitude of the ending location
  • end_lng: longitude of the ending location
  • member_casual: type of rider (casual or member)
  • Documentation of Cleaning/Manipulation of Data

    The data was cleaned/manipulated using Python, and the process is documented in a Jupyter Notebook--available on my GitHub.

    Quick summary of significant steps taken in this process:

  • Divvy has two types of bikes available to use--classic bikes or electric bikes. In the data, a third option "docked_bike" is listed. Based on exploring Divvy data from other months (outside the 12 month scope), the timing of the introduction of electric bikes to their fleet, and other patterns in the data, I decided that "docked_bike" is the same as a classic bike and replaced these values accordingly. (Normally this would be a quick question I could ask the stakeholders to clarify, but for the purposes of this case study I am statisfied with the evidence I reference in the documentation.)
  • Any classic bike data points that do not have a recorded end station were removed from the dataset. Classic bikes must be parked at Divvy stations (unlike electric bikes, which can be parked anywhere), so if this information is missing then this implies there is either something wrong with the data, or an unusual event occured (e.g. a bike was lost). This subset of data only accounted to 0.1% of the entire dataset, and since they are not indicitave of a typical trip, I decided to remove them from the dataset.
  • "False starts"--defined by having a trip duration of less than 2 minutes and the same start/end location--were removed from the dataset. False starts accounted for 2.5% of the total dataset. Since they are not indicative of a typical trip, I decided to remove them from the dataset.
  • Summary of Analysis

  • Casual riders tend to take trips on weekends, as opposed to member riders who typically use the service on weekdays. This could be due to casual riders using the service for fun weekend trips (supported by findings I will detail next), whereas member riders may be using the service for more consistent needs like commuting to work.
  • The most popular end stations used for casual riders tend to be in locations near parks, beaches, and museums. The most popular end stations for member riders tend to be more inland, perhaps near common areas of work (e.g. Clinton St & Washington Blvd near Accenture Tower, or University Ave & 57th St located within the University of Chicago).
  • Casual riders use electric bikes slightly more frequently than classic bikes, whereas member riders use classic bikes slightly more frequently than electric bikes. While electric bikes in and of themselves have desirable attributes (less tiring to ride, more modern, etc.), casual riders may prefer electric bikes due to their convenience--they can be parked anywhere so it may be easier for a casual rider to pick up an electric bike from wherever they happen to be. Member riders with more consistent use of the bike-share service may have specific Divvy stations where they pick-up and drop-off bikes as part of their routine, and therefore finding a classic bike to use may be just as easy as finding an electric bike. Also, the use of classic bikes are free for members, whereas the use of electric bikes include an additional charge.
  • Casual riders tend to use the bikes for longer periods of time than member riders, but not by a large amount (typically just a couple minutes more).
  • All riders tend to use the bike-share service in warmer months as opposed to colder months--members are slightly more likely to use the service in cold months.
  • Supporting Visualizations

    See the Tableau dashboard above.

    Top Three Recommendations

    (the case study mentions the marketing team uses "email, social media, and other channels" to promote the bike-share program)

  • Connect with casual riders via email. Include information/promote fun attractions that are easy to get to via Divvy bikes, and emphasize that the Annual Membership would pay for itself after two weekend trips if they usually use the Single Ride service, or even after just one trip if they usually use the Day Pass.
  • If the marketing team is able to set up advertisements (promoting the membership program) on location at Divvy stations, they may focus these ads on the more popular end stations for casual riders.
  • When posting on social media to promote the electic bike fleet, include information on discounts for members.
  • Additional data that could expand findings: The data provided to the public is, of course, anonymized. However, if rider identifiers are available in the data within the company, then this would provide invaluable insight. For example, one could determine what the average amount of savings would be for casual riders that end up spending more than members.