Bikes and Big Data

Ben Jestico is an MSc candidate from the University of Victoria.

If you own a smart phone, you have likely come across hundreds of fitness apps that will track running and cycling routes using GPS. Distance travelled, average speed, travel times, and calories burned can all be measured through apps such as Strava, MapMyRide, MapMyRun, and Garmin to connect and provide detailed information about your trip. Routes can be uploaded to share with friends, monitor progress, or brag about how far you went that day.

Ben Jestico collecting data at a local bike path.

The data generated by these apps warrants their own name and are commonly referred to as “crowdsourced data” where citizens or “the crowd” collect information about their environment. The prevalence of GPS and route tracking means that this new type of data has exciting potential for researchers aiming to understand how crowdsourced data can be used to supplement existing data sources. While these apps are marketed for “fitness” or “training”, which holds some inherent biases, the wealth of information poses interesting questions about how, when, and where we can use these data.

Researchers aiming to understanding cycling safety and ridership trends are often hindered by a lack of data on where cyclists ride and how many of them are riding there. Standard methods of collecting cycling data consist of manually counting cyclists during peak commuting periods of the day and at one point in time. Time periods are selected in order to capture cyclists commuting to/from work or school that are likely to be regular cyclists who travel along the same routes multiple times per week. Standard data collection surveys provide a high level of detail in regards to the number of cyclists at a particular location in time, but are extremely limited in their spatial and temporal coverage. Fitness apps provide the opposite – they provide a high level of detail in their spatial and temporal coverage but are a biased sample of cyclists as only those using an app will be tracked and monitored.

How can we have the best of both worlds? The detail of manually counting cyclists (unbiased sample), with the high spatial and temporal coverage of fitness apps (biased sample). Exploring these research questions allows researchers to have a more in depth look at crowdsourced data in order to understand how they can supplement existing data sources, and provide valuable insight into ridership trends that can be used to inform cycling safety and planning.