Last month, when MUNI announced that its real-time arrival predictions were temporarily offline, one of our users asked if we could help:
Just like that, we launched crowdsourced real-time in San Francisco.
We’d been preparing for a moment like this. For the last year, we’ve been building the world’s first (functional) crowdsourced transit tracker. Why? Many cities (like our hometown of Montreal) still don’t have any real-time transit data. Other cities (like San Francisco) have real-time data, but coverage can be spotty, and vehicle locations infrequently updated.
Our plan was to roll out crowdsourcing to cities without any real-time, and then, gradually, to cities like SF that already have real-time. Well, the best laid plans of mice and men often go awry. When AT&T pulled the plug on their old 2G networks (on which MUNI’s tracking depended) we were being asked to turn on crowdsourcing while MUNI updated their hardware.
We were already seeing some success in Montreal (our first crowdsourced city), so we said, why the hell not?
Here’s what happened when we launched SF:
- Braden asks for SF crowdsourcing on Twitter
- We launch SF crowdsourcing within 1 hour
- Within 2 hours we have a blog post live detailing our exploits
- Our blog post reaches the top of Hacker News
- Tens of thousands of people read about our experiment
- 14% of daily active users in SF are crowdsourcing real-time
Since January, we’ve been crowdsourcing MUNI transit departures. However, MUNI has done a great job fixing their real-time tracking more quickly than expected, so today we’ve decided to turn off crowdsourcing for SF — although it’s still piping along marvelously in NYC, as well as cities in Quebec and British Columbia.
We had a lot of internal debate on the team: should we turn crowdsourcing off, even though MUNI’s real-time tracker was working again? The plan had always been to shut it down once MUNI was back online (until we were ready to have both trackers rolling at once). While we tried to find a quick solution, we’re erring on the side of caution by turning it off for now. Here’s our rationale:
Reasons to keep crowdsourcing in SF
- Community: There’s an old adage that 99% of internet users consume content, while only 1% create it. In Transit’s case, more than 10% of SF riders were creating real-time data. This is a crazy engagement rate (you guys are crazy awesome) and came as a wonderful surprise.
- Data frequency: Our crowdsourced data is updated much more frequently than MUNI’s. We update vehicle locations every second vs. every minute or so in MUNI’s case.
- Data volume: We’ve crowdsourced up to 45% of trips on SF’s busiest lines. The N-Judah train is our most popular.
Reasons to put crowdsourcing on hold in SF
- Merging issues: our transit tracker updates positions every second, while MUNI’s tracker updates every minute. So what happens when our data + MUNI’s data is reporting something different? If we try to merge the 2 bus positions into 1 bus on the map — but it turns out there were actually two buses — one might pass by without you knowing. On the other hand, if we judge that 2 bus positions really are 2 different buses — but they’re actually the same bus — you’ll end up seeing “ghost buses” on the map, and nobody likes that. We’re hard at work on a solution to perfect merging.
- Predictions: our prediction algorithm works pretty well with crowdsourced vehicle locations because vehicle locations are being updated every few seconds. But MUNI doesn’t update vehicle locations that quickly, so our current prediction algorithm doesn’t work as well for their data as ours. We couldn’t have both predictions engines running side-by-side, and right now, MUNI’s prediction accuracy is better.
Deferring to MUNI’s real-time predictions feels like the best choice for our riders in SF… for now. We’ll obviously be perfecting our crowdsourced real-time tracker in all the cities where “good enough” real-time doesn’t exist (like Montreal, Victoria, and NYC’s lettered subway lines) so that when we re-launch in SF (and everywhere else), everything will be perfect.
SF has validated our hunch that crowdsourcing is a viable (and hyper-accurate) way of collecting transit data. With 10% of users helping us crowdsource, and up to 45% of trips being crowdsourced on some lines, we couldn’t have imagined a better first month.
But while we’re pleased, we’re not complacent. We know that in order for crowdsourced data to properly complement (and compete) with agency-provided data, it has to be available for 100% of lines, 100% of the time. Thankfully, our crowdsourcing chefs are cooking up a bunch of cool new features that will make it easier to improve real-time data (and more addicting to share it.)
There are lots of hard problems yet to be solved. We’re confident that we can do it. But we still need all the help we can get. So shoot us your resume!
Cya soon San Francisco.