Inside TransportAPI: how open data is helping you catch the bus
The number 14 bus gets stuck in traffic on Fulham Road for 20 minutes, arriving later than anticipated at South Kensington Tube station. Just as you’re getting off the bus, Citymapper alerts you that the District line has been suspended between Earl’s Court and Embankment, meaning you’ll almost certainly miss your train from Victoria. The app directs you up the Piccadilly line to Green Park and back down to Victoria, where you can catch the next train – if you hurry.
Feeding all this information to you from one app is no small feat of software engineering. It requires accurate real-time data from at least three sources (bus, Tube and train), all of which use different formats, location IDs and semantics. Enter TransportAPI, a five-year-old British company that is the only single source of public transport data in the UK. Whether you’re leaping onto a bus in Bristol or catching a cab in Canary Wharf, TransportAPI aggregates, harmonises and distributes the data, meaning you’ll know when you’ll reach your destination and how much it will cost.
TransportAPI is already a source of valuable data for travellers, but it wants to do more. Where are the empty seats on the top deck of the bus? Is it cheaper to hire a car and drive to Manchester or take the train? TransportAPI wants to make your apps – either as a consumer or a developer – much more informative.
Making the most of open data
It only takes a glance at the Tube map to realise that running a transport system is a complicated business. The morass of real-time data produced by the country’s various transport systems is even more byzantine, which is why academics Jonathan Raper and David Mountain decided someone needed to make sense of it all.
Transport Buzz displays a map of transport-related tweets
In 2010, they set up TransportAPI, with the aim of knitting together of all the real-time data from the various transport companies. “A lot of open data is not well documented,” Jonathan Raper, co-founder and managing director of TransportAPI, told Alphr. “We specialise in trying to understand the syntax and semantics of it.”
“A lot of the digital infrastructure around those services is relatively poor,” added Raper. “You may have all the sensor data you can shake a stick at, but the references that tell you where those sensors are located are not in as good shape. So we maintain a lot of infrastructure data, reference lists and lookup tables of different identifiers that are used by different services.”
A key part of TransportAPI’s job is marrying overlapping data from different sources. For example, bus and Tube services may serve the same location, but the two data feeds will use different identifiers for the same stop. TransportAPI pulls this information together so partner app providers such as Citymapper don’t have to. “We do a lot of harmonisation, error checking, validation and organisation,” said Raper.
“It’s so sophisticated it’s capable of encoding interplanetary travel timetables as well as the number 46 bus.”
The journey-planner app makers could get all this information for themselves – often for free – but they would rather pay TransportAPI to provide all that data in one go. “We can aggregate,” said Raper, “which means developers don’t need to sign agreements with lots of different data providers and try to make sense of the different formats. In our service, once you’ve used one interface, you can use it in every transport area. If you build a bus app and then decide to add trains, all you have to do is change the URL to ‘/train’.”
Raper claims that making sense of the various transport companies’ data is the biggest technical challenge his team faces. The company maintains databases based on “some very complex and difficult-to-handle data formats such as TransXChange, which is the representation of transport timetables. It has exceptionally difficult semantics associated with it. One of the dev team quipped that it’s so sophisticated it’s capable of encoding interplanetary travel timetables as well as the number 46 bus.”
Transport companies even sabotage data to stop the public or their rivals having access to it. “There are situations where things are deliberately obfuscated,” said Raper, accusing some transport operators of trying to protect their monopolies. “It’s a weapon, an example of a 21st-century digital competition tool for large organisations – to try and make their data very difficult to consume. There may be regulatory reasons why they’re being forced to release it, but there are operational reasons why they don’t want other people to see what they’re doing and how they’re doing it.”