Running with the pirate hunters: How AI is creating an online piracy arms race
By the time the first boxing bell has rung, or whistle is blown, the pirate hunters have begun. Before a punch is thrown or goal is scored, they’re tracking illegal streams, scouring sources, issuing takedown notices. When the match is over, they’ll want as many pirates sunk as possible.
Long gone are the days of stealing videos or filming in cinemas; anti-piracy measures have changed immeasurably since the advent of online streaming. When a boxing match, a football game or a fantasy drama airs, it does so into a world where an illegal channel can swiftly divert it to millions of non-paying viewers. For those committing piracy, the rewards can be worth thousands. For those hunting these pirates, defending intellectual property has never felt so urgent.
“If you look at some of the stats,” says Peter Oggel, vice president of technology for anti-piracy firm Irdeto. “Game of Thrones season seven was pirated more than a billion times. The Mayweather McGregor fight; we saw close to 3 million illegal viewers. It’s a quick calculation to see how much money is lost in that.”
Online piracy is big business, and when it comes to sport, it’s a fast business. After a match has started, those searching for pirates have the length of the game to identify and take down nefarious streams. All of this is made more difficult by the relative accessibility of modern piracy. Gone is the dominion of the torrent; now all you need is a streaming box and an internet connection.
“The growth of piracy devices, add-ons, and apps is the most serious emerging threat to the legal marketplace for content, including films, television shows, sports and news programs as well as a potential danger to consumers by spreading malware,” echoes a spokesperson for the anti-piracy group, Alliance for Creativity and Entertainment (ACE), which counts Netflix, HBO and Disney amongst its members.
A 2017 survey of 1,500 millennials, by SMG Insight and commissioned by the BT Sport Industry Awards, found that 54% had watched illegal streams of live sports. A third admitted to regularly watching pirated content, compared to only 4% of over-35s. The survey also found only 12-24% of 18-24-year-olds had a subscription to a service such as Sky or BT.
“Unless we are careful we will have a generation of young people who consider pirated sports content to be the norm,” the chairman of the SIG, Nick Keller, said at the time. “That’s a significant challenge not just for rights holders but the whole sector – from sponsors and athletes to ticket holders.”
Given the scale these figures suggest, those responsible for fighting piracy are struggling to throw enough manpower at the problem. With the culture of pirated sport becoming a norm, there’s simply not enough eyes to check the internet for content that’s somewhere it shouldn’t be. The eyes of a machine, however, are another matter.
Looking for logos
Irdeto, part of the multimedia conglomerate Naspers, is using convolution neural networks to trawl the internet for sports games. Say, for example, there’s a match between Barcelona and Chelsea being shown on Sky Sports. Irdeto uses AI to hunt for unwarranted copies of that transmission; fed into illegal, ad-based channels. Currently, the most common way to do this is by teaching the artificial intelligence to recognise specific broadcaster logos.
“What the team did was set out to create a large data set of all possible channel logos,” explains Oggel. “I think we got dozens of channels with hundreds of thousands of samples, that led to a complete training data set of more than 3 million samples.”
The types of machine-learning models Oggel is talking about are typically trained to recognise day-to-day images. A box of cornflakes on a table, for example, or a piano. The issue with illegal streams of sport is having to negotiate different resolutions, aspect ratios or zooms. Sometimes a logo for a broadcaster will flash up on the screen or be obscured by other items, or appear on the microphone of a post-match interview.
“The other challenge you have with screenshots is that there might be logos on the billboards alongside the field,” notes Oggel. “Of course, you do not want the machine to mistakenly pick those up.”
Even when the pirate hunters tweak their system well enough to effectively recognise logos, flag the stream as an illegal piece of content and send out a takedown notice, the pirates often manage to stay one step ahead.
“The pirates have realised what we’re doing,” says Rory O’Connor, senior vice president of cybersecurity services for Irdeto. “What they’ve started doing is blanking the logos. The more mischievous ones are actually putting on other logos of other broadcasters.
“That’s where the next phase of the machine learning project comes in,” he adds. “We’re actually trying to teach [the system] to recognise things like football strips so it can actually determine which game is on from seeing Barcelona’s colours, or whomever else’s.”
The challenge with machine learning, O’Connor continues, is that you can give it a problem, and it will solve the problem, but then you have to tell it what the next problem is. This is where a team of human analysts come in, investigating the latest developments from pirates on the dark web and feeding these back to the AI developers. “It’s a continuous battle,” he laughs. “Today the analysts are quite often hired on their knowledge of football leagues rather than specialist anti-piracy skills.”
Looking for faces, looking for leaks
Recognising team strips might be useful for some sports, but not for others. “Boxing is a big problem because they don’t have many clothes on,” chirps O’Connor, before Oggel mentions they’re also experimenting with facial recognition for identifying pirated streams (“So before they’re beaten up, maybe we’ll be able to recognise their faces”).
For content that isn’t a live event, like Game of Thrones, there is less of a time pressure to take down individual pirates, so the emphasis is on using the the artificial intelligence to hunt down the source of the leak by recognising hidden watermarks in the video file.
“We train the system to look for watermarks so we can find where in the distribution chain the content is leaking from,” says O’Connor. “That’s also a continuous battle because the pirates start to crop the video or reduce the quality to remove the mark. We’ve even had commercial piracy devices that splice the video. What they will do is take the video from a number of sources and splice it together.”
He goes on to mention that some pirates will even take audio from one source and video from another, just to confuse their system: “It’s an arms race”.
Nick Matthew, investigations manager for the intellectual property protection organisation FACT, ultimately characterises this arms race as one borne from the ubiquity of interconnected devices: “Technology means more people than ever have a device of some sort or another.
“If you go back 15 years, most people had a computer in their home. Now, one person could have three or four different devices. Access is far greater, and as a result you are now seeing huge amounts of people from all over the world being able to go into broadcasts and [illegally] access them.”
As the pirate hunters use machine learning to automate their processes, will the unprecedented demand for free content encourage pirates to find new workarounds – perhaps even using machine learning to automate their own operations? The hope is that those with the funds and skills to artfully circumvent AI-based systems like Irdeto will be far and few between, but it goes to show how persistent the piracy industry is, and how novel measures like image recognition will need to face new countermeasures from the pirates.
All in the space of a handful of minutes, before the final bell rings.