Splunk and the Squeaky Dolphin: when Big Data goes rogue
The oddly named firm’s products analyse “Big Data”. As it claims on its website: “by monitoring and analysing everything from customer clickstreams and transactions to network activity and call records—and more—Splunk turns machine data into valuable insights no matter what business you’re in.” Even if that business is snooping.
It should come as no surprise, then, that it’s found a place in the biggest Big Data haul: GCHQ apparently uses its data-sentiment analysis software to figure out what people are thinking online. (Indeed, Splunk advertises on its site that the US Department of Defense and Homeland Security use its products, so it really should come as no surprise.)
According to one of the Snowden leaks of the day — you can read more about the other snooping revelation here — GCHQ has been scooping up online data, such as YouTube views and Facebook Likes, and using Splunk to analyse that Big Data to try to predict trouble.
The programme, which honestly is called “Squeaky Dolphin”, collects online activity in real-time, according to an NBC News report, going so far as tapping fibre cables to uncover what we’re doing on YouTube, Facebook and Blogspot. Though individuals’ data can be extracted, Squeaky Dolphin is “not interested in individuals, just broad trends”, according to one of the leaked slides. (Although, this being the GCHQ, take that claim with a grain of salt.) It’s looking for Big Data, not Small Bits of Data.
It’s not much of an advertisement for Big Data that the GCHQ’s examples come off as rather silly — and it’s worth noting the slides in question were taken from a presentation to the NSA, so this would have been GCHQ’s attempt to impress its peers on its accomplishments, and likely would have included the best examples possible.
So what insights has this amazing data-analysis tool revealed? How different browsers reflect different types of user. Apparently, Internet Explorer users show the least “openness to experience” and the most “agreeableness”, while Firefox users are a shade more likely to show “neuroticism”.
It’s unclear where the information is from, and if indeed it’s from it’s own Splunk-based analysis, but GCHQ has decided that it’s important. Certainly I feel a lot more confident about the war against terror.
Much of the work is more serious, and in part to make up for GCHQ’s failure to spot online the growing tensions in the Middle East ahead of the Arab Spring in 2011. It notes on one slide that it picked up web activity ahead of rallies in Syria and Bahrain the day before they took place.
Scanning Twitter, public-facing Facebook pages and YouTube for certain keywords to see where the next flashpoint is going to be isn’t a stupid idea — but it’s surely possible to do that with public information. Tapping fibre cables and Google’s systems in order to develop questionable “sentiment” intelligence is yet another step too far, not least given the apparent results.
Indeed, it’s easy to question whether its even possible to analyse so much data — a real-time feed of Facebook, YouTube, Blogger and more is going to be rather busy — and get a meaningful result, though Splunk’s marketing department would likely disagree.
We asked the company for comment, with Sherry Lowe, vice president and corporate spokesperson, saying: “Splunk takes privacy very seriously. As with any software used by thousands of organisations around the world, we typically do not have visibility into how individual customers may use our products.”
Well, now Splunk knows how one of its customers uses its products, as do the rest of us — and it’s not a great ad for Big Data.