The news that the Government wants to set up a super-huge datacenter is somewhat worrying. Apparently, it wants to record all email traffic and probably every website you and I ever visit. You can bet that each file you download will be scrutinised for copyright-bashing illegality, and just wait till it spots what’s going on with your web browsing. See that slightly dodgy picture? That’s child porn, that is. Any minute now, the black helicopters will be circling overhead and the stormtroopers will explode.
Maybe I’ve been watching too much of the latest 24 box set, but it’s clear that some people not only believe all this computerised mumbo jumbo is possible, but that grotesque lack of respect for human rights is justified too. I’m not confusing the hormonally overloaded whimsy of Hollywood writers with the dim-witted ramblings of a politician. But there’s no doubt that there has been, in the past and present, significant crossbreeding between the two. Why try to explain a dodgy policy when TV will do it for you?
Of course, I don’t mind if The State decides it wants to read my emails. That it will read this column six weeks in advance of you is unfortunate, but I’m sure you’ll agree that all content needs to be thoroughly checked for correctness, both spelling and political.
What galls me is that this will do absolutely nothing to halt bad people in their evil ways. Do we really expect there to be [email protected] emailing their buddies at [email protected]? Is it likely that someone will send, in plain text email: “It’s the number 27 bus, bomb will be on the top floor behind the first row of seats on the left. But it’s a secret, okay?”
Of course not. Liverpool has a marvellous cathedral and the chaffinch sings a wonderful song. That might not mean something to you or me, but it’s perfectly good English. It could be code for “the uranium is in Manchester, and the bombing group is ready and waiting”. Words mean things because that’s what we know. “Liverpool has….” is a pretty innocuous and bland phrase unless you know it has a special meaning. Then the content of the message is irrelevant; it’s the meaning you attach to it, and is read at the other end, which is important.
So just what sort of data mining is going to be possible with this huge data store? Well, to answer that we look to our friends at companies such as Google. I’m not suggesting that a search for “terrorist” will come up with 15 hits, all telling you where you can buy a terrorist at a bargain price, although that seems to be the usual response from Google these days. No, indexing data enables fast looking and handling of pattern matching.
The question is how much of this can actually be joined up. I know it’s easy on 24 – “just open a socket to my screen for that” and, as if a miracle has occurred, a cross-match has been made between border patrol immigration data and the sale of some underwear in Des Moines, Iowa. First, there’s the grating nonsense of computer operators handing around passwords between each other – would you really have a password such as “AT1425” to gain access to all the Department of Defense data? Ah, it’s that clever socket thing again, isn’t it?
The reality is that data needs to be curated – sorted and structurally managed. Some stuff is well curated – I’m quite sure that, for example, my bank records are in a reasonably well-structured format. But what about my email accounts? If you don’t know I have some, how do you know how to look? In a world where I can create 14 new email aliases on free servers across the globe in the time it takes a CTU operative to drink their latt?, how will anyone know where to start?