
Other tape formats were created that were better suited to storing teletext data, such as S-VHS in the late 1980s – but they never really took off like VHS. This means that even if it is easier to extract data from S-VHS, there is much less potential source material in the wild.
In 2011, coder Alistair Buxton found a pile of old VHS tapes in his attic, and having previously messed around with open-source DVR app MythTV, he started to wonder if he could extract teletext data from the dusty tapes.
“You can think of teletext like a barcode, but one pixel high – a series of black-and-white lines.”
“You can think of teletext like a barcode, but one pixel high – a series of black-and-white lines,” Alistair told me. Unfortunately, it’s not stored on the tape quite so simply. “After I captured some, I could immediately see that it wasn’t just black or white any more. Instead of solid lines, everything was blurred together.”
It was at this point that Alistair had a moment of genius – and realised that separating the signal from the noise is exactly the problem that barcode readers in smartphones have when the camera isn’t focused correctly. So to clean up the captured data he was able to run it through a deblurring algorithm.
The resulting data was better – but there were still lots of errors and problems with the data. Because the deblurring algorithm has to make so many guesses, sometimes it guesses incorrectly – meaning that it could have picked out the wrong letters and so on.
So, how to mitigate this? Brilliantly, thanks to the way teletext works, there’s an almost built-in checking system: simply recording for longer.
Bamboozle
Teletext was encoded into the broadcasted TV signal, and under perfect circumstances you may need only five minutes of footage to capture every page, as five minutes should allow for inclusion of every sub-page on carousel pages. Those were the ones that had “(3/5)” in the corner, and which you’d have to wait for what felt like a lifetime for it to automatically switch to the next. The reason you had to wait – and the reason it was especially torturous if you were searching for a flight on a Teletext Holidays page with 70 subpages – is precisely because the data for each page was embedded in the live TV image and changed at regular intervals. When page one switched to page two, you’d have to wait for every single subpage to be broadcast in turn before you’d get back to page one.
But as we know, VHS recordings are of a much lower quality than the original broadcasts. And this is where a clever method of error correction comes in. Jason Robertson, another hobbyist who uses and has built upon Alistair’s software, tells me that he tends to capture 20-45 minutes’ worth of footage.
What this means is that when the data is slowly crunched through by his computer, each text page is captured a number of different times. And this enables for a form of error correction. Just as the last digit in the barcode is a check digit, making it possible to use mathematics to fill in any missing blanks, Alistair’s software compares the different versions of the semi-complete pages to fill in the blanks. For example – if two pages contain “CEEFAX” and one contains “CEEFAB” in the same place, it’s a safe bet that “CEEFAX” is the correct word.
To double-check this data, Jason has written his own teletext-editing software, and code that will generate a complete archive of the day’s pages in HTML, so you can go and relive the 23 October 1998 like you really were watching TV that day.
Fastext
Unfortunately, extracting text pages is hard work. Once the video has been captured from a tape, it takes about a day for a computer to crunch through a 5GB data file. But the future of teletext archaeology could be about to receive a massive speed boost.
Continues on page 3: Turning a Raspberry Pi into a teletext broadcaster
Disclaimer: Some pages on this site may include an affiliate link. This does not effect our editorial in any way.