For the past week, I’ve been holed up nights in a warehouse attempting to print the human genome on punched tape. Punched tape is an old data storage medium that encodes binary data by punching holes into paper tape that is typically rolled up like a reel of film. The reference human genome weighs in at some 3.2 billion base pairs and 3.2 billion is a big number. Really big. After a week of off-and-on printing with two punch machines, I’ve used up around 50 rolls of paper and the the work area is already starting to fill up but I’m not even 5% through one measly little chromosome yet.
I’m going to continue printing a few days a week until around March 17th. By that point, I expect the paper tape to have created quite an impressive jumble of genetic code.
There are a few ways to you can follow along:
- Twitch. No streaming schedule but I should be printing a few days a week for a few hours at least.
- Twitter. Maybe I’ll post stuff about this project but don’t get your hopes up.
- If you are around Seattle, stop by an open house from 1-5 PM on March 17th. More information to come.
This project uses a pair of tabletop paper tape reader+punch combo units made by GNT in the early 80s, although even back then punched tape was pretty dated. These machines use eight holes per row of data, with each hole encoding one bit of information (so one byte per row). At ten rows of holes per inch, storing 1MB requires about 2.5km of tape.
To make matters worse, I’m printing a human readable encoding of the human genome. Here’s how the four bases we all know and love are encoded:
- A = hole in column 2
- C = hole in column 4
- G = hole in column 6
- T = hole in column 8
The sequence “GATTACA” for example is printed as:
Being able to see and touch bits of data was a key appeal of paper tape, and the human readable encoding is very much in that spirit. Besides, it’s not like this project was ever about being practical anyways.
I’m using sequences from GRCh38 in the FASTA format. I originally wanted to sequence my own genome and print that out, but none of the companies I emailed about this ever responded. (Even better: a friend and I could both sequence ourselves and then the computer could combine them randomly! although I’m not sure I could endorse such a lewd performance)
FASTA files are not purely sequences of the letters
T but also encode ambiguity with additional letters such
Y (which represents a
T). By far the most common ambiguity in GRCh38 is
N, which stands for any one of
For the print, the ambiguities are just treated like bitwise or operations, so a
Y prints a hole in column 4 for the
C and one in column 8 for the
T. Therefore the FASTA sequence
CKTYGN is printed as:
A very simple python script reads the genetic data line by line, encodes it into a bit pattern for printing, and sends the binary data to punch machines over a serial connection.
No one manufactures real punched tape any more, so I had some rolls custom made and shipped across the country on a pallet (thanks Papertec!) The ten inch diameter reels I am using hold around 2000 feet or so of tape, and take about one hour and ten minutes to print. This means that printing the entire human genome—with its decadent 3.2 billion base pairs—would require approximately 13,300 reels of paper tape with a cumulative length of some 5000 miles. Printing would also take two years running two punches 24/7. (This is just another reason why I need some art school interns. The time between tape changes is pretty damn close to one hour and eight minutes too…) Sadly though, as I only have 500 reels of tape to work with, the best I can hope for is under 4% of the genome. More realistically, I may make it through 200 rolls in my month of printing.
While working on this project it was rather disheartening when I would tell people about it, their first response is usually either asking, “why?” or inquiring, “so how do plan to make money with that?” Because why not? Isn’t slowly filling a room with a paper tape genetic code a good enough reason on its own? And now that the project is finally happening, it is already meeting my expectations.
Checkout twitch if you are interested in tracking progress. And if you’re in the Seattle area, stay tuned for info about the open houses. In the meantime, I’ll be camped out nights in an unheated, creepy warehouse binge watching movies like Blood Salvage and Neon Maniacs while attempting to print the human genome at 600 baud. Sounds pretty typical actually.