TOSG wrote:Okay, here's my explanation. I wish that I had some brilliant, razor-like solution, but I'm afraid that I just used logic and persistence, with a few guiding principles.
1) Looking at a BLAST (
http://130.14.29.110/BLAST/ ; look at "nblast") alignment of the sequence, I saw that there were three large regions of human DNA, and two regions of DNA sequence that did not match with any known DNA source.
2) So, I hypothesized, as before, that the unmatched DNA was likely where the message was encoded (as Walter could have free reign to encode any message he liked there, rather than being constrained to a given sequence of human DNA).
3) Thus, I manually removed most of the regions that had homology with the human DNA, so I could focus on the (in my estimation) important part of the sequence.
4) So, I made a restriction map of this sequence (on the NEBCutter 2.0 program), showing all the restriction enzymes that cut the sequence.
5) I noticed that BamHI was a double-cutter of this sequence (BamHI is a commonly used enzyme, so that tipped me off), and removed the rest of two of the human DNA regions. So, I cut the DNA with that.
6) I then looked at this fragment in NEBCutter, again.
7) Given the clue that TaqI might be used (see my previous post), I searched for that, and found that it cut three times.
I counted up that Walter had given instructions for decoding 61 letters, so I looked for a DNA fragment that was 61*3 = 183 bases long.
9) Awesome! I found that two of the TaqI cut sites combined to leave behind a 183-base pair sequence!
10) I put this sequence into a protein translator, and then grinded away at decoding the letters, according to the same principles as last time.
So, there you have it! Probably not the most elegant way to do it, but it worked, and pretty fast.
For the curious, the DNA sequence is: cgacatatgagcatggcgaccagcaccttcagtgcgcagtgtggcccggagcatcattacctggctgaa
cattcttctatttttaatggcgtcttcagccagcagcttaaaaacaaccttatctactttccctcctcctactttccctcctcccgcttgagggtaggccccatcccccccttt
And the protein sequence is: R H M S M A T S T F S A Q C G P E H H Y L A E H S S I F N G V F S Q Q L K N N L I Y F P S S Y F P S S R L R V G P I P P F