Dispatch from the pit

IMG_0509It happened today. Out running along the lake in a cold drizzle, I felt it. The low pit of winter is past. We’re on the upslope to spring. There was a huge gaggle of cormorants (is gaggle the right term for cormorants?) on the fake island in Preverenges. They must be on their way to Scandinavia. They must be feeling it, too. (I took this picture the day before.)

I know it officially happened on December 21, when the balance of dark versus light hit bottom and the slow climb back into the sun began once again. But January is usually still too dark and cold and, well, winter for it to register. Today, however, despite the clouds and the rain, I finally feel like I’m climbing out of the hole. Continue reading

You want fries with that?

Gydle has been silent the entire month of November. No excuses, I just didn’t have anything to say. Then I woke up this morning and my brain was teeming with ideas. Was it something I ate?

First, I have a great gift idea.

I got an e-mail the other day from “American Gut.” Imagine my excitement! The Human Food Project is live on IndieGoGo. For only $99 and a stool sample, you can get a list of the microbes colonizing your gut. Upscaling is a bargain – it’s $180 for two samples, $260 for three and a mere $320 for a family of four! Continue reading

Holy hairballs, Batman! It’s not junk after all!

Greetings from hibernation nation. I did say I’d come out if something really big happened. Guess what? One of my current scientific obsessions was Big News today! No, don’t go away – it’s not the microbiome. It’s my other obsession: junk DNA. I’ve written about it before, here and here and here.

In a stunning “no doh?” development, a vast international array of researchers has discovered that the 99% of the human genome that was considered “useless junk” isn’t junk after all. Continue reading

Spiraling down the double helix

It’s almost mid-month.  I’m at about 19,000 words, about 3,000 words behind my carefully calculated NaNoWriMo goal. (I’ve made an excel spreadsheet). I took my blood pressure yesterday and realized that stressing about keeping up with my self-imposed word count is not helping anything. In fact, my scientific approach to this endeavor – just 2,000 words a day, gives me 5 days off – is totally ludicrous. Creativity doesn’t work like that. I should know better.

Take yesterday. The central theme of my plot involves people researching induced pluripotent stem cells. Informing myself on this seemed the sensible thing to do, so I started reading. Now, I don’t know about you, but it seems to me that the topic of stem cells has been around forever. But did you know that human embryonic stem cells were only discovered in 1998? We’ve only known about these suckers for a dozen years? Continue reading

Watermarks


I’m not all that thrilled that my first post here is a techie one. I was kind of hoping I could write about flowers or something. But Mary was so impressed by my decoding skills that she prevailed upon me to write this. So blame her. Here is a picture of flowers anyway. For the record, my decoding skills are OK, but not great. I am mostly pretty good at it because I am so lazy. I’ll write more about that later.

In this post, I will describe how to figure out the encoding scheme for the DNA watermarks Mary described in her recent post.

My main goal is to give an example of how a code gets deciphered. It’s an art as well as a science. This particular code is not insanely difficult, so it makes a good example.

On to the watermarks.

I got the watermarks themselves from this paper. I also read an article that said that there were quotes from James Joyce and Richard Feynman in the watermarks. That is all the information I will need to decode them.

I will concentrate mostly on watermark #2, because it looked to me most amenable to analysis. Here it is:

TTAACTAGCTAACAACTGGCAGCATAAAACATATAGAACTACCTGCTATAAGTGATACAA
CTGTTTTCATAGTAAAACATACAACGTTGCTGATAGTACTCCTAAGTGATAGCTTAGTGC
GTTTAGCATATATTGTAGGCTTCATAATAAGTGATATTTTAGCTACGTAACTAAATAAAC
TAGCTATGACTGTACTCCTAAGTGATATTTTCATCCTTTGCAATACAATAACTACTACAT
CAATAGTGCGTGATATGCCTGTGCTAGATATAGAACACATAACTACGTTTGCTGTTTTCA
GTGATATGCTAGTTTCATCTATAGATATAGGCTGCTTAGATTCCCTACTAGCTATTTCTG
TAGGTGATATACGTCCATTGCATAAGTTAATGCATTTAACTAGCTGTGATACTATAGCAT
CCCCATTCCTAGTGCATATTTTCATCCTAGTGCTACGTGATATAATTGTACTAATGCCTG
TAGATAATTTAATGCCTGGCTCGTTTGTAGGTGATAATTTAGTGCCTGTAAAACATATAC
CTGAGTGCTCGTTGCGTGATAGTTCGTTCATGCATATACAACTAGGCTGCTGTGATATGG
TCACTGCCCTTACTGTGCTACATATTACTGCGAGGGGGATGACGTATAAACCTGTTGTAA
GTGATATGACGTATATAACTACTAGTGATATGACGTATAGGCTAGAACAACGTGATATGA
CGTATATGACTACTGTCCCAAACATCAGTGATATGACGTATACTATAATTTCTATAATAG
TGATAAATAAACCTGGGCTAAATACGTTCCTGAATACGTGGCATAAACCTGGGCTAACGA
GGAATACCCATAGTTTAGCAATAAGCTATAGTTCGTCATTTTTAAGGCGCGCCTTAACTA
GCTAA

My first thought is that I am strongly inclined to treat the bases in threes, because that is how the genome encodes amino acids. Taking 4 symbols in sets of three gives a total number of possibilities of 43 = 64, which is enough space for the alphabet and numbers and some symbols. It’s also just enough for upper-and lowercase letters and numbers and maybe a space but nothing else.

I can’t tell which is which, but I heard somewhere that one of the messages has a Web page in it and that require several punctuations, so am sticking with the uppercase letters and symbols theory.
Either way, this code amounts to a single-substitution cipher, kind of like the cryptogram in the newspaper, only with punctuation included.
The start and end tags were given in the paper, so I can remove those. Additional noncoding data is also marked. Here is what I get for the second watermark, with the start and end tags and noncoding data removed, taken three at a time:

CAA CTG GCA GCA TAA AAC ATA TAG AAC TAC CTG CTA TAA GTG ATA
CAA CTG TTT TCA TAG TAA AAC ATA CAA CGT TGC TGA TAG TAC TCC
TAA GTG ATA GCT TAG TGC GTT TAG CAT ATA TTG TAG GCT TCA TAA
TAA GTG ATA TTT TAG CTA CGT AAC TAA ATA AAC TAG CTA TGA CTG
TAC TCC TAA GTG ATA TTT TCA TCC TTT GCA ATA CAA TAA CTA CTA
CAT CAA TAG TGC GTG ATA TGC CTG TGC TAG ATA TAG AAC ACA TAA
CTA CGT TTG CTG TTT TCA GTG ATA TGC TAG TTT CAT CTA TAG ATA
TAG GCT GCT TAG ATT CCC TAC TAG CTA TTT CTG TAG GTG ATA TAC
GTC CAT TGC ATA AGT TAA TGC ATT TAA CTA GCT GTG ATA CTA TAG
CAT CCC CAT TCC TAG TGC ATA TTT TCA TCC TAG TGC TAC GTG ATA
TAA TTG TAC TAA TGC CTG TAG ATA ATT TAA TGC CTG GCT CGT TTG
TAG GTG ATA ATT TAG TGC CTG TAA AAC ATA TAC CTG AGT GCT CGT
TGC GTG ATA GTT CGT TCA TGC ATA TAC AAC TAG GCT GCT GTG ATA
TGG TCA CTG CCC TTA CTG TGC TAC ATA TTA CTG CGA GGG GGA TGA
CGT ATA AAC CTG TTG TAA GTG ATA TGA CGT ATA TAA CTA CTA GTG
ATA TGA CGT ATA GGC TAG AAC AAC GTG ATA TGA CGT ATA TGA CTA
CTG TCC CAA ACA TCA GTG ATA TGA CGT ATA CTA TAA TTT CTA TAA
TAG TGA TAA ATA AAC CTG GGC TAA ATA CGT TCC TGA ATA CGT GGC
ATA AAC CTG GGC TAA CGA GGA ATA CCC ATA GTT TAG CAA TAA GCT
ATA GTT CGT CAT TTT TAA

The first place to start in decoding any code is a frequency analysis. I just count the number of times each three-base symbol appears in the text:

ATA:  41        TAG:  27        TAA:  25        CTG:  18
TGC: 16 GTG: 16 CTA: 15 CGT: 14
AAC: 13 TTT: 10 TGA: 10 TAC: 10
GCT: 10 TCA: 8 TCC: 7 CAT: 7
CAA: 7 TTG: 5 GTT: 4 GGC: 4
CCC: 4 ATT: 4 GCA: 3 TTA: 2
GGA: 2 CGA: 2 AGT: 2 ACA: 2
TGG: 1 GTC: 1 GGG: 1

I’ve listed the symbols here in decreasing order of frequency. There is one that really stands out: ATA occurs 41 times, and the next one is TAG that occurs 27 times. That’s a big gap. I am pretty confident that ATA will be a space. There is a simple test for that: do two ATAs ever occur next to each other? If not, then ATA is probably a space. And sure enough, ATA ATA never appears.

The next two most frequent are TAG and TAA. I am not so sure about those; they could be commas, if the watermark is a list of names, or they might be letters. In English, the letters in order of decreasing frequency are ETAON RISHD LFCMU … so it is likely that those two from the ETAON group. Let’s try it out with TAG as E and TAA as T:

????T? E????T? ????ET? ????E??T? ?E??E? ?E??TT? ?E???T ?E?????T? ????? ?T????E?? ???E E??T??????? ?E???E E??E???E???E? ???? ?T??T??? ?E????E? ???E??? T??T??E ?T?????E? ?E??T? ??????? ???? ??E??? ???????? ??????? ???T? ?? T??? ?? ?E??? ?? ???????? ?? ?T??TE?T ???T ??? ?? ???T?? ? ?E?T? ????T

I have labeled all the characters I don’t know as question marks. That’s not optimal, because you can’t tell which characters are the same and which are different, but it will have to do. It looks like I got the spaces right; the words look like words. I can’t tell about the other letters. But here I can try a “crib:” a known piece of plaintext. Notice those last three words? They might be an author name from a quote:

? ?E?T? ????T
- JAMES JOYCE

If TAG is an A and TAA is an E, then they fit! So let’s assume JAMES JOYCE is correct and try again:

M???E? A????E? M?C?AE? MO??A??E? SA?JAY ?AS?EE? CA?O?E ?A?????E? C??C? ME??YMA?? ???A A??E?O??C?? ?ACY?A ASSA?-?A?C?A? ??Y? ?E??E?S? ?AY-Y?A? C??A??? E??E??A ?E??SO?A? ?A??E? ???SO?? JO?? ??ASS? ???-???? ??????O ???E? ?O E??? ?O ?A??? ?O ????M??? ?O ?EC?EA?E ???E O?? O? ???E?? - JAMES JOYCE

That is starting to look like something! I see SAN?JAY so I am going to guess that is SANJAY and I see ?EC?EA?E which I will guess is RECREATE. Those give me three very common letters: N, T, and R. Putting those in, I get:

M???E? A???RE? M?C?AE? MONTA??E? SANJAY ?AS?EE? CARO?E ?ART???E? C??C? MERRYMAN? N?NA A??ERO??C?? NACYRA ASSA?-?ARC?A? ??YN ?EN?ERS? RAY-Y?AN C??AN?? E??EN?A ?EN?SO?A? ?AN?E? ???SON? JO?N ??ASS? ???-??N? ?????TO ???E? TO ERR? TO ?A??? TO TR??M??? TO RECREATE ???E O?T O? ???E?? - JAMES JOYCE

At this point I could probably just Google the quote but that would be cheating. So I get the L from CAROLE and the I from NINA. And the same symbol occurs a lot at the end of words that looks like a comma:

MI??EL AL?IRE, MIC?AEL MONTA??E, SANJAY ?AS?EE, CAROLE LARTI??E, C??C? MERRYMAN, NINA AL?ERO?IC?, NACYRA ASSA?-?ARCIA, ??YN ?EN?ERS, RAY-Y?AN C??AN?, E??ENIA ?ENISO?A, ?ANIEL ?I?SON, JO?N ?LASS, ??I-?IN? ?I???TO LI?E, TO ERR, TO ?ALL, TO TRI?M??, TO RECREATE LI?E O?T O? LI?E?? - JAMES JOYCE

Now I am getting somewhere. I probably have enough letters right to try that first watermark, which may contain HTML code. Here is what I get from it:

J? CRAI? ?ENTER INSTIT?TE ?????A?C?E???IJ?LMNO??RST????Y?? ??????????????-????????????????????,?SYNT?ETIC ?ENOMICS, INC?????OCTY?E ?TML???TML???EA???TITLE??ENOME TEAM??TITLE????EA????O?Y??A ?RE????TT????????JC?I?OR????T?E JC?I??A?????RO?E YO???E ?ECO?E? T?IS ?ATERMAR? ?Y EMAILIN? ?S ?A ?RE???MAILTO?MRO?STI??JC?I?OR????ERE???A????????O?Y????TML?

Note the IJ?LMNO??RST. There is an alphabet in there! That really helps:

J? CRAIG VENTER INSTITUTE ?????ABCDEFGHIJKLMNOPQRSTUVWXYZ? ??????????????-????????????????????,?SYNTHETIC GENOMICS, INC????DOCTYPE HTML??HTML??HEAD??TITLE?GENOME TEAM??TITLE???HEAD??BODY??A HREF??HTTP???WWW?JCVI?ORG???THE JCVI??A??P?PROVE YOU?VE DECODED THIS WATERMARK BY EMAILING US ?A HREF??MAILTO?MROQSTIZ?JCVI?ORG??HERE???A???P???BODY???HTML?

That’s looking like good HTML. I will put the complete alphabet into the second watermark again:

MIKKEL ALGIRE, MICHAEL MONTAGUE, SANJAY VASHEE, CAROLE LARTIGUE, CHUCK MERRYMAN, NINA ALPEROVICH, NACYRA ASSAD-GARCIA, GWYN BENDERS, RAY-YUAN CHUANG, EVGENIA DENISOVA, DANIEL GIBSON, JOHN GLASS, ZHI-QING QI???TO LIVE, TO ERR, TO FALL, TO TRIUMPH, TO RECREATE LIFE OUT OF LIFE?? - JAMES JOYCE

And that’s pretty much it.

Odds and Ends

Just a quickie update on THREE things today:

FIRST, as I anticipated, my brother Dave cracked the Venter code. Actually, within minutes of reading my post, at 12:34 am his time, he was trying to explain it to me in a gmail chat.

Dave: You’re not going to believe this, but I already had a program that would decode it.
Me: No way.
Dave: Yep. A geocache puzzle was based on it so I added it to my code last summer. So for example: TTAACTAGCTAATGTCGTGCAATTGGAGTAGAGAACACAGAACGATTAACTAGCTAA decodes to: LTS*CRAIGVENTERLTS* (LTS* means letters)

Right. That’s SO obvious. Then a minute later, he writes:

Dave: Ok, I found the watermarks here (link to the PDF of Venter’s paper in Science magazine with pages of incomprehensible (to me) gibberish).

A couple of minutes pass…

Dave: Hmm… My table is only partly right.
Dave: Hmm… well, I will write a decoder tomorrow.
Me: get on it, willya? 😉

Last night he told me he had figured out the code. 

Me: Are you getting ready to do your guest post?
Dave: Yes I might write it tonight. Should I post Python code?
Me: up to you. Preferably English. We don’t want to lose my loyal masses you know. All 25 of them.
Dave: Haha. OK, maybe I’ll write it tonight.

This morning, (midnight his time), he writes:

Dave: OK I got the Venter decoder working. It decoded them all perfectly. Only issue now is how to write the article.

So prepare yourselves. Dave is going to reveal how he cracked the Venter code. This is going to be an internet scoop, so tell all your friends about it.

*** SECOND ITEM ***

Thanks to Brendan’s iPod adventure, I learned a valuable lesson about importing things into Switzerland today. In fact, I think I may have gained real insight into the economic protectionism that characterizes this itty bitty but ever so expensive country.

Thinking it would be complicated if not impossible to find a 5th generation nano in Switzerland, (the wee beastie is not sold in stores anymore), I ordered a nice green specimen on EBay while I was in the US. Dave then shipped it out to me, declaring a $200 value on the customs tag. We Americans tell the truth! To my surprise, I was required to pay nearly $60 to pick up the package from the Post Office. No money, no iPod.

I raged all the way home. It’s a miracle I didn’t get into an accident. My head very nearly exploded. I immediately called up those responsible. They informed me that once the value exceeds 100 francs, you have to pay VAT, even if it’s a gift from a beloved uncle. On top of that, they charge a flat 35 franc fee to process the package. What? I said. That’s outrageous!  The man explained that a lot of work was involved, and they had to recuperate costs.  If that’s the case, then these people are getting paid something like $300 an hour to rip packages open and then tape IOUs to them. Sounds peachy. Where can I sign up?

*** THIRD THING *** 

I did a post a little while ago about titles. Perusing Newswise recently, I came across two that riveted my attention, for different reasons. The first:

Sniffing Out Leukemia by Turning Dogs into Humans

Researchers at North Carolina State University are narrowing the search for genes involved in non-Hodgkin lymphoma – by turning dogs into humans.

Now that is really something! Has North Carolina State become Hogwarts? Why have a baby when you can turn the family dog into a human? No, really, if there was a contest for the worst and most misleading title+lead sentence of all time, I’d probably nominate this one. I did read far enough to understand that the research here was on the genetic level, not the organism level. No dogs became humans. Nobody is finding leukemia with an actual nose here. If you want to learn more, you’re welcome to follow this link.

Then yesterday, I saw this:
Breast Fat Injection Causes Confusion on Mammograms

Intrigued, I read the lead:

A breast augmentation procedure in which fat from other parts of the body is transferred to the breasts causes can cause false suspicion of breast cancer on follow-up mammograms, according to a study in the April issue of Plastic and Reconstructive Surgery®, the official medical journal of the American Society of Plastic Surgeons (ASPS).

Whoa!  Full stop! Do you mean to tell me that fat from other parts of my body – i.e. hips and thighs – could have been relocated to my breasts? THAT is a win-win if I’ve ever heard of one! Why didn’t anyone tell me about this? On second thought, back in my twenties, when it might have been interesting, I didn’t have that much fat on other parts of my body.

It’s just as well, because it turns out that the fat cells die, making them look suspiciously like tumors. The controversy here is that some radiologists say they can tell the difference, while others say they have to do biopsies just to be sure. I wonder which ones haven’t paid off their equipment yet?

*** NUMBER FOUR ***

Okay, I know I said three things, but I can’t leave this out. You’ll be happy to hear that blogging is improving my life. My husband Marc read about my weed distress and he actually agreed to help me rid the vegetable garden of its winter weed occupants. This is nothing short of a breakthrough!

There’s the evidence.

Designer DNA Dinged

Who says scientists and writers can’t play God? My sister recently alerted me to a story in which science and literature intersect in a very bizarre way. It’s weird enough that I thought I’d pass it on.

A little less than a year ago, maverick geneticist (and yacht owner) Craig Venter rocked the world (again) by announcing that he had created synthetic life. His team had developed a bacterial-like genome from DNA made in the laboratory.

First, they ordered DNA pieces 1,000 units in length from a company called Blue Heron that specializes in synthesizing DNA. Then they used some helpful yeasts to weave it together (the first microbial sweatshop?). Finally, they put this new synthetic genome into a cell whose genetic material had been removed. The new DNA took over the cell and promptly started manufacturing its own proteins, rather than the proteins the original cell would have made. Venter claimed he had created synthetic life.

The synthetic genome itself includes some 1,080,000 bases, which is a lot of information. In every genome, there are so-called “junk” portions of the DNA that don’t make proteins or do anything useful for the cell (as far as we know). In Venter’s synthetic genome, he made his own version of the “junk” section, creating special “genetic watermarks” that could be used to distinguish the synthetic cell and all its descendents from naturally-occurring bacteria. Basically (pardon the pun) what they did was to come up with a code that uses the four nucleic acids C,G,A, and T to encode all the letters of the alphabet and the numbers. (This would be a good puzzle for my brother Dave, who is always on the lookout for obscure codes. If he cracks it, I’ll let him post it on this blog.)

This code itself is encoded in the genetic watermarks, along with a URL that anyone who deciphers the code can e-mail, the names of the 46 authors and other key contributors and three quotations:

Image:Image: Wikimedia commons/Jermoe Walker, Dennis Myts 

“TO LIVE, TO ERR, TO FALL, TO TRIUMPH, TO RECREATE LIFE OUT OF LIFE.” – from A Portrait of the Artist as a Young Man, by James Joyce;

“SEE THINGS NOT AS THEY ARE, BUT AS THEY MIGHT BE.”-A quote from J. Robert Oppenheimer in the book American Prometheus; and

“WHAT I CANNOT BUILD, I CANNOT UNDERSTAND.” – a quote attributed to physicist Richard Feynman.

The news of this marvelous and ethically mindblowing feat reverberated around the world. Obama convened the White House Bioethics Commission, asking them to give him a report. The Vatican issued a press release, calling it an “interesting result” but said the scientists had not created life, just “replaced one of its motors.” (Which one? the outboard?) Venter has entered into an agreement with Exxon to create synthetic algae that will fix carbon dioxide and turn it into usable fuel. Now that would be a win-win, wouldn’t it?

In an interesting twist, Venter revealed in a conference last month that the James Joyce estate had sent him a “cease and desist” letter because he hadn’t asked permission to use the quote. Venter claimed fair use as stipulated in the US Copyright law. The Joyce estate, of which Joyce’s only living descendent Stephen is the executor, is notoriously aggressive with respect to copyright. In 2004 it threatened to sue the Irish government if there were any public readings of Joyce’s work at the 100th anniversary Bloomsbury celebration of the author.

This one might be worth following. The synthetic critter cost Venter upwards of $40 million to make. If The Joyce estate sues, will Venter have to pay royalties on each appearance of the quote? Think about how fast these things will reproduce! Bacterial growth is the classic exponential growth example. If you start with just one bacterium, and it doubles every hour, by the end of the day you will have 16,777,216 bacteria! This could be the windfall Stephen has been waiting for!!

Remember, though, once the cell is out there and reproducing, thanks to evolution, it will mutate. These little snippets of text will morph fairly quickly into something slightly different, and all bets will be off. Unless…

As one person posted in a comment to Carl Zimmer’s blog describing the news,

“If they wait long enough, maybe it will mutate into “Finnegan’s Wake.”