Skip to content

“Dark Matter” of Life

October 18, 2011

A big problem for astronomers is the missing “dark matter” and “dark energy” unaccounted for when you add up all the known sources of matter and energy in the universe.

But for life on Earth, we have a similar problem: Where are all the species?

To detect a new species, we use polymerase chain reaction (PCR), a cyclic enzyme reaction that makes a million copies of a DNA sequence defined by two “primers,” bits of sequence before and after. But to do this, you have to assume that all life possesses a common gene, usually the one encoding a ribosomal RNA. Whatever gene you pick, you have to know ahead of time what it is. So  it’s circular reasoning–How will you ever “see” a species that doesn’t have that gene? Or whose ribosomal gene sequence is too divergent (too distantly evolved) for your primers to pick up?

So Earth might have species of life we don’t know about with properties we cannot imagine. Perhaps species of microbes or plants that can make plastics or antibiotics, or generate electricity from water. What do you think those mystery life forms are up to?

  1. October 19, 2011 12:14 pm

    How nice with a new science-fiction-author-blog. I found you through Charlie Stross’s blog. Good luck.

    The astronomers’ problem arise from the fact that the dark matter is missing. That is, astronomers can measure its effects but they cannot see it. The earth harbours species that we don’t know (although we can estimate how many they are), but that they have properties we cannot imagine is highly unlikely. Well, if you are talking about potential darkmatter-life, you might be correct. If PCR of ribosomal RNA was the only way we identified new species, you could still have had something. However that’s not true.

    Most species have been found because they have been seen; mostly using the eyes, and sometimes using a microscope. Many new microbial species are found because of their effects on the environment: They make people sick, stones slippery, or smell funny. Recently, shot-gun sequencing has been the instrument of choice for finding unknown, microscopic, species. Using sequencing enables you to pretty much pick up any species as long as it is DNA or RNA based. Of all the species we have found, none (barring that incident with arsenic based life that doesn’t seem to pan out) lack RNA, and only some viruses lack DNA.

    We could assume that these unknown species cannot be seen, have no effect on the environment, and don’t use DNA. In that case you might be right. I think that these invisible, effect-less, non-DNA-based organisms create the force that connects all living things and binds the galaxy together. If we could only talk to them we could understand why the LHC cannot find the Higgs-Boson and we can all become Jedi.

    • October 19, 2011 1:30 pm

      Welcome aboard–some great thoughts!

      Numerically, the vast majority of all species (mostly microbes) have been “found” based solely on DNA sequence. In most cases, microbial genera are known by their small-subunit rRNA.

      Shot-gun sequencing has certainly revealed more. The problem with shot-gun is that you take what DNA is out there, so you pick up the most prevalent species. If a species represents only one millionth of a community, will you see it? Maybe, maybe not. What about one billionth? A human body contains 10^14 bacterial cells. So do a few cubic cm of soil. And every cubic cm of soil contains species found nowhere else.

      As for environmental effects, we are still discovering microbial contributions to the nitrogen cycle, production of NO2, N2O and so forth. And it’s possible for a species with important effects to lie dormant for a hundred years, then “wake up” when conditions are right. It’s a tough time to write the textbook on microbes, that’s for sure.

      • October 19, 2011 2:17 pm

        Of course, you are right that most species exist only as names on sequences. I shouldn’t have used “most” there.

        You say “every cubic cm of soil contains species [plural] found nowhere else,” is that really true? Nowhere?

        The question of how to identify a species that constitutes just one in a billion is quite interesting, especially if you want to do it without bias and without some characteristic that you are looking for. As you say we might only become aware of a species once it “wakes up”. Is it not also reasonable to say that a species is only of interest if it is awake and has some effect?

        • October 19, 2011 2:33 pm

          “Every cubic cm of soil” — Yes, a statistical argument based on metagenomics (sequencing every genome in a sample) says that every time you take a new sample there is new stuff found nowhere else; and statistics says that will happen every time.

          Even wilder, most bacteria have an “open pangenome” — that means, every time you sequence a new isolate of a *known* species, the new isolate will have *new* genes never seen before(!) So even one “species” has infinite possible genes(??) This is real science, now, not SF.

          “a species is only of interest if it is awake and has some effect?”
          –What if it’s dormant, but possesses a gene to make something that cures malaria or cancer? BTW the reason actinomycetes have such genes involves predation and cannibalism, but that’s another story.

          –If the obscure species has an effect, how do we even know what it is? Suppose it’s making methane, or some other molecule with extreme effect on the atmosphere. If we don’t know who’s doing it, it’s left out of the climate calculations.

          • October 19, 2011 3:15 pm

            That is actually quite interesting. It is also, in many ways, a quirk of statistics and of the definitions of species and genome when you come to the microbial level.

            As to malaria, cancer and methane. That is exactly my point. If you are looking for a specific characteristic you can, usefully, test different samples for this characteristic. To try to do it the other way around – to test all samples and try to deduce what each might be important for – is intractable. Especially when there is a unique species in every sample and every species include infinite genetic variation.

  2. October 19, 2011 12:44 pm

    Maybe I show my ignorance, but what is preventing the following approaches:

    1. Exhausting the search space of all possible primers (assuming primers can be responsibly short).

    2. If we can classify cells based on size, shape, and color (under a number of wave lengths) we should be able to connect sequences to shapes. Confirming that we aren’t making any mistakes by doing an analysis on the rate of amplification under PCR (cultures that contain cells that can’t be amplified will grow as quickly). Once we have attributed visual features to sequences look for cells that we don’t yet have sequences for.

    3. Use retro-viruses to inject the primer into cells in a culture.

    4. Run very short DNA_microarrays on non-amplified samples. Statistical analysis should show if some sequences exist but not are represented in samples that are PCR amplified.

    5. RNA-seq should clue us in on missing primers.

    6. Use bacteria for amplification.

    Turn the question around, how can we design life that is resistant to sequencing and analysis?

    Lots of repetitions to defeat assembly, non-obvious primers, change the genetic code to make primer discovery harder, hide information in RNA state/protein folding complexes, DNA that produces poisonous RNA to prevent amplification in bacteria harder, LINE sections to make DNA state time dependent, add noise in the form of useless plasmids, prevent DNA from completely unrolling with particular hidden proteins, DNA destroying chemicals in vacuoles (you pop the cell you pop the vacuoles which kills the DNA), multiple nuclei with different DNA in the same cell that require each other to function.

    • October 19, 2011 1:39 pm

      OK, some great ideas here.

      “Exhausting the search space of all possible primers” — That’s what needs to be done, but it’s a bigger job than you might think. A primer is typically 20 bases, with 4 possiblities each. So how many primer pairs to try: (4^20)^2 if I got that right.

      There’s also the Clintonian/Gatesian argument, What do we mean by “possible”? Most microbiogists assume that (1) all life has cells (2) all cells use ribosomes with common ancestry, and therefore common rRNA on which to base the primers. Suppose a cell/virus/whatever doesn’t even have ribosomes? No wonder we can’t find life on Mars, if we can’t even find it here.

      So “how can we design life that is resistant to sequencing and analysis?” All it would need is a different genetic code (conversion of DNA alphabet to proteins).

      BTW, side topic–Evolution needs cooperation (as in shared genetic code) as much as competition, or else nobody could eat each other.. But that’s another controversy.

      • October 19, 2011 4:22 pm

        With the caveat that I don’t really know what I’m talking about, I think it might be a little less than (4^20)^2 since the primer can anywhere on the sequence as long as it matches and we can get away with using a primer and it’s inverse (as long we the primer doesn’t match to one of the ends), and the primer doesn’t have to be a perfect match (I think).

        We could narrow our shot at finding a workable primer by using 5 or 10 nt CHIP assays.

        Still it’s a big number and PCR is absurdly expensive. There are probably better approaches than brute force.

        >Suppose a cell/virus/whatever doesn’t even have ribosomes? No wonder we can’t find life on Mars, if we can’t even find it here.

        I really agree with this argument and think some cheap but effective studies should be done that ignore our assumptions about life. Biologists in the room go do that tell me what you find. Also Prions and nanobacteria.

        >So “how can we design life that is resistant to sequencing and analysis?” All it would need is a different genetic code (conversion of DNA alphabet to proteins).

        Khorana managed to crack the genetic code once who’s to say he or a fellow traveler isn’t up to the task again. Is it possible to develop a cryptographically secure cell?

        I agree that the layer at which DNA is encoded into proteins is probably a good place to put security mechanisms (build logic gates out of RNA-RNA interactions).

        encrypted_DNA -> encrypted_RNA -> one_way_function -> plaintext_protein

        Keep the promoter networks and protein interaction networks complex enough to that 3SAT is too costly.

        >Evolution needs cooperation (as in shared genetic code) as much as competition, or else nobody could eat each other.. But that’s another controversy.

        Extend, Embrace, Exterminate, it’s not just for software anymore.

        Sorry for the long post. =/ Very excited about your blog!

  3. October 19, 2011 1:50 pm

    Bwa-ha-ha, I’m an unseen actinomycete, and you’ll NEVER find me with primers! I make nonribosomal peptides. My peptides can take out MRSA, but you’ll never find me hiding amongst all the soil’s wimpy mycorrhizae. Unless maybe you feed me real well with Luria broth and brain extract (aka zombie food).

  4. October 19, 2011 1:55 pm

    Sorry, microbial trolls are “uncultured” all right. 😉

    Actinomycete: You still use ribosomes to make most proteins, although we do appreciate your nonribosomal ones (like vancomycin). As for brain extract, you’re welcome to it. Clinicians use it all the time, though it smells foul.

  5. October 19, 2011 2:04 pm

    eo: Those are some interesting questions.

    1: Up to nonamers are regularly used to do reverse-transcription. But in order for PCR to be used for identification you need specific primers so that you can either use a length or sequence based identification scheme. If you just amplify everything, you will just have more stuff you don’t know what it is. As for longer primers, standard primers are about 20 nucleotides long. There are four nucleotide types. So that’s 4^20 possible primers, and then you need two for each so that makes it 4^40 = 1.21E24 different reactions. If you run your PCR with 396-reaction plates and each plate takes 40 minutes to prepare and run that makes 2.3e17 years, or about 18 million times the age of the universe. So, let us say that it is difficult.

    2: I would say that any organism we can culture and see can be described in sufficient detail to uniquely identify it.

    3-6 I don’t understand.

    The turn-around question is also interesting, but it assumes that the only way to identify an organism is by genetic sequencing.

    • October 19, 2011 3:34 pm

      “If you are looking for a specific characteristic you can, usefully, test different samples for this characteristic. To try to do it the other way around – to test all samples and try to deduce what each might be important for – is intractable.”

      Actually to your comment above:

      In practice, the most valuable property of a microbe might be something not-quite what you’re looking for. A slight variation perhaps; a gene that makes something like penicillin only ten times stronger; or a version that works best on a different class of bug. So what drug companies do is just to screen new isolates from genera known to make lots of secondary products, then see what their genomes make and test a thousand products against a hundred pathogens and cancer types.

      BTW if you’re trying to run a lab, don’t let me distract you; I know what a task that is! After 20+ years of continuous funding, I have suggestions for keeping productive. Might be a good blog post there.

      • October 20, 2011 2:37 am

        I agree, and not-quite is still almost. That is, even when doing high throughput screening, you have to decide that you are looking for a cancer drug and not looking for an anti-hypertensive.

        A blog-post about keeping productive would be the ultimate irony. I look forward to it.

    • October 19, 2011 4:33 pm

      > 3-6 I don’t understand.

      3. The idea is that a retrovirus (like lentivirus) can be used to copies an arbitrary sequence into a host cell. If this arbitrary sequence happens to be a primer you can then PCR that cell. The problem is how general is this approach, will it work for cells who’s internal architecture is really different? Probably not. What other tools to we have for forcing a sequence into another sequence?

      >The turn-around question is also interesting, but it assumes that the only way to identify an organism is by genetic sequencing.

      I agree, identification is a different ball game than preventing effective sequencing/analysis. One is a stealth bomber and the other is a battleship. So how do you make a stealth cell?

  6. October 19, 2011 5:23 pm

    Paul Davies has some extended thoughts on this in his The Eerie Silence.

    • October 19, 2011 8:43 pm

      Thanks, the Davies book looks good.
      IMHO the aliens would act pretty much like us, given the circumstances. Just as, birds and bats both evolved to fly, and so they share common features.

      If we humans ever see so much as a chlorophyll signature (polarized light reflection) off a nearby planet, let alone something like pyramids, we’ll be aiming our radio out there; so I’d guess the aliens would do the same.

      So–since we’ve not seen them, I think they’re using something like neutrinos. I once convinced my Bio Sci Fi class I was taking attendance by neutrino beam from the back of the aud. Halfway through the term, a physics major gave a report showing it wasn’t so.

  7. October 19, 2011 6:18 pm

    Actually, we were working on this about ten years ago. For eukaryotes that is.

    The second structural loop of the ITS2 sequence (part of the very common ribosomal DNA sequence) is roughly species specific, although there are groups where it doesn’t work as an identifier. Still, we demonstrated that it worked as a species identifier in a wide variety of fungi and oomycetes, and other groups have shown it works in a wide variety of animals.

    Basically, if you have tissue, you can take a subsample and *carefully* amplify it with the well conserved rDNA primers, and identify species based on the ITS sequence. I don’t know if this trick works for prokaryotes, but then again, I’m a little hazy

    Now, where are the species?
    –most are parasites on something else, so the best place to look other species. Plants have, on average, 5-10 parasitic fungal species that depend on them. There’s a few hundred thousand plant species in the world, so that’s a million species right there.
    –Wild temperate soils. Oddly enough, it looks like temperate soils are more diverse than tropical soils. I can personally verify that, when I counted fungal spores in oak savanna soils, there were things I couldn’t identify to kingdom, let alone species.
    –Wild tropical biomass. Tropical soils may be relatively depauperate (*we think*), but the aboveground is crawling with complicated parasitic/symbiotic ecosystems. The canopy of tropical trees are full of unknown or poorly known species. The diversity is the bacteria in the guts of the parasitoid wasps that grow in the bodies of the caterpillars that eat the epiphytic liverworts that grow on the epiphytic orchids that root in the moss mats of large figs. For example.
    –oceanic waters–if you like bacteria and viruses.
    –reefs (of all depths) if you like everything phylum.
    –And then there’s the whole deep biosphere. It’s bacterial, or possibly eukaryotic to the size of nematodes, and it goes about 5 kilometers down. We think. Who knows how diverse it is?

    As for what they’re up to, my guess is that most of them are either ingesting or excreting right now. And making their hosts itch.

  8. October 20, 2011 5:31 am

    Perhaps a limiting case would be to define a species in ecological terms – i.e. it exists in so far as it interacts with other species, or at least affects them. (A bit like object orientation.) This would mean that there would be a sort of weirdness horizon – if you can think of a way a creature might work that wouldn’t have any impact on “normal” life at all, there’s your alternative ecosystem.

Comments are closed.

%d bloggers like this: