Arxiu d'etiquetes: RNA

Cracking the genetic code

In the same way that Alan Turing decoded Enigma, the encryption machine used by the German army in World War II, several scientists managed to decipher the genetic code. The solution to this framework has allowed us to understand how cells work and make genetic manipulation possible.

INTRODUCTION

A code is a system of replacing the words in a message with other words or symbols, so that nobody can understand it unless they know the system. For example the genetic code.

Although it seems to be a lie, all living beings (except for some bacteria) biologically work in the same way. And it is that Jacques Monod already said, everything that is verified as true for E. coli must also be true for elephants.

From the cells of the blue whale, the largest animal on the planet, to the cells of a hummingbird, passing through humans, are the same. This is thanks to the genetic code, which allows the information of each gene to be transmitted to the proteins, the executors of this information.

This flow of information was named by Francis Crick, in 1958, as the central dogma of molecular biology (Figure 1). In it he claimed that information flows from DNA to RNA, and then from RNA to proteins. This is how genetic information is transmitted and expressed unidirectionally. However, later modifications were added. Crick claimed that only DNA can be duplicated and transcribed to RNA. However, it has been seen that the replication of its RNA also occurs in viruses and that it can perform a reverse transcription to generate DNA again.

main-qimg-eee77f2b58be05c964ce0c04756f2cfb
Figure 1. Central dogma of molecular biology. Red arrows: Francis Crick’s way. Grey arrows: later modifications (Source: Quora)

THREE LANGUAGES OF CELLS

Inside the cells three different languages ​​are spoken, but they can be related through the genetic code.

The one we already know is the language of deoxyribonucleic acid (DNA), wound in a double chain and composed of 4 letters that correspond to the nitrogenous bases: adenine (A), thymine (T), cytosine (C) and guanine (G).

Another language very similar to the latter is that of RNA. It differs from DNA mainly in three aspects: (i) it is composed of a single chain instead of being double-stranded, (ii) its sugars are ribose instead of deoxyribose (hence the name of ribonucleic acid) and (iii) it contains the base uracil (U) instead of T. Neither the change of sugar nor the substitution of U by T alters the pairing with base A, so that RNA synthesis can be performed directly on a DNA template.

The last language that remains for us to know is that of proteins, formed by 20 amino acids. The amino acids constitute each and every one of the proteins of any living organism. The order of the amino acids that form the chain of the protein determines its function (Figure 2).

aminoacids
Figure 2. Table of 20 amino acids (Source: Compound Interest)

THE GENETIC CODE

As we have been saying, the genetic code is the rules that follow the nucleotide sequence of a gene, through the RNA intermediary, to be translated into an amino acid sequence of a protein. There are several types of RNA, but the one that interests us is the messenger RNA (mRNA), essential in the transcription process.
The cells decode the RNA by reading its nucleotides in groups of three (Figure 3). Since mRNA is a polymer of four different nucleotides, there are 64 possible combinations of three nucleotides (43). This brings us to one of its characteristics: it is degenerate. This means that there are several triplets for the same amino acid (synonymous codons). For example, proline is coded by the triplets CCU, CCC, CCA and CCG.

genetic_code_med
Figure 3. The genetic code with the table of 20 amino acids (Source: BioNinja)

The genetic code is not ambiguous since each triplet has its own meaning. All triplets make sense, either encode a particular amino acid or indicate read completion. Most amino acids are encoded by at least two codons. Methionine and tryptophan are the only amino acids that are codified only by a codon. But each codon codes only for an amino acid or stop sign. In addition, it is unidirectional, all triplets are read in the 5′-3′ direction.
The AUG codon serves as the start codon at which translation begins. There is only one start codon that codes for the amino acid methionine, while there are three stop codons (UAA, UAG and UGA). These codons cause the polypeptide to be released from the ribosome, where the translation occurs.
The position of the start codon determines the point where translation of the mRNA and its reading frame will begin. This last point is important because the same nucleotide sequence can encode completely different polypeptides depending on the frame in which it is read (Figure 4). However, only one of the three reading patterns of a mRNA encodes the correct protein. The displacement in the reading frame causes the message no longer to make sense.

Marco de Lectura
Figure 4. Possible frameshifts (Source: marcoregalia.com)

 

As we said at the beginning, one of the main characteristics of the genetic code is that it is universal, since almost all living beings use it (with the exception of some bacteria). This is important because a genetic code shared by such diverse organisms provides important evidence of a common origin of life on Earth. The species of the Earth of today probably evolved from an ancestral organism in which the genetic code was already present. Because it is essential for cellular function, it should tend to remain unchanged in the species through the generations. This type of evolutionary process can explain the remarkable similarity of the genetic code in present organisms.

Although the human being itself continues to be an enigma for science, the revolution of the deciphering of the genetic code has allowed us to delve into the functioning of our body, specifically that of our cells, and cross borders to genetic manipulation.

 

REFERENCES

  • Alberts, B. et al. Biología molecular de la célula (2010). Editorial Omega, 5a edición
  • Cooper, G.M., Hausman R.E. La Célula (2009). Editorial Marbán, 5a edición
  • Gotta Love Cells
  • BioNinja
  • Main picture: eldiario.es

MireiaRamos-angles

Sequencing the human genome

Genomics is a new science which has had a very important boom in recent years, thanks to advanced technologies of DNA sequencing, advances in bioinformatics and increasingly sophisticated techniques for analysing whole genomes. And I will discuss in this article about whole genomes and their sequencing, mentioning the Human Genome Project, which allowed the sequencing of the human genome.

WHY WE SEQUENCED?

Sequencing is the set of methods and biochemical techniques aimed at determining the order of nucleotides (A, T, C and G). Its objective is to get in order all nucleotides DNA of an organism.

The first organisms sequenced were two bacteria, Haemophilus influenzae and Mycoplasma genitalium in 1995. One year later, the genome of a fungus was sequenced (Saccharomyces cerevisiae).

From that moment comes the eukaryotic sequencing project: in 1998 Caenorhabditis elegans (nematode) was sequenced, in 2000 Drosophila melanogaster (fruit fly) and in 2001 the human genome.

But, why we sequenced? In the case of human genome, there is the need to know to help alleviate or prevent diseases.

Some of the organisms sequenced are model organisms, which have:

  • Medical importance: there are pathogens and we know diseases that they can cause.
  • Economic importance: organisms that humans eat, they can improve with the molecular techniques.
  • Study of evolution: in 2007 more than 11 species of Drosophila were sequenced and it tried to understand the evolutionary relationship between their chromosomes. It has also been made in mammals (ENCORE Project).

WHAT WE UNDERSTAND FOR GENOME SEQUENCED?

The human genome has 46 chromosomes, it means 23 chromosome pairs (22 autosomal chromosome pairs and 1 sexual chromosome pair, XX or XY depending if it is female or male).

The size of the human genome sequenced is 32,000Mb, 23 chromosomes plus Y chromosome.

The human genome was obtained from the mixture of human genomes to obtain a representation of all humanity genome.

PARADOX THAT WE FIND IN GENOME

A paradox is a statement that, despite apparently sound reasoning from true premises, leads to a self-contradictory or a logically unacceptable conclusion. In genomes we find two clear paradoxes.

The first one refers to the C-value, which represents the amount of DNA in the genome. As would be expected, if the organism is larger and more complex, the size of its genome will be bigger. However this is not true because there is not this correlation. It is due because the genome not only contains coding genome and proteins, but also contains repetitive DNA. In addition, the most compacted genomes are found in organisms less complexes.

The second paradox refers to the G-value, which represents the number of genes. There is no correlation between the number of genes and its complexity. A clear example is that in human genome has around 20,000 genes and Arabidopsis thaliana (herbaceous plant) has 25,000 genes. The reason is found in the RNA world, which is more complex and it is related to gene regulation.

THE HUMAN GENOME PROJECT (HGP)

The human genome sequencing project has been the most important biomedical research project of the whole history. With a budget of 3 thousand millions of dollars and the participation of an International Public Consortium, which was formed by EEUU, UK, Japan, France, Germany, China and other countries. Its ultimate objective was achieving the complete sequence of the human genome.

It started in 1990, but things get complicated when, in 1999, appeared a private company, Celera Genomics, headed by the scientist Craig J. Venter, who launched the challenge of getting the human sequence in record time, before the expected by the Public Consortium.

At the end it was decided to leave in a draw. The Public Consortium accelerated the process and obtained the draft almost at the same time. On 26th June 2000, in a ceremony at the White House with President Bill Clinton, the two leading representatives of the parties in competition, Craig Venter by Celera and the Public Consortium director, Francis Collins found. It announced the achievement of two drafts of the complete human genome sequence (Video 1). It was a historic moment, as the discovery of the double helix or the first time the man went to the Moon.

Video 1. Human Genome announcement at the White House (Source: YouTube)

The corresponding publications of both sequences did not appear until February 2001. The Public Consortium published its sequence in the journal Nature, while Celera did in Science (Figure 1). Three years later, in 2004, the Consortium published the final or complete version of the human genome.

portadasGH
Figure 1. Covers publications of the human genome sequence draft in Nature and Science magazines in February 2001 (Source: Bioinformática UAB)

PERSONAL GENOMES

The genome of the year 2001 is the reference genome. From here we have entered in the era of personal genomes, with names and surnames. Craig Venter was the first person who sequenced his genome, and the next one was James Watson, one of the discoverers of double helix.

It took 13 years to sequence the reference genome. It took less time to sequence Craig Venter’s genome and only few months for Watson’s genome.

CLINICAL APPLICATIONS OF SEQUENCING

Without going to sequence the entire genome they have been identified disease-causing genes. An exome is not the whole genome, but the part of the genome corresponding to exons.

An example is the case of Nicholas Volker (Figure 2), the first case of genomic medicine. This child had a severe and intractable inflammatory bowel disease of unknown cause. With exome sequencing was allowed to discover a mutation in the XIAP gene on chromosome X, replacing an amino acid functionally important for another. A bone marrow transplant saved the life of the patient.

nicholas volker
Figure 2. Nicholas Volker with his book One in a Billion, which tells his story (Source: Rare & Undiagnosed Network)

REFERENCES

  • L. Pray. Eukaryotic genome complexity. Nature Education 2008; 1(1):96
  •  Brown. Genomes 3, 3rd edition (2007)
  • Bioinformática UAB
  • BT.com
  • E. A. Worthey et al. Making a definitive diagnosis: Successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genetics in Medicine 2011; 13, 255-262
  • Main picture: Noticias InterBusca

MireiaRamos-angles