Advertisement
MDchoice.com
We subscribe to the HONcode principles
of the Health On the Net Foundation


Health Information
Saturday, May 17, 2008
Find more information about this topic from either the Web or the world's best medical journals by using the search boxes at the top of this page.
 


Sequencing Pathogenic Microbes

Advances in molecular biology have led to remarkably fast and accurate methods for sequencing the genomes of disease-causing organisms. Sequencing a genome reveals the lineup of paired chemical bases that make up the pathogen’s DNA, the language of life.

Although less well-known than the $3 billion project to decode the entire human genome, the potential payoffs of efforts to sequence the genomes of pathogenic microbes have begun stirring scientific excitement.  Sequence information can be exploited in many ways: to demarcate genes, to locate therapeutic targets, to identify mutations that contribute to drug resistance, and to compare the genomes of variant strains to note differences that may affect the antigenicity or virulence of the microbe.

When scientists identify genes that are unique to a particular microbe, drugs can be targeted to these genes, and the products of these genes can be incorporated into experimental vaccines. Strategies can be devised to counteract genetic mutations that cause a microbe to become drug resistant. Once virulence genes are found, researchers can attempt to disable them. Genetic variations detected in different strains of the same pathogen can be used to study the population dynamics of these strains, such as the spread of a virulent or drug-resistant form of a pathogen in a susceptible population. Finally, understanding the genetic basis for both virulence and drug resistance may also help predict disease prognosis and influence the type and extent of patient care and treatment.

Because of their small size, microbes can be sequenced relatively quickly. The human genome -- which contains 3 billion base pairs of DNA, approximately 95 percent of which do not code for any genes -- will take 15 years to sequence. In contrast, a bacterium’s genome typically comprises 1 to 4 million base pairs, or megabases (Mb), of DNA, almost all of which encode genes. At one-half to 1/100 the size of the smallest bacteria, viruses have even smaller genomes.

When the first DNA sequencing methods were developed in the mid-1970s, an individual scientist could sequence only a few DNA base pairs per year. Today, teams of scientists at giant sequencing centers depend on computers, automation, robotics, and other advanced technologies to sequence and assemble more than 15 Mb of DNA annually. For organisms with larger genomes, sequencing may require collaboration and coordination among several such centers.  The speed with which the first microbe sequencing project, Haemophilus influenzae, was completed three years ago stunned scientists. Using newly developed techniques, investigators used a shotgun approach to sequence thousands of fragments of the bacterium’s genome. Special computer programs read these sequences and stitched them together by comparing overlapping sequences. The result was one complete circle of DNA containing all of the genetic information of the bacterium.

Encouraged by this success, NIAID has funded projects to sequence the full genomes of 12 other medically important microbes, including the bacteria that cause tuberculosis, gonorrhea, chlamydia and cholera (see Table 1). Many of these microbes have been completely sequenced and are now being annotated and analyzed. During annotation, each gene’s position or placement on the genome is determined. This information is further analyzed to provide insight on important features of the genome that may affect the biology of the microbe and its ability to cause disease.


Supported Microbe Sequencing Projects

Genome

Size

Principal Investigator

Whole Genome Projects
Chlamydia trachomatis 1.70 Mb1 Richard S. Stephens, UC Berkeley/Claire Fraser, TIGR2
Chlamydia pneumoniae Claire Fraser, TIGR
Enterococcus faecalis 3.00 Mb Karen A. Ketchum, TIGR
Escherichia coli Fred Blattner, Univ. of Wisconsin
Giardia lamblia 12.00 Mb Mitchell Sogin, Woods Hole3
Mycobacterium avium

4.70 Mb

Robert D. Fleischmann, TIGR
Mycobacterium tuberculosis (clinical isolate)

4.40 Mb

Robert D. Fleischmann, TIGR
Neisseria gonorrhoeae

2.20 Mb

David Dyer, Univ. Oklahoma - Oklahoma City
Salmonella typhimurium

4.50 Mb

Richard Wilson, Washington Univ. - St. Louis
Staphylococcus aureus Steven R. Gill, TIGR/John J. Iandolo, Oklahoma Univ., Health Sci. Center
Streptococcus pneumoniae 2.20 Mb Susan Hollingshead/John Glass, UAB4
Streptococcus pyogenes

1.98 Mb

Joseph Ferretti, Univ. Oklahoma - Health Sci. Center
Treponema pallidum

1.05 Mb

George Weinstock, Univ. Texas - Houston and TIGR
Trypanosoma brucei Mark Adams, TIGR
Ureaplasma urealyticum

0.75 Mb

Gail Cassell, UAB
Vibrio cholerae

2.50 Mb

Rebecca Clayton, TIGR
Partial Genome Projects
Leishmania major Kenneth D. Stuart, SBRI5
Plasmodium falciparum Stephen Hoffman, NMRI6/Leda Cummings, TIGR
Gene Discovery Projects
Cryptosporidium parvum Richard Nelson, SF Gen. Hosp./UCSF7
Plasmodium berghei (rodent malaria) John Dame, Univ. Florida - Gainesville
Plasmodium vivax John Dame, Univ. Florida - Gainesville
Toxoplasma gondii John Boothroyd, Stanford Univ.
1Mb: megabase, equal to 1 million base pairs
2TIGR: The Institute for Genomic Research
3Woods Hole: Woods Hole Marine Biological Laboratory
4UAB: University of Alabama at Birmingham
5SBRI: Seattle Biomedical Research Institute
6NMRI: Naval Medical Research Institute
7SF Gen. Hosp./UCSF: San Francisco General Hospital/University of California at San Francisco

Projects focused on sequencing the partial genomes of larger parasitic protozoa. Currently, NIAID contributes to a multi-agency effort to sequence the genome of Plasmodium falciparum, the organism responsible for the most deadly form of malaria. The Institute also supports sequencing efforts seeking to identify genes expressed by the protozoa that cause the AIDS-related opportunistic infections toxoplasmosis and cryptosporidiosis. Two other grants are supporting limited gene discovery projects for P. vivax and a rodent malaria, P. berghei.

NIAID grantees deposit the sequence information in specialized and public databases such as GenBank, run by the National Center for Biotechnology Information, where it can be accessed by anyone through the Internet. NIAID is working with the World Health Organization, the United Kingdom’s Wellcome Trust, and others to identify ways the research community and funding agencies can coordinate efforts and capitalize on the data accrued by these sequencing projects.

Sequencing Basics

DNA consists of two entwined, helical strings of chemical units represented by the letters A, C, T, and G. Depending on the genome’s size, DNA sequencing can generally be approached in either of two ways.

For relatively small genomes, every base pair of DNA can be sequenced. In this approach, the entire genome of an organism is cut and pasted into DNA carriers for easier handling. A "clone" refers to single carrier that contains an inserted DNA fragment, and the collection of clones carrying different DNA fragments is referred to as a "library." The type of carrier used depends in part on the size of the genomic fragment.

For relatively large genomes, an alternative to whole-genome sequencing is to analyze only the parts of the genome that contain genes. In a living cell, only gene-encoding DNA is expressed and copied onto intermediate templates called RNA. In a method called expressed sequence tagging (EST), researchers isolate these RNA templates from cells and convert them back into a DNA form (complementary DNA, or cDNA), which is then pasted into carriers. These cDNA segments can then be used as molecular probes to identify intact expressed genes in larger, cloned pieces of genomic DNA.

In both the whole genome and EST sequencing approaches, the DNA within each carrier is sequenced from both ends. The overlap of such end sequences is used to arrange the clones so that the DNA they contain is in the same sequence order as that in the intact genome.


National Institutes of Health
National Institute of Allergy and Infectious Diseases

December 1998