|
Sequencing
Pathogenic Microbes
Advances in molecular
biology have led to remarkably fast and accurate methods for
sequencing the genomes of disease-causing organisms. Sequencing
a genome reveals the lineup of paired chemical bases that make
up the pathogen’s DNA, the language of life.
Although less well-known
than the $3 billion project to decode the entire human genome,
the potential payoffs of efforts to sequence the genomes of
pathogenic microbes have begun stirring scientific excitement.
Sequence information can be exploited in many ways: to demarcate
genes, to locate therapeutic targets, to identify mutations
that contribute to drug resistance, and to compare the genomes
of variant strains to note differences that may affect the antigenicity
or virulence of the microbe.
When scientists identify
genes that are unique to a particular microbe, drugs can be
targeted to these genes, and the products of these genes can
be incorporated into experimental vaccines. Strategies can be
devised to counteract genetic mutations that cause a microbe
to become drug resistant. Once virulence genes are found, researchers
can attempt to disable them. Genetic variations detected in
different strains of the same pathogen can be used to study
the population dynamics of these strains, such as the spread
of a virulent or drug-resistant form of a pathogen in a susceptible
population. Finally, understanding the genetic basis for both
virulence and drug resistance may also help predict disease
prognosis and influence the type and extent of patient care
and treatment.
Because of their
small size, microbes can be sequenced relatively quickly. The
human genome -- which contains 3 billion base pairs of DNA,
approximately 95 percent of which do not code for any genes
-- will take 15 years to sequence. In contrast, a bacterium’s
genome typically comprises 1 to 4 million base pairs, or megabases
(Mb), of DNA, almost all of which encode genes. At one-half
to 1/100 the size of the smallest bacteria, viruses have even
smaller genomes.
When the first DNA
sequencing methods were developed in the mid-1970s, an individual
scientist could sequence only a few DNA base pairs per year.
Today, teams of scientists at giant sequencing centers depend
on computers, automation, robotics, and other advanced technologies
to sequence and assemble more than 15 Mb of DNA annually. For
organisms with larger genomes, sequencing may require collaboration
and coordination among several such centers. The speed
with which the first microbe sequencing project, Haemophilus
influenzae, was completed three years ago stunned scientists.
Using newly developed techniques, investigators used a shotgun
approach to sequence thousands of fragments of the bacterium’s
genome. Special computer programs read these sequences and stitched
them together by comparing overlapping sequences. The result
was one complete circle of DNA containing all of the genetic
information of the bacterium.
Encouraged by this
success, NIAID has funded projects to sequence the full genomes
of 12 other medically important microbes, including the bacteria
that cause tuberculosis, gonorrhea, chlamydia and cholera (see
Table 1). Many of these microbes have been completely sequenced
and are now being annotated and analyzed. During annotation,
each gene’s position or placement on the genome is determined.
This information is further analyzed to provide insight on important
features of the genome that may affect the biology of the microbe
and its ability to cause disease.
Supported
Microbe Sequencing Projects
|
Genome
|
Size
|
Principal
Investigator
|
| Whole
Genome Projects |
|
|
| Chlamydia trachomatis |
1.70 Mb1 |
Richard S. Stephens, UC Berkeley/Claire Fraser, TIGR2 |
| Chlamydia pneumoniae |
|
Claire Fraser, TIGR |
| Enterococcus faecalis |
3.00 Mb |
Karen A. Ketchum, TIGR |
| Escherichia coli |
|
Fred Blattner, Univ.
of Wisconsin |
| Giardia lamblia |
12.00 Mb |
Mitchell Sogin, Woods Hole3 |
| Mycobacterium avium |
4.70 Mb
|
Robert D. Fleischmann, TIGR |
| Mycobacterium tuberculosis (clinical isolate) |
4.40 Mb
|
Robert D. Fleischmann, TIGR |
| Neisseria gonorrhoeae |
2.20 Mb
|
David Dyer, Univ. Oklahoma - Oklahoma City |
| Salmonella typhimurium |
4.50 Mb
|
Richard Wilson, Washington Univ. - St. Louis |
| Staphylococcus aureus |
|
Steven R. Gill, TIGR/John J. Iandolo,
Oklahoma Univ., Health Sci. Center |
| Streptococcus pneumoniae |
2.20 Mb |
Susan Hollingshead/John Glass, UAB4 |
| Streptococcus pyogenes |
1.98 Mb
|
Joseph Ferretti, Univ. Oklahoma - Health Sci. Center |
| Treponema pallidum |
1.05 Mb
|
George Weinstock, Univ. Texas - Houston and TIGR |
| Trypanosoma brucei |
|
Mark Adams, TIGR |
| Ureaplasma urealyticum |
0.75 Mb
|
Gail Cassell, UAB |
| Vibrio cholerae |
2.50 Mb
|
Rebecca Clayton, TIGR |
|
|
|
| Partial
Genome Projects |
|
|
| Leishmania major |
|
Kenneth D. Stuart, SBRI5 |
| Plasmodium falciparum |
|
Stephen Hoffman, NMRI6/Leda Cummings, TIGR |
|
|
|
| Gene
Discovery Projects |
|
|
| Cryptosporidium parvum |
|
Richard Nelson, SF Gen. Hosp./UCSF7 |
| Plasmodium berghei (rodent malaria) |
|
John Dame, Univ. Florida - Gainesville |
| Plasmodium vivax |
|
John Dame, Univ. Florida - Gainesville |
| Toxoplasma gondii |
|
John Boothroyd, Stanford Univ. |
1Mb: megabase, equal to 1 million
base pairs
2TIGR: The Institute for Genomic Research
3Woods Hole: Woods Hole Marine Biological Laboratory
4UAB: University of Alabama at Birmingham
5SBRI: Seattle Biomedical Research Institute
6NMRI: Naval Medical Research Institute
7SF Gen. Hosp./UCSF: San Francisco General Hospital/University
of California at San Francisco |
Projects focused
on sequencing the partial genomes of larger parasitic protozoa.
Currently, NIAID contributes to a multi-agency effort to sequence
the genome of Plasmodium falciparum, the organism responsible
for the most deadly form of malaria. The Institute also supports
sequencing efforts seeking to identify genes expressed by the
protozoa that cause the AIDS-related opportunistic infections
toxoplasmosis and cryptosporidiosis. Two other grants are supporting
limited gene discovery projects for P. vivax and a rodent
malaria, P. berghei.
NIAID grantees deposit
the sequence information in specialized and public databases
such as GenBank, run by the National Center for Biotechnology
Information, where it can be accessed by anyone through the
Internet. NIAID is working with the World Health Organization,
the United Kingdom’s Wellcome Trust, and others to identify
ways the research community and funding agencies can coordinate
efforts and capitalize on the data accrued by these sequencing
projects.
Sequencing
Basics
DNA consists of two
entwined, helical strings of chemical units represented by the
letters A, C, T, and G. Depending on the genome’s size, DNA
sequencing can generally be approached in either of two ways.
For relatively small
genomes, every base pair of DNA can be sequenced. In this approach,
the entire genome of an organism is cut and pasted into DNA
carriers for easier handling. A "clone" refers to single carrier
that contains an inserted DNA fragment, and the collection of
clones carrying different DNA fragments is referred to as a
"library." The type of carrier used depends in part on the size
of the genomic fragment.
For relatively large
genomes, an alternative to whole-genome sequencing is to analyze
only the parts of the genome that contain genes. In a living
cell, only gene-encoding DNA is expressed and copied onto intermediate
templates called RNA. In a method called expressed sequence
tagging (EST), researchers isolate these RNA templates from
cells and convert them back into a DNA form (complementary DNA,
or cDNA), which is then pasted into carriers. These cDNA segments
can then be used as molecular probes to identify intact expressed
genes in larger, cloned pieces of genomic DNA.
In both the whole
genome and EST sequencing approaches, the DNA within each carrier
is sequenced from both ends. The overlap of such end sequences
is used to arrange the clones so that the DNA they contain is
in the same sequence order as that in the intact genome.
National
Institutes of Health
National Institute of Allergy and Infectious Diseases
December 1998
|
|