Analysis
of Gene Expression of Human Brain
Y.
Sun, R. Yolken and the Stanley Neuropathology Consortium
The
Stanley Neurovirology Laboratory, Department of Pediatrics,
Johns Hopkins University School of Medicine, Baltimore, MD 21287
Normal
cell functions are dictated by well regulated patterns of gene
expression. Any insult that causes aberrant gene expression will
disturb the homeostasis of a cell as a unit. Therefore,
measurement of gene expression and the identification of genes
that are abnormally expressed during a disease process may
provide a clue to the understanding of the pathogenesis of
complex human diseases which involve the interaction of genetic
and environmental factors.
Schizophrenia
is generally considered a disease of the human brain. However,
there are no specific neuropathology changes in the brain that
are consistently associated with the severity and duration of the
disease. The etiology of schizophrenia may involve multifactorial
effect such as genetic, infectious, immunological, developmental
and biochemical factors. Epidemiologic evidence on seasonality of
births support the possibility of viral infection in the
development of schizophrenia [1,2]. Mechanisms of virus-host
interaction are heterogeneous. At the molecular level, infectious
agents can interact with hosts by interfering with its gene
regulation and expression leading to functional or structural
abnormalities and even death of host cells. In addition,
infectious agents can trigger the activation of the host immune
response which can result in the generation of cytokines and
other mediators which can effect brain function. Because there is
neither localized neuropathology nor specific pathologic agents
recognized in association with schizophrenia, it is very
difficult to take advantage of the conventional methodology of
histopathology and immunology for the investigation of the role
of a pathologic agent in the pathogenesis. Alternatively, recent
described technology in molecular biology makes it possible to
study the gene expression of cells and offers an attractive
approach to the investigation of this disease.
A
number of methods are available for the study of gene expression.
The most widely used technique is the differential display of
gene expression from two different sources, i.e. case and
control. There are many versions of differential display with
characteristic advantages and disadvantages [3]. Success with
this technique varies depending on the quantity and quality of
the target genes. Genes that are expressed in very low amounts
are not readily detected because the sensitivity of the technique
is limited by visualizing a band from the gel. Polymerase chain
reaction (PCR) based differential display method improves the
sensitivity but at the cost of specificity. Furthermore, PCR
amplification tends to favor certain size ranges of the template
pool, as this technique amplifies certain DNA fragment better
than the others. Although longer PCR products can be amplified
with modified reaction conditions, it is difficult to adjust the
PCR condition to suit all sizes in the same pool of cDNA
fragments. Therefore, differential display techniques are subject
to amplification bias in the relative abundance of each gene,
making it difficult to quantitatively compare gene product.
Recently,
a team led by Kenneth Kinzler at the Johns Hopkins University
developed a new technique termed serial analysis of gene
expression (SAGE) [4]. This technique allow for evaluation of
global gene expression from cells or tissue and also the
comparison of the relative abundance of each gene transcripts
from two or more different sources through the application of a
computerized program. Briefly, messenger RNA is extracted and
converted to cDNA with biotinylated oligo-dT in a standard
reaction. Then, the cDNA is digested with a restriction
endonuclease. Those restriction fragments of DNA with the
oligo-dT are selectively collected by mixing with streptavidin
coated magnetic beads in a strong magnetic field. Such fragments
are then divided into equal pairs ligated to a pair of linkers
respectively. Each of the linkers contains a recognition sequence
for a type IIS DNA restriction enzyme which cuts the DNA at a
distance 10 to 20 bp from the asymmetric recognition site. In
this way, a short tag of equal length is generated for each
expressed gene. Such tags are then ligated and used as templates
for PCR. The sensitivity of detecting genes of low abundance is
greatly increased after amplification. Because the templates are
equal size, all the PCR products produced are of the same length.
Therefore, unlike the PCR of a mixed length templates, PCR bias
is unlikely to occur in SAGE. The tags are then released from the
linkers and ligated end-to-end by T4DNA
ligase to form concatenate multi-tag chains. The concatenate
chains are then cloned into a plasmid vector and sequenced.
Clones with various lengths inserts are recorded by the number of
tags and the tag sequence information, and linked to the GenBank
search. By accumulating the number of tags, relative abundance of
each expressed gene is obtained. Genes that are expressed
aberrantly as compared to controls are considered potential
candidate genes of interest. Such genes will be investigated by
PCR, Northern blot hybridization and cDNA library screening.
Because
of the unique efficiency of SAGE, it is possible to analyze large
number of RNA species in a short period of time. There are
estimated 150,000 human genes. SAGE tags of 10-base would cover
one million (i.e. 410)
sequences of different combination. Therefore, all the expressed
genes can be represented by SAGE tags. RNA species that are
expressed in high abundance can be detected more readily because
they accumulate more quickly than genes of low abundance. SAGE is
a very attractive method for the thorough evaluation of gene
expression and can be used to detect RNA species that are
expressed in different amounts in different developmental stages
or disease status. We have used SAGE for the analysis of RNA
species expressed in human brain tissue. Our data indicate that
SAGE is a potentially useful method for the analysis of RNA
species in human brain. Through the analysis of only 13 clones of
the concatenate tags, we were able to get sequence information of
151 tags (Table 1). Almost two-thirds (108/51) of the tags have
no match in the GenBank Database. This finding suggests that most
of the expressed genes in the adult human brain are still
uncharacterized. All of the tags that are matched to GenBank are
mRNA in nature, indicating that SAGE is specific for messenger
RNA devoid of ribosomal RNA contamination (Table 2). Another
important notion is that the majority of the genes are expressed
in low amounts. This is not surprising given the fact that the
brain is an organ of complex functions. Many more genes in the
brain are still unknown and new genes in the brain are discovered
continuously [5]. The one tag (AAAACATTCT) that is more frequent
in this collection does not match any sequence in the GenBank. It
may be a novel gene highly expressed in the brain but this needs
to be proven by additional analysis. For further information on
the relative abundance of these genes, more clones need to be
analyzed.
The
above data indicate that SAGE can be an effective method for the
characterization of brain mRNA. Since SAGE relies on the binding
of RNA at the 3′ poly A tail, the method should be able to
identify virtually all human mRNA species as well as microbial
RNAs which are polyadenylated, such as those which are found in
the myxovirus and paramyxovirus groups of negative-strand RNA
viruses. However, in light of its general applicability SAGE has
great potential for th analysis of complex human brain diseases
such as schizophrenia and related disorders.
TABLE
1. ANALYSIS OF TAGS FROM 13 CLONES OF CONCATEMERS IN NORMAL HUMAN
BRAIN
Occurrence |
Number of tags |
Genbank |
Search |
4 |
1 |
0 |
1 |
3 |
2 |
1 |
0 |
2 |
13 |
4 |
9 |
1 |
136 |
38 |
98 |
Total |
151 |
43 |
108 |
TABLE
2. Tag Sequences Matched to the GenBank
Tag sequences |
n(frequency) | GenBank Match |
GTGGCTCACG | 3 (1.8) |
Human HLA class I genomic survey sequence |
GATCCCAACT | 2 (1.2) |
Human mRNA for metallothionein from cadmium |
TGATTTCACT | 2 (1.2) |
Human cytochrome c oxidase subunit III (COIII)pse |
TGTGCTGAAC | 2 (1.2) |
Human transferrin mRNA complete cds |
GGGAAACCCC | 2 (1.2) |
Human fibroblast mRNA fragment with alu sequence |
AAAATAAAGA | 1 (0.6) |
Human HAPI mRNA |
AACCCAAAAA | 1 (0.6) |
Human 1 1kd protein mRNA |
AAGCTCTCCT | 1 (0.6) |
Human chromogranin A mRNA |
ACCCTTGGCC | 1 (0.6) |
H. sapiens CpG island DNA genomic Mse1 fragment |
ACTTACCTGC | 1 (0.6) |
Human mRNA for cytochrome c oxidase subunit VIB |
AGAATCGCTT | 1 (0.6) |
Human coatomer protein mRNA |
AGGGCTTCCA | 1 (0.6) |
Human HepGe 3′ region Mbol cDNA |
AGGGTGAACG | 1 (0.6) |
Human synaptobrevin 2 gene |
AGGTCAGGAG | 1 (0.6) |
Human mRNA for HLA class II DR-beta |
CCAACAAGAA | 1 (0.6) |
Human mRNA for cell surface glycoprotein |
CCACTGCACT | 1 (0.6) |
Hum. cortex mRNA containing an Alu repetit. elem. |
CCTAGCTGGA | 1 (0.6) |
Human mRNA for T-cell cyclophilin |
CCTGTGGTCC | 1 (0.6) |
Human Down Syndr. region of chromos. 21 DNA |
CTTGTAATCC | 1 (0.6) |
Human Down Syndr. region of chromos. 21 DNA |
ATGAAACCCT | 1 (0.6) |
Human Down Syndr. region of chromos. 21 DNA |
GAACACATCC | 1 (0.6) |
H. sapiens mRNA for ribosomal protein L19 |
GACTGTGCCA | 1 (0.6) |
Human cytoplasmic dynein light chain 1 mRNA |
GCAAGCCAAC | 1 (0.6) |
H. sapiens mitoch. DNA for loop attachment sequence |
GGAGTGGACA | 1 (0.6) |
Homo sapiens ribosomal proteins L18 mRNA |
GGGGTAAGAA | 1 (0.6) |
H. sapiens phosphatidylethanolamine binding protein |
GTAAGTGTAC | 1 (0.6) |
H. sapiens mitoch. DNA for loop attachment sequence |
GTGGCACGTG | 1 (0.6) |
Human clone AZA1 Alu repeat sequence |
GTGGCAGGTG | 1 (0.6) |
Human ferritin H-type chain pseudogene |
GTGGCGCGCG | 1 (0.6) |
H. sapiens DNA for loop attachment sequence |
GTTCCCTGGC | 1 (0.6) |
Human FAU1P pseudogene, trinucleotide repeat region |
TACAAGAGGA | 1 (0.6) |
Human mRNA for DNA binding protein, TAXREB107 |
TAGGATGGGG | 1 (0.6) |
Human sodium/potassium-transporting ATPase beta-3 |
TATCCCAGAA | 1 (0.6) |
Human kpni repeat mRNA |
TATCCTGGAA | 1 (0.6) |
Human AMP deaminase (AMPD3) gene, exon 6 |
TGCACTTCAA | 1 (0.6) |
H. sapiens mRNA for high endothelial venule |
TGTGGGGCTC | 1 (0.6) |
Human mRNA for histidyl-tRNA synthetase (HRS) |
TTTTACCAGT | 1 (0.6) |
Human chloride channel regulatory protein mRNA |
TGATCTCCAA | 1 (0.6) |
Fatty acid synthase (human breast) |
GTTTCAGGTA | 1 (0.6) |
Homo sapiens calcium-ATPase mRNA |
GTGAAACCCT | 1 (0.6) |
H. sapiens mRNA for laminin |
GCGAAACCCC | 1 (0.6) |
Human ataxia-telagiectasia locus, exon 4 |
AGCCACTGCG | 1 (0.6) |
Human coagulation factor XI gene |
ACCGTGGGCT | 1 (0.6) |
Human creatine kinase B isoenzyme gene, exon 3 |
REFERENCES
[1]
O’Callaghan E, Gibson T, Colohan HA, Walshe D, Buckley P,
Waddington JR (1991) Season of birth in schizophrenia; evidence
for confinement of an excess of a winter births to patients
without a history of mental disorder. Br J Pyschiatry
158:764-769.
[2]
Yolken RH, Torrey EF (1995) Viruses, schizophrenia and bipolar
disorder (review). Clin Microbiol Rev 8:131-145.
[3] Yee
F, Yolken RH (1997) Identification of differentially expressed
RNA transcripts in neuropsychiatric disorders. Biol Psychiatry
41:759-761.
[4]
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial
analysis of gene expression. Science 270:484-487.
[5]
Adams MD, Dubnick M, Kerlavage AR, Moreno R, Kelly JM, Utterback
TR, Nagle JW, Fields C, Venter JC (1992) Sequence identification
of 2375 human brain genes. Nature 335:632-634.
Research
supported by the Theodore and Vada Stanley Foundation