analysis of gene expression

Analysis
of Gene Expression of Human Brain

Y.
Sun, R. Yolken and the Stanley Neuropathology Consortium

The
Stanley Neurovirology Laboratory, Department of Pediatrics,
Johns Hopkins University School of Medicine, Baltimore, MD 21287

 

Normal
cell functions are dictated by well regulated patterns of gene
expression. Any insult that causes aberrant gene expression will
disturb the homeostasis of a cell as a unit. Therefore,
measurement of gene expression and the identification of genes
that are abnormally expressed during a disease process may
provide a clue to the understanding of the pathogenesis of
complex human diseases which involve the interaction of genetic
and environmental factors.

Schizophrenia
is generally considered a disease of the human brain. However,
there are no specific neuropathology changes in the brain that
are consistently associated with the severity and duration of the
disease. The etiology of schizophrenia may involve multifactorial
effect such as genetic, infectious, immunological, developmental
and biochemical factors. Epidemiologic evidence on seasonality of
births support the possibility of viral infection in the
development of schizophrenia [1,2]. Mechanisms of virus-host
interaction are heterogeneous. At the molecular level, infectious
agents can interact with hosts by interfering with its gene
regulation and expression leading to functional or structural
abnormalities and even death of host cells. In addition,
infectious agents can trigger the activation of the host immune
response which can result in the generation of cytokines and
other mediators which can effect brain function. Because there is
neither localized neuropathology nor specific pathologic agents
recognized in association with schizophrenia, it is very
difficult to take advantage of the conventional methodology of
histopathology and immunology for the investigation of the role
of a pathologic agent in the pathogenesis. Alternatively, recent
described technology in molecular biology makes it possible to
study the gene expression of cells and offers an attractive
approach to the investigation of this disease.

A
number of methods are available for the study of gene expression.
The most widely used technique is the differential display of
gene expression from two different sources, i.e. case and
control. There are many versions of differential display with
characteristic advantages and disadvantages [3]. Success with
this technique varies depending on the quantity and quality of
the target genes. Genes that are expressed in very low amounts
are not readily detected because the sensitivity of the technique
is limited by visualizing a band from the gel. Polymerase chain
reaction (PCR) based differential display method improves the
sensitivity but at the cost of specificity. Furthermore, PCR
amplification tends to favor certain size ranges of the template
pool, as this technique amplifies certain DNA fragment better
than the others. Although longer PCR products can be amplified
with modified reaction conditions, it is difficult to adjust the
PCR condition to suit all sizes in the same pool of cDNA
fragments. Therefore, differential display techniques are subject
to amplification bias in the relative abundance of each gene,
making it difficult to quantitatively compare gene product.

Recently,
a team led by Kenneth Kinzler at the Johns Hopkins University
developed a new technique termed serial analysis of gene
expression (SAGE) [4]. This technique allow for evaluation of
global gene expression from cells or tissue and also the
comparison of the relative abundance of each gene transcripts
from two or more different sources through the application of a
computerized program. Briefly, messenger RNA is extracted and
converted to cDNA with biotinylated oligo-dT in a standard
reaction. Then, the cDNA is digested with a restriction
endonuclease. Those restriction fragments of DNA with the
oligo-dT are selectively collected by mixing with streptavidin
coated magnetic beads in a strong magnetic field. Such fragments
are then divided into equal pairs ligated to a pair of linkers
respectively. Each of the linkers contains a recognition sequence
for a type IIS DNA restriction enzyme which cuts the DNA at a
distance 10 to 20 bp from the asymmetric recognition site. In
this way, a short tag of equal length is generated for each
expressed gene. Such tags are then ligated and used as templates
for PCR. The sensitivity of detecting genes of low abundance is
greatly increased after amplification. Because the templates are
equal size, all the PCR products produced are of the same length.
Therefore, unlike the PCR of a mixed length templates, PCR bias
is unlikely to occur in SAGE. The tags are then released from the
linkers and ligated end-to-end by T
4DNA
ligase to form concatenate multi-tag chains. The concatenate
chains are then cloned into a plasmid vector and sequenced.
Clones with various lengths inserts are recorded by the number of
tags and the tag sequence information, and linked to the GenBank
search. By accumulating the number of tags, relative abundance of
each expressed gene is obtained. Genes that are expressed
aberrantly as compared to controls are considered potential
candidate genes of interest. Such genes will be investigated by
PCR, Northern blot hybridization and cDNA library screening.

Because
of the unique efficiency of SAGE, it is possible to analyze large
number of RNA species in a short period of time. There are
estimated 150,000 human genes. SAGE tags of 10-base would cover
one million (i.e. 4
10)
sequences of different combination. Therefore, all the expressed
genes can be represented by SAGE tags. RNA species that are
expressed in high abundance can be detected more readily because
they accumulate more quickly than genes of low abundance. SAGE is
a very attractive method for the thorough evaluation of gene
expression and can be used to detect RNA species that are
expressed in different amounts in different developmental stages
or disease status. We have used SAGE for the analysis of RNA
species expressed in human brain tissue. Our data indicate that
SAGE is a potentially useful method for the analysis of RNA
species in human brain. Through the analysis of only 13 clones of
the concatenate tags, we were able to get sequence information of
151 tags (Table 1). Almost two-thirds (108/51) of the tags have
no match in the GenBank Database. This finding suggests that most
of the expressed genes in the adult human brain are still
uncharacterized. All of the tags that are matched to GenBank are
mRNA in nature, indicating that SAGE is specific for messenger
RNA devoid of ribosomal RNA contamination (Table 2). Another
important notion is that the majority of the genes are expressed
in low amounts. This is not surprising given the fact that the
brain is an organ of complex functions. Many more genes in the
brain are still unknown and new genes in the brain are discovered
continuously [5]. The one tag (AAAACATTCT) that is more frequent
in this collection does not match any sequence in the GenBank. It
may be a novel gene highly expressed in the brain but this needs
to be proven by additional analysis. For further information on
the relative abundance of these genes, more clones need to be
analyzed.

The
above data indicate that SAGE can be an effective method for the
characterization of brain mRNA. Since SAGE relies on the binding
of RNA at the 3′ poly A tail, the method should be able to
identify virtually all human mRNA species as well as microbial
RNAs which are polyadenylated, such as those which are found in
the myxovirus and paramyxovirus groups of negative-strand RNA
viruses. However, in light of its general applicability SAGE has
great potential for th analysis of complex human brain diseases
such as schizophrenia and related disorders.

TABLE
1. ANALYSIS OF TAGS FROM 13 CLONES OF CONCATEMERS IN NORMAL HUMAN
BRAIN

Occurrence

Number of tags

Genbank
Matched

Search
Unmatched

4

1

0

1

3

2

1

0

2

13

4

9

1

136

38

98

Total

151

43

108

TABLE
2. Tag Sequences Matched to the GenBank

Tag
sequences
n(frequency) GenBank
Match
GTGGCTCACG 3
(1.8)
Human
HLA class I genomic survey sequence
GATCCCAACT 2
(1.2)
Human
mRNA for metallothionein from cadmium
TGATTTCACT 2
(1.2)
Human
cytochrome c oxidase subunit III (COIII)pse
TGTGCTGAAC 2
(1.2)
Human
transferrin mRNA complete cds
GGGAAACCCC 2
(1.2)
Human
fibroblast mRNA fragment with alu sequence
AAAATAAAGA 1
(0.6)
Human
HAPI mRNA
AACCCAAAAA 1
(0.6)
Human
1 1kd protein mRNA
AAGCTCTCCT 1
(0.6)
Human
chromogranin A mRNA
ACCCTTGGCC 1
(0.6)
H.
sapiens CpG island DNA genomic Mse1 fragment
ACTTACCTGC 1
(0.6)
Human
mRNA for cytochrome c oxidase subunit VIB
AGAATCGCTT 1
(0.6)
Human
coatomer protein mRNA
AGGGCTTCCA 1
(0.6)
Human
HepGe 3′ region Mbol cDNA
AGGGTGAACG 1
(0.6)
Human
synaptobrevin 2 gene
AGGTCAGGAG 1
(0.6)
Human
mRNA for HLA class II DR-beta
CCAACAAGAA 1
(0.6)
Human
mRNA for cell surface glycoprotein
CCACTGCACT 1
(0.6)
Hum.
cortex mRNA containing an Alu repetit. elem.
CCTAGCTGGA 1
(0.6)
Human
mRNA for T-cell cyclophilin
CCTGTGGTCC 1
(0.6)
Human
Down Syndr. region of chromos. 21 DNA
CTTGTAATCC 1
(0.6)
Human
Down Syndr. region of chromos. 21 DNA
ATGAAACCCT 1
(0.6)
Human
Down Syndr. region of chromos. 21 DNA
GAACACATCC 1
(0.6)
H.
sapiens mRNA for ribosomal protein L19
GACTGTGCCA 1
(0.6)
Human
cytoplasmic dynein light chain 1 mRNA
GCAAGCCAAC 1
(0.6)
H.
sapiens mitoch. DNA for loop attachment sequence
GGAGTGGACA 1
(0.6)
Homo
sapiens ribosomal proteins L18 mRNA
GGGGTAAGAA 1
(0.6)
H.
sapiens phosphatidylethanolamine binding protein
GTAAGTGTAC 1
(0.6)
H.
sapiens mitoch. DNA for loop attachment sequence
GTGGCACGTG 1
(0.6)
Human
clone AZA1 Alu repeat sequence
GTGGCAGGTG 1
(0.6)
Human
ferritin H-type chain pseudogene
GTGGCGCGCG 1
(0.6)
H.
sapiens DNA for loop attachment sequence
GTTCCCTGGC 1
(0.6)
Human
FAU1P pseudogene, trinucleotide repeat region
TACAAGAGGA 1
(0.6)
Human
mRNA for DNA binding protein, TAXREB107
TAGGATGGGG 1
(0.6)
Human
sodium/potassium-transporting ATPase beta-3
TATCCCAGAA 1
(0.6)
Human
kpni repeat mRNA
TATCCTGGAA 1
(0.6)
Human
AMP deaminase (AMPD3) gene, exon 6
TGCACTTCAA 1
(0.6)
H.
sapiens mRNA for high endothelial venule
TGTGGGGCTC 1
(0.6)
Human
mRNA for histidyl-tRNA synthetase (HRS)
TTTTACCAGT 1
(0.6)
Human
chloride channel regulatory protein mRNA
TGATCTCCAA 1
(0.6)
Fatty
acid synthase (human breast)
GTTTCAGGTA 1
(0.6)
Homo
sapiens calcium-ATPase mRNA
GTGAAACCCT 1
(0.6)
H.
sapiens mRNA for laminin
GCGAAACCCC 1
(0.6)
Human
ataxia-telagiectasia locus, exon 4
AGCCACTGCG 1
(0.6)
Human
coagulation factor XI gene
ACCGTGGGCT 1
(0.6)
Human
creatine kinase B isoenzyme gene, exon 3

REFERENCES

[1]
O’Callaghan E, Gibson T, Colohan HA, Walshe D, Buckley P,
Waddington JR (1991) Season of birth in schizophrenia; evidence
for confinement of an excess of a winter births to patients
without a history of mental disorder. Br J Pyschiatry
158:764-769.

[2]
Yolken RH, Torrey EF (1995) Viruses, schizophrenia and bipolar
disorder (review). Clin Microbiol Rev 8:131-145.

[3] Yee
F, Yolken RH (1997) Identification of differentially expressed
RNA transcripts in neuropsychiatric disorders. Biol Psychiatry
41:759-761.

[4]
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW (1995) Serial
analysis of gene expression. Science 270:484-487.

[5]
Adams MD, Dubnick M, Kerlavage AR, Moreno R, Kelly JM, Utterback
TR, Nagle JW, Fields C, Venter JC (1992) Sequence identification
of 2375 human brain genes. Nature 335:632-634.

 

Research
supported by the Theodore and Vada Stanley Foundation