Retroviral
sequences of the human genome. Characterisation and Expression
Jonas Blomberg, MD, PhD, Section of
Virology, Dpt of Medical Sciences, Uppsala University. Academic Hospital, 751 85
Uppsala, Sweden. [email protected]
A model-based expert system for
detection of retroviral sequences, RetroTector (c), was developed by Göran
Sperber and JB. It processes a human genome in 4-5 days when run on a cluster of
seven office tabletops.
Retroviral sequences are detected by
“fragment threading” an algorithm (RETROVID) which finds an optimal succession
of conserved retroviral motifs. Proteins (‘puteins’) are deduced by a
multifunctional algorithm (ORFID). Putative extra ORFs are also deduced based on
splice sites and stops/shifts. Results are stored in sequence tables from which
subsets easily can be selected, processed and exported. Data are presented
graphically, together with similar annotated sequences from GenBank and RepBase.
8173 retroviral elements were predicted in the April 2003 human genome version.
Compared with RepBase, retroTector 010 missed 10xx retroviral elements, while
Repbase missed 500 elements found by Retrotector. 3661 predicted elements had a
clearly recognizable pol gene. They were classified according to
similarity of their entire pol gene by an iterative clustering procedure,
which progressively removes elements of lower and lower similarity until a
manageable subset is obtained. The procedure preserves tree topology by pruning
only the finer branches, and is independent of human bias. Based on motif usage
6410 predicted elements were gamma- and 1613 betaretroviral, while 1290 where
weakly similar to other genera or not classified. The sum exceed 8173 because
some weakly scoring elements got multiple genus assignments.. Clustering occured
on a gliding scale, where some branches well coincided with preexisting HERV
groups, others not. Under the motto “Evolution, Not Revolution” the pruned tree
was decorated with HERV groups from the literature or from RepBase, when
possible. However, there are many branches which have not yet been named.
Using the RetroTector (c) data set,
ten broadly targeted real time PCRs for detection and quantification of HERV-E,
HERV-I/T, HERV-H and HERV-W were constructed. They detect 1-10 nucleic acid
copies, and detect many, but probably not all, members of the respective group
without cross-reaction between groups. RNA from a panel of normal tissues, brain
and plasma from normal blood donors were analysed. Results of PCRs targeted to
the same HERV group could vary widely for the same tissue, probably due to a
heterogeneity in expression between elements in the group. RNA expression in
placenta was high for HERV-W, while HERV-H and HERV-I/T expression were high in
testis and brain, and HERV-E in testis. HERV expression in brain diseases was
also studied. No clear connection between HERV expression and disease was seen
in a small number of samples.