Retroviral sequences of the human genome

Retroviral

sequences of the human genome. Characterisation and Expression

 

Jonas Blomberg, MD, PhD, Section of

Virology, Dpt of Medical Sciences, Uppsala University. Academic Hospital, 751 85

Uppsala, Sweden. [email protected]

 

A model-based expert system for

detection of retroviral sequences, RetroTector (c), was developed by Göran

Sperber and JB. It processes a human genome in 4-5 days when run on a cluster of

seven office tabletops.

Retroviral sequences are detected by

“fragment threading” an algorithm (RETROVID) which finds an optimal succession

of conserved retroviral motifs. Proteins (‘puteins’) are deduced by a

multifunctional algorithm (ORFID). Putative extra ORFs are also deduced based on

splice sites and stops/shifts. Results are stored in sequence tables from which

subsets easily can be selected, processed and exported. Data are presented

graphically, together with similar annotated sequences from GenBank and RepBase.

8173 retroviral elements were predicted in the April 2003 human genome version.

Compared with RepBase, retroTector 010 missed 10xx retroviral elements, while

Repbase missed 500 elements found by Retrotector. 3661 predicted elements had a

clearly recognizable pol gene. They were classified according to

similarity of their entire pol gene by an iterative clustering procedure,

which progressively removes elements of lower and lower similarity until a

manageable subset is obtained. The procedure preserves tree topology by pruning

only the finer branches, and is independent of human bias. Based on motif usage

6410 predicted elements were gamma- and 1613 betaretroviral, while 1290 where

weakly similar to other genera or not classified. The sum exceed 8173 because

some weakly scoring elements got multiple genus assignments.. Clustering occured

on a gliding scale,  where some branches well coincided with preexisting HERV

groups, others not. Under the motto “Evolution, Not Revolution” the pruned tree

was decorated with HERV groups from the literature or from RepBase, when

possible. However, there are many branches which have not yet been named.

Using the RetroTector (c) data set,

ten broadly targeted real time PCRs for detection and quantification of HERV-E,

HERV-I/T, HERV-H and HERV-W were constructed. They detect 1-10 nucleic acid

copies, and detect many, but probably not all, members of the respective group

without cross-reaction between groups. RNA from a panel of normal tissues, brain

and plasma from normal blood donors were analysed. Results of PCRs targeted to

the same HERV group could vary widely for the same tissue, probably due to a

heterogeneity in expression between elements in the group. RNA expression in

placenta was high for HERV-W, while HERV-H and HERV-I/T expression were high in

testis and brain, and HERV-E in testis. HERV expression in brain diseases was

also studied. No clear connection between HERV expression and disease was seen

in a small number of samples.