Computational Promoter Recognition in Eukaryotic Genomic DNA
Studien zur Mustererkennung , Bd. 7
Uwe Ohler
ISBN 978-3-89722-988-4
206 pages, year of publication: 2002
price: 40.50 €
An important aspect within computational biology deals with the analysis
of biological sequence data with methods known from statistical pattern
recognition. This thesis describes a system to identify the regulatory DNA
sequences known as promoters, which control the expression of genes
in their proximity. Promoters follow a common structure because all the
genes controlled by them are accessed by the same enzyme complex. However,
individual promoters differ very much from each other: This enables a
specific activation of a gene at a certain time or tissü, and thus
the development of a complex organism.
The thesis presents increasingly complex probabilistic models representing
the DNA sequence and structure of promoters, and shows how they can be used
to identify promoter regions in long DNA seqünces. Among other methods,
different types of Markov chain and generalized hidden Markov models are
studied, and a Bayesian classification approach is compared to neural networks.
The system was successfully applied by the Drosophila Genome Project to the
complete genome of the fruit fly, and results are compared with promoter
recognition in human sequences.