Date of Award
9-1-2021
Degree Name
Master of Science
Department
Computer Science
First Advisor
Huang, Chun-Hsi
Abstract
The genome of an organism contains all hereditary information encoded in DNA.Genome databases are rapidly increasing. This increase in the amount of DNA data demands an increasing need to compress the DNA data in less space for faster transmission and research activities. General text compression algorithms don’t utilize the specific characteristics of a DNA sequence. There are various tools developed using different algorithms and approaches. Many of these tools include the implementation of Huffman encoding to incorporate the characteristics of DNA sequences. Huffman-based techniques center on the idea of selecting repeated sequences to form a skewed Huffman tree. The algorithm also lies around constructing multiple Huffman trees when encoding. These implementations have demonstrated an improvement in the compression ratios compared to the standard Huffman tree. This research suggests few improvements over one of these algorithms to select the repeat sequences to obtain better compression ratios.
Access
This thesis is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.