TY - JOUR
T1 - Lifeprint
T2 - A novel k-tuple distance method for construction of phylogenetic trees
AU - Reyes-Prieto, Fabián
AU - García-Chéquer, Adda J.
AU - Jaimes-Díaz, Hueman
AU - Casique-Almazán, Janet
AU - Espinosa-Lara, Juana M.
AU - Palma-Orozco, Rosaura
AU - Méndez-Tenorio, Alfonso
AU - Maldonado-Rodríguez, Rogelio
AU - Beattie, Kenneth L.
PY - 2011
Y1 - 2011
N2 - Purpose: Here we describe LifePrint, a sequence alignment-independent k-tuple distance method to estimate relatedness between complete genomes. Methods: We designed a representative sample of all possible DNA tuples of length 9 (9-tuples). The final sample comprises 1878 tuples (called the LifePrint set of 9-tuples; LPS9) that are distinct from each other by at least two internal and noncontiguous nucleotide differences. For validation of our k-tuple distance method, we analyzed several real and simulated viroid genomes. Using different distance metrics, we scrutinized diverse viroid genomes to estimate the k-tuple distances between these genomic sequences. Then we used the estimated genomic k-tuple distances to construct phylogenetic trees using the neighbor-joining algorithm. A comparison of the accuracy of LPS9 and the previously reported 5-tuple method was made using symmetric differences between the trees estimated from each method and a simulated "true" phylogenetic tree. Results: The identified optimal search scheme for LPS9 allows only up to two nucleotide differences between each 9-tuple and the scrutinized genome. Similarity search results of simulated viroid genomes indicate that, in most cases, LPS9 is able to detect single-base substitutions between genomes efficiently. Analysis of simulated genomic variants with a high proportion of base substitutions indicates that LPS9 is able to discern relationships between genomic variants with up to 40% of nucleotide substitution. Conclusion: Our LPS9 method generates more accurate phylogenetic reconstructions than the previously proposed 5-tuples strategy. LPS9-reconstructed trees show higher bootstrap proportion values than distance trees derived from the 5-tuple method.
AB - Purpose: Here we describe LifePrint, a sequence alignment-independent k-tuple distance method to estimate relatedness between complete genomes. Methods: We designed a representative sample of all possible DNA tuples of length 9 (9-tuples). The final sample comprises 1878 tuples (called the LifePrint set of 9-tuples; LPS9) that are distinct from each other by at least two internal and noncontiguous nucleotide differences. For validation of our k-tuple distance method, we analyzed several real and simulated viroid genomes. Using different distance metrics, we scrutinized diverse viroid genomes to estimate the k-tuple distances between these genomic sequences. Then we used the estimated genomic k-tuple distances to construct phylogenetic trees using the neighbor-joining algorithm. A comparison of the accuracy of LPS9 and the previously reported 5-tuple method was made using symmetric differences between the trees estimated from each method and a simulated "true" phylogenetic tree. Results: The identified optimal search scheme for LPS9 allows only up to two nucleotide differences between each 9-tuple and the scrutinized genome. Similarity search results of simulated viroid genomes indicate that, in most cases, LPS9 is able to detect single-base substitutions between genomes efficiently. Analysis of simulated genomic variants with a high proportion of base substitutions indicates that LPS9 is able to discern relationships between genomic variants with up to 40% of nucleotide substitution. Conclusion: Our LPS9 method generates more accurate phylogenetic reconstructions than the previously proposed 5-tuples strategy. LPS9-reconstructed trees show higher bootstrap proportion values than distance trees derived from the 5-tuple method.
KW - Phylogeny
KW - Sequence alignment
KW - Similarity search
KW - Tuple
KW - Viroid
UR - http://www.scopus.com/inward/record.url?scp=79960982092&partnerID=8YFLogxK
U2 - 10.2147/AABC.S15021
DO - 10.2147/AABC.S15021
M3 - Artículo
C2 - 21918634
SN - 1178-6949
VL - 4
SP - 13
EP - 27
JO - Advances and Applications in Bioinformatics and Chemistry
JF - Advances and Applications in Bioinformatics and Chemistry
IS - 1
ER -