{"title":"A complete characterization of pairs of binary phylogenetic trees with identical $A_k$-alignments","authors":"Mirko Wilde, Mareike Fischer","doi":"arxiv-2408.07011","DOIUrl":null,"url":null,"abstract":"Phylogenetic trees play a key role in the reconstruction of evolutionary\nrelationships. Typically, they are derived from aligned sequence data (like\nDNA, RNA, or proteins) by using optimization criteria like, e.g., maximum\nparsimony (MP). It is believed that the latter is able to reconstruct the\n\\enquote{true} tree, i.e., the tree that generated the data, whenever the\nnumber of substitutions required to explain the data with that tree is\nrelatively small compared to the size of the tree (measured in the number $n$\nof leaves of the tree, which represent the species under investigation).\nHowever, reconstructing the correct tree from any alignment first and foremost\nrequires the given alignment to perform differently on the \\enquote{correct}\ntree than on others. A special type of alignments, namely so-called $A_k$-alignments, has gained\nconsiderable interest in recent literature. These alignments consist of all\nbinary characters (\\enquote{sites}) which require precisely $k$ substitutions\non a given tree. It has been found that whenever $k$ is small enough (in\ncomparison to $n$), $A_k$-alignments uniquely characterize the trees that\ngenerated them. However, recent literature has left a significant gap between\n$k\\leq 2k+2$ -- namely the cases in which no such characterization is possible\n-- and $k\\geq 4k$ -- namely the cases in which this characterization works. It\nis the main aim of the present manuscript to close this gap, i.e., to present a\nfull characterization of all pairs of trees that share the same\n$A_k$-alignment. In particular, we show that indeed every binary phylogenetic\ntree with $n$ leaves is uniquely defined by its $A_k$-alignments if $n\\geq\n2k+3$. By closing said gap, we also ensure that our result is optimal.","PeriodicalId":501044,"journal":{"name":"arXiv - QuanBio - Populations and Evolution","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Populations and Evolution","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.07011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Phylogenetic trees play a key role in the reconstruction of evolutionary
relationships. Typically, they are derived from aligned sequence data (like
DNA, RNA, or proteins) by using optimization criteria like, e.g., maximum
parsimony (MP). It is believed that the latter is able to reconstruct the
\enquote{true} tree, i.e., the tree that generated the data, whenever the
number of substitutions required to explain the data with that tree is
relatively small compared to the size of the tree (measured in the number $n$
of leaves of the tree, which represent the species under investigation).
However, reconstructing the correct tree from any alignment first and foremost
requires the given alignment to perform differently on the \enquote{correct}
tree than on others. A special type of alignments, namely so-called $A_k$-alignments, has gained
considerable interest in recent literature. These alignments consist of all
binary characters (\enquote{sites}) which require precisely $k$ substitutions
on a given tree. It has been found that whenever $k$ is small enough (in
comparison to $n$), $A_k$-alignments uniquely characterize the trees that
generated them. However, recent literature has left a significant gap between
$k\leq 2k+2$ -- namely the cases in which no such characterization is possible
-- and $k\geq 4k$ -- namely the cases in which this characterization works. It
is the main aim of the present manuscript to close this gap, i.e., to present a
full characterization of all pairs of trees that share the same
$A_k$-alignment. In particular, we show that indeed every binary phylogenetic
tree with $n$ leaves is uniquely defined by its $A_k$-alignments if $n\geq
2k+3$. By closing said gap, we also ensure that our result is optimal.