Problem 1 - Unweighted supertree estimation
Input:
A set of M (> 2) phylogenetic trees depicting evolutionary relationships for a set of N taxa. Individual trees cover overlapping taxa subsets from the taxa set. These trees depict conflicting evolutionary relations among these taxa subsets.
Output:
A supertree displaying the evolutionary relationships for all N taxa, such that the consensus relations among individual taxa subsets are preferably reflected in the supertree. Here, the branch lengths (weights) of the input trees are not used for supertree estimation. So, the problem is termed as unweighted supertree estimation.
Existing studies:
Existing studies either use matrix or graph based representation of input tree topologies, or decompose the trees into fixed or variable sized subtrees. So far, decomposition of input trees upto a minimum of triplets (subtrees of 3 taxa) have been proposed in literature. Such a method incurs cubic time and space complexity.
Our implementations (two different methods):
We have implemented a supertree method COSPEDTREE which uses evolutionary relations among individual couplets (taxa pair) to generate the supertree. The method uses cubic time complexity, and most importantly, quadratic space complexity, thus efficient for use in large scale biological datasets.
Link to main page of COSPEDTREE
We have extended the above method to generate COSPEDTREE-II , which also uses couplet relations to derive the final supertree. However, the output supertree is much better resolved compared to the earlier version. The running time of the new approach is also considerably lower.
Link to main page of COSPEDTREE-II
Problem 2 - Weighted Suprtree estimation
Input:
Here, the branch lengths (weights) of the input trees are present (input trees are weighted) and they are used for supertree estimation.
Output:
Output supertree quests for satisfying the following two objectives:
Existing studies:
Our implementation:
We have extended our previous unweighted supertree algorithm COSPEDTREE to incorporate branch length information. It is a two stage approach, described as follows.
Link to main page of COSPEDTREEBL
Problem Overview
Gene trees: Phylogenetic trees created by sampling a gene from a group of N taxa, and subsequently employing phylogenetic reconstruction methods.
Input: A set of M (> 1) gene trees covering a set of N taxa. These gene trees have conflicting topologies and branch lengths, due to independent site specific evolution of individual gene sequences. Even the consensus gene tree topology may not be the true species tree topology.
Reason for discordance: Mainly for one of the following three evolutionary processes: 1) Horizontal gene transfer (HGT), 2) Gene duplication / loss, and 3) incomplete lineage sorting (ILS) or deep coalescence (DC).
Objective: To infer a species tree from this set of incongruent gene trees.
Approach 1 - Supertree based species tree estimation
We have proposed a species tree estimation method COSPEDSPEC, which models ILS as the sole reason between gene tree / species tree incongruence.
Link to main page of COSPEDSpec
Approach 2 - Species tree estimation using couplet based internode and excess gene count
We have proposed another species tree estimation method IDXL, which models ILS as the sole reason between gene tree / species tree incongruence.
Approach 3 - Species tree estimation using couplet based accumulated coalescence rank and excess gene count
We have proposed two more species tree estimation methods named AcRNJ and AcRNJXL, which models ILS as the sole reason between gene tree / species tree incongruence.