Problem 1 - Unweighted supertree estimation
Input:
A set of M (> 2) phylogenetic trees depicting evolutionary relationships for a set of N taxa. Individual
trees cover overlapping taxa subsets from the taxa set. These trees depict conflicting evolutionary relations
among these taxa subsets.
Output:
A supertree displaying the evolutionary relationships for all N taxa, such that
the consensus relations among individual taxa subsets are preferably reflected in the
supertree. Here, the branch lengths (weights) of the input trees are not used for supertree
estimation. So, the problem is termed as unweighted supertree estimation.
Existing studies:
Existing studies either use matrix or graph based representation of input tree topologies, or decompose
the trees into fixed or variable sized subtrees. So far, decomposition of input trees upto
a minimum of triplets (subtrees of 3 taxa) have been proposed in literature. Such a method
incurs cubic time and space complexity.
Our implementations (two different methods):
We have implemented a supertree method COSPEDTREE which uses evolutionary relations
among individual couplets (taxa pair) to generate the supertree. The method uses cubic time complexity,
and most importantly, quadratic space complexity, thus efficient for use in large scale biological datasets.
Link to main page of COSPEDTREE
We have extended the above method to generate COSPEDTREE-II , which also uses couplet relations
to derive the final supertree. However, the output supertree is much better resolved compared to the
earlier version. The running time of the new approach is also considerably lower.
Problem 2 - Weighted Suprtree estimation
Input:
Here, the branch lengths (weights) of the input trees are present (input trees are weighted)
and they are used for supertree estimation.
Output:
Output supertree quests for satisfying the following two objectives:
Existing studies:
Our implementation:
We have extended our previous unweighted supertree algorithm COSPEDTREE to incorporate branch length information. It is a two stage approach, described as follows.
Link to main page of COSPEDTREEBL
Problem Overview
Gene trees: Phylogenetic trees created by sampling a gene from a group of N taxa,
and subsequently employing phylogenetic reconstruction methods.
Input: A set of M (> 1) gene trees covering a set of N taxa. These
gene trees have conflicting topologies and branch lengths, due to independent site specific
evolution of individual gene sequences. Even the consensus gene tree topology may not
be the true species tree topology.
Reason for discordance: Mainly for one of the
following three evolutionary processes: 1) Horizontal gene transfer (HGT),
2) Gene duplication / loss, and 3) incomplete lineage sorting (ILS)
or deep coalescence (DC).
Objective: To infer a species tree from this set of incongruent gene trees.
Approach 1 - Supertree based species tree estimation
We have proposed a species tree estimation method COSPEDSPEC,
which models ILS as the sole reason between gene tree / species tree incongruence.
COSPEDSPEC has cubic time complexity and quadratic space complexity, which makes it useful in
large biological datasets.
Link to main page of COSPEDSpec
Approach 2 - Species tree estimation using couplet based internode and excess gene count
We have proposed another species tree estimation method IDXL,
which models ILS as the sole reason between gene tree / species tree incongruence.
IDXL exhibits improved performance compared to the NJst approach, especially for
biological datasets (in terms of higher bootstrap support values on the recommended clades).
Approach 3 - Species tree estimation using couplet based accumulated coalescence rank and excess gene count
We have proposed two more species tree estimation methods named AcRNJ and AcRNJXL,
which models ILS as the sole reason between gene tree / species tree incongruence.
Both AcRNJ and AcRNJXL exhibit improved performance compared to STAR, especially for
the biological datasets.