Documentation

Clustering

All protein sequences were compared against each other with diamond (https://github.com/bbuchfink/diamond). The diamond output has been processed with SiLiX (http://lbbe.univ-lyon1.fr/SiLiX) to build protein clusters.
diamond v0.8.29.91 | by Benjamin Buchfink
diamond blastp -d diamdb -q input.fasta -o output_diamond -p 10 --seq 10000 --sensitive

silix version 1.2
silix -f FAM input.fasta ouput_diamond > output_silix

Sequence Alignments

Alignments have been calculated for each cluster with MAFFT version 7 (https://mafft.cbrc.jp/alignment/software ).
MAFFT v7.266
mafft inputfile > outputfile
Then alignements have been post-processed with HmmCleaner.pl version 0.180750 (https://metacpan.org/dist/Bio-MUST-Apps-HmmCleaner/view/bin/HmmCleaner.pl) using singularity (https://gitlab.in2p3.fr/penel/docker-hmmcleaner)
Finaly the protein alignements and CDS sequences are used to build CDS alignments.

Species Tree

We selected all families containing one single gene for each of the 26 ciliate genomes (N=36 single-copy gene families present in 22 Paramecium + 4 Tetrahymena). Based on protein sequence alignments, we inferred the maximum-likelihood tree using the edge-linked partition model in IQ-TREE V1.6 (http://www.iqtree.org/ Chernomor et al., 2016; Nguyen et al., 2015).
Each gene was considered as a partition:
iqtree version 1.6
iqtree -spp partition.nexus

Reconciled Phylogenetic Trees

For each gene family with at least 10 species, a gene tree was calculated with the iqtree2 (IQ-TREE multicore version 2.2.0 COVID-edition http://www.iqtree.org/) using a codon model:
iqtree2 -s inputalinfile -m TESTNEW -madd LG4M,LG4X -st CODON6 -alrt 1000
Then reconciled gene trees (with duplication, loss and transfer events) were computed with GeneRax (https://github.com/BenoitMorel/GeneRax), based on its protein alignment, its gene tree and the species tree (see above). We used blocks of 100 alignments.

generax -f input_famfile -r UndatedDTL -s input_rooted_species_tree -p directory_output
cat input-famfile
[FAMILIES]
- FAM001
starting_gene_tree = pathToTree/FAM001.newick
alignment = pathToAln/FAM001.aln
subst_model = GTR+G
- FAM002
starting_gene_tree = pathToTree/FAM002.newick
alignment = pathToAln/FAM002.aln
subst_model = GTR+G
- FAM003
starting_gene_tree = pathToTree/FAM003.newick
alignment = pathToAln/FAM003.aln
subst_model = GTR+G
.
.
.
- FAM100
starting_gene_tree = pathToTree/FAM100.newick
alignment = pathToAln/FAM100.aln
subst_model = GTR+G