Difference between revisions of "Ggtree"

From EEBedia
Jump to: navigation, search
(Getting Started)
(References)
 
(138 intermediate revisions by 2 users not shown)
Line 12: Line 12:
 
To introduce you to the R package [http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12628/abstract ggtree] for plotting phylogenetic trees.
 
To introduce you to the R package [http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12628/abstract ggtree] for plotting phylogenetic trees.
  
== Introduction ==
 
  
 
== Getting Started ==
 
== Getting Started ==
  
Download the tree file [http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/moths.txt moths.txt] and save in a convenient place on your hard drive.
+
This tutorial is written for the cluster user in mind, but feel free to perform it with your own local version of <tt>R</tt> (>=3.4). There are instructions at the end of this tutorial on how to get your local version of <tt>R</tt> set-up for this exercise.
  
====Installing Packages====
+
====Get Situated on the Cluster====
  
Open a terminal, start <tt>R</tt>, and install the packages we will be using. We'll be using the packages:
+
Log onto the cluster like normal but with an added flag to allow for any graphics to be displayed on your computer.
 +
 
 +
  ssh username@bbcsrv3.biotech.uconn.edu -Y
 +
 
 +
Be sure to get off the head node to avoid litigation and subsequent incarceration:
 +
 
 +
  qlogin
 +
 
 +
Navigate to the folder you want to be working in for the R portion of the lab and download the tree file we'll be working with:
 +
 
 +
curl -OL http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/moths.txt
 +
 
 +
For more information on the curl command and what options you can use with it consult [https://en.wikipedia.org/wiki/CURL Wikipedia]
 +
 
 +
====Start R and Load Packages====
 +
 
 +
See what versions of R are available:
 +
module avail
 +
 
 +
Load R version 3.4.4
 +
module load R/3.4.4
 +
 
 +
Start R
 +
R
 +
 
 +
You'll need to load the following packages:
  
 
  BiocInstaller
 
  BiocInstaller
ape
 
 
  Biostrings
 
  Biostrings
 +
ape
 
  ggplot2
 
  ggplot2
 
  ggtree
 
  ggtree
Line 34: Line 58:
 
  treeio
 
  treeio
  
You can install a package like so:
+
You can load packages like so:
  
  install.packages("BiocInstaller")
+
  library("BiocInstaller")
  
Many of the above packages are part of the [https://bioconductor.org/packages/release/bioc/ Bioconductor project] (like ggtree and treeio). You can find extensive documentation on their website for packages associated with their project.
+
To make it easier, you can load the <tt>easypackages</tt> library,
 +
 
 +
  library("easypackages")
 +
 
 +
and then load all the libraries at once with this command:
 +
 
 +
libraries("BiocInstaller","Biostrings","ape","ggplot2","ggtree","phytools","ggrepel","stringr","stringi","abind","treeio")
  
 
====Read in the Tree File====
 
====Read in the Tree File====
Line 44: Line 74:
 
We're dealing with a tree in the Newick file format which the function <tt>read.newick</tt> from the package <tt>treeio</tt> can handle:
 
We're dealing with a tree in the Newick file format which the function <tt>read.newick</tt> from the package <tt>treeio</tt> can handle:
  
  tree <- read.newick("moth.txt")
+
  tree <- read.newick("moths.txt")
  
R can handle more than just Newick formatted tree files. To see what other file formats from the various phylogenetic software that R can handle checkout [https://bioconductor.org/packages/release/bioc/html/treeio.html <tt>treeio</tt>]. Note: the functionality within <tt>treeio</tt> used to be part of the <tt>ggtree</tt> package itself, but the authors recently split <tt>ggtree</tt> in two with one part (<tt>ggtree</tt>) handling mostly plotting, and the other other part (<tt>treeio</tt>) handling mostly file input/output operations.
+
R can handle more than just Newick formatted tree files. To see what other file formats from the various phylogenetic software that R can handle checkout [https://bioconductor.org/packages/release/bioc/html/treeio.html <tt>treeio</tt>]. The functionality within <tt>treeio</tt> used to be part of the <tt>ggtree</tt> package itself, but the authors recently split <tt>ggtree</tt> in two with one part (<tt>ggtree</tt>) handling mostly plotting, and the other other part (<tt>treeio</tt>) handling mostly file input/output operations.
  
Let's quickly plot the tree to see what it looks like using the regular old <tt>plot</tt> function from the <tt>graphics</tt> package:
+
Let's quickly plot the tree to see what it looks like using the <tt>plot</tt> function from the <tt>ape</tt> package:
  
plot(tree)
+
plot(tree)
  
Notice the tree has all of its tips labeled. It's also a little cramped. You can expand the plot window to try to get the tree to display more legibly. We'll eventually use the package <tt>ggsave</tt> to control the dimensions of the plot when we finally export it to a PDF file. But until then, expand the plot window to get the tree to display reasonably well.
+
Notice the tree has all of its tips labeled. It's also a little cramped. You can expand the plot window to try to get the tree to display more legibly. We'll eventually use the function <tt>ggsave</tt> to control the dimensions of the plot when we finally export it to a PDF file. Don't worry about getting it to display well at the moment.
  
 
Now plot the tree using the <tt>ggtree</tt> package:
 
Now plot the tree using the <tt>ggtree</tt> package:
Line 58: Line 88:
 
  ggtree(tree)
 
  ggtree(tree)
  
What happened to our tree!? The <tt>plot</tt> function from the <tt>graphics</tt> package simply, but stubbornly, plots your tree without much ability to alter aesthetics. <tt>ggtree</tt> by default plots almost nothing, assuming you will add what you want to your tree plot. You can add elements to the plot using <tt>geoms</tt>, just the same way that you would add elements to plots using the package <tt>ggplot2</tt>. The use of <tt>geoms</tt> makes plotting easily extensible, but it is by no means normal <tt>R</tt> syntax. To see the <tt>geoms</tt> available to <tt>ggtree</tt> check out its [https://www.bioconductor.org/packages/release/bioc/manuals/ggtree/man/ggtree.pdf reference manual on BioConductor].
+
What happened to our tree!? The <tt>plot</tt> function from the <tt>ape</tt> package plotted the tree with tip labels, but <tt>ggtree</tt> plotted just the bare bones of the tree. <tt>ggtree</tt> by default plots almost nothing, assuming you will add what you want to your tree plot. The grammar/logic of <tt>ggtree</tt> is meant to model that of <tt>ggplot2</tt> and not the <tt>R</tt> language in general. The syntax of <tt>ggtree/ggplot2</tt> makes them easily extendable and particularly useful for graphics, but is by no means intuitive to someone used to <tt>R</tt> and plotting trees using <tt>ape</tt>.
  
====Adding/Altering Tree Elements with Geoms====
+
===Adding/Altering Tree Elements with Geoms and Geom-Like Functions===
 +
 
 +
<tt>ggtree</tt> has a variety of functions available to you that allow you to add different elements to a tree. Many of them have the prefix <tt>"geoms"</tt> and are collectively referred to as <tt>geoms</tt>. We'll only go over some of them. You start with a bare bones tree and elements to the tree, function by function, until you get the tree looking like you want it to. You'll see as we progress through this tutorial that visualizing trees in <tt>ggtree</tt> is a truly ''additive'' process.
  
 
=====Tip Labels=====
 
=====Tip Labels=====
Line 66: Line 98:
 
OK this tree would be more useful with tiplabels. Let's add them using <tt>geom_tiplab</tt>:
 
OK this tree would be more useful with tiplabels. Let's add them using <tt>geom_tiplab</tt>:
  
  ggtree(tree)+geom_tiplab()
+
  ggtree(tree) + geom_tiplab()
  
Those tip labels are nice but a little big. <tt>geom_tiplab</tt> has a bunch of arguments that you can play around with, including one for the text size. You can read more about the available arguments in [https://www.bioconductor.org/packages/release/bioc/manuals/ggtree/man/ggtree.pdf the <tt>ggtree</tt> manual] Plot the tree again but with smaller labels:
+
This tree is a little crowded. You can expand the graphics window vertically to get it all to fit, but it might be better to do a circular tree:
 +
 
 +
ggtree(tree, layout="circular")
 +
 
 +
OK that's a bit easier to work with. Those tip labels are nice but a little big. <tt>geom_tiplab</tt> has a bunch of arguments that you can play around with, including one for the text size. You can read more about the available arguments for a given function in [https://www.bioconductor.org/packages/release/bioc/manuals/ggtree/man/ggtree.pdf the <tt>ggtree</tt> manual]. Plot the tree again but with smaller labels:
 
   
 
   
  ggtree(tree)+geom_tiplab(size=3.5)
+
  ggtree(tree, layout="circular") + geom_tiplab2(size=3.5)
 +
 
 +
Notice we are using <tt>geom_tiplab2</tt> and not <tt>geom_tiplab</tt> to show labels on the circular tree. Don't ask me why there are two different tip label geoms for different tree layouts :)
 +
 
 +
The tree is still a little crowded, but at this point just play around with the size of the graphics window so you can work with it. We'll finalize how the tree looks later on using the <tt>ggsave</tt> function.
 +
 
 +
=====Clade Colors=====
 +
 
 +
In order to label clades, we need to tell <tt>ggtree</tt> which nodes subtend each clade we want to label. Just like with the plot function in <tt>ape</tt>, you can plot a tree with node numbers, see which nodes subtend the clade of interest and then tell <tt>ggtree</tt> the nodes that define the clades you want to label. Another way to get your node of interest is to use the <tt>findMRCA</tt> function (find '''m'''ost '''r'''ecent '''c'''ommon '''a'''ncestor) from the <tt>phytools</tt> package. We will pass the function two tip labels as arguments that define each clade of interest. In their study, Keegan et al (in review) found the Amphipyrinae (as currently classified taxonomically) is polyphyletic -- astoundingly polyphyletic. Let's color two clades: one for what they found to be true Amphipyrinae, and one for a tribe (Stiriini) currently classified taxonomically in Amphipyrinae, that they show to be far removed phylogenetically and thus has no business being classified within Amphipyrinae.
 +
 
 +
amphipyrinae_clade <- findMRCA(tree, c("*Redingtonia_alba_KLKDNA0031","MM01162_Amphipyra_perflua"))
 +
stiriini_clade <- findMRCA(tree, c("*Chrysoecia_scira_KLKDNA0002","*Annaphila_diva_KLKDNA0180"))
 +
 
 +
You can't (as far as I know) tell <tt>ggtree</tt> directly, as in ape, that the lineages descending from a given node should all be a certain color. What we need to do is define a group that consists of the clades we want colored, and to tell ggtree that it should color the tree by according to the group.
 +
 
 +
tree <- groupClade(tree, node=c(amphipyrinae_clade, stiriini_clade), group_name = "group")
 +
 
 +
In the above line of code, we apply the <tt>groupClade</tt> function to the object <tt>tree</tt>. We are not overwriting <tt>tree</tt> and making it consist of only the Amphipyrinae and Stiriini clades, just defining clades within <t>tree</tt>. Now if you were to execute <tt>ggtree(tree, layout="circular") + geom_tiplab2(size=3.5)</tt> will still look the same. We need to amend the command to tell it to style the tree by the grouping of clades we just made called "group":
 +
 
 +
ggtree(tree, layout="circular",aes(color=group, linetype="solid")) +
 +
geom_tiplab2(size=3.5)
 +
 
 +
As you can see the tree gets colored according to some default color scheme. We can define our own color scheme. Let's call it "palette":
 +
 
 +
palette <- c("#000000", "#009E73","#e5bc06")
 +
 
 +
The values in palette are color values represented by a [https://en.wikipedia.org/wiki/Hexadecimal hexadecimal] value. You can Google one of these hexadecimal values and a little interactive hexadecimal color picker will pop up. Feel free to pick two colors of your choosing to use in the palette -- but leave #000000 as it is. When you're designing a figure for publication, be sure to consider how easily your colors can be distinguished from each other by [http://www.somersault1824.com/tips-for-designing-scientific-figures-for-color-blind-readers/ colorblind] folks.
 +
 
 +
Now let's amend the ggtree command and tell it to use the colors we defined:
 +
 
 +
ggtree(tree, layout="circular",aes(color=group, linetype="solid")) +
 +
geom_tiplab2(size=3.5) +
 +
scale_colour_manual(values = palette)
 +
 
 +
The order in which clades are colored is determined by the order of clades in the <tt>groupClade</tt> command. Every lineage in the tree not within a defined clade (i.e. within stiriini_clade or amphipyrinae_clade) is automatically colored according to the first palette value. The first defined clade (stiriini_clade) is colored according to the second palette value, and the second defined clade (amphipyrinae_clade) is colored according to the third palette value.
  
 
=====Clade Labels=====
 
=====Clade Labels=====
 +
 +
Let's add some labels to the two clades. It's relatively straightforward now that we've already defined the subtending nodes:
 +
 +
ggtree(tree, layout="circular",aes(color=group, linetype="solid")) +
 +
geom_tiplab2(size=3.5) +
 +
scale_colour_manual(values = palette) +
 +
geom_cladelabel(node=amphipyrinae_clade, label="Amphipyrinae") +
 +
geom_cladelabel(node=stiriini_clade, label="Stiriini")
 +
 +
OK we should move those labels so they're not directly over the tree:
 +
 +
ggtree(tree, layout="circular",aes(color=group, linetype="solid")) +
 +
geom_tiplab2(size=3.5) +
 +
scale_colour_manual(values = palette) +
 +
geom_cladelabel(node=amphipyrinae_clade, label="Amphipyrinae", fontsize=6, offset=1.9, align=TRUE) +
 +
geom_cladelabel(node=stiriini_clade, label="Stiriini",fontsize=6, offset=2.1, align=TRUE)
 +
 +
It would also look better if we could move the text "Stiriini" a little so it's not directly over the line. Use the <tt>offset.text</tt> to move the text relative to the line.
 +
 +
You might have noticed that adding labels caused the rest of the tree to squish together. <tt>ggtree</tt> will try to fit everything into whatever size graphics window you have open. Try playing around with expanding and contracting the graphics window to see this functionality in action. Don't worry about getting everything to display perfectly in the graphics window, because we will use the function <tt>ggsave</tt> to create a PDF -- with definable dimensions -- to control how big the plot is, and thus how the tree looks with its many elements. You may wish to go back and change some of the tree elements after seeing your figure in PDF form.
  
 
=====Node Labels=====
 
=====Node Labels=====
  
 +
Let's add some node labels. You can add labels that show the number of the node, but what you would probably like to do is show nodal support values (e.g. bootstraps) which are stored as node labels. We can display the node labels using <tt>geom_label</tt>.
  
=====Clade Color=====
+
ggtree(tree, layout="circular",aes(color=group, linetype="solid")) +
 +
geom_tiplab2(size=3.5) +
 +
scale_colour_manual(values = palette) +
 +
geom_cladelabel(node=amphipyrinae_clade, label="Amphipyrinae", fontsize=6, offset=1.9, align=TRUE) +
 +
geom_cladelabel(node=stiriini_clade, label="Stiriini",fontsize=6, offset=2.1, align=TRUE) +
 +
geom_label(aes(label=label))
  
 +
You should see A LOT of node labels appear. They get redrawn when you change the size of the graphics window which is quite mesmerizing to watch. Let's subset the node labels in order to just show the ones we want and reduce some of the clutter. We'll first create a dataframe from the data within <tt>tree</tt>:
 +
 +
q <- ggtree(tree)
 +
d <- q$data
  
=====Scale Bar=====
+
First let's select only internal nodes (we don't need to show the leaf node labels, as we've already done that with <tt>geomtiplab2</tt>):
 +
 
 +
d <- d[!d$isTip,]
 +
 
 +
Now lets get rid of the root node:
 +
 
 +
d <- d[!d$node=="Root",]
 +
 
 +
And finally get rid of any node labels less than 75:
 +
 
 +
subset_labels <- d[as.double(d$label) > 75,]
 +
 
 +
Note that the object <tt>tree</tt> still has all of its labels. All we did was make a "copy" of <tt>tree</tt> called <tt>q</tt>, and then we created a subset of the data in <tt>q</tt> called <tt>d</tt>. Before, when we plotted the tree with node labels, we didn't specify which ones to label -- so <tt>ggtree</tt> labeled all of them. Now alter your <tt>geom_label</tt>, using the <tt>data</tt> argument available to <tt>geom_label</tt> display the dataset you just created consisting of a subset of node labels. Right now the only argument available to <tt>geom_label</tt> that we are using is the <tt>aes</tt> argument. Look in the <tt>ggtree</tt> manual for an argument that allows you to specify the data passed to <tt>geom_label</tt>.
 +
 
 +
=====Scale Bar and Title=====
 +
 
 +
Try adding a scale bar using the scale bar geom. I've added in some of the available arguments:
 +
 
 +
geom_treescale(x=2,y=1,fontsize=5,linesize=1,offset=0.5)
 +
 
 +
Add a title using <tt>ggtitle</tt>. Use it just like you would a <tt>geom</tt>:
 +
 
 +
ggtitle("This is a Title")
  
 
====Export Plot to PDF====
 
====Export Plot to PDF====
 +
 +
<tt>ggsave</tt> cannot plot <tt>phylo</tt> objects (like <tt>tree</tt>) directly like <tt>ape</tt> can. You must first apply your <tt>ggtree</tt> function to your phylo object, and assign the result to a new variable. Let's call that variable <tt>tree_save</tt>:
 +
 +
tree_save <- ggtree(tree, layout="circular",aes(color=group, linetype="solid")) +
 +
geom_tiplab2(size=3.5) +
 +
scale_colour_manual(values = palette) +
 +
geom_cladelabel(node=amphipyrinae_clade, label="Amphipyrinae", fontsize=6, offset=1.9, align=TRUE) +
 +
geom_cladelabel(node=stiriini_clade, label="Stiriini",fontsize=6, offset=2.1, align=TRUE)
 +
 +
Now you can export <tt>tree_save</tt> to a PDF
 +
 +
ggsave(tree_save,file="moth_tree.pdf", width=30, height=30)
 +
 +
If the layout of your tree just isn't quite what you wanted, go back and play around with the geoms and geom-like functions until the PDF is to your liking.
  
 
====Cite ggtree====
 
====Cite ggtree====
  
citation("ggtree")
+
Remember to cite <tt>ggtree</tt> if you use it in a published work!
 +
 
 +
citation("ggtree")
 +
 
 +
==Challenge!==
 +
 
 +
<tt>ggtree</tt> allows you to add images to a tree. [https://guangchuangyu.github.io/2018/03/annotating-phylogenetic-tree-with-images-using-ggtree-and-ggimage/ Here's] a vignette concerning how to do this. It's a little sparse, but see if you can figure out how to add two moths images to the tree: one to represent Stiriini, and one to represent Amphipyrinae. Use the commands below to download images to put on the tree. The yellow one is for Stiriini:
 +
 
 +
curl -LO http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/moth1.png
 +
curl -LO http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/moth4.png
 +
 
 +
==Running ggtree on your Computer==
 +
 
 +
You will need to install the following packages:
 +
 
 +
BiocInstaller
 +
Biostrings
 +
ape
 +
ggplot2
 +
ggtree
 +
phytools
 +
ggrepel
 +
stringr
 +
stringi
 +
abind
 +
treeio
 +
 
 +
The package <tt>BiocInstaller</tt> is special. You can think of it as a ''meta''-package, as it is used to handle the [https://www.bioconductor.org/install/#why-biocLite installation and interoperability] of a suite of closely related open-source bioinformatics packages.
 +
 
 +
Install BiocInstaller like so:
 +
 +
source("https://bioconductor.org/biocLite.R")
 +
biocLite()
 +
 
 +
If the above code fails to install BiocInstaller, check out there [https://bioconductor.org/packages/release/bioc/html/BiocInstaller.html website] to see if they have updated instructions on how to install the package. You can, and probably should, install BioConductor packages using BiocInstaller, and not through the regular <tt>install.packages("package_name")</tt> method. To install packages via BioConductor:
 +
 +
source("https://bioconductor.org/biocLite.R")
 +
biocLite("ape")
 +
Alternatively:
 +
 
 +
install.packages("ape")
 +
 
 +
Or install multiple packages like so:
 +
 
 +
install.packages(c("ape", "Biostrings"))
 +
 
 +
Now load all of the above packages like so:
 +
 
 +
library("ape")
 +
 
 +
== Getting Help ==
 +
 
 +
The [https://groups.google.com/forum/#!forum/bioc-ggtree Google Group] for ggtree is fairly active. The lead author of <tt>ggtree</tt> chimes in regularly to answer people's questions -- just be sure you've read the documentation first!
 +
 
 +
Speaking of documentation there is the [https://www.bioconductor.org/packages/release/bioc/manuals/ggtree/man/ggtree.pdf <tt>ggtree</tt> manual], and lots of [http://www.bioconductor.org/packages/3.7/bioc/vignettes/ggtree/inst/doc/ggtree.html vignettes] concerning how to do particular things in <tt>ggtree</tt>.
  
 
== References ==
 
== References ==
 +
 +
Keegan, K, Lafontaine, JD, Wahlberg, N, Wagner, D (in review) "Towards Resolving Amphipyrinae (Lepidoptera, Noctuoidea, Noctuidae): a Massively Polyphyletic Taxon"." Systematic Entomology https://doi.org/10.1101/271478
  
 
Yu G, Smith D, Zhu H, Guan Y and Lam TT (2017). “ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data.” Methods in Ecology and Evolution, 8, pp. 28-36. doi: 10.1111/2041-210X.12628, http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12628/abstract.
 
Yu G, Smith D, Zhu H, Guan Y and Lam TT (2017). “ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data.” Methods in Ecology and Evolution, 8, pp. 28-36. doi: 10.1111/2041-210X.12628, http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12628/abstract.

Latest revision as of 20:38, 21 February 2019

Adiantum.png EEB 5349: Phylogenetics

by Kevin Keegan

Goals

To introduce you to the R package ggtree for plotting phylogenetic trees.


Getting Started

This tutorial is written for the cluster user in mind, but feel free to perform it with your own local version of R (>=3.4). There are instructions at the end of this tutorial on how to get your local version of R set-up for this exercise.

Get Situated on the Cluster

Log onto the cluster like normal but with an added flag to allow for any graphics to be displayed on your computer.

 ssh username@bbcsrv3.biotech.uconn.edu -Y

Be sure to get off the head node to avoid litigation and subsequent incarceration:

 qlogin

Navigate to the folder you want to be working in for the R portion of the lab and download the tree file we'll be working with:

curl -OL http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/moths.txt

For more information on the curl command and what options you can use with it consult Wikipedia

Start R and Load Packages

See what versions of R are available:

module avail

Load R version 3.4.4

module load R/3.4.4

Start R

R

You'll need to load the following packages:

BiocInstaller
Biostrings
ape
ggplot2
ggtree
phytools
ggrepel
stringr
stringi
abind
treeio

You can load packages like so:

library("BiocInstaller")

To make it easier, you can load the easypackages library,

 library("easypackages")

and then load all the libraries at once with this command:

libraries("BiocInstaller","Biostrings","ape","ggplot2","ggtree","phytools","ggrepel","stringr","stringi","abind","treeio")

Read in the Tree File

We're dealing with a tree in the Newick file format which the function read.newick from the package treeio can handle:

tree <- read.newick("moths.txt")

R can handle more than just Newick formatted tree files. To see what other file formats from the various phylogenetic software that R can handle checkout treeio. The functionality within treeio used to be part of the ggtree package itself, but the authors recently split ggtree in two with one part (ggtree) handling mostly plotting, and the other other part (treeio) handling mostly file input/output operations.

Let's quickly plot the tree to see what it looks like using the plot function from the ape package:

plot(tree)

Notice the tree has all of its tips labeled. It's also a little cramped. You can expand the plot window to try to get the tree to display more legibly. We'll eventually use the function ggsave to control the dimensions of the plot when we finally export it to a PDF file. Don't worry about getting it to display well at the moment.

Now plot the tree using the ggtree package:

ggtree(tree)

What happened to our tree!? The plot function from the ape package plotted the tree with tip labels, but ggtree plotted just the bare bones of the tree. ggtree by default plots almost nothing, assuming you will add what you want to your tree plot. The grammar/logic of ggtree is meant to model that of ggplot2 and not the R language in general. The syntax of ggtree/ggplot2 makes them easily extendable and particularly useful for graphics, but is by no means intuitive to someone used to R and plotting trees using ape.

Adding/Altering Tree Elements with Geoms and Geom-Like Functions

ggtree has a variety of functions available to you that allow you to add different elements to a tree. Many of them have the prefix "geoms" and are collectively referred to as geoms. We'll only go over some of them. You start with a bare bones tree and elements to the tree, function by function, until you get the tree looking like you want it to. You'll see as we progress through this tutorial that visualizing trees in ggtree is a truly additive process.

Tip Labels

OK this tree would be more useful with tiplabels. Let's add them using geom_tiplab:

ggtree(tree) + geom_tiplab()

This tree is a little crowded. You can expand the graphics window vertically to get it all to fit, but it might be better to do a circular tree:

ggtree(tree, layout="circular")

OK that's a bit easier to work with. Those tip labels are nice but a little big. geom_tiplab has a bunch of arguments that you can play around with, including one for the text size. You can read more about the available arguments for a given function in the ggtree manual. Plot the tree again but with smaller labels:

ggtree(tree, layout="circular") + geom_tiplab2(size=3.5)

Notice we are using geom_tiplab2 and not geom_tiplab to show labels on the circular tree. Don't ask me why there are two different tip label geoms for different tree layouts :)

The tree is still a little crowded, but at this point just play around with the size of the graphics window so you can work with it. We'll finalize how the tree looks later on using the ggsave function.

Clade Colors

In order to label clades, we need to tell ggtree which nodes subtend each clade we want to label. Just like with the plot function in ape, you can plot a tree with node numbers, see which nodes subtend the clade of interest and then tell ggtree the nodes that define the clades you want to label. Another way to get your node of interest is to use the findMRCA function (find most recent common ancestor) from the phytools package. We will pass the function two tip labels as arguments that define each clade of interest. In their study, Keegan et al (in review) found the Amphipyrinae (as currently classified taxonomically) is polyphyletic -- astoundingly polyphyletic. Let's color two clades: one for what they found to be true Amphipyrinae, and one for a tribe (Stiriini) currently classified taxonomically in Amphipyrinae, that they show to be far removed phylogenetically and thus has no business being classified within Amphipyrinae.

amphipyrinae_clade <- findMRCA(tree, c("*Redingtonia_alba_KLKDNA0031","MM01162_Amphipyra_perflua"))
stiriini_clade <- findMRCA(tree, c("*Chrysoecia_scira_KLKDNA0002","*Annaphila_diva_KLKDNA0180"))

You can't (as far as I know) tell ggtree directly, as in ape, that the lineages descending from a given node should all be a certain color. What we need to do is define a group that consists of the clades we want colored, and to tell ggtree that it should color the tree by according to the group.

tree <- groupClade(tree, node=c(amphipyrinae_clade, stiriini_clade), group_name = "group")

In the above line of code, we apply the groupClade function to the object tree. We are not overwriting tree and making it consist of only the Amphipyrinae and Stiriini clades, just defining clades within <t>tree</tt>. Now if you were to execute ggtree(tree, layout="circular") + geom_tiplab2(size=3.5) will still look the same. We need to amend the command to tell it to style the tree by the grouping of clades we just made called "group":

ggtree(tree, layout="circular",aes(color=group, linetype="solid")) +
geom_tiplab2(size=3.5)

As you can see the tree gets colored according to some default color scheme. We can define our own color scheme. Let's call it "palette":

palette <- c("#000000", "#009E73","#e5bc06")

The values in palette are color values represented by a hexadecimal value. You can Google one of these hexadecimal values and a little interactive hexadecimal color picker will pop up. Feel free to pick two colors of your choosing to use in the palette -- but leave #000000 as it is. When you're designing a figure for publication, be sure to consider how easily your colors can be distinguished from each other by colorblind folks.

Now let's amend the ggtree command and tell it to use the colors we defined:

ggtree(tree, layout="circular",aes(color=group, linetype="solid")) + 
geom_tiplab2(size=3.5) + 
scale_colour_manual(values = palette)

The order in which clades are colored is determined by the order of clades in the groupClade command. Every lineage in the tree not within a defined clade (i.e. within stiriini_clade or amphipyrinae_clade) is automatically colored according to the first palette value. The first defined clade (stiriini_clade) is colored according to the second palette value, and the second defined clade (amphipyrinae_clade) is colored according to the third palette value.

Clade Labels

Let's add some labels to the two clades. It's relatively straightforward now that we've already defined the subtending nodes:

ggtree(tree, layout="circular",aes(color=group, linetype="solid")) + 
geom_tiplab2(size=3.5) + 
scale_colour_manual(values = palette) +
geom_cladelabel(node=amphipyrinae_clade, label="Amphipyrinae") +
geom_cladelabel(node=stiriini_clade, label="Stiriini")

OK we should move those labels so they're not directly over the tree:

ggtree(tree, layout="circular",aes(color=group, linetype="solid")) + 
geom_tiplab2(size=3.5) + 
scale_colour_manual(values = palette) +
geom_cladelabel(node=amphipyrinae_clade, label="Amphipyrinae", fontsize=6, offset=1.9, align=TRUE) +
geom_cladelabel(node=stiriini_clade, label="Stiriini",fontsize=6, offset=2.1, align=TRUE)

It would also look better if we could move the text "Stiriini" a little so it's not directly over the line. Use the offset.text to move the text relative to the line.

You might have noticed that adding labels caused the rest of the tree to squish together. ggtree will try to fit everything into whatever size graphics window you have open. Try playing around with expanding and contracting the graphics window to see this functionality in action. Don't worry about getting everything to display perfectly in the graphics window, because we will use the function ggsave to create a PDF -- with definable dimensions -- to control how big the plot is, and thus how the tree looks with its many elements. You may wish to go back and change some of the tree elements after seeing your figure in PDF form.

Node Labels

Let's add some node labels. You can add labels that show the number of the node, but what you would probably like to do is show nodal support values (e.g. bootstraps) which are stored as node labels. We can display the node labels using geom_label.

ggtree(tree, layout="circular",aes(color=group, linetype="solid")) + 
geom_tiplab2(size=3.5) + 
scale_colour_manual(values = palette) +
geom_cladelabel(node=amphipyrinae_clade, label="Amphipyrinae", fontsize=6, offset=1.9, align=TRUE) +
geom_cladelabel(node=stiriini_clade, label="Stiriini",fontsize=6, offset=2.1, align=TRUE) +
geom_label(aes(label=label))

You should see A LOT of node labels appear. They get redrawn when you change the size of the graphics window which is quite mesmerizing to watch. Let's subset the node labels in order to just show the ones we want and reduce some of the clutter. We'll first create a dataframe from the data within tree:

q <- ggtree(tree)
d <- q$data

First let's select only internal nodes (we don't need to show the leaf node labels, as we've already done that with geomtiplab2):

d <- d[!d$isTip,]

Now lets get rid of the root node:

d <- d[!d$node=="Root",]

And finally get rid of any node labels less than 75:

subset_labels <- d[as.double(d$label) > 75,]

Note that the object tree still has all of its labels. All we did was make a "copy" of tree called q, and then we created a subset of the data in q called d. Before, when we plotted the tree with node labels, we didn't specify which ones to label -- so ggtree labeled all of them. Now alter your geom_label, using the data argument available to geom_label display the dataset you just created consisting of a subset of node labels. Right now the only argument available to geom_label that we are using is the aes argument. Look in the ggtree manual for an argument that allows you to specify the data passed to geom_label.

Scale Bar and Title

Try adding a scale bar using the scale bar geom. I've added in some of the available arguments:

geom_treescale(x=2,y=1,fontsize=5,linesize=1,offset=0.5)

Add a title using ggtitle. Use it just like you would a geom:

ggtitle("This is a Title")

Export Plot to PDF

ggsave cannot plot phylo objects (like tree) directly like ape can. You must first apply your ggtree function to your phylo object, and assign the result to a new variable. Let's call that variable tree_save:

tree_save <- ggtree(tree, layout="circular",aes(color=group, linetype="solid")) + 
geom_tiplab2(size=3.5) + 
scale_colour_manual(values = palette) +
geom_cladelabel(node=amphipyrinae_clade, label="Amphipyrinae", fontsize=6, offset=1.9, align=TRUE) +
geom_cladelabel(node=stiriini_clade, label="Stiriini",fontsize=6, offset=2.1, align=TRUE)

Now you can export tree_save to a PDF

ggsave(tree_save,file="moth_tree.pdf", width=30, height=30)

If the layout of your tree just isn't quite what you wanted, go back and play around with the geoms and geom-like functions until the PDF is to your liking.

Cite ggtree

Remember to cite ggtree if you use it in a published work!

citation("ggtree")

Challenge!

ggtree allows you to add images to a tree. Here's a vignette concerning how to do this. It's a little sparse, but see if you can figure out how to add two moths images to the tree: one to represent Stiriini, and one to represent Amphipyrinae. Use the commands below to download images to put on the tree. The yellow one is for Stiriini:

curl -LO http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/moth1.png
curl -LO http://hydrodictyon.eeb.uconn.edu/people/plewis/courses/phylogenetics/labs/moth4.png

Running ggtree on your Computer

You will need to install the following packages:

BiocInstaller
Biostrings
ape
ggplot2
ggtree
phytools
ggrepel
stringr
stringi
abind
treeio

The package BiocInstaller is special. You can think of it as a meta-package, as it is used to handle the installation and interoperability of a suite of closely related open-source bioinformatics packages.

Install BiocInstaller like so:

source("https://bioconductor.org/biocLite.R")
biocLite()

If the above code fails to install BiocInstaller, check out there website to see if they have updated instructions on how to install the package. You can, and probably should, install BioConductor packages using BiocInstaller, and not through the regular install.packages("package_name") method. To install packages via BioConductor:

source("https://bioconductor.org/biocLite.R")
biocLite("ape")

Alternatively:

install.packages("ape")

Or install multiple packages like so:

install.packages(c("ape", "Biostrings"))

Now load all of the above packages like so:

library("ape")

Getting Help

The Google Group for ggtree is fairly active. The lead author of ggtree chimes in regularly to answer people's questions -- just be sure you've read the documentation first!

Speaking of documentation there is the ggtree manual, and lots of vignettes concerning how to do particular things in ggtree.

References

Keegan, K, Lafontaine, JD, Wahlberg, N, Wagner, D (in review) "Towards Resolving Amphipyrinae (Lepidoptera, Noctuoidea, Noctuidae): a Massively Polyphyletic Taxon"." Systematic Entomology https://doi.org/10.1101/271478

Yu G, Smith D, Zhu H, Guan Y and Lam TT (2017). “ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data.” Methods in Ecology and Evolution, 8, pp. 28-36. doi: 10.1111/2041-210X.12628, http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12628/abstract.