Phylogenetic Analysis Assignment:

Goals: get experience in phylogenetic analysis by conducting a cladistic analysis; learn to identify convergent evolution and homology and to trace the evolutionary history of characteristics given a phylogeny; write part of a lab report in standard scientific paper format.

Background: The evolutionary relationships among different species are called phylogenetic relationships.  Phylogenetic relationships reflect the fact that as speciation has occurred during the history of life, some species were derived from a common ancestral species recently, others shared a common ancestry only in the distant past.  "Closely related" in an evolutionary, phylogenetic sense means that species were derived recently from a common ancestor -- not long ago, the ancestors of these different species belonged to the same species.

Systematics is the study of the phylogenetic (evolutionary) relationships among organisms.  Systematics is fundamental to evolutionary biology, since it is through understanding the evolutionary relationships among organisms that we can observe evolutionary history, and, by observing the history, make inferences about processes of evolution. Systematics also provides information for classifying organisms into the standard levels of the Linnean hierarchy (Kingdom, Phylum, Class, Order, Family, Genus, species.)  Most groups of organism are still not adequately studied systematically and may be incorrectly classified.

Systematics is generally based on the hierarchical pattern of homology: characteristics are shared among species suggesting that they evolved from an ancestor with those characteristics, and some species share many characteristics in common suggesting that they came recently from an ancestor with those characteristics.  The specific way that this information is used in most modern phylogenetic studies is through a method called cladistics.  The basic idea behind cladistics is that the similarities (homology) among species that will provide evidence of relationship within a group are those that have been derived within that group.  Similarities that were primitive to the group -- that is, presumably present in the ancestor to the group -- do not provide evidence about relationships within the group.  So studies of systematics should be based on derived character states.  Cladistics is the study of systematics based on derived character states.

To understand what primitive and derived character states mean, consider an example.  Suppose you want to know the evolutionary relationships among four game birds: a grouse, a ptarmigan, a quail, and a pheasant.  You observe that the grouse and the ptarmigan have feathers on their legs, while quail and pheasants have scales on their legs.  Thus, there are similarities among some species suggesting common ancestry.  However, scales on legs occur in all the birds related to these birds -- ducks, songbirds, etc. -- birds that other systematic studies have shown to be more distantly related to these four game birds than they are to each other.  This suggests that probably a distant ancestor to the game birds and all these other birds had legs with scales.  So having scaled legs cannot tell us anything about whether a quail and a pheasant are more closely related to each other than either is to a grouse or a ptarmigan.  The trait that is different, and presumably derived within the group, is having feathers on the legs.  This trait provides evidence that grouse and ptarmigan are closer evolutionary relatives than either is to a quail or a pheasant.

 This example illustrates another aspect of most cladistic analyses. To determine whether characters are primitive or derived, we can compare species in the group we are studying to another, more distantly related, species or group of species.  Traits that are found in the group and also in the distantly related species are probably primitive, and are not useful for phylogenetic analysis.  Traits that are found only within the group are probably derived, and ARE useful for phylogenetic analysis.  This method of determining which characters are primitive and which are derived is called outgroup comparison.  The species within the group you are interested in studying make up what is called the ingroup.  The more distantly related species you look at to identify characters that are derived is called the outgroup.

While derived characteristics that are shared between species provide evidence that those species are closely related, they do not PROVE relationship.  It is possible for two species to independently evolve the same characteristics.  For example, birds and bats both have wings, but they were evolved independently -- the bats are more closely related to other mammals than they are to birds, birds are more closely related to alligators, lizards, and snakes than they are to mammals.  Independent evolution of the same characteristics in different species is called convergent evolution.  Convergent evolution does NOT provide evidence of phylogenetic relationship.  Traits that provide evidence of phylogenetic relationship are those that represent homology -- similarity due to common ancestry.

How do we determine whether the traits we are studying are similar because of homology or because of convergence?  In general, it is more likely that traits are similar because of common ancestry.  This is true because phylogenetic history is simply an extension of genetic history, going back very far in time.  Looking at genetics, we know that if two individuals have the same trait it is probably because an ancestor of theirs, at some point in time, had that trait.  The other possibility would be that the same mutation had occurred in both individuals.  Since mutation is rare, this is less likely.  For two species to develop the same traits independently, there must have been independent mutations leading to those traits in the two species. Further, both must have evolved to be the common trait of the species independently. This is possible, and has happened, but it is more likely that two species have the same traits because they were both derived from an ancestor with those traits.  As a result, if we look at many different traits, we should find that most reflect phylogeny, while a few may not.  To apply this principle to study systematics, we study many different traits of the species whose phylogeny we want to determine, and we accept the phylogeny supported by most characters, and assuming the smallest amount of convergent evolution, as the most likely phylogeny of the group.

We also need to evaluate our study by determining how well the different characteristics we've studied agree with each other about which possible phylogenetic tree is the best supported.  Systematists calculate a variety of different statistics about the tree. One easy to calculate and widely used statistic is the consistency index (CI).  The consistency index measures the amount of homology versus convergent evolution in a tree, given the characters used to study the tree. It is a proportion (so it ranges from 0 to 1), and the higher the consistency index, the lower the amount of convergent evolution, and the better the data support the tree.  It is calculated as:

 CI = minimum possible number of character state changes / actual number of character state changes

Since each derived trait must evolve at least once on this tree, the minimum possible number of character state changes is the total number of derived character states.  The actual number of character state changes is determined by looking at where on a tree these characters have apparently evolved to give modern species with the traits they have.

    = total number of derived character states for all characters / actual number of character state changes

The actual number of character state changes for a tree is called the treelength.  The smaller the treelength, the closer the CI is to 1, and the better supported the hypothesis of phylogeny represented by the tree is.  As a rule of thumb, CI values of under 0.6 indicate relatively poor support for the phylogeny; CI values between 0.6 and 0.8 indicate reasonably good support, and CI values of above 0.8 indicate very good support.  Systematics thus works like other areas of science in that any possible way that species could be related to each other is a hypothesis.  From that hypothesis we would predict that related species share derived traits.  The hypothesis of phylogeny that is best supported is the one in which most of the traits we have studied reflect common ancestry, rather than independent evolution.  As with other areas of science, we can never prove our hypothesis to be true, we can only support or refute it with additional evidence.  As a result, a phylogeny that is developed in a systematic study is often called a "hypothesis of phylogeny" or a "hypothesis of relationship."

Rules for the assignment: you will present your study in a brief (1-2 written pages plus one page with a figure and a table) statement written in the format of the results section of a scientific report.  You are encouraged to work in groups to identify characters and develop the phylogeny; it may be difficult for a single person to identify all the necessary characters.   The figure and table you use will be the same for all group members and must be checked, by the date given in the lecture syllabus, before you write your report.  You must write the written part of each paper individually.  Your report MUST NOT BE LONGER THAN 2 typed, double spaced pages when typed with at least 1" margins and a font size no smaller than 12 point.  You must e-mail the written part of the assignment to me (remember, be sure to keep a copy!)

You will present a figure and a table; these MUST be presented on the sheet in the lab manual.  The figure will show your best supported phylogeny and present the treelength and CI.

Methods: In this laboratory exercise, you will use an assigned group of organisms to learn to identify derived and primitive traits using outgroup comparison.  You will use the derived traits to develop an hypothesis of phylogeny, and will identify which character states that you have identified apparently show homology, and support your tree, and which show convergent evolution, and do not.

Characters and character states:  Systematists use some terminology to describe the primitive and derived characteristics used in a phylogeny.  A trait such as eye color or number of incisor teeth is called a character.  Each of these characters can come in several character states.  For example, the character of eye color could have the states blue, brown, red, and yellow within some group of species.  The character incisor number could have character states two, four, or six.

Phylogenetically informative character states: Evidence for phylogeny comes from derived character states that are shared among species.  To provide information about relationship, a derived character state must occur in at least two of the species being studied; if it is in only 1, it does not show how that one species is related to any of the others so it does not provide evidence for relationships.  Further, a derived state must not occur in all of the ingroup species.  If it does, it only shows that all of the ingroup species came from one ancestor, but not how different species within the ingroup are related.  To be phylogenetically informative, a character state must occur in at least two but not all of the ingroup species.

Identifying primitive and derived character states in toucans:  You will use pictures of toucans available on the web to identify primitive and derived character states to study the phylogeny of a group of these toucans.  To determine primitive versus derived character states, you will compare the colors or patterns of different parts of the bodies of the toucans to an outgroup species, a kind of bird called a barbet.  Several phylogenetic studies have shown that barbets are close relatives of toucans, but that all toucans are more related to each other than they are to the barbets, so the barbets are appropriate outgroups for the toucans.

Different toucans will be assigned for this analysis for different semesters. A link to the list of toucans required for the current semester is given at the bottom of this page.

A link to the page of toucan pictures is also given at the bottom of this page.

The first step of your phylogenetic study is to identify ten phylogenetically informative characters in an assigned group of five toucans and one barbet to use as an outgroup.  Systematists describe primitive and derived states in various ways. For this study, you need to use a simple method in which you identify just one derived state for each character. Species without this derived state have the primitive state.  This means you need to be careful about how you decide on your characters.  Deciding on "head color" as a character might cause problems if you find some species have yellow heads, some purple heads, and some green heads: there would be more than one derived state.  You could, however, decide on "presence versus absence of a purple head" as your character.  If the outgroup does not have a purple head, then presence of a purple head would be the derived state; any toucan without a purple head would have the primitive state.  Once you have found ten phylogenetically informative characters in the toucans, enter this information in the table provided in your lab manual, as follows:

The next step is to determine the phylogeny of toucans best supported by these characteristics.  To do this, you will enter the data you have collected on toucans into a program on a web page.  This program is designed to help you find the best supported hypothesis or hypotheses of phylogeny for your data by allowing you to draw different trees easily, calculating a treelength for you, and allowing you to trace the evolution of different characters on your tree.  To prepare your data for entry into this program, you must: outgroup,     0,0,0,0,0,0,0,0,0,0
rat,          1,0,1,1,0,0,1,1,0,0
cat,          0,1,0,1,0,1,0,0,1,1
mouse,        1,0,1,0,0,0,1,0,0,0
bear,         0,1,0,0,1,1,0,1,1,1
dog,          0,1,0,0,1,1,0,1,0,1
  Once you have your data in a word processing program in the format shown, you can copy it into the phylogeny drawing program (there is a link to that program at the bottom of this page.)  At that point, you will use the phylogeny drawing program as described below to try to find the most likely phylogeny based on your data.

Using the phylogeny drawing program:
 

  1. Type or copy data into the data input window at the bottom of the page.  Note that this window always starts out with a default made-up dataset for a group of species to show you the format for entering data (the made-up dataset is the one that is given above to show you this format.)  You can select this and delete it or copy or type something on top of it.  If you'd like to play with the program to see how it works before entering your toucan data, you can use this dataset.
  2. Once your data are entered in the data input window, click the "Read Extant Traits" button above the data input window.  The program should then draw a phylogenetic tree in the space above the data input window.  This tree is drawn in the order in which the species are entered in your data table.  Some of the branches will be shown in different colors (black, gray, and/or yellow) because the program automatically traces the evolution of your first character when it makes the initial tree.  This will be described more below.
  3. Note to the right of the tree that the treelength is calculated.  The box entitled "Tree Length" gives the treelength for the tree that is drawn based on ALL of the characters you entered.  The box entitled "Chosen Trait Tree Length" shows the number of times the character currently being traced evolves on your tree.  The character that is currently chosen is indicated in the box below the treelength boxes.
  4. Your goal is to find the tree with the smallest possible treelength for your data.  To do this, you can move branches on the tree.  When you move the mouse over a branch, it turns red.  If you click on that branch, it turns blue.  If you then click on another branch, the first branch you selected is moved to the point where you clicked.  There are two exceptions to this:
  5. You should use the characters, as traced on the tree, to help you find the best tree.  First, note how the character tracing works.  When you select a character to be traced, the tree shows which modern and ancestral species apparently had the primitive state and which had the derived state.  Primitive states are indicated in gray, derived in black.  Sometimes there is more than one equally likely way for a trait to have evolved; if this is the case some ancestral species might have had either the primitive or derived state.  Situations such as this are shown in yellow.
  6. Now that you know how the character tracing works, consider how to use it to help you find the best tree.  You should go through each character and try drawing the tree or trees supported by that character.  To do this, remember that sharing derived character states provides evidence that species are close relatives, so you should try drawing trees that group the species that share the derived character state for that character together.  Watch the treelength: if it decreases when you group species that share the derived state together, then you have progressed toward a shorter (more likely) tree.  If the treelength does not decrease, or increases, then the move you made did not help you find a better tree; the best thing to do in that case is to move the branch back where it was.  If you go through each of your characters and try moving the branches to make the tree supported by the derived state for that character, you should find as well supported a tree as is possible for your data. Once you have found the best supported tree, draw it on the page you will hand in, in your lab manual.
  7. Once you have the best tree for you data, look through the characters as they are traced on the tree again, and identify which show only homology (evolve just once and fully support the tree) and which show convergent evolution.  For your paper, you will be required to have this information.  You will also need to know the treelength of your shortest tree, and to calculate the consistency index, which is done by dividing the number of derived states for all your characters by the treelength.  Enter the treelength of the shortest tree and the consistency index in the spaces provided on the page you will hand in, in your lab manual.
The required write-up: The written part of this assignment is to hand in the "Results" section of a scientific report.  See the preface to the lab manual to review the sections in scientific reports; read the part on the Results section carefully.   Your Results section must be TYPED, DOUBLE SPACED, and written in the style appropriate for scientific papers, as described in the preface to this manual.  Points will be deducted for incorrect grammar, spelling errors, and inappropriate style.

Here are the specific points that you must address in your Results section:

  1. Describe in words what the tree shows about the relationships among species.  Include all the relationships that you have discovered.  To do this, you should write a statement based on each one of the ancestral branches of the tree.
  2. Name all character states that fully support the tree.   A character fully supports a tree if it evolves just once (does not show any homoplasy -- no convergent evolution and no reversal) in the ancestor to more than one of the species, and then is retained in all descendent species.  Such characters show only homology, no homoplasy. For each character that fully supports a tree, state the specific relationship among species that it supports (which species does it indicate to be more related to each other than to other species in the tree.) Tracing each character in the phylogeny drawing program should help you find this information.
  3. Name all character states that do NOT fully support the tree (show homoplasy.)  Note that at least one character state must show homoplasy; if you find none of yours do, then you must replace one of your characters with a character that has a state that does show homoplasy.   For each character state that does NOT fully support the tree, indicate the possible species/ancestors in the tree in which this character most likely evolved.  Note that such a state may have evolved within one or more modern species and/or within one or more ancestral species.  The "Tree Length for Chosen Trait" window in the phylogeny drawing program will show the number of times the trait you have chosen has evolved; if this number is at least two, then the character state shows convergent evolution.  The phylogeny drawing program also shows (by coloring the branches black) where the derived state for each character evolved, so you can use it to identify the species (modern species or ancestors to modern species) in which characters evolved.  If the phylogeny drawing program shows some ancestral branches in yellow, it means that there is more than one equally likely way for the character to have evolved; in such cases, you need to describe just one of the possible ways it could have evolved.  In your description, indicate which form of homoplasy -- convergent evolution or reversal -- is occurring.
Hand in your written results section AND the figure/table page from the lab manual (note that even though I have checked it once you MUST HAND IT IN AGAIN with the written results; otherwise I can't grade the written results, since they depend on the characters and tree you found!) by the due date given on the lecture syllabus.

See the list of required species for this semester's toucan phylogeny project by clicking here!

See the toucan pictures by clicking here!

Access the phylogeny drawing program by clicking here!