Saturday, December 13, 2008

phylogenetic tree

A phylogenetic tree is a graphical representation of the evolutionary relationship between taxonomic groups. The term phylogeny refers to the evolution or historical development of a plant or animal species, or even a human tribe or similar group. Taxonomy is the system of classifying plants and animals by grouping them into categories according to their similarities. A phylogenetic tree is a specific type of cladogram where the branch lengths are proportional to the predicted or hypothetical evolutionary time between organisms or sequences.

Cladograms are branched diagrams, similar in appearance to family trees, that illustrate patterns of relatedness where the branch lengths are not necessarily proportional to the evolutionary time between related organisms or sequences. Bioinformaticians produce cladograms representing relationships between sequences, either DNA sequences or amino acid sequences. However, cladograms can rely on many types of data to show the relatedness of species. In addition to sequence homology information, comparative embryology, fossil records and comparative anatomy are all examples of the types of data used to classify species into phylogenic taxa. So, it is important to understand that the cladograms generated by bioinformatics tools are primarily based on sequence data alone. Given that, it is also true that sequence relatedness can be very powerful as a predictor of the relatedness of species.

Cladograms cannot be considered completely true and accurate descriptions of the evolutionary history of organisms, because in any cladogram there are a number of possible evolutionary pathways that could produce the pattern of relatedness illustrated in the cladogram. The cladogram only illustrates the probability that two organisms, or sequences, are more closely related to each other than to a third organism, it does not necessarily clarify the pathway that created the existing relationships. However, the cladogram can be used in the formulation of new hypotheses and to cast new light on existing data. One of the most ambitious cladograms produced to date can be viewed at the tree of life website, originated by david and wason at the University of Arizona
(1) . Please take a moment to view the "Root of the Tree" link on the Tree of Life web site. In this phylogenetic tree, the root is at the far left, termed the root of the cladogram because it is at the base of the cladogram, opposite the branches. Return to the home page and click on the link entitled "Popular Pages", then select "Mammals". At the right side of this cladogram are the terminal nodes, located at the tip of the branches in any cladogram.In the Mammalia cladogram illustrated here, there are six terminal nodes, labeled Triconodonts, Monotremata, Multituberculata, Marsupialia, Palaeoryctoids, and Eutheria.An internal node is a hypothetical common ancestor. The branching points between the root and the terminal nodes are internal nodes. Each internal node is also at the base of a clade, which includes the common ancestral node plus all its descendents. Sample a few more links on the Tree of Life. Be sure to read Darwin's quote on the home page and ponder how difficult it would be to get published in a scientific journal today, if it were necessary to write this beautifully in order to succeed.The Tree of Life is an example of a cladogram illustrating the relationships between taxa, based on the collective evidence from many different fields of biology and bioscience. In contrast, the subject of this tutorial is the construction of cladograms through bioinformatics tools, where the cladograms are based on sequence data. First, use the billogy workbench
(2) 2) to build a simple unrooted cladogram. The Workbench will require a password (it's free), but it will grant entrance immediately upon registration of a password. Enter the site, and scroll down the page until the five menu buttons are visible.The "Session Tools" button allows the naming of a session, so that different jobs in progress can be saved under distinct sessions. Select "Session Tools", then select "Start New Session" and click on "Run" to change the name of "Default Session" to a new name. Once the workbench has been exited, the session will remain. Subsequently, clicking on the dot to the left of the session name under the "Session Tools" menu, and then selecting "Resume Session", will recall the session. The Workbench policy at the time of this writing is that old jobs are deleted only when an account has not been accessed for 6 months.
Next, select "Protein Tools" from the menu buttons, highlight "Ndjinn Multiple Database Search", and click "Run". In the query box to the right of the term "Contains", type HSP70, for the molecular chaperone, heat shock protein 70 kDa. Scroll down the database list and check the box to the left of the database entitled "PDBFINDER" before hitting the "Search" button. Among the results, find 2BUP, chaperone, and check the box to the left. Then select the menu button entitled "Import sequence(s)". This will import the sequence in fastA format into the open session. Now, under the box of session options, there should be a listing for the 2BUP sequence, with a small box to the left. Notice that the main menu under "Protein Tools" allows more options such as "Delete Protein Sequence", "Copy Protein Sequence" and "Add New Protein Sequences".
For now, select the "Ndjinn Multiple Database Search" again. Search the PDBFINDER Database again by scrolling down the page and selecting it, but this time, just search using the PDB ID codes 1HKB, 1ATN and 1DKG for hexokinase, actin and the molecular chaperone DnaK (use the OR operator between each PDB ID code to search for all three in the same search). Import all three sequences simultaneously by checking the box to the left of the PDB ID codes used in the query and clicking on "Import sequence(s)". 1DKG will return three chains, A, B and D. Only chain D is the molecular chaperone, chains A and B are nucleotide exchange factors that co-crystallized with DnaK. Delete chains A and B by checking the box to the left of 1DKG_A and 1DKG_B, highlighting "Delete Protein Sequence", and clicking on "Run". Actin (1ATN) returns two chains, but chain A is the actin, chain D should be deleted in the same manner.
Hexokinase (also called phosphotransferase) will return two chains as well. They are both hexokinase, but two identical sequences are not desirable in the cladogram, so delete chain B. Four sequences should remain, 1DKG_D, 1ATN_A, 1HKB_A, and 2BUP_A; check the boxes to the left of each of these. Scroll down the protein tools menu and highlight "CLUSTALW - Multiple Sequence Alignment", then click "Run". The default parameters will be sufficient for our purposes, just select "Submit". When the sequence alignment is returned, scroll down the page and view the multiple alignment. The Workbench automatically returns an unrooted tree with the alignment.

No comments: