GenBank and BLAST
A Beginning Beginner's Guide
These directions will take you FROM finding a sequence of either a nucleic acid or a protein TO showing how many other creatures have similar sequences and also TO a most-probable evolutionary tree of that sequence. So let's get started!
- Finding the sequence in the published literature.
Use a search engine such as Google® or Yahoo!® to look up the key words which should take you to where the sequence has been published. For example, if you are looking for the operator region of the lac-operon in E. coli, try these words: "lac-operon operator DNA sequence E. coli". Probably a number of articles will come up. Look at their titles. Remember that titles should give you the results. Find several which look promising and then skim through them - and I mean 'skim' - for a published sequence will be a long string (perhaps line after line, and even page after page!) of A's, G's, T's and C's. Such strings will jump out at you! Copy them to some file on your computer's memory (you don't want to sit there typing them all in - you will probably help 'evolution'). Sometimes, the article will start the sequence with a line of computer code starting with a ">" sign. Copy that code also (it might be all that is needed).
- Go the GenBank (note not GeneBank):
http://www.ncbi.nlm.nih.gov/GenBank/GenbankSearch.html
You don't need to carry this with you, as you can merely Google or Yahoo! "GenBank" and it will come up as one of the top offerings. Look for the one ending in "...Search.html".
- Find and click on BLAST Sequence Similarity Searching
- Under "Basic BLAST" go to either
- Nucleotide blast or
- Protein blast
- Copy your sequence from your file and paste it into the box titled:
"Enter Query Sequence".
If you also have one of those ">statements" put that as the top line in this box. It helps the computer back at NIH to find your sequence out of the hundreds of millions it has in its memory.- Give it a title - perhaps your last name and a trial number
- Make the Data Base = non-redundant (nr)
- Choose your Algorithm = blastp (for proteins) or blastn for somewhat similar nucleotides
- Now comes the big moment!
Hit the button called BLAST
- Several screens go by - and this might take several minutes during weekdays as thousands of scientists are accessing BLAST and you are waiting in line to be served.
- Finally one comes up with a lot of horizontal colored lines. The top line is your protein or gene, and the lines under it are those of other organisms possessing something similar. You will probably notice how many other creatures share a lot of the characteristics of your sequence.
- Below the colored lines will be page upon page of comparisons between your sequence and each of those creatures taken one at a time - the full sequences with changes noted.
- DO NOT PRINT THIS WEB-PAGE OUT because it can easily be hundreds of pages long!
- Scroll down to the very BOTTOM of the above page. At the very bottom you will see a link titled: "Distance Tree of Results." Click on that and up will come the most probable evolutionary tree for your sequence.
- PRINT THIS PAGE of the tree! (preferably in color!)
- Make sure that page includes the color codes at the side, and also the binomial names of the organisms. The color codes will tell you into which kingdoms, phyla, etc., this sequence and similar ones are found.
FUN Seaches
There are three levels of searches you might find interesting to make - recent mutations (last 200,00 years), mid-depth (last 1 billion years), and primordial (back to the beginnings of life on earth)
- Recent
- sickle-cell anemia
- HIV-resistance
- Mid-Depth
- photosynthesis
- electron transport proteins
- catalase
- flagella proteins
- the dehydrogenases
- β-hydroxy acids ↔ β-keto-acids (coenzyme = NAD+/NADH)
(do the following three share a common ancestor?)- Lactic dehydrogenase (LDH)
- Malic dehydrogenase (MDH)
- Isocitric dehydrogenase (iCDH)
- Di-methylene dehydrogenases ( -CH2-CH2- → -CH=CH- )
(coenzymes FAD/FADH2 and CoA)- succinic dehydrogenase (SDH)
- myoglobin and hemoglobin
- Primordial - these are things common to all life, so they must have started very early!
- transcriptase
- the various rRNA's
- the various tRNA's
- RNase (ribonuclease)
- the many different proteins making up the ribosome
- DNA-polymerase
- and a host of other urgenetic components