- MEGAN MEtaGenome ANalyzer. A stand-alone metagenome analysis tool.
- Metagenomics and Our Microbial Planet A website on metagenomics and the vital role of microbes on Earth from the National Academies.
- The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet A report released by the National Research Council in March 2007. Also, see the Report In Brief.
- IMG/M The Integrated Microbial Genomes system, for metagenome analysis by the DOE-JGI.
- CAMERA Cyberinfrastructure for Metagenomics, data repository and tools for metagenomics research.
- A good overview of metagenomics from the Science Creative Quarterly
- list of Metagenome Projects from genomesonline.org
- MG-RAST publicly available, free, metagenomics annotation pipeline and repository for pyrosequences, Sanger sequences, and other sequence approaches.
- Human microbiome project
- MetaHIT official website for the EU-funded project : Metagenomics of the Human Intestinal Tract
- Annotathon Bioinformatics Training Through Metagenomic Sequence Annotation
- Metagenomics Metagenomics research and applications
Next Gen. Sequencing
With IBM tossing it’s hat into the ring of “next-next-generation” sequencing, I’m starting to get lost as to which generation is which. For the moment, I’m sort of lumping things together, while I wait to see how the field plays out. In my mind, first generation is anything that requires chain termination, Second generation is chemical based pyrosequencing, and third generation is single molecule sequencing based on a nano-scale mechanical process. It’s a crude divide, but it seems to have some consistency.
At any rate, I decided I’d collect a few videos to illustrate each one. For Sanger, there are a LOT of videos, many of which are quite excellent, but I only wanted one. (Sorry if I didn’t pick yours.) For second and third generation DNA sequencing videos, the selection kind of flattens out, and two of them come from corporate sites, rather than youtube – which seems to be the general consensus repository of technology videos.
Personally, I find it interesting to see how each group is selling themselves. You’ll notice some videos press heavily on the technology, while others focus on the workflow.
As an aside, I also find it interesting to look for places where the illustrations don’t make sense… there’s a lovely place in the 454 video where two strands of DNA split from each other on the bead, leaving the two full strands and a complete primer sequence… mysterious! (Yes, I do enjoy looking for inconsistencies when I go to the movies.)
Ok, get out your popcorn.
First Generation:
Sanger Entry: Link
Second Generation:
Pyrosequencing Entry: Link
There was an article on probiotics in the New York Times today. By Tara Parker-Pope it addresses some important issues rarely covered in the press about probiotics (see Well – Probiotics – Looking Underneath the Yogurt Label – NYTimes.com).
On the one hand, the article does a decent job of pointing out that there is great strain to strain variation among microbes labelled as probiotics. In this regard there is a great quote by Gregor Reid:
Lactobacillus is just the bacterium,” said Gregor Reid, director of the Canadian Research and Development Center for Probiotics. “To say a product contains Lactobacillus is like saying you’re bringing George Clooney to a party. It may be the actor, or it may be an 85-year-old guy from Atlanta who just happens to be named George Clooney. With probiotics, there are strain-to-strain differences.”
Sure, James Watson has been known, especially recently, to say some outrageous things. But here is something I think everyone, scientists and the public should read – an opinoin piece in the NY Times today by Watson ( Op-Ed Contributor – To Fight Cancer, Know the Enemy – NYTimes.com)
This piece is worth reading because it contains some critical ideas and wisdom which has been missing in discussions of the fight against cancer.
First, Watson discusses the critical importance of basic science and says that when he expressed this importance to the National Cancer Institute advisory board many years ago, he was eventually booted off.
Second, he discusses how we have only recently begun to understand the basic biology of cancer (he also mentions how the human genome project has helped in this). The genome project will, he says, allow for the determination of most/all of the major genetic changes that occur in cancer cells.
The Pervasive Effects of an Antibiotic on the Gut
PLoS Biology Vol. 6, No. 11, e280 doi:10.1371/journal.pbio.0060280
The format of Windows and Unix text files differs slightly. In Windows, lines end with both the line feed and carriage return ASCII characters, but Unix uses only a line feed. As a consequence, some Windows applications will not show the line breaks in Unix-format files. Likewise, Unix programs may display the carriage returns in Windows text files with Ctrl-m ( ^M ) characters at the end of each line.
There are many ways to solve this problem. This document provides instructions for using FTP, screen capture, unix2dos and dos2unix, tr, awk, Perl, and vi to do the conversion. Before you use these utilities, the files you are converting must first be on a Unix computer.
Note: In the instructions below, replace unixfile.txt with the name of the Unix file you are transferring, and replace winfile.txt with the name of the Windows file you are transferring.
FTP
When using an FTP program to move a text file between Unix and Windows, be sure the file is transferred in ASCII format. This will ensure that the document is transformed into a text format appropriate for the host. Some FTP programs, especially graphical applications like Hummingbird FTP, do this automatically. If you are using FTP from the command line, however, before you begin the file transfer, be sure to enter at the FTP prompt:
ascii
Note: You need to use a client that supports secure FTP to transfer files to and from Indiana University’s central systems. For more, see At IU, what SSH/SFTP clients are supported and where can I get them?
Screen capture
You can also convert files from Unix to Windows format when transferring them to a PC with a communications program by selecting ASCII text download. Select this option with your communications program to capture all the text subsequently displayed to your screen, and then enter at the Unix prompt:
cat unixfile.txt
Most communications programs will add carriage returns to the stream of text as they save it to your computer’s hard drive. Once the file has finished displaying, abort the text download.
Note: This method may be slow for large text files. Also, no error checking is performed on the file as it is transferred.
dos2unix and unix2dos
On systems using Solaris, the utilities dos2unix and unix2dos are available. These utilities provide a straightforward method for converting files from the Unix command line.
To use either command, simply type the command followed by the name of the file you wish to convert, and the name of a file which will contain the converted results. Thus, to convert a Windows file to a Unix file, at the Unix prompt, enter:
dos2unix winfile.txt unixfile.txt
Linux? You geeks use Linux?
If you work in science, and you work on big datasets (such as analyzing next generation sequencing data), chances are that you use Linux for some of your work. I frequent several of our lab’s Red Hat servers for data analysis and code development purposes. However, these aren’t just my servers to use. Other lab members and, depending on the server, IT staff use them too. I try to remember to check and see who is on and what they’re running before getting too involved with something that’s going to hog memory or processor time. But, of course, I don’t always remember.
I decided to automate this process to take the remembering part out. By adding in a shell script + some code in my profile file, my ssh login immediately displays relevant information without having to invoke it manually.
Shell Script
The code is based on the Bash shell, so it may our may not apply to your ssh login. I keep the shell script in my /home/user directory with the name “.greeting.sh”. Adding the leading period just makes it invisible to standard “ls” queries so it doesn’t add to the clutter in my home directory. The code for the “.greeting.sh” follows between the lines of # signs:
##################################################
#!/bin/bash
UNAME=`whoami`
TIME=`date`
HOST=`hostname`
UCNT=`users | wc -w`
ULST=`users`
PROC=`ps aux|awk ‘NR > 0 { s +=$3 }; END {printf(”%d\n”, s + 0.5);}’`
MPCT=`free | grep Mem | awk ‘{printf(”%d\n”, $3 / $2 * 100 + 0.5);}’`
MYSHELL=`echo $SHELL`
echo
echo “$TIME”
echo “Shell: $MYSHELL”
echo “Hello $UNAME! Welcome to $HOST!”
if [ $UCNT -ge 2 ]
then
echo “$UCNT users are currently logged into $HOST:”
echo “$ULST”
else
echo “No other users currently logged in.”
fi
echo “System Status:”
if [ $PROC -ge 80 ]
then
echo “High processor usage at ${PROC}%”
elif [ $PROC -ge 50 ]
then
echo “Medium processor usage at ${PROC}%”
else
echo “Low processor usage at ${PROC}%”
fi
if [ $MPCT -ge 80 ]
then
echo “High memory usage at ${MPCT}%”
elif [ $MPCT -ge 50 ]
then
echo “Medium memory usage at ${MPCT}%”
else
echo “Low memory usage at ${MPCT}%”
fi
echo
exit 0
##################################################
For example, the code above prints the following when logging in: The date, a greeting, the hostname, my current shell, whether other users are logged in (and the list of users if others are on), and information about current processor and memory usage. I customize this script depending on the primary use of the server. If you have a server that should always be running a certain program, add a line that looks for that program. If it were called “myprogram” you could add the following line to the program:
PROG=`ps aux | grep -v grep | grep myprogram | wc -l`
If the program is running, then it will return 1 (if only one instance is running), or 0 if it isn’t running. By adding in some language later testing if $PROG -ge 1, a message could print saying the program was running or not.
Take note! Don’t forget to alter the permissions on the script to allow execution, using something like “chmod +x .greeting.sh”. Also note that the variables are defined using backticks (same key as the ~ on standard US QWERTY keyboards), not single quotes.
Automatically running
The script isn’t much use if you have to run it manually (if I remembered to do that, why would I need a script?), so I like to set the script to run automatically immediately following an ssh login. As I said before, I use Bash on most of the Linux servers I use. For this shell, there is a file called “.bash_profile” in the home directory of each user. This profile file is executed on every ssh connection to set some common environment variables, like PATH. By adding in code to run the greeting script, the output from the script will be displayed immediately after login. Example code to add to the bottom of your profile file:
##################################################
if [ -e "/home/user/.greeting.sh" ]
then
/home/user/.greeting.sh
fi
##################################################
That’s all there is to it. A simple, but powerfull script to automatically give you information on server login. Feel free to your system and purpose.
Next Generation Seq Tools
Something I came across.
Integrated solutions
* CLCbio Genomics Workbench – de novo and reference assembly of Sanger, 454, Solexa, Helicos, and SOLiD data. Commercial next-gen-seq software that extends the CLCbio Main Workbench software. Includes SNP detection, browser and other features. Runs on Windows, Mac OS X and Linux.
* NextGENe – de novo and reference assembly of Illumina and SOLiD data. Uses a novel Condensation Assembly Tool approach where reads are joined via “anchors” into mini-contigs before assembly. Requires Win or MacOS.
* SeqMan Genome Analyser – Software for Next Generation sequence assembly of Illumina, 454 Life Sciences and Sanger data integrating with Lasergene Sequence Analysis software for additional analysis and visualization capabilities. Can use a hybrid templated/de novo approach. Early release commercial software. Compatible with Windows® XP X64 and Mac OS X 10.4.
Firefox?!?!
I know what you’re thinking. “Come on. A browser? As a bioinformatics tool?” You might actually be surprised. I think that most people that do research spend at least some amount of time online trying to track down information. Maybe it’s protein name, or DNA elements in a chromosome segment. Maybe it’s a certain paper or topic through PubMed. Personally, I spend a good amount of time searching out answers. Furthermore, I switch between databases / websites between tabs to get information from different sources. Could there be a way to search faster?
Keyword Search To The Rescue!
Luckily, there is a faster way: the keyword search. Basically the keyword search will allow you to make a bookmark shortcut to any search box using a keyword. Once a keyword search has been saved that particular search can be invoked with just the keyword. I frequently use the UCSC Genome Browser for research, so I’ll use this as an example.
How To
- Navigate to the UCSC Genome Browser main page.
- In the top navigation panel click “Genomes”
- The default page should be the Human genome browser. If you are interested in a different organism you can certainly change it using the drop-down boxes. There should be an input box labeled “position or search term”. Right click in the box.
- In the pop-up menu select “Add a Keyword for This Search…”. An “Add Bookmark” window will appear.
- In the “Name” box type a descriptive name. In this case use “UCSC Human Search”.
- In the “Keyword” box type the keyword you want to use. In this case use “ucsc”.
- Press the “Add” button to save this search.
Let’s test the keyword. Open a new blank Firefox tab by pressing CTRL+T or File -> New Tab. In the address bar type “ucsc MECP2″ and press enter. The “ucsc” keyword triggers the query “MECP2″ to be run through the search box we saved. After a few seconds a window for the UCSC browser should appear listing possible genes matching the symbol MECP2. If you had navigated to the UCSC Browser directly and typed MECP2 directly in the search box the results would have been the same.
What about direct chromosome positions? Let’s try it. Clear the text from the URL bar, type “ucsc chr1:1-20000000″, and press enter. The page should change to show the first 20,000,000 base pairs of chromosome 1.
What other uses could it have? What about a “pubmed” keyword search? Or an Ensembl search? It can be particularly powerful of you combine these searches. If you were researching Rett Syndrome, you could in one tab search for “pubmed Rett Syndrome”. After reading a few papers and finding information on MECP2 in Rett Syndrome all you have do is hit CTRL+T to open another tab. Then type “ucsc MECP2″ to find it in the genome browser. If you had a saved search for the NCBI Protein database you could go even further by opening yet another tab and typing “protein MECP2_HUMAN” (assuming your keyword was protein). The result would be a page about the MECP2 protein in humans where you could get the amino acid sequence. Your specific search set would depend on what databases you search most frequently in your research.
This kind of time savings can really add up. Plus you can show off your cool new hack to friends when they’re trying to search for something.
