- MEGAN MEtaGenome ANalyzer. A stand-alone metagenome analysis tool.
- Metagenomics and Our Microbial Planet A website on metagenomics and the vital role of microbes on Earth from the National Academies.
- The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet A report released by the National Research Council in March 2007. Also, see the Report In Brief.
- IMG/M The Integrated Microbial Genomes system, for metagenome analysis by the DOE-JGI.
- CAMERA Cyberinfrastructure for Metagenomics, data repository and tools for metagenomics research.
- A good overview of metagenomics from the Science Creative Quarterly
- list of Metagenome Projects from genomesonline.org
- MG-RAST publicly available, free, metagenomics annotation pipeline and repository for pyrosequences, Sanger sequences, and other sequence approaches.
- Human microbiome project
- MetaHIT official website for the EU-funded project : Metagenomics of the Human Intestinal Tract
- Annotathon Bioinformatics Training Through Metagenomic Sequence Annotation
- Metagenomics Metagenomics research and applications
Next Gen. Sequencing
With IBM tossing it’s hat into the ring of “next-next-generation” sequencing, I’m starting to get lost as to which generation is which. For the moment, I’m sort of lumping things together, while I wait to see how the field plays out. In my mind, first generation is anything that requires chain termination, Second generation is chemical based pyrosequencing, and third generation is single molecule sequencing based on a nano-scale mechanical process. It’s a crude divide, but it seems to have some consistency.
At any rate, I decided I’d collect a few videos to illustrate each one. For Sanger, there are a LOT of videos, many of which are quite excellent, but I only wanted one. (Sorry if I didn’t pick yours.) For second and third generation DNA sequencing videos, the selection kind of flattens out, and two of them come from corporate sites, rather than youtube – which seems to be the general consensus repository of technology videos.
Personally, I find it interesting to see how each group is selling themselves. You’ll notice some videos press heavily on the technology, while others focus on the workflow.
As an aside, I also find it interesting to look for places where the illustrations don’t make sense… there’s a lovely place in the 454 video where two strands of DNA split from each other on the bead, leaving the two full strands and a complete primer sequence… mysterious! (Yes, I do enjoy looking for inconsistencies when I go to the movies.)
Ok, get out your popcorn.
First Generation:
Sanger Entry: Link
Second Generation:
Pyrosequencing Entry: Link
There was an article on probiotics in the New York Times today. By Tara Parker-Pope it addresses some important issues rarely covered in the press about probiotics (see Well – Probiotics – Looking Underneath the Yogurt Label – NYTimes.com).
On the one hand, the article does a decent job of pointing out that there is great strain to strain variation among microbes labelled as probiotics. In this regard there is a great quote by Gregor Reid:
Lactobacillus is just the bacterium,” said Gregor Reid, director of the Canadian Research and Development Center for Probiotics. “To say a product contains Lactobacillus is like saying you’re bringing George Clooney to a party. It may be the actor, or it may be an 85-year-old guy from Atlanta who just happens to be named George Clooney. With probiotics, there are strain-to-strain differences.”
Sure, James Watson has been known, especially recently, to say some outrageous things. But here is something I think everyone, scientists and the public should read – an opinoin piece in the NY Times today by Watson ( Op-Ed Contributor – To Fight Cancer, Know the Enemy – NYTimes.com)
This piece is worth reading because it contains some critical ideas and wisdom which has been missing in discussions of the fight against cancer.
First, Watson discusses the critical importance of basic science and says that when he expressed this importance to the National Cancer Institute advisory board many years ago, he was eventually booted off.
Second, he discusses how we have only recently begun to understand the basic biology of cancer (he also mentions how the human genome project has helped in this). The genome project will, he says, allow for the determination of most/all of the major genetic changes that occur in cancer cells.
The Pervasive Effects of an Antibiotic on the Gut
PLoS Biology Vol. 6, No. 11, e280 doi:10.1371/journal.pbio.0060280
Linux? You geeks use Linux?
If you work in science, and you work on big datasets (such as analyzing next generation sequencing data), chances are that you use Linux for some of your work. I frequent several of our lab’s Red Hat servers for data analysis and code development purposes. However, these aren’t just my servers to use. Other lab members and, depending on the server, IT staff use them too. I try to remember to check and see who is on and what they’re running before getting too involved with something that’s going to hog memory or processor time. But, of course, I don’t always remember.
I decided to automate this process to take the remembering part out. By adding in a shell script + some code in my profile file, my ssh login immediately displays relevant information without having to invoke it manually.
Shell Script
The code is based on the Bash shell, so it may our may not apply to your ssh login. I keep the shell script in my /home/user directory with the name “.greeting.sh”. Adding the leading period just makes it invisible to standard “ls” queries so it doesn’t add to the clutter in my home directory. The code for the “.greeting.sh” follows between the lines of # signs:
##################################################
#!/bin/bash
UNAME=`whoami`
TIME=`date`
HOST=`hostname`
UCNT=`users | wc -w`
ULST=`users`
PROC=`ps aux|awk ‘NR > 0 { s +=$3 }; END {printf(”%d\n”, s + 0.5);}’`
MPCT=`free | grep Mem | awk ‘{printf(”%d\n”, $3 / $2 * 100 + 0.5);}’`
MYSHELL=`echo $SHELL`
echo
echo “$TIME”
echo “Shell: $MYSHELL”
echo “Hello $UNAME! Welcome to $HOST!”
if [ $UCNT -ge 2 ]
then
echo “$UCNT users are currently logged into $HOST:”
echo “$ULST”
else
echo “No other users currently logged in.”
fi
echo “System Status:”
if [ $PROC -ge 80 ]
then
echo “High processor usage at ${PROC}%”
elif [ $PROC -ge 50 ]
then
echo “Medium processor usage at ${PROC}%”
else
echo “Low processor usage at ${PROC}%”
fi
if [ $MPCT -ge 80 ]
then
echo “High memory usage at ${MPCT}%”
elif [ $MPCT -ge 50 ]
then
echo “Medium memory usage at ${MPCT}%”
else
echo “Low memory usage at ${MPCT}%”
fi
echo
exit 0
##################################################
For example, the code above prints the following when logging in: The date, a greeting, the hostname, my current shell, whether other users are logged in (and the list of users if others are on), and information about current processor and memory usage. I customize this script depending on the primary use of the server. If you have a server that should always be running a certain program, add a line that looks for that program. If it were called “myprogram” you could add the following line to the program:
PROG=`ps aux | grep -v grep | grep myprogram | wc -l`
If the program is running, then it will return 1 (if only one instance is running), or 0 if it isn’t running. By adding in some language later testing if $PROG -ge 1, a message could print saying the program was running or not.
Take note! Don’t forget to alter the permissions on the script to allow execution, using something like “chmod +x .greeting.sh”. Also note that the variables are defined using backticks (same key as the ~ on standard US QWERTY keyboards), not single quotes.
Automatically running
The script isn’t much use if you have to run it manually (if I remembered to do that, why would I need a script?), so I like to set the script to run automatically immediately following an ssh login. As I said before, I use Bash on most of the Linux servers I use. For this shell, there is a file called “.bash_profile” in the home directory of each user. This profile file is executed on every ssh connection to set some common environment variables, like PATH. By adding in code to run the greeting script, the output from the script will be displayed immediately after login. Example code to add to the bottom of your profile file:
##################################################
if [ -e "/home/user/.greeting.sh" ]
then
/home/user/.greeting.sh
fi
##################################################
That’s all there is to it. A simple, but powerfull script to automatically give you information on server login. Feel free to your system and purpose.
Firefox?!?!
I know what you’re thinking. “Come on. A browser? As a bioinformatics tool?” You might actually be surprised. I think that most people that do research spend at least some amount of time online trying to track down information. Maybe it’s protein name, or DNA elements in a chromosome segment. Maybe it’s a certain paper or topic through PubMed. Personally, I spend a good amount of time searching out answers. Furthermore, I switch between databases / websites between tabs to get information from different sources. Could there be a way to search faster?
Keyword Search To The Rescue!
Luckily, there is a faster way: the keyword search. Basically the keyword search will allow you to make a bookmark shortcut to any search box using a keyword. Once a keyword search has been saved that particular search can be invoked with just the keyword. I frequently use the UCSC Genome Browser for research, so I’ll use this as an example.
How To
- Navigate to the UCSC Genome Browser main page.
- In the top navigation panel click “Genomes”
- The default page should be the Human genome browser. If you are interested in a different organism you can certainly change it using the drop-down boxes. There should be an input box labeled “position or search term”. Right click in the box.
- In the pop-up menu select “Add a Keyword for This Search…”. An “Add Bookmark” window will appear.
- In the “Name” box type a descriptive name. In this case use “UCSC Human Search”.
- In the “Keyword” box type the keyword you want to use. In this case use “ucsc”.
- Press the “Add” button to save this search.
Let’s test the keyword. Open a new blank Firefox tab by pressing CTRL+T or File -> New Tab. In the address bar type “ucsc MECP2″ and press enter. The “ucsc” keyword triggers the query “MECP2″ to be run through the search box we saved. After a few seconds a window for the UCSC browser should appear listing possible genes matching the symbol MECP2. If you had navigated to the UCSC Browser directly and typed MECP2 directly in the search box the results would have been the same.
What about direct chromosome positions? Let’s try it. Clear the text from the URL bar, type “ucsc chr1:1-20000000″, and press enter. The page should change to show the first 20,000,000 base pairs of chromosome 1.
What other uses could it have? What about a “pubmed” keyword search? Or an Ensembl search? It can be particularly powerful of you combine these searches. If you were researching Rett Syndrome, you could in one tab search for “pubmed Rett Syndrome”. After reading a few papers and finding information on MECP2 in Rett Syndrome all you have do is hit CTRL+T to open another tab. Then type “ucsc MECP2″ to find it in the genome browser. If you had a saved search for the NCBI Protein database you could go even further by opening yet another tab and typing “protein MECP2_HUMAN” (assuming your keyword was protein). The result would be a page about the MECP2 protein in humans where you could get the amino acid sequence. Your specific search set would depend on what databases you search most frequently in your research.
This kind of time savings can really add up. Plus you can show off your cool new hack to friends when they’re trying to search for something.
VNTI Is Dead
The golden age of Vector NTI has ended, and free software licenses are no longer available to academics. This move has been disturbing to many, and support for deactivated licenses haven’t been the best so far. But after sending a plea to the tech support services associated with VNTI, they’ve come through with some help.
To answer an oft answered question, DNA/RNA/Protein sequences CANNOT be exported after a license is expired. I know, I know, bad programming practice and bad PR practice. BUT if your data is locked in you can get a temporary license to export everything. For DNA / RNA molecules you can export into GenBank, EMBL, and FASTA file formats. For protein sequences you can export into GenPept, SWISS-PROT, or Protein FASTA format. File export DOES NOT work for Enzymes, Oligos, Gel Markers, Citations, BLAST Results, or Analysis Results. Those of you with extensive Oligo libraries will want to contact Tech Support directly for assistance in exporting or moving these files. Sorry guys. It may or may not be supported.
Exporting DNA/RNA Molecules
- Open your VNTI Database.
- Go to ‘DNA/RNA Molecules’ from the drop down box.
- Select all the molecules you want to export. For everything, select one molecule and either press CTRL+A or use ‘Edit’ -> ‘Select All’.
- Go to ‘Edit’ -> ‘Copy To’ -> ‘File…’. Make sure to choose the format you want. If you want all three, just repeat the process for each one.
Exporting Protein Sequences
The process is identical to exporting DNA / RNA molecules, except the Protein Molecules library must be used.
Getting a Temporary License
To get your temporary license e-mail Technical Support at bioinfosupport[AT]invitrogen.com. In your message just explain that you’ve been a user of the VNTI free license, but the license expired and you need a temporary one to export all your data.
Now, I’m glad that Life Sciences / Invitrogen has come through with some help for the community. Do I agree with the change in marketing? No. Do I think the transition was handled gracefully? No. But they could have elected to lock everyone’s data in permanently, and have instead elected to extend the olive branch. Hope this helps some of your out there with trapped data.
Previously I tried to get the word out on a change to the NIH policy for grant supported research that required researchers to transfer a copy of the final work to a repository (PMC) that provides free access to the article. My personally biased opinion is the policy was a great move, and that making scientific knowledge more highly available to everyone is a good thing.
Some publishers have already stepped up to embrace the new policy by transferring the paper to PMC for you, some well before the 1 year deadline. Others have no coherent plan and charge large fees for a paper to be transferred to PMC. For example, the American Psychological Society charges Wellcome Trust supported researchers $4,000 to send a copy of their paper to PubMed Central.
There already is controversy about the policy in Congress. House Bill HR 6845 was introduced (you can find it by querying ‘HR 6845′ here) as the Fair Copyright in Research Works Act. After glancing over it, it seems that the bill intends to reverse the NIH policy decision by making sure funding agencies can’t force the funded individuals to put their works in a public archive. While on the outset that may sound like it’s protecting the researcher by not ‘forcing’ them to make their work available, it seems to me it’s actually protection for publishers that don’t want to modify their business model. You publish with us, you transfer copyright to us, we get paid for others to view the work. That worked fine for a long time. But the world has changed. We now live in a world where information is instantly available. How about instead of reversing a policy that makes more information available to more people we try to work out a new publishing model?
Who knows where this whole thing will end up? I don’t have a clue. What I do know is that making scientific works available (even if after a waiting period) to a wider audience of researchers is a good thing that spurs more research and greater innovation. But that’s just my two cents.
