Protecting Your NIH Funding: New Public Access Policy

Posted by: Eli Roberson  :  Category: News

If you have or want a career in academic biomedical or basic science research in areas that impact human health then your career depends on funding. For health related research the NIH is your bread and butter. You publish papers as a graduate student and post doc, get some expertise in a certain area, apply for funding, get funding, do research, publish more papers, apply for more grants; you get the picture.

Unfortunately for academic scientists the grant environment has become quite hostile lately. NIH appropriations have become flat in the face of a greater number of researchers, more expensive technologies, an economy that for all intents and purposes is in a recession, and ridiculous fuel costs (yes, that is germane as many suppliers now are applying a fuel surcharge to orders). Basically the same sum of money with decreased buying power is being split amongst a growing number of people doing more expensive research. Unfortunately this means fewer grants are being funded and fewer funded grants are being renewed. Every edge you can get in getting or maintaining a grant is important, which is why the new NIH public access policy is important.

The most important highlights of the new policy is that any NIH supported research published in a peer-reviewed journal must have the final accepted manuscript deposited in PubMed Central with in 12 months of publication. The up side is an increased in accessibility to people that don’t have the advantage of being at an institution with a million different site licenses for journals. The downside is, of course, more work for the researcher.  The following types of papers are subject to the new policy per the faq:

The Policy applies to any manuscript that:

  • Is peer-reviewed;
  • And, is accepted for publication in a journal on or after April 7, 2008;
  • And, arises from:
    • Any direct funding 1 from an NIH grant or cooperative agreement active in Fiscal Year 2008, or;
    • Any direct funding from an NIH contract signed on or after April 7, 2008, or;
    • Any direct funding from the NIH Intramural Program, or;
    • An NIH employee

Complying with the new access policy is part of the terms of agreement for any NIH award from now on, so be sure before you sign any copyright agreements to have papers published that the agreement wouldn’t prevent you from depositing the work in PubMed Central within the timelimit. Furthermore, references to your own published works in NIH grant proposals must include a PubMed Central ID (PMCID).

Don’t forget to stay in compliance with this policy, or the big bad NIH administrators may sneak out of your closet in the night and freeze your funding.

Bioinformatics Tool Chest: Bioconductor

Posted by: Eli Roberson  :  Category: General, Technical

Image of Bioconductor Logo

Following up on the previous bioinformatics tool chest post, I thought I’d cover Bioconductor next. Bioconductor is actually an off-shoot of the R-project.

Now hold on, I know what you’re thinking. “But you talked about R last time, why do we have to talk about R again?!?” It’s simple really. Though bioconductor is a derivitive of R, its purpose truly is unique enough to deserve its own post.

Bioconductor (or BioC) is an open-source derivitive of R focused on facilitating the analysis of genomic data. One might ask, why should I care? If you perform any kind of high-throughput SNP genotyping or gene expression analysis, this software suite gives you immediate access to free, open-source, extremely powerful data analysis options. Got Affymetrix CEL files for expression data? No problem. Bioconductor can load, normalize, analyze, and summarize that data for you. How about SNP genotyping data? Again no problem. Want to check the copy number of your SNP data? You’ll have several options. Many Bioconductor packages are built using S4 methods and classes (the exact definition of which are unimportant for this article). The advantage of that coding system is that you can use and extend existing classes to perform your own, custom designed analysis methods. And even better, once you’ve worked out a new method, you can incorporate it into a package and submit it to Bioconductor for everyone to use!

The bottom line is this: if you need powerful, customizable, freely available analysis software (and who doesn’t after spending ridulous amounts of money running many samples on high-throughput technology) then Bioconductor is a viable choice. If you have genomic data give BioC a try, and if it’s useful to you build your own packages for the whole community.

How Perl Saved the Human Genome Project

Posted by: Abbas  :  Category: General, News

The helix graphic is reproduced from Dr. Lincoln Stein's article

 

The helix graphic is reproduced from Dr. Lincoln Stein’s article “How Perl Saved the Human Genome Project” as published in the September 1996 issue of The Perl Journal.
Reprinted courtesy of the Perl Journal, http://www.tpj.com Archive.
Lincoln Stein’s website is http://stein.cshl.org

DATE: Early February, 1996

LOCATION: Cambridge, England, in the conference room of the largest DNA sequencing center in Europe.

OCCASION: A high level meeting between the computer scientists of this center and the largest DNA sequencing center in the United States.

THE PROBLEM: Although the two centers use almost identical laboratory techniques, almost identical databases, and almost identical data analysis tools, they still can’t interchange data or meaningfully compare results.

THE SOLUTION: Perl.

The human genome project was inaugurated at the beginning of the decade as an ambitious international effort to determine the complete DNA sequence of human beings and several experimental animals. The justification for this undertaking is both scientific and medical. By understanding the genetic makeup of an organism in excruciating detail, it is hoped that we will be better able to understand how organisms develop from single eggs into complex multicellular beings, how food is metabolized and transformed into the constituents of the body, how the nervous system assembles itself into a smoothly functioning ensemble. From the medical point of view, the wealth of knowledge that will come from knowing the complete DNA sequence will greatly accelerate the process of finding the causes of (and potential cures for) human diseases.

Six years after its birth, the genome project is ahead of schedule. Detailed maps of the human and all the experimental animals have been completed (mapping out the DNA using a series of landmarks is an obligatory first step before determining the complete DNA sequence). The sequence of the smallest model organism, yeast, is nearly completed, and the sequence of the next smallest, a tiny soil-dwelling worm, isn’t far behind. Large scale sequencing efforts for human DNA started at several centers a number of months ago and will be in full swing within the year.

read more…

Bioinformatics Tool Chest: R Programming Language

Posted by: Eli Roberson  :  Category: General, Technical

Data

Scientists love data. Call it a character flaw, but most of us can’t get enough. More data, more! But the data alone are just the start. To really be useful, we have to do something with the data. Model. Summarize. Evangelize it. Something. Who hasn’t needed to plot a standard curve? Or find the mean value of a series of numbers? What should you do when you have these questions.

The Problem

Many scientists turn to our friend Excel to solve these problems. It’s easy to work with, and you can even make graphs easily. That isn’t necessarily a good thing, as perfectly nice people make really bad graphs because those fancy 3D features are so tantalizing. Everyone interested in bioinformatics or computational biology needs a tool in their tool chest that can handle:

  1. statistics
  2. figure, graph creation
  3. very large data

The Solution

Look no further friends, your savior has arrive, and its name is R. R is a free, cross-platform, open-source derivitive of the S language. In case you didn’t catch that last part: R is free. You can download R from the nearest mirror to get started.

The Good

  • Freely available
  • Open-source — can compile it to your needs (OS, cpu, available memory, optimization levels)
  • Tons of add on packages
  • Scriptable
  • Ability to write own functions and packages
  • Able to handle large datasets
  • Interfaces with compiled languages
  • Can save plots as Post-scripts (print quality)
  • Extensive tutorials online along with mailing lists and archives for trouble shooting

The Bad

  • Command-line interface
  • Can be slow reading large files
  • Interpreted language (can be slower than compiled code)
  • No tech support line
  • Steep learning curve for beginners, especially non-programmers