Bioinformatics Tool Chest: Vector NTI

Posted by: Eli Roberson  :  Category: General

Continuing on the topic of bioinformatics tools for researchers, I thought I’d move away from R for a bit. The tool for today is Vector NTI. Fomerly Vector NTI was a product of Informax. To use Vector NTI one had to purchase a license, costing in the thousands of dollars. Then Vector was purchased by Invitrogen. This was great for researchers, because now Invitrogen offers annual, renewable licenses for free to academic researchers. Basically all that you need to do now is to sign up on the invitrogen site for the Vector NTI User Community and confirm that you are actually an academic researcher to get your free license.

Enough of the “how to get it” spiel, why would you want to get Vector? Think of it as a swiss army knife of research software. A central feature of the software is the local database. The database stores DNA, RNA, and protein molecule sequences, restriction enzymes with recognition sequences, oligos with sequences, gel markers, citations, blast results, and analysis results. The database comes prepopulated with many molecules (especially from the Invitrogen product line), oligos, markers, etc. Furthermore, the database doesn’t just store sequences, but also features, such as genes other key features. New molecules are easily added to the database. To add any of the molecules in the database to the current Vector tool all you have to do is drag it from the database window into the tool window.

Okay, okay. I know what you’re saying, “Yeah yeah yeah, I can already store my data in a database.” But that isn’t all. Say you have a molecule that you want to design PCR primers for. Vector can do that for you, and help analyze multiplex PCR primers. Want to clone a DNA segment? Not a problem. Use the database to figure out the best vector and electronically create the molecule ahead of time. There are even cloning wizards!

What about sequencing? Say you run some standard dye-terminator capillary sequencing on an ABI machine. You can actually load the *.abi file directly into Vector to analyze and edit the chromatogram. Say you’ve cloned the DNA you were interested in and sequenced it in both directions. Load the *.abi file into the Contig module of Vector, edit the chromatograms, and then align them with the electronic molecule to see if your product matches expectation.

And all of these things are just the beginning of what you can do with Vector NTI. If you want to find out more, get it yourself and try it out.

Protecting Your NIH Funding: New Public Access Policy

Posted by: Eli Roberson  :  Category: News

If you have or want a career in academic biomedical or basic science research in areas that impact human health then your career depends on funding. For health related research the NIH is your bread and butter. You publish papers as a graduate student and post doc, get some expertise in a certain area, apply for funding, get funding, do research, publish more papers, apply for more grants; you get the picture.

Unfortunately for academic scientists the grant environment has become quite hostile lately. NIH appropriations have become flat in the face of a greater number of researchers, more expensive technologies, an economy that for all intents and purposes is in a recession, and ridiculous fuel costs (yes, that is germane as many suppliers now are applying a fuel surcharge to orders). Basically the same sum of money with decreased buying power is being split amongst a growing number of people doing more expensive research. Unfortunately this means fewer grants are being funded and fewer funded grants are being renewed. Every edge you can get in getting or maintaining a grant is important, which is why the new NIH public access policy is important.

The most important highlights of the new policy is that any NIH supported research published in a peer-reviewed journal must have the final accepted manuscript deposited in PubMed Central with in 12 months of publication. The up side is an increased in accessibility to people that don’t have the advantage of being at an institution with a million different site licenses for journals. The downside is, of course, more work for the researcher.  The following types of papers are subject to the new policy per the faq:

The Policy applies to any manuscript that:

  • Is peer-reviewed;
  • And, is accepted for publication in a journal on or after April 7, 2008;
  • And, arises from:
    • Any direct funding 1 from an NIH grant or cooperative agreement active in Fiscal Year 2008, or;
    • Any direct funding from an NIH contract signed on or after April 7, 2008, or;
    • Any direct funding from the NIH Intramural Program, or;
    • An NIH employee

Complying with the new access policy is part of the terms of agreement for any NIH award from now on, so be sure before you sign any copyright agreements to have papers published that the agreement wouldn’t prevent you from depositing the work in PubMed Central within the timelimit. Furthermore, references to your own published works in NIH grant proposals must include a PubMed Central ID (PMCID).

Don’t forget to stay in compliance with this policy, or the big bad NIH administrators may sneak out of your closet in the night and freeze your funding.

Bioinformatics Tool Chest: Bioconductor

Posted by: Eli Roberson  :  Category: General, Technical

Image of Bioconductor Logo

Following up on the previous bioinformatics tool chest post, I thought I’d cover Bioconductor next. Bioconductor is actually an off-shoot of the R-project.

Now hold on, I know what you’re thinking. “But you talked about R last time, why do we have to talk about R again?!?” It’s simple really. Though bioconductor is a derivitive of R, its purpose truly is unique enough to deserve its own post.

Bioconductor (or BioC) is an open-source derivitive of R focused on facilitating the analysis of genomic data. One might ask, why should I care? If you perform any kind of high-throughput SNP genotyping or gene expression analysis, this software suite gives you immediate access to free, open-source, extremely powerful data analysis options. Got Affymetrix CEL files for expression data? No problem. Bioconductor can load, normalize, analyze, and summarize that data for you. How about SNP genotyping data? Again no problem. Want to check the copy number of your SNP data? You’ll have several options. Many Bioconductor packages are built using S4 methods and classes (the exact definition of which are unimportant for this article). The advantage of that coding system is that you can use and extend existing classes to perform your own, custom designed analysis methods. And even better, once you’ve worked out a new method, you can incorporate it into a package and submit it to Bioconductor for everyone to use!

The bottom line is this: if you need powerful, customizable, freely available analysis software (and who doesn’t after spending ridulous amounts of money running many samples on high-throughput technology) then Bioconductor is a viable choice. If you have genomic data give BioC a try, and if it’s useful to you build your own packages for the whole community.

How Perl Saved the Human Genome Project

Posted by: Abbas  :  Category: General, News

The helix graphic is reproduced from Dr. Lincoln Stein's article

 

The helix graphic is reproduced from Dr. Lincoln Stein’s article “How Perl Saved the Human Genome Project” as published in the September 1996 issue of The Perl Journal.
Reprinted courtesy of the Perl Journal, http://www.tpj.com Archive.
Lincoln Stein’s website is http://stein.cshl.org

DATE: Early February, 1996

LOCATION: Cambridge, England, in the conference room of the largest DNA sequencing center in Europe.

OCCASION: A high level meeting between the computer scientists of this center and the largest DNA sequencing center in the United States.

THE PROBLEM: Although the two centers use almost identical laboratory techniques, almost identical databases, and almost identical data analysis tools, they still can’t interchange data or meaningfully compare results.

THE SOLUTION: Perl.

The human genome project was inaugurated at the beginning of the decade as an ambitious international effort to determine the complete DNA sequence of human beings and several experimental animals. The justification for this undertaking is both scientific and medical. By understanding the genetic makeup of an organism in excruciating detail, it is hoped that we will be better able to understand how organisms develop from single eggs into complex multicellular beings, how food is metabolized and transformed into the constituents of the body, how the nervous system assembles itself into a smoothly functioning ensemble. From the medical point of view, the wealth of knowledge that will come from knowing the complete DNA sequence will greatly accelerate the process of finding the causes of (and potential cures for) human diseases.

Six years after its birth, the genome project is ahead of schedule. Detailed maps of the human and all the experimental animals have been completed (mapping out the DNA using a series of landmarks is an obligatory first step before determining the complete DNA sequence). The sequence of the smallest model organism, yeast, is nearly completed, and the sequence of the next smallest, a tiny soil-dwelling worm, isn’t far behind. Large scale sequencing efforts for human DNA started at several centers a number of months ago and will be in full swing within the year.

read more…

Bioinformatics Tool Chest: R Programming Language

Posted by: Eli Roberson  :  Category: General, Technical

Data

Scientists love data. Call it a character flaw, but most of us can’t get enough. More data, more! But the data alone are just the start. To really be useful, we have to do something with the data. Model. Summarize. Evangelize it. Something. Who hasn’t needed to plot a standard curve? Or find the mean value of a series of numbers? What should you do when you have these questions.

The Problem

Many scientists turn to our friend Excel to solve these problems. It’s easy to work with, and you can even make graphs easily. That isn’t necessarily a good thing, as perfectly nice people make really bad graphs because those fancy 3D features are so tantalizing. Everyone interested in bioinformatics or computational biology needs a tool in their tool chest that can handle:

  1. statistics
  2. figure, graph creation
  3. very large data

The Solution

Look no further friends, your savior has arrive, and its name is R. R is a free, cross-platform, open-source derivitive of the S language. In case you didn’t catch that last part: R is free. You can download R from the nearest mirror to get started.

The Good

  • Freely available
  • Open-source — can compile it to your needs (OS, cpu, available memory, optimization levels)
  • Tons of add on packages
  • Scriptable
  • Ability to write own functions and packages
  • Able to handle large datasets
  • Interfaces with compiled languages
  • Can save plots as Post-scripts (print quality)
  • Extensive tutorials online along with mailing lists and archives for trouble shooting

The Bad

  • Command-line interface
  • Can be slow reading large files
  • Interpreted language (can be slower than compiled code)
  • No tech support line
  • Steep learning curve for beginners, especially non-programmers

Useful Online Resources For Cancer Target Analysis

Posted by: Abbas  :  Category: News
Collection of cancer genes based on mutation data

http://www.sanger.ac.uk/genetics/CGP/Census/
  

Repository of microarray data from cancer genomics publication

www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

Respository of cytogenetic abnormalities in human cancer

http://www.progenetix.de/

Respository of cytogenetic abnormalities in human cancer

www.ncbi.nlm.nih.gov/sky

Validated SBNPs in cancer genes

http://snp500cancer.nci.nih.gov/home.cfm

Bioinformatics Search Engine

Posted by: Abbas  :  Category: General, News

PubMed is the free public interface to MEDLINE. It provides access to bibliographic information in MEDLINE as well as additional life science journals.
Searching articles on PubMed requires some skill and more over it does not support some of search strings like popular search engines (Google and Yahoo). 

Eventhough we can search through Google Scholar it gives many false positive results. But, Google provides us to customise our own search engine through google co-op. Through this we can create our own search engine on our interested topic.

I have created a search engine for bioinformatics through Google Co-op, this search engine searches only bioinformatics related journals and it has less false positive results. But this BioMed search engine is still in beta version and it requires further improvement.

You can access the BioMed - Search engine for bioinformatics by clicking here.

Computational Tools For Glycomics Studies

Posted by: Abbas  :  Category: Technical

Sugars are involved in almost every aspect of biology, from recognising pathogens and to blood clotting.The glycome’s basic building blocks are far more numerous and varied than the four letters of the DNA alphabet or the score of amino acids that make proteins.In the late 1980s, when researchers isolated the first gene for a glycosyl transferase, an enzyme that adds sugars to fats and proteins. The discovery gave scientists the first opportunity to study this process, which is usually called glycoslyation, by manipulating the activity of such enzymes.

Fig: Carbohydrate only (no protein) - PDB id:2HYA

Glycomics, or glycobiology is a discipline of biology that deals with the structure and function of oligosaccharides (chains of sugars). The identity of the entirety of carbohydrates in an organism is thus collectively referred to as the glycome.The progressing glycomics projects will dramatically accelerate the understanding of the roles of carbohydrates in cell communication and hopefully lead to novel therapeutic approaches for treatment of human disease

The Functional Glycomics Gateway is a comprehensive and free online resource that is the result of a collaboration between the Consortium for Functional Glycomics (CFG) and Nature Publishing Group. It is aimed at keeping you abreast of developments in the emerging field of functional glycomics.

http://www.functionalglycomics.org/static/index.shtml

For annotation and/or cross-reference carbohydrate-related data collections which will allow us to find important data for compounds of interest in a compact and well-structured representation

http://www.glycosciences.de/sweetdb/

Many pdb-files contain carbohydrate structures. Since there is not such a standard nomenclature like it exists for amino acids, it is difficult to find the carbohydrate information. Sometimes entire oligosaccharides are encoded in one single residue. Information about carbohydrate linkages is often missing, and if it is present, it is not in a unique format and therefore also difficult to find.pdb2linucs automatically extracts carbohydrate information from pdb-files .

http://www.dkfz-heidelberg.de/spec/pdb2linucs/

GlycoSuite comprises GlycoSuiteDB, the leading curated and annotated glycan database, and new bioinformatic tools which interface mass spectrometric data with the database.

https://glycosuite.proteomesystems.com/glycosuite/glycodb

A Complex Carbohydrate Structure Database, also known as CarbBank is available . But, due to lack of funding it is no longer updated.

http://www.boc.chem.uu.nl/sugabase/carbbank.html

Downloading Human Chromosome Sequences

Posted by: Abbas  :  Category: General, Technical

 


Sequencing of a genome often starts with a random shotgun sequencing strategy or with direct sequencing on genomic DNA . The DNA sequences of the clones or sequenced genome fragments often overlap, yielding enlarged DNA sequences (contigs).

Genome Assembly

The genomic sequences are assembled into a series of genomic sequence contigs. These are then ordered, oriented with respect to each other, and placed along each chromosome with appropriately sized gaps inserted between adjacent contigs. The resulting genome assembly thus consists of a set of genomic sequence contigs and a specification for how to arrange the sequence contigs along each chromosome.

Finished Chromosomes

A chromosome sequence is considered finished when any gaps that remain cannot be closed using current cloning and sequencing technology. In practice, therefore, the sequence for a finished chromosome usually consists of a small number of genomic sequence contigs.

Unfinished Chromosomes

Genomic sequence contigs for unfinished chromosomes are assembled and laid out based largely on the clone tiling path. However, the tiling paths do not specify the orientation of the clone sequences or how they should be joined; therefore, data on the alignment of the input genomic sequences to each other and to other sequences are also used to guide the assembly. Genomic sequences that augment the initial set of genomic contigs based on the tiling path clones are also incorporated.

To download complete human chromosome sequences:

It is possible to download in fasta format of each chromosome as whole sequences, through NCBI ftp site.NCBI ftp site maintains section called assembled chromosomes. We can download each chromosome sequences by clicking file which starts with hs_ref.

ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/

Manual Annotation:
Vega site maintianed by Sanger Institute presents data from the manual annotation of the human genome.

High-quality annotated human chromosome sequences
To download all human annotated contigs in one fasta sequnence

ftp://ftp.sanger.ac.uk/pub/vega/human/

Identification of genes

Genes are found using three complementary approaches: (a) known genes are placed primarily by aligning mRNAs to the assembled genomic contigs; (b) additional genes are located based on alignment of ESTs to the assembled genomic contigs; and (c) previously unknown genes are predicted using hints provided by protein homologies.

ioPuppy OS

Posted by: Abbas  :  Category: News, Technical

ioPuppy OS is an electronic workbench for bio-informatics and computational biology. It has been designed to meet the needs of beginners. BioPuppy is available as a live CD cum installation CD (and in USB Pen drive) and containing all the required software to boot the computer with ready to use bio-informatics tools. BioPuppy is based of the Puppy Linux. The another objective of this BioPuppy is small in size. Our estimated size of BioPuppy is ~150MB. BioPuppy derived from Puppy-3.01-seamonkey Linux. 

INCLUDES: 
- Sequence Analysis Tools: Sim-4, Tigr-Glimmer 2, Genewise, Muscle, Sigma, HMMER, Clustal-W, mafft 
- Structure Prediction Tools: mfold, gibbs 
- Protein Structure Analysis Tools: Garlic, Rasmol 
- Phylogenetic Analysis tools: Fast DNA, Phylip, Phylodraw 
- Protein Modeling: Modeller 
- Docking: MGL Tools (Autodock tools, Python Molecular Viewer (PMV), Vision) 
- System Biology: Copasi 
- On-line tools: BLAST, EMBOSS 

FOR MORE INFORMATION: 
To download and for more details visit http://biopuppy.org/