Useful Online Resources For Cancer Target Analysis

Posted by: Abbas  :  Category: News
Collection of cancer genes based on mutation data

http://www.sanger.ac.uk/genetics/CGP/Census/
  

Repository of microarray data from cancer genomics publication

www.broad.mit.edu/cgi-bin/cancer/datasets.cgi

Respository of cytogenetic abnormalities in human cancer

http://www.progenetix.de/

Respository of cytogenetic abnormalities in human cancer

www.ncbi.nlm.nih.gov/sky

Validated SBNPs in cancer genes

http://snp500cancer.nci.nih.gov/home.cfm

Bioinformatics Search Engine

Posted by: Abbas  :  Category: General, News

PubMed is the free public interface to MEDLINE. It provides access to bibliographic information in MEDLINE as well as additional life science journals.
Searching articles on PubMed requires some skill and more over it does not support some of search strings like popular search engines (Google and Yahoo). 

Eventhough we can search through Google Scholar it gives many false positive results. But, Google provides us to customise our own search engine through google co-op. Through this we can create our own search engine on our interested topic.

I have created a search engine for bioinformatics through Google Co-op, this search engine searches only bioinformatics related journals and it has less false positive results. But this BioMed search engine is still in beta version and it requires further improvement.

You can access the BioMed – Search engine for bioinformatics by clicking here.

Computational Tools For Glycomics Studies

Posted by: Abbas  :  Category: Technical

Sugars are involved in almost every aspect of biology, from recognising pathogens and to blood clotting.The glycome’s basic building blocks are far more numerous and varied than the four letters of the DNA alphabet or the score of amino acids that make proteins.In the late 1980s, when researchers isolated the first gene for a glycosyl transferase, an enzyme that adds sugars to fats and proteins. The discovery gave scientists the first opportunity to study this process, which is usually called glycoslyation, by manipulating the activity of such enzymes.

Fig: Carbohydrate only (no protein) – PDB id:2HYA

Glycomics, or glycobiology is a discipline of biology that deals with the structure and function of oligosaccharides (chains of sugars). The identity of the entirety of carbohydrates in an organism is thus collectively referred to as the glycome.The progressing glycomics projects will dramatically accelerate the understanding of the roles of carbohydrates in cell communication and hopefully lead to novel therapeutic approaches for treatment of human disease

The Functional Glycomics Gateway is a comprehensive and free online resource that is the result of a collaboration between the Consortium for Functional Glycomics (CFG) and Nature Publishing Group. It is aimed at keeping you abreast of developments in the emerging field of functional glycomics.

http://www.functionalglycomics.org/static/index.shtml

For annotation and/or cross-reference carbohydrate-related data collections which will allow us to find important data for compounds of interest in a compact and well-structured representation

http://www.glycosciences.de/sweetdb/

Many pdb-files contain carbohydrate structures. Since there is not such a standard nomenclature like it exists for amino acids, it is difficult to find the carbohydrate information. Sometimes entire oligosaccharides are encoded in one single residue. Information about carbohydrate linkages is often missing, and if it is present, it is not in a unique format and therefore also difficult to find.pdb2linucs automatically extracts carbohydrate information from pdb-files .

http://www.dkfz-heidelberg.de/spec/pdb2linucs/

GlycoSuite comprises GlycoSuiteDB, the leading curated and annotated glycan database, and new bioinformatic tools which interface mass spectrometric data with the database.

https://glycosuite.proteomesystems.com/glycosuite/glycodb

A Complex Carbohydrate Structure Database, also known as CarbBank is available . But, due to lack of funding it is no longer updated.

http://www.boc.chem.uu.nl/sugabase/carbbank.html

Downloading Human Chromosome Sequences

Posted by: Abbas  :  Category: General, Technical

 


Sequencing of a genome often starts with a random shotgun sequencing strategy or with direct sequencing on genomic DNA . The DNA sequences of the clones or sequenced genome fragments often overlap, yielding enlarged DNA sequences (contigs).

Genome Assembly

The genomic sequences are assembled into a series of genomic sequence contigs. These are then ordered, oriented with respect to each other, and placed along each chromosome with appropriately sized gaps inserted between adjacent contigs. The resulting genome assembly thus consists of a set of genomic sequence contigs and a specification for how to arrange the sequence contigs along each chromosome.

Finished Chromosomes

A chromosome sequence is considered finished when any gaps that remain cannot be closed using current cloning and sequencing technology. In practice, therefore, the sequence for a finished chromosome usually consists of a small number of genomic sequence contigs.

Unfinished Chromosomes

Genomic sequence contigs for unfinished chromosomes are assembled and laid out based largely on the clone tiling path. However, the tiling paths do not specify the orientation of the clone sequences or how they should be joined; therefore, data on the alignment of the input genomic sequences to each other and to other sequences are also used to guide the assembly. Genomic sequences that augment the initial set of genomic contigs based on the tiling path clones are also incorporated.

To download complete human chromosome sequences:

It is possible to download in fasta format of each chromosome as whole sequences, through NCBI ftp site.NCBI ftp site maintains section called assembled chromosomes. We can download each chromosome sequences by clicking file which starts with hs_ref.

ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/Assembled_chromosomes/

Manual Annotation:
Vega site maintianed by Sanger Institute presents data from the manual annotation of the human genome.

High-quality annotated human chromosome sequences
To download all human annotated contigs in one fasta sequnence

ftp://ftp.sanger.ac.uk/pub/vega/human/

Identification of genes

Genes are found using three complementary approaches: (a) known genes are placed primarily by aligning mRNAs to the assembled genomic contigs; (b) additional genes are located based on alignment of ESTs to the assembled genomic contigs; and (c) previously unknown genes are predicted using hints provided by protein homologies.

ioPuppy OS

Posted by: Abbas  :  Category: News, Technical

ioPuppy OS is an electronic workbench for bio-informatics and computational biology. It has been designed to meet the needs of beginners. BioPuppy is available as a live CD cum installation CD (and in USB Pen drive) and containing all the required software to boot the computer with ready to use bio-informatics tools. BioPuppy is based of the Puppy Linux. The another objective of this BioPuppy is small in size. Our estimated size of BioPuppy is ~150MB. BioPuppy derived from Puppy-3.01-seamonkey Linux. 

INCLUDES: 
- Sequence Analysis Tools: Sim-4, Tigr-Glimmer 2, Genewise, Muscle, Sigma, HMMER, Clustal-W, mafft 
- Structure Prediction Tools: mfold, gibbs 
- Protein Structure Analysis Tools: Garlic, Rasmol 
- Phylogenetic Analysis tools: Fast DNA, Phylip, Phylodraw 
- Protein Modeling: Modeller 
- Docking: MGL Tools (Autodock tools, Python Molecular Viewer (PMV), Vision) 
- System Biology: Copasi 
- On-line tools: BLAST, EMBOSS 

FOR MORE INFORMATION: 
To download and for more details visit http://biopuppy.org/ 

Fastest Super Computer in the World!

Posted by: admin  :  Category: News

An article by Erica Ogg: 

“Fun fact: the fastest supercomputer in the world–used to monitor the U.S. nuclear weapons stockpile–is really just a PlayStation 3 on steroids. Roadrunner is based on the IBM QS22 blades, which are built using advanced versions of the Cell processor in Sony’s PS3. It also runs using x86 chips from Advanced Micro Devices, making it the world’s first hybrid supercomputer. 

“In total, Roadrunner takes up 278 refrigerator-size server racks, and connects 6,562 dual-core AMD Opteron and 12,240 Cell chips.” 

Full story: 
[link] 

Fulbright Scholar Awards Available!

Posted by: admin  :  Category: News

The Fulbright Scholar Program is offering a combined research/lecturing award in bioinformatics at the Faculty of Science and Faculty of Medicine, University of Iceland, Reykjavik for 2009-10. Grantees will teach undergraduate courses in gene sequences and possibly graduate courses in comparative genomics and gene expression analysis. They will also have access to all available facilities for their own research and ample time to collaborate with faculty and local scientists. The basic eligibility requirement is a Ph.D and university teaching experience. A full award description can be found at [link]

FOR MORE INFORMATION: 
For more information and the online application, please visit http://www.cies.org or contact Muriel Joffe (mjoffe@cies.iie.org). The application deadline is August 1, 2008. 

Parsing Sequences (Shell Scripting)

Posted by: admin  :  Category: Technical

Simple SHELL script for parsing BLAST output

1. To parse the sequence names from BLAST output.

“grep” is one of the very powerful unix command to retrieve the particular pattern from a file.

Syntax:
grep “” input_file

Example: grep “>” Blast_output.txt

In this above example grep command will retrieve the lines which are having “>” symbol. In Blast output file all the sequence names are starting with “>”. So you can get all the sequence names in the Blast output file.

Learn More 

2. Parsing the Sequence names and the sequences from the BLAST output 

“egrep” is one of the powerful command in retrieving multiple patterns from a file.

Syntax: 
egrep “pattern1 | pattern2 | pattern3″ filename

Example:
Below is the combination of SHELL and Perl script for parsing the BLAST Output.

egrep “> | sbjct” Blast_output | sed ’s/Sbjct://’ BLAST_output.txt >output.txt 
open (FH, output.txt);
while(”"= $ln)
{
if($ln !~ m/>/)
{
@temp = split(/\t/,$ln);
print “$temp[1]\n”;
}
else
{
print $ln;
}

In the above example egrep will retrieve the lines which are matching with “>” and Sbjct and store the output in output.txt. Then the Perl script will parse the sequeunces.

Perl for Bioinformatics

Posted by: admin  :  Category: Technical

1. Perl Scripts are very easy for the String processing when using biological data like Genome sequences or protein sequences.

2. File handling is easy in Perl.

3. Perl regular expression is very flexible and easy to match similar patters rather than identical ones. It can be used in instance like matching a motif or a repeat in a sequence.

4. There are no strict rules for writing Perl scripts like other languages. That makes it easy for the biologist to learn Perl in short period.

5. Perl scripts can be combined with SHELL scripts for text processing.

6. Using Perl CGI and HTML one can develop the Web pages. Perl CGI is very similar to Perl scripts.

7. CPAN contains hundreds of Perl Modules which are Specific for sequence analysis. 
Eg: FASTAParse , Peptide::Pubmed .

8. Perl can be used for System administration purpose also.

9. Perl Template tool kit is another Perl product which can be used for developing advanced web pages.

10. Using perl DBIx it is easier to pass mysql data (backend) to the web page(front end).

11. Processing / Parsing a HTML file is very easy by using CPAN modules.

12. File type conversion is possible in Perl using CPAN modules. Ex:Doc to PDF ,HTML to PDF ..Etc.

13. By using Perl Magick module we can do image processing.

14. Perl critic module will help you to write a best Perl codes by criticizing your code structure.

Bio – what?

Posted by: admin  :  Category: General

Bioinformatics (and sometimes computational biology) involve the use of techniques including applied mathematicsinformaticsstatistics,computer scienceartificial intelligencechemistry, and biochemistry to solve biological problems usually on the molecular level. The core principle of these techniques is using computing resources in order to solve problems on scales of magnitude far too great for human discernment. Research in computational biology often overlaps with systems biology. Major research efforts in the field include sequence alignmentgene findinggenome assemblyprotein structure alignmentprotein structure prediction, prediction of gene expression and protein-protein interactions, and the modeling of evolution.

In conclusion, Bioinformatics is the application of computer science on Biological systems