is a free
, open source
and open development
software project for the analysis and comprehension of genomic
data generated by wet lab
experiments in molecular biology
Bioconductor is based primarily on the statistical R programming language, but does contain contributions in other programming languages.
It has two releases each year that follow the biannual releases of R. At any one time there is a release version, which corresponds to the released version of R, and a development version, which corresponds to the development version of R. Most users will find the release version appropriate for their needs. In addition there are a large number of genome annotation packages available that are mainly, but not solely, oriented towards different types of microarrays.
The project was started in the Fall of 2001 and is overseen by the Bioconductor core team, based primarily at the Fred Hutchinson Cancer Research Center with other members coming from various US and international institutions.
Most Bioconductor components are distributed as R packages
, which are add-on modules for R. Initially most of the Bioconductor software packages focused on the analysis of single channel Affymetrix
and two (or more channel) cDNA
microarrays. As the project has matured, the functional scope of the software packages broadened to include the analysis of all types of genomic data, such as SAGE, sequence
, or SNP
The broad goals of the projects are to:
- The R Project for Statistical Computing. R and the R package system provides a broad range of advantages to the Bioconductor project including:
- Documentation and reproducible research. Each Bioconductor package contains at least one vignette, which is a document that provides a textual, task-oriented description of the package's functionality. These vignettes come in several forms. Many are simple "How-to"s that are designed to demonstrate how a particular task can be accomplished with that package's software. Others provide a more thorough overview of the package or might even discuss general issues related to the package. In the future, we are looking towards providing vignettes that are not specifically tied to a package, but rather are demonstrating more complex concepts. As with all aspects of the Bioconductor project, users are encouraged to participate in this effort.
- Statistical and graphical methods. The Bioconductor project aims to provide access to a wide range of powerful statistical and graphical methods for the analysis of genomic data. Analysis packages are available for: pre-processing Affymetrix and cDNA array data; identifying differentially expressed genes; graph theoretical analyses; plotting genomic data. In addition, the R package system itself provides implementations for a broad range of state-of-the-art statistical and graphical techniques, including linear and non-linear modeling, cluster analysis, prediction, resampling, survival analysis, and time series analysis.
- Genome Annotation. The Bioconductor project provides software for associating microarray and other genomic data in real time to biological metadata from web databases such as GenBank, LocusLink and PubMed (annotate package). Functions are also provided for incorporating the results of statistical analysis in HTML reports with links to annotation WWW resources. Software tools are available for assembling and processing genomic annotation data, from databases such as GenBank, the Gene Ontology Consortium, LocusLink, UniGene, the UCSC Human Genome Project (AnnotationDbi package). Data packages are distributed to provide mappings between different probe identifiers (e.g. Affy IDs, LocusLink, PubMed). Customized annotation libraries can also be assembled.
- Open source. The Bioconductor project has a commitment to full open source discipline, with distribution via a SourceForge.net-like platform. All contributions are expected to exist under an open source license such as Artistic 2.0, GPL2, or BSD. There are many different reasons why open--source software is beneficial to the analysis of microarray data and to computational biology in general. The reasons include:
- Open development. Users are encouraged to become developers, either by contributing Bioconductor compliant packages or documentation. Additionally Bioconductor provides a mechanism for linking together different groups with common goals to foster collaboration on software, possibly at the level of shared development.
|| Release Date
|| Package Count
|| Dependency |
|| May 1, 2002
|| R 1.5
|| November 19, 2002
|| R 1.6
|| May 29, 2003
|| R 1.7
|| October 30, 2003
|| R 1.8
|| May 17, 2004
|| R 1.9
|| October 25, 2004
|| R 2.0
|| May 18, 2005
|| R 2.1
|| October 14, 2005
|| R 2.2
|| April 27, 2006
|| R 2.3
|| October 4, 2006
|| R 2.4
|| April 26, 2007
|| R 2.5
|| October 8, 2007
|| R 2.6
|| May 1, 2008
|| R 2.7
- Gentleman, R.; Carey, V.; Huber, W,; Irizarry, R.; Dudoit, S. (2005). Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer.
- Gentleman, R. (2008). R Programming for Bioinformatics. Chapman & Hall/CRC.
- Hahne, F.; Huber, W.; Gentleman, R.; Falcon, S. (2008). Bioconductor Case Studies. Springer.