Unless otherwise stated all the code here is released under the GNU GPL license. Please note that THERE IS NO WARRANTY FOR ANY OF THE PROGRAMS. See section 11 of the GPL for further details. (For some applications the R or C/C++ is not yet available; once the code is properly cleaned up and documented it will be realesed under the GPL as R packages or stand-alone C/C++ files).
My main software development project now is Asterias (to which several of the applications below are related). Asterias is a set of web-based applications for the analysis of genomic and proteomic data. Currently, Asterias combines Python with R and C/C++, using MPI for parallelization, and aspires to become a standard for high-performance, distributed, web-based bioinformatics and biostatistics applications.
RJaCGH is an R package for the analysis of array CGH data using Hidden Markov Models. We incorporate distance between genes (using a non-homogeneous HMM) and do not fix in advance the nubmer of states, but rather use a full Bayesian approach using Reversible Jump MCMC. This is a package developed by Oscar Rueda and myself. The package is available from CRAN and from the Asterias project site. Our new method is described in this COBRA preprint. (You can download it also from here). This package is part of Oscar Rueda's PhD thesis ("Statistical methods for the analysis of copy number alterations in the genome"). You can download his thesis here.
ADaCGH is a web tool for the analysis of array CGH to detect gains and losses in genomic DNA. We implement several very different approaches and also call IDClight to display additional gene information. ADaCGH is a web interface made with Python that uses R underneath (with R and C code written by myself and Oscar Rueda Palacio) and uses parallelization to speed up the computations.
SignS is a web tool for gene selection and finding molecular signatures when we have patient survival data. We implement two very different methods, and provide additional gene information in clickable tables and dendrograms thanks to calling our IDClight application. SignS is a web interface made with Python that uses R underneath. To greatly speed up the computations, we use MPI (which takes adavantage of the 66 CPUs available on our servers).
GeneSrF is a web tool for gene selection in classification problems that uses random forest. Two approaches for gene selection are used: one is targeted towards identifying small, non-redundant sets of genes that have good predictive performance. The second is a more heuristic graphical approach that can be used to identify large sets of genes (including redundant genes) related to the outcome of interest. This is a web interface (using Python) of my varSelRF package.
An R package for variable selection using random forests, targeted towards gene expression data. Details can be found in "Variable selection from random forests: application to gene expression data.". Download the source package.
Download the source package; it can be installed under GNU/Linux (and other Unixes) with the usual "R CMD INSTALL geSignatures_0.6-5.tar.gz". Download the windows version. You can use the R menus to install from a local ZIP package.
Pomelo is a web-based tool that can be used to find differentially expressed genes. It currently implements statistical tests for two-group (via t-tests) and multigroup (via ANOVA) comparisons, regression analysis, survival data (gene-wise Cox model) and contingency tables (using Fisher's exact test). We allow control of the Family Wise Error Rate (using the maxT approach) and the False Discovery Rate.
Download the source code for the statistical tests. This software is released under the GNU GPL. A lot of the code borrows heavily from code in R. See the README file. This compressed file includes also a Windows executable.
FatiGO can be used to examine whether groups of genes are enriched in certain Gene Ontology terms. We use Fisher's exact test for contingency table with adjustments for multiple testing.
DNMAD stands for diagnosis and normalization of microarray data. It is a web server for cDNA microarrays normalization and diagnosis, developed with together with Juanma Vaquerizas (jvaquerizas AT cnio DOT es).
Tnasas, which stands for "this is not a substitute for a statistician", is a tool for building predictors from microarray data. It is useful as a benchmark (it offers several well tested methods) and as pedagogical tool (against overoptimism when building predictors and ignoring several selection biases). Developed with Juanma Vaquerizas (jvaquerizas AT cnio DOT es) at CNIO, using R. The code (R with a tiny bit of C++) will soon be available under the GNU GPL.
PHYLOGR is an R package for the manipulation and analysis of phylogenetically simulated data sets and phylogenetically based analyses using GLS. You can download the source package here or you can get it from CRAN, where you can download both the source and windows binaries.
ape is another R package for phylogenetics and evolution, but there is little overlap between ape and PHYLOGR.
These are a set of programs in RPL ---reverse polish lisp--- to use the HP 48 calculator as a handheld computer to record behavioral data, and help in the execution of a behavioral experiment. Included are some utility functions in C++ for the processing and cleaning of the output.
I spent some time working on the loser/winner effects. This is a problem that requires game theory (what is a good strategy depends on what your opponents do) but I could not find simple analytical solutions. So I used genetic algorithms, which seems natural enough here since we are dealing with the evolution of behavioral strategies.
There are several libraries for genetic algorithms. I started using galib, a very nice library. However, I found it hard to use of it for my problem, where fitness is the result of repeated interactions between the genotypes (and not something you evaluate in a sweep over the population at each generation); this is doable with galib, but I found it hard and awkward. Thus, to learn more C++ and to have more control, I wrote a set classes and methods for genetic algorithms (ga.cpp, ga.h). And the code for the loser/winner part (fighting.cpp), plus a few helper functions, etc.
Please note that the documentation is non-existent (you'll need to read the comments), that there are a few comments in Spanish and spanglish, and that indentation and line width are "peculiar" (set to fit my monitor and usage of XEmacs). Its been a while since I worked on these issues. But I'd appreciate that, if you use this code, you let me know.
apt-get install blitz apt-get install r-mathlib
I've also run it with other GNU/Linux distributions.