sppPCA 1.0 – Sequential Projection Pursuit PCA

sppPCA 1.0

:: DESCRIPTION

The sppPCA method presented here provides an approach for researchers to perform exploratory data analysis on new -omic datasets containing missing data. By removing the necessity to impute missing values, the results of the low-dimensional projections of the data are not skewed by inaccurate estimates of variance, which is often introduced by imputation. Sequential projection pursuit (SPP) is a computationally robust approach for performing the optimization task to identify the small subset of orthogonal latent variables of interest (e.g., principal components).

:: DEVELOPER

Computational Biology & Bioinformatics ,Pacific Northwest National Laboratory

:: SCREENSHOTS

:: REQUIREMENTS

Windows
Matlab
JAVA

:: DOWNLOAD

sppPCA

:: MORE INFORMATION

Citation

Biotechniques. 2013 Mar;54(3):165-8. doi: 10.2144/000113978.
Sequential projection pursuit principal component analysis–dealing with missing data associated with new -omics technologies.
Webb-Robertson BJ, Matzke MM, Metz TO, McDermott JE, Walker H, Rodland KD, Pounds JG, Waters KM.

PCAj – Population Structure Prediction System for Japanese

PCAj

:: DESCRIPTION

PCAj (Principal component analysis for Japanese)predicts population structure of Japanese samples using genome-wide SNP genotypes. It creates a 2D scatterplot of predicted principal components based on the probabilistic PCA.

::DEVELOPER

Kumasaka Natsuhiko

:: SCREENSHOTS

N/A

:: REQUIREMENTS

Linux / MacOsX / Windows
Java

:: DOWNLOAD

PCAj

:: MORE INFORMATION

Citation

Kumasaka et al. (2010)
Establishment of a Standardized System to Perform Population Structure Analyses with Limited Sample Size or with Different Sets of SNP Genotypes.
Journal of Human Genetics, 55(8):525-33.

shellfish – Parallel PCA and data processing for Genome-wide SNP data

shellfish

:: DESCRIPTION

shellfish carries out a variety of tasks related to principal component analysis of genome-wide SNP data. Unlike other available software, PCA computations can be carried out in parallel (both on a computing cluster running the Sun Grid Engine, and also in the simple case of a machine with multiple processors). In addition to the PCA calculations, it automates the process of data subsetting and allele-matching, using plink and gtool for file format interconversion where necessary. The aim is that tasks that would otherwise require a complex series of shell commands and/or work in R, can be carried out with a single, straightforward, command.

::DEVELOPER

Dan Davison

:: SCREENSHOTS

N/A

:: REQUIREMENTS

MacOsX / Linux
Python

:: DOWNLOAD

shellfish

:: MORE INFORMATION