National Institute of Statistical Sciences
19 T.W. Alexander Drive P.O. Box 14006 Research Triangle Park, NC 27709-4006
919.685.9300  FAX 919.685.9310  admin@niss.org



PowerArray: A software environment for
statistical analysis of microarray data


Overview of PowerArray

PowerArray is a computer environment for the examination of high-dimensional data from -omic platforms. High -dimensional databases(e.g. Microarray data sets) are large and logistically awkward; they are also complex to visualize and analyze statistically. PowerArray has computational routines that match the work flow for the cleaning, visual examination and statistical analysis of these databases.

Logically, there are three input data sets to consider. First, there is the assay data, which is arranged in a large table where the rows are variables (e.g., genes) and the columns are biological samples (denoted as Y in the figure). Next, there is a design table that gives the experimental/logical relationships among the samples (X). Finally, there is a data set (A) that gives annotations for each of the variables (e.g. genes). PowerArray allows the scientist to use these three data sets to discover treatment or sample effects and put those effects into biological context. These three datasets are organized as an L-shape structure shown at the right.
        
PowerArray supports project management, platform-specific assay formats, e.g. Affymetrix CEL file, multivariate visualizations, statistical analysis from simple t-tests to complex ANOVAs and the building of final reports. There is seamless import from and export to Excel.

PowerArray has many functions that can be learned by working through the tutorial examples in chapters 2 and 3 of the user manual. Chapters 4-8 give a more thorough description of the functionality of PowerArray.

Key Features of PowerArray

PowerArray has numerous user-friendly features for data visualization, analysis and reporting. The top 10 features of PowerArray are listed here:
(1) PowerArray is the only software that works on three integrated blocks of data: Y for matrix, X for design and A for annotation. For Y, each row is a response variable (instead of each column is a variable). This is a common way to arrange high-dimensional data in Excel, but opposite to the way experimental data are typically arranged in SAS.
(2) PowerArray implements a clear analysis flow for high dimensional data analysis. The usual analysis flow is: signal extraction => data cleaning => sample validation => linear modeling to get estimates and LSMeans => high level analysis.
(3) PowerArray integrates data analysis with data visualization. Data visualizations are specifically designed for high-dimensional data (e.g., trellis graphs).
(4) PowerArray's high dimensional linear model (GLM on a large number of variables) is 50-100 faster than SAS. PowerArray can analyze an unbalanced two-way ANOVA with covariates for 20K genes and 350 chips and report/organize estimates, LSMeans and p-value results in 30-40 secs.
(5) PowerArray provides a number of routines useful for cleaning large, awkward datasets. For example, the proteomics data are usually very messy and biologists can use PowerArray to do the data cleaning.
(6) PowerArray uses RSVD (robust singular value decomposition) for Affymetrix gene chips signal extraction. RSVD performs much better than MAS5 and works for any number of chips. We have used RSVD to generate signals for 1200 chips. RSVD also leads to a robust PCA that works for missing data and messy data. RSVD also leads to a unique pathway analysis.
(7) PowerArray uses Projects to manage its data objects. List and clusters are explicit data objects in PowerArray.
(8) PowerArray integrates with Excel and PowerPoint for graphic reporting.
(9) PowerArray has several unique ways to present the ANOVA analysis results. For example, TriState described in chapter 7.5.2 provides a convenient way to summarize and explore analysis result.
(10) PowerArray includes a unique method for sample validation, which is essential for data quality control.

Installing PowerArray

System requirements to install PowerArray

  • Windows 98/NT/2000/XP
  • Microsoft .NET framework redistributable (version 1.1 or above)

  • A minimum of 128 MB RAM

  • At least 200 Mb free hard disk space

  • Microsoft Excel 2000 or above is required for Excel integration

  • Microsoft PowerPoint 2000 or above is required for PowerPoint integration

Before you install PowerArray, make sure you have installed Microsoft .NET framework redistributable. You can get it for free from http://msdn.microsoft.com/netframework/downloads/updates/default.aspx

After downloading and saving the PowerArray distribution “PowerArray.zip," unzip the file in a directory of your choice, and double-click on the file "PowerArrayInstaller.msi." to initiate the installation. The installer will install PowerArray and add an icon to the desktop. Then, start PowerArray by double-clicking on the desktop icon.

Acknowledgements

The initial version of PowerArray was written while Jack Liu worked as a joint postdoc for the CIIT Centers for Health Research and the National Institute of Statistical Sciences. Their support is gratefully acknowledged. Scientists Kevin Gaido and Kim Lehmann, and statisticians Stan Young and Alan Karr provided initial encouragement for this project. After Jack Liu joined GSK, the Statistical Science group at RTP has supported the development of the software. In particular, David Cooper has been the key sponsor for the software.

GSK and SanofiAventis scientists have tested and made numerous suggestions that have greatly improved the usefulness of the software. Matthew Newman is especially thanked for his continuous help on bug reporting.

GSK and SanofiAventis provided financial support for writing the manual and testing the software. Dhiral Phadke wrote the user manual, and Xiaohua Gong tested many of the routines.

Updated PowerArray Installer is available.
(current users will need to uninstall the old version first)

Download Now!

User Manual released November, 2005 (16 MB zip file; PDF format)

Data Set 1: Small, simple micro array data set.

Data Set 2: Larger, more complex data set with cel files for input.



Events  |  Programs  |  Projects  |  Publications  |  People  |  Software  |  About NISS  |  Home

Entire site © 2002-2005, National Institute of Statistical Sciences. All Rights Reserved.
 This page updated on December 19, 2006 11:10 AM