Statistics add-in software for statistical analysis in Excel
  • Tutorials
  • Correlation / PCA tutorial

Reducing the dimensionality of the data

Due to the large number of variables in the dataset, it is hard to comprehend all of the relationships between the variables using a scatter plot or correlation matrix. Using a data reduction technique such as principal components analysis (PCA) reduces the dimensionality of the dataset whilst retaining as much of the variability in the data as possible. The first few principal components retain most of the variation in the original variables, and, to make interpretation simpler, they can be used to describe the relationships between the original variables and similarities between observations.

PCA is a mathematical technique that reduces dimensionality by creating a new set of variables called principal components. The first principal component is a linear combination of the original variables and explains as much variation as possible in the original data. Each subsequent component explains as much of the remaining variation as possible under the condition that it is uncorrelated with the previous components.

  1. On the Analyse-it ribbon tab, in the PCA group, click Principal Components.
    The Principal Components task is added to the analysis task pane.
  2. Select Scale.
  3. Select Variances, Coefficients, and Color maps.
  4. Click Recalculate.
    The results are calculated and the analysis report opens.

The variances table shows the amount of variance in the original data explained by each principal component (also called the eigenvalues). Because the data was standardized, a principal component with a variance of 1 indicates that the component accounts for variation equivalent to one of the original variables. Also, the sum of all the variances equals the number of original variables.

PCA variances table

There are many ad-hoc rules regarding the number of components to retain to adequately describe the data. According to the table, the first two principal components account for nearly 70% of the variance in the original 12 variables, whilst the first three components account for nearly 80%.

The coefficients table shows the linear combinations that make each principal component, and the color map shows the structure of the components. Absolute values near zero indicate that a variable contributes little to the component, whereas larger absolute values indicate variables that contribute more to the component. The sign of the coefficients is irrelevant and may even differ when the analysis is performed on different computers.

PCA coefficients table

There is not necessarily a simple interpretable structure to the principal components because they are created to maximize the amount of variance whilst remaining uncorrelated with the other components. By trying to interpret the coefficients in the table, we can see that the first component is an average of many different variables; the second component represents mainly crime, wellness, and, to a lesser extent, schools and housing quality; and the third component – although it still has some reasonable sized contribution from other variables – represents mainly green space.

Next topic: Understanding the relationship between variables (revisited)

  •  Tutorials
  •  Distribution tutorial
  •  Correlation / PCA tutorial
  •  Understanding the relationship between variables
  •  Reducing the dimensionality of the data
  •  Understanding the relationship between variables (revisited)
  •  Understanding the similarities between observations
  •  Grouping the observations
  •  Adding additional variables
  •  Adding additional observations
  •  Publishing the plot
  •  Compare groups means tutorial
  •  Association in 2-way contingency tables tutorial
  •  Simple linear regression tutorial
  •  Bland-Altman method comparison tutorial
  •  Estimating the precision of a measurement procedure (CLSI EP05-A3)
  •  Evaluating the linearity of a measurement procedure (CLSI EP06-A)
  •  Verifying the precision of a measurement procedure against a performance claim and estimating the bias (CLSI EP15-A3)
  •  Pareto charts tutorial
  •  Process control charts tutorial
  •  Process capability tutorial



Version 6.15
Published 18-Apr-2023
statistics software, statistical software for Excel
  • Products
  • Store 
  • Support
  • Blog
  • About us
  • Download trial
  •  Search
  •  Sign in
  •  Contact us
Analyse-it editions
  • Standard edition
  • Medical edition
  • Method Validation edition
  • Quality Control & Improvement edition
  • Ultimate edition

  • Blog  
  • About us
  • Contact us  
  • Privacy policy


Copyright 2026 Analyse-it Software, Ltd, Leeds, United Kingdom .
We use essential cookies to run the site, and optional analytics to improve the experience for visitors. For more information see our Privacy policy.