Principal Components Analysis

PCA (Principal Components Analysis) is one of the most widely used multivariate techniques. For multivariate data, it helps visualisation by finding combinations of the original variables which best represent the variation in the data.

Chemetrica provides a very flexible implementation of PCA and the dialogue box with the various options is shown below:

Chemetrica has a wide range of options in its PCA routines:

  • Analysis can be based on the covariance matrix (mean-centred values), the correlation matrix (autoscaled values) or a simple cross-product matrix (no scaling or centring)

  • Malinowski model functions IND, RE, IE and F-tests for eigenvalues

  • Scree plot output

  • Hotelling T2 and Rao tests for outliers

  • A separate validation set can be used to select the number of principal components in a model, with PRESS plot.

  • Sample residuals and leverage for determining influential or badly-modeled observations

  • Bi-Plots to relate variables and observations

  • Modelling power, which shows which of the original variables are the most important for the principal components model


Bi-Plots show both scores and loadings. The scores represent the observations, and the loadings represent the variables, so the bi-plot shows how variables and observations are related.

PCA models work by only selecting the first few principal components. The sample residual shows how much error is introduced by only selecting some of the principal components. The leverage shows how much influence each observation has on the model. Combining these two measures shows observations where the model may be poor, which have the most influence on the model, and finally, observations with both high sample residual and high leverage may be outliers which have strongly distorted the model, and should be further examined.

Return to the Chemetrica features page.