PCA stands for Principle Component Analysis. It is a common tool in machine learning in general, and even more in chemometrics. PCA is a technique used to emphasize variation and bring out strong patterns in the spectra (read more about it here). In SCiO Lab, PCA is used to make data easier to explore and visualize by reducing the whole spectrum from a vector of 330 values (one per wavelength) to a shorter vector (typical 3-6 values), without losing too much of the information stored in the original spectrum.

For example, let’s look at the following data collection of four medicine types

spectra

 

Looking at the spectra, only three are observable.

Using the PCA view, each spectrum is visualized as a point in 3D space (you can rotate the view to better see the difference between types), you can clearly see the four different medicines, the blue and purple are well separated, while the green and orange are somewhat overlapping.

PCA1                        PCA3

 

 

From the PCA view, one can project that a classification model on this data will work well  on the blue and purple, but will have some difficulty discerning green from orange. The next figure shows the expected performance of the classification model (AKA “confusion matrix”)

 

classification

 

As expected, two medicines (blue and purple) are easily distinguished, and two (green and orange) have some confusion between them.