PCA

Created: 2022-12-02 15:32
#note

It is a method to reduce the dimensionality of the feature space. In particular, it consists in feature extraction, instead of feature elimination (we drop some features, but we still retain the most valuable parts of all the variables).

The main is that when our data has a higher variance, it holds more information. So, we want that the greatest variance by some projection of the data comes to lie on the first coordinate, the second-greatest variance on the second coordinate and so on. We keep just the N greatest components.

Reducing dimensions with PCA changes the distances of our data, especially the smallest (pairwise) ones. This could be one drawback of PCA.

References

  1. Towards Data Science
  2. Towards Data Science 2