Feature Extraction
3 min readJun 20, 2023
Step 3 of data preprocessing
Part 1: Feature selection
Part 2: Feature Extraction
The dimension of the data can be decreased without losing much initial feature dataset.
It is extracting or deriving information from the original features set to create a new feature subspace.
These techniques are also used to:
- Reducing the number of features from the original features set
- Reduce model complexity, model overfitting.
- Enhance model computation efficiency and reduce generalization error.
Difference between feature selection and feature extraction
Feature Selection:
- The original features are maintained in the case of feature selection algorithms.
- It is used if the requirement is to maintain the original features.
- when model explainability is a key requirement.
Feature extraction:
- The feature extraction algorithms transform the data onto a new feature space.
- When it is important to derive useful information from the data, hence creating new feature subspace doesn’t affect the model.
- Used to improve the predictive performance of the models.
Two categories of Feature extraction:
- Linear
- It assumes that the data falls on a linear subspace or classes of data can be distinguished linearly
2. non-linear
- It assumes that the pattern of data is more complex and exists on a non-linear sub-manifold
Unsupervised Feature Extraction:
They mostly concentrate on the variation and distribution of data.
- PCA:
- Linear unsupervised method
- The aim of PCA is to find orthogonal directions which represent the data with the least error
- PCA tries to maximize this variance to find the most variant orthonormal directions of data.
- The desired directions are the eigenvectors of the covariance matrix of data.
2. Kernel Principle Component analysis:
- KPCA finds the non-linear subspace of data which is useful if the data pattern is not linear.
- The kernel PCA uses kernel method which maps data to a higher dimensional space .
- Kernel PCA relies on the blessing of dimensionality by using kernels. i.e., it assumes in higher dimensions, the representation or discrimination of data is easier.
There are so many other unsupervised feature extraction techniques like:
- Dual PCA
- Multidimensional Scaling
- Isomap
- Locally linear embedding
- Laplacian Eigenmap
- Maximum variance unfolding
- Autoencoders and Neural Networks
- T-distributed stochastic neighbor embedding
Supervised feature extraction:
- Fisher Linear Discriminant Analysis:
- It is also referred to as Fisher Discriminant Analysis (FDA) or Linear Discriminant Analysis (LDA)
- Similar to PCA, FLDA calculates the projection of data along a direction;
- However, rather than maximizing the variation of data, FLDA utilizes label information to get a projection maximizing the ratio of between-class variance to within-class variance.
Other techniques:
- Kernel Fisher Linear discriminant Analysis
- Supervised PCA
- Metric Learning
Application of methods of Feature selection and feature extraction
Reference:
- Paper: A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction from journal of applied science and technology trends.
- https://elearn.daffodilvarsity.edu.bd/pluginfile.php/1225702/mod_label/intro/Feature%20Selection%20with%20numerical%20and%20categorical%20data.pdf
- https://vitalflux.com/machine-learning-feature-selection-feature-extraction/
- https://downloads.hindawi.com/archive/2015/198363.pdf