Feature Extraction

Sowjanya Sadashiva

3 min readJun 20, 2023

--

Step 3 of data preprocessing

Part 1: Feature selection

Part 2: Feature Extraction

The dimension of the data can be decreased without losing much initial feature dataset.

It is extracting or deriving information from the original features set to create a new feature subspace.

These techniques are also used to:

Reducing the number of features from the original features set
Reduce model complexity, model overfitting.
Enhance model computation efficiency and reduce generalization error.

Difference between feature selection and feature extraction

Feature Selection:

The original features are maintained in the case of feature selection algorithms.
It is used if the requirement is to maintain the original features.
when model explainability is a key requirement.

Feature extraction:

The feature extraction algorithms transform the data onto a new feature space.
When it is important to derive useful information from the data, hence creating new feature subspace doesn’t affect the model.
Used to improve the predictive performance of the models.

Two categories of Feature extraction:

Linear

It assumes that the data falls on a linear subspace or classes of data can be distinguished linearly

2. non-linear

It assumes that the pattern of data is more complex and exists on a non-linear sub-manifold

Unsupervised Feature Extraction:

They mostly concentrate on the variation and distribution of data.

PCA:

Linear unsupervised method
The aim of PCA is to find orthogonal directions which represent the data with the least error
PCA tries to maximize this variance to find the most variant orthonormal directions of data.
The desired directions are the eigenvectors of the covariance matrix of data.

2. Kernel Principle Component analysis:

KPCA finds the non-linear subspace of data which is useful if the data pattern is not linear.
The kernel PCA uses kernel method which maps data to a higher dimensional space .
Kernel PCA relies on the blessing of dimensionality by using kernels. i.e., it assumes in higher dimensions, the representation or discrimination of data is easier.

There are so many other unsupervised feature extraction techniques like:

Dual PCA
Multidimensional Scaling
Isomap
Locally linear embedding
Laplacian Eigenmap
Maximum variance unfolding
Autoencoders and Neural Networks
T-distributed stochastic neighbor embedding

Supervised feature extraction:

Fisher Linear Discriminant Analysis:

It is also referred to as Fisher Discriminant Analysis (FDA) or Linear Discriminant Analysis (LDA)
Similar to PCA, FLDA calculates the projection of data along a direction;
However, rather than maximizing the variation of data, FLDA utilizes label information to get a projection maximizing the ratio of between-class variance to within-class variance.

Other techniques:

Kernel Fisher Linear discriminant Analysis
Supervised PCA
Metric Learning

Application of methods of Feature selection and feature extraction

Reference:

Paper: A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction from journal of applied science and technology trends.
https://elearn.daffodilvarsity.edu.bd/pluginfile.php/1225702/mod_label/intro/Feature%20Selection%20with%20numerical%20and%20categorical%20data.pdf
https://vitalflux.com/machine-learning-feature-selection-feature-extraction/
https://downloads.hindawi.com/archive/2015/198363.pdf

Computer Science

Data Preprocessing

Machine Learning

Feature Engineering

Written by Sowjanya Sadashiva

I am a computer science enthusiast with Master's degree in Computer Science and Specialization in Data Science.

No responses yet

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams