Feature Selection and Feature Extraction

3 min readJun 13, 2023

Step 3 of data preprocessing

Part 1 : Feature Selection

Dimensionality reduction (DR) has been performed based on two main methods, which are feature selection (FS) and feature extraction (FE) .

Reducing high dimensional datasets to a low dimensional dataset by filter or remove redundant and noise information is a method to solve this problem, and this is known as dimensionality reduction .

Advantages of Dimensionality reduction:

Eliminating irrelevant, redundant patterns and noisy data
Reduce the time and amount of memory required for processing data
The quality of data will improve.
The algorithm will work efficiently, achieve better accuracy.
Reducing the cost of computing, improving dimensions visualization.

Difference between Dimensionality reduction and feature selection:

Feature selection select features to keep or remove from the dataset,
Dimensionality reduction create a projection of the data resulting in entirely new input features.

Feature Selection

Construct a subset of features as small as possible but represents the whole input data vital feature.
Information can be lost since some features should be excluded when the process of feature subset choice by doing this information can be reduced.

Unsupervised : do not use output variable

correlation

Supervised: we have and use output variables,

Supervised methods are Filter, Wrapper, Embedded.

Filter

Filter-based feature selection methods use statistical measures to score the correlation or dependence between input variables that can be filtered to choose the most relevant features
This method use statistical techniques to evaluate the relationship between each input variable and the target variable, and these scores are used as the basis to choose (filter) those input variables that will be used in the model.

filter method techniques

Information gain
Chi-squared test
Fisher’s score
missing value ratio

wrapper

Wrapper feature selection methods create many models with different subsets of input features and select those features that result in the best performing model according to a performance metric.

wrapper method techniques:

forward selection
backward elimination
Exhaustive feature selection
Recursive feature elimination
genetic algorithms

Embedded

combination of both Filter and wrapper methods.
they create best subset of features
few algorithms that perform feature selection automatically as part of learning the model.

Embedded method techniques

Penalized regression model, Regularization, L1(Lasso), L2(ridge), elastic nets (L1 + L2)
decision trees
ensembles of decision tree, random forest.

feature selection statistics

Numerical Input, Numerical output: regression predictive modeling problem

Pearson’s Correlation coefficient(Linear)
Spearman’s rank coefficient(non-linear)

Numerical Input, Categorical output: classification predictive modeling problem

ANOVA correlation coefficient(Linear)
Kendall’s rank coefficient(non-linear)

Categorical input, Numerical output: regression predictive modeling problem

we can use the same “Numerical Input, Categorical Output” methods, but in reverse.

Categorical input, categorical output: classification predictive modeling problem

Chi-Squared test (contingency tables).
Mutual Information(information gain) — information theory.

Reference:

Paper: A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction from journal of applied science and technology trends.
https://elearn.daffodilvarsity.edu.bd/pluginfile.php/1225702/mod_label/intro/Feature%20Selection%20with%20numerical%20and%20categorical%20data.pdf
https://vitalflux.com/machine-learning-feature-selection-feature-extraction/