Feature Selection and Feature Extraction

Sowjanya Sadashiva
3 min readJun 13, 2023

--

Step 3 of data preprocessing

Part 1 : Feature Selection

Dimensionality reduction (DR) has been performed based on two main methods, which are feature selection (FS) and feature extraction (FE) .

Reducing high dimensional datasets to a low dimensional dataset by filter or remove redundant and noise information is a method to solve this problem, and this is known as dimensionality reduction .

Advantages of Dimensionality reduction:

  1. Eliminating irrelevant, redundant patterns and noisy data
  2. Reduce the time and amount of memory required for processing data
  3. The quality of data will improve.
  4. The algorithm will work efficiently, achieve better accuracy.
  5. Reducing the cost of computing, improving dimensions visualization.

Difference between Dimensionality reduction and feature selection:

  • Feature selection select features to keep or remove from the dataset,
  • Dimensionality reduction create a projection of the data resulting in entirely new input features.

Feature Selection

  • Construct a subset of features as small as possible but represents the whole input data vital feature.
  • Information can be lost since some features should be excluded when the process of feature subset choice by doing this information can be reduced.

Unsupervised : do not use output variable

  • correlation

Supervised: we have and use output variables,

Supervised methods are Filter, Wrapper, Embedded.

Filter

  • Filter-based feature selection methods use statistical measures to score the correlation or dependence between input variables that can be filtered to choose the most relevant features
  • This method use statistical techniques to evaluate the relationship between each input variable and the target variable, and these scores are used as the basis to choose (filter) those input variables that will be used in the model.

filter method techniques

  • Information gain
  • Chi-squared test
  • Fisher’s score
  • missing value ratio

wrapper

  • Wrapper feature selection methods create many models with different subsets of input features and select those features that result in the best performing model according to a performance metric.

wrapper method techniques:

  • forward selection
  • backward elimination
  • Exhaustive feature selection
  • Recursive feature elimination
  • genetic algorithms

Embedded

  • combination of both Filter and wrapper methods.
  • they create best subset of features
  • few algorithms that perform feature selection automatically as part of learning the model.

Embedded method techniques

  • Penalized regression model, Regularization, L1(Lasso), L2(ridge), elastic nets (L1 + L2)
  • decision trees
  • ensembles of decision tree, random forest.

feature selection statistics

Numerical Input, Numerical output: regression predictive modeling problem

  • Pearson’s Correlation coefficient(Linear)
  • Spearman’s rank coefficient(non-linear)

Numerical Input, Categorical output: classification predictive modeling problem

  • ANOVA correlation coefficient(Linear)
  • Kendall’s rank coefficient(non-linear)

Categorical input, Numerical output: regression predictive modeling problem

  • we can use the same “Numerical Input, Categorical Output” methods, but in reverse.

Categorical input, categorical output: classification predictive modeling problem

  • Chi-Squared test (contingency tables).
  • Mutual Information(information gain) — information theory.

Reference:

  1. Paper: A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction from journal of applied science and technology trends.
  2. https://elearn.daffodilvarsity.edu.bd/pluginfile.php/1225702/mod_label/intro/Feature%20Selection%20with%20numerical%20and%20categorical%20data.pdf
  3. https://vitalflux.com/machine-learning-feature-selection-feature-extraction/

--

--

Sowjanya Sadashiva

I am a computer science enthusiast with Master's degree in Computer Science and Specialization in Data Science.