Non Parametric Functions and Point based Estimators
4 min readJun 21, 2023
supervised learning part 3
Non-parametric functions
- They have a more flexible representation that is not fixed by a predetermined number of parameters.
- The number of parameters in non-parametric models can vary based on the size and complexity of the training data.
- These models have the ability to learn complex relationships from the data without imposing strong assumptions about the functional form.
- They learn directly from the training data without explicitly representing the function in terms of fixed parameters.
- The idea is that the model assigns weights or relevance to the training samples based on their proximity or similarity to the input data point.
- Examples of non-parametric models include decision trees, k-nearest neighbors (KNN), support vector machines (SVM), and random forests.
In KNN, the model function for prediction is based on the weighted average or majority vote of the ‘k’ nearest neighbors to the input data point.
- The weights are determined by the distances between the input data point and its k nearest neighbors.
- Point based density estimators
- point based mixture distributions
- tile coding etc.,
Advantages of Non-Parametric Functions:
- Flexibility: Non-parametric models have a more flexible representation, allowing them to capture complex relationships and patterns in the data.
- No Assumptions: Non-parametric models make fewer assumptions about the underlying data distribution, making them more versatile.
- Limitations of Non-Parametric Functions:
- Computationally Intensive: Non-parametric models can be computationally expensive, especially with large datasets or complex structures.
- Interpretability: Non-parametric models are often more difficult to interpret compared to parametric models.
Point Based Estimators:
A non parametric way to represent arbitrary functions with the precision limited by the number of points.
- The basic idea behind point-based density estimators is to assign a density value to each observed data point and then combine these values to estimate the density at any given point in the data space. The density value assigned to each point reflects the local density of data points around it.
- we apply local gaussian for each data point (Kernel), add them all to get the function for the data points.
- each point influence the function.
- In point based estimator, there are a set of points, we need to know the weight of each point based on the neighbor data and when we add them all together we get the function.
- Each data points represent the location of the specific (kernel) function and its weight represents the magnitude of its contribution.
- Kernel functions define the change in function value contribution when moving away from the point in terms of a similarity measure.
- In the KDE, the estimated density at any point x is obtained by summing the contributions from nearby data points, weighted by the kernel function. The kernel function K(u) determines the shape of the contribution from each data point. Common choices include the Gaussian (normal) kernel, Epanechnikov kernel, or triangular kernel.
- Point-based estimators do not assume a specific functional form for the underlying distribution. They can capture complex and non-linear relationships without being limited by pre-defined assumptions.
- They are generally robust to outliers and deviations from assumptions. They focus on the local behavior of the data and are less affected by extreme values or data points that do not conform to the assumed distribution.
- They can be applied to various tasks, such as density estimation, regression, classification, and anomaly detection.
- Point-based estimators can work well with relatively small datasets. They can capture intricate details and patterns even with limited samples, which can be advantageous in situations where data collection is expensive or time-consuming.
Disadvantages of point based estimators:
- They can be computationally expensive for large datasets.
- The choice of bandwidth or other parameters may impact the quality of the estimation.
- They may not perform well in high-dimensional spaces due to the “curse of dimensionality.”
Reference:
- https://www.tandfonline.com/doi/pdf/10.1080/01621459.1991.10475021?casa_token=wVxGlP2TnCMAAAAA:CK069mCm5Xlxr9MPXOGq9JE8kpIfveXXvPr2fzNrZFxZofJhSfL5j0pvlr9RWr9Zixy87XMa8K0gCw
- https://www.cs.cmu.edu/~aarti/Class/10315_Fall19/lecs/Lecture21.pdf
- Letcure notes by Prof. Manfred Huber, University of Texas at Arlington.