Descriptive statistics and Inferential Statistics
statistics — 1
What is Descriptive statistics?
- To summarize and describe the important features of the data.
- methods of Descriptive statistics are Graphical and Numerical
- Graphical : histograms, boxplots, scatterplots, pie chart etc.,
- Numerical : mean, median, mode, standard deviation, correlation coefficients.
What is Inferential statistics?
- Generalizing from sample to population.
- Describe the many ways in which statistics derived from observations on samples from study populations can be used to deduce whether or not those populations are truly different
- In a statistics world the experiments are done on the sample data and the information gained by the sample data is used to draw conclusions about the population.
- This includes Point Estimation, Hypothesis Testing, Estimation by Confidence Intervals, Predictions.
Measure of central tendency/ Measure of Location:
Mean: They are affected by the outliers.
Median: Sample median and Population median.
- Insensitive to outliers.
Mode: Most frequent elements.
- It works well with categorical variables.
Mean, Median and mode can be used to handle missing values using simple impute.
The above attachment is a complete explanation of using mean, median, mode as a technique to handle missing values.
Measure of Variance:
Range: The difference between the lowest and highest values.
Variance: It is a measure of how two datasets are different.
- n — 1 is a degree of freedom in sample variance.
Standard Deviation : a measure of how dispersed the data is in relation to the mean.
- A low standard deviation indicates that the values are close to mean.
- A high standard deviation indicates that the values are spread out over a wider range.
- (σ), Population SD
- s, Sample SD
DESCRIPTIVE STATISTICS Graphical Summary:
- Qualitative Data — Pie chart, dot plot.
- Quantitative Data — Bar graph.
- Boxplot: It visualizes 5 key points of the data.
The lowest value, highest value, median and first quartile and third quartile.
The boxplot is based on the measure that are resistant to the presence of a few outliers, is called fourth spread(measure of spread) fs = upper quartile — lower quartile.
References:
- https://towardsdatascience.com/understanding-descriptive-statistics-c9c2b0641291
- Book: probability and statistics for engineers and scientists jay L Devore