Handle missing data — part 1
2 min readMay 19, 2023
step 2 in data preparation.
What happens if there are missing values in the dataset?
Missing data can reduce the statistical power of a study and can produce biased estimates, leading to invalid conclusions.
- The absence of data reduces statistical power, which refers to the probability that the test will reject the null hypothesis when it is false.
- The lost data can cause bias in the estimation of parameters.
- It can reduce the representativeness of the samples.
- It may complicate the analysis of the study.
Types of missing data:
- Missing completely at random(MCAR): The data missing is at completely random, the missing rate does not depend on any other factors or it is not related to any specific values/outcome. There is no pattern for the missing data. The statistical advantage of data that are MCAR is that the analysis remains unbiased.
- Missing at random(MAR): The pattern can be found for the missing values. It is dependent on the variable not other missing values eg., age, race, gender. The probability of a value being missing generally depends on the observed values (independent variables) not on the missing values.
- Missing not at random(MNAR): It is related to the outcome. There is a pattern of missing data. We can not use any of the standard methods for dealing with missing data. In this case the missing value is non ignorable.
The missing data can be Categorical or continuous data, these data needs to be handled by using certain techniques.
Handling missing data — part 2: https://medium.com/@sowjanyasadashiva/handling-missing-values-part-2-4314ae5887ca
Reference: