Data Preparation
Steps to prepare data
- Gather data from different data source
- Handle the missing values
- Feature extraction
- feature selection
- Encode categorical values
- Numeric Feature engineering
- Split data into Train and test data
Gather Data:
There are multiple data sources from where we can download, fetch or load the data from, like: Kaggle, Scikit learn datasets, UCI ML repo, Scrape HTML page data using beautiful soup and many others.
In the company level projects the data is the input given by users through webpages that are stored in data warehouse.
Gathering the right amount of data is first and important step in machine learning. Its always suggested to have sufficient data to train the model to get the right result.
Data collection from different websites:
- Kaggle:
we can download the data from Kaggle and use it for ML projects.
2. Scikit learn datasets
example code:
from sklearn.datasets import load_iris
data = load_iris()
#this will load the data of iris dataset
3. https://archive.ics.uci.edu/ml/index.php
4. Webpage scraping: gathering information from the Internet.
using beautiful soup: python -m pip install beautifulsoup4
from bs4 import BeautifulSoup
import requests
URL = "https://realpython.com/beautiful-soup-web-scraper-python/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
Techniques for collecting the data:
Data collection methods can be divided into two categories: Quantitative and Qualitative methods.
Quantitative method:
- Quantitative data collection is the process of gathering numerical data that can be analyzed using statistical methods.
- Quantitative data collection is often used to measure variables and establish relationships between variables.
- Quantitative data collection methods include: Surveys, Experiments, Controlled observations, Polls, Longitudinal studies, Interviews.
Qualitative method:
- Qualitative data collection is the process of gathering non-numerical information to understand people’s attitudes, behaviors, beliefs, and motivations.
- Qualitative data collection methods allow researchers to assess the “why’s” and “how’s” behind statistics.
- Qualitative data collection methods gather contextual information.
- Qualitative data collection methods include: Observations, Case Studies, Observational studies, paper surveys, online surveys.
Next stop: https://sowjanyasadashiva.medium.com/handle-missing-data-part-1-96dee61f74ab