Posts

MODELLING AND VISUALISATION

 Modelling Modeling in machine learning refers to the process of  creating a mathematical or computational model that learns patterns from data to make predictions or decisions without being explicitly programmed to perform the task. Steps: Choose a model architecture (e.g., linear regression, decision tree, neural network). Train the model on data (feed in input data and let the model learn from the outcomes). Evaluate the model to see how well it performs. Use the model to make predictions on new/unseen data. Popular Python Modeling Libraries: Library Purpose Key Features scikit-learn General ML Wide variety of classical ML algorithms (classification, regression, clustering), easy API XGBoost Gradient Boosting Fast and accurate gradient boosting implementation LightGBM Gradient Boosting Fast, supports large datasets, better performance on categorical features CatBoost Gradient Boosting Handles categorical features well automatically TensorFlow Deep Learning Powerful, p...

Data Cleaning

Image
  Data Cleaning and Preparation Handling Missing Values Missing values are entries in your dataset that have no recorded data. In Python and Pandas, missing values are typically represented as: NaN (Not a Number) — from NumPy None — a Python built-in null object Example: import pandas as pd import numpy as np data = {     "Name": ["Ramya", None, "Swathi"],     "Age": [24, 28, np.nan],     "City": ["Chennai", "Delhi", None] } df = pd.DataFrame(data) print(df) Output:     Name   Age    City 0  Ramya  24.0  Chennai 1   None  28.0    Delhi 2 Swathi   NaN     None Why Should We Handle or Remove Missing Values? Missing data can significantly affect the quality and outcome of data analysis or machine learning models. 1. Accuracy and Validity      If missing values are ignored, calculations like mean, sum, correlation, etc., can give incorrect re...