🚀 Introduction: The Quest for the Mean Temperature
Regression analysis is the backbone of predictive modeling in data science, allowing us to forecast continuous values based on known features. This analysis applies two distinct regression techniques — Linear Regression and Decision Tree Regressor — to a daily weather dataset (KCLT.csv
) to predict the actual mean temperature (actual_mean_temp
).
The objective is to understand how these models handle the prediction task and to assess their performance using key metrics like Mean Squared Error (MSE) and R-squared (R²).
Part 1: Simple Prediction with Linear Regression
The first task uses a simple Linear Regression model to establish a baseline prediction, relying on the strong, intuitive relationship between a day’s highest temperature and its average temperature.
1.1. Model Setup (Q1)
- Feature (Independent Variable):
actual_max_temp
- Target (Dependent Variable):
actual_mean_temp
- Data Split: 80% Training, 20% Testing, with a
random_state=42
.
Code Snippet: Data Loading and Preparation
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from…