Learning ML - First attempt at basic linear regression

August 20, 2022

Laugh all you want, but if the machines are going to turn us all into batteries, I want to know how they did it.

# leastSquares and linear regression.

from pandas import read_csv, DataFrame
from sklearn.linear_model import LinearRegression
from sklearn import datasets
import numpy as np
# data = [[1, 1], [2,3], [4,3], [3,2], [5,5]]

print("***********")
dataset = read_csv("./data.csv", names=["x", "y"])
print(dataset)


# print(dataset.describe())
# print(dataset.shape)
X = dataset[['x']] # x is the independent variable so 2 brackets. 
Y = dataset['y'] # note the difference here: y is the dependent variable so just the one bracket. 

# This is because we need to split our dataset into the "matrix" of indpendent variables and the vector (Or dependent variable). Mathematically, a vector is a matrix that has just one column. 

print("X: ", X)
print("y: ", Y)

model = LinearRegression()
model.fit(X, Y)

y_pred = model.predict(dataset[['x']])
print("y_pred", y_pred)
df = DataFrame({'Actual': dataset.loc[:, 'y'], 'Predicted': y_pred})
print(df)

print(' residual sum of squares is : '+ str(np.sum(np.square(df['Predicted'] - df['Actual']))))

print("Predict value for 6: ", model.predict([[6]]))
print("Predict value for 3.3: ", model.predict([[3.3]]))
print("Model Coefficients: ", model.coef_)
print("Model Intercept: ", model.intercept_)
print("Line equation: " + str(model.intercept_) + " + " + str(model.coef_[0])+ "x")
print("R(squared) = ", model.score(X, Y, sample_weight=None))
# How to find the "Sum of Square Residuals"
# Residuals are the differences between the real data & he line, 
# and we are summing the square of these values. 


# References:
# [] https://www.geeksforgeeks.org/how-to-calculate-residual-sum-of-squares-in-python/
# [] https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.predict
# [] https://www.pluralsight.com/guides/importing-and-splitting-data-into-dependent-and-independent-features-for-ml
Share