Day 33
Jul 21, 2017 · 471 words
Polynomial regression as linear regression
There is no specific polynomial regression estimator in sklearn
. Zico Kolter goes over this in his lecture, but polynomial regression is really an extension of linear regression because the additional feature terms ($x^2$, $x^3$, … $x^n$) can be thought of as regular variables with linear coefficients. Scikit-learn has you making a PolynomialFeatures
object of a particular degree, then using its fit_transform
method to generate the feature matrix.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()
x_plot = np.linspace(X.min(),X.max(),100).reshape(-1, 1)
poly = PolynomialFeatures(degree=5)
X_ = poly.fit_transform(X)
x_plot_ = poly.fit_transform(x_plot)
reg = linear_model.LinearRegression()
reg.fit(X_,y)
plt.scatter(X,y)
plt.plot(x_plot, reg.predict(x_plot_))
That example fits a 5th degree polynomial to a sinusoidal sample.
More Dimensions
I know how to perform a regression on one-dimensional input, but it would be cool to try adding a dimension. There’s no real extra coding involved (other than passing in an array of $x_i \in \mathbb{R}^n$ as opposed to $x_i \in \mathbb{R}^1$) but the hard part is actually showing the results…
3D Visualization
I’m using this resource to try out 3D plotting in Matplotlib. I’m going to start by just making a scatterplot of the data, looking both at the observed temperature for each point’s day and the number of minutes since midnight:
It’s hard to see here, but it should form a tilted saddle shape.
I got this far with the regression plot, but it’s definitely not right; it’s not accounting for changes in the y-axis. I’ve verified that the regression is working when changing the y variable, but I’m doing something wrong with the plotting process.
Fixed!
I finally got the plot to display correctly. I don’t entirely understand how it works since I’m unfamiliar with the specifics of the
numpy.meshgrid()
and Axes3D.plot_wireframe()
methods. This post helped me figure out how to organize my Z-array. Visualization is not the most important part of this project so I suppose it’s not a big deal if I do a hack-job on the plotting scripts.
Anyways, you can see the saddle shape now. The red surface represents a least squares regression on 5th degree polynomial features of the data. I can now take a theoretical temperature and time input (and convert that to minutes since midnight) and return an estimated power usage. I could do this for as many input variables as I wanted now, but I wouldn’t be able to really visualize that nicely (I had a hard enough time plotting in 3D). Pretty cool!
To do
I’m still using a basic linear regression technique for the estimator. I’m fine with using a polynomial for now, but I might want to look at other loss function options (e.g. LassoCV is a regularized linear estimator that has cross validation built in)