Day 33

Jul 21, 2017 · 471 words

Polynomial regression as linear regression

There is no specific polynomial regression estimator in sklearn. Zico Kolter goes over this in his lecture, but polynomial regression is really an extension of linear regression because the additional feature terms ($x^2$, $x^3$, … $x^n$) can be thought of as regular variables with linear coefficients. Scikit-learn has you making a PolynomialFeatures object of a particular degree, then using its fit_transform method to generate the feature matrix.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model

X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()
x_plot = np.linspace(X.min(),X.max(),100).reshape(-1, 1)
poly = PolynomialFeatures(degree=5)

X_ = poly.fit_transform(X)
x_plot_ = poly.fit_transform(x_plot)

reg = linear_model.LinearRegression()
reg.fit(X_,y)

plt.scatter(X,y)
plt.plot(x_plot, reg.predict(x_plot_))

That example fits a 5th degree polynomial to a sinusoidal sample.

More Dimensions

I know how to perform a regression on one-dimensional input, but it would be cool to try adding a dimension. There’s no real extra coding involved (other than passing in an array of $x_i \in \mathbb{R}^n$ as opposed to $x_i \in \mathbb{R}^1$) but the hard part is actually showing the results…

3D Visualization

I’m using this resource to try out 3D plotting in Matplotlib. I’m going to start by just making a scatterplot of the data, looking both at the observed temperature for each point’s day and the number of minutes since midnight:

It’s hard to see here, but it should form a tilted saddle shape.

I got this far with the regression plot, but it’s definitely not right; it’s not accounting for changes in the y-axis. I’ve verified that the regression is working when changing the y variable, but I’m doing something wrong with the plotting process.

Fixed!

I finally got the plot to display correctly. I don’t entirely understand how it works since I’m unfamiliar with the specifics of the numpy.meshgrid() and Axes3D.plot_wireframe() methods. This post helped me figure out how to organize my Z-array. Visualization is not the most important part of this project so I suppose it’s not a big deal if I do a hack-job on the plotting scripts.

Anyways, you can see the saddle shape now. The red surface represents a least squares regression on 5th degree polynomial features of the data. I can now take a theoretical temperature and time input (and convert that to minutes since midnight) and return an estimated power usage. I could do this for as many input variables as I wanted now, but I wouldn’t be able to really visualize that nicely (I had a hard enough time plotting in 3D). Pretty cool!

To do

I’m still using a basic linear regression technique for the estimator. I’m fine with using a polynomial for now, but I might want to look at other loss function options (e.g. LassoCV is a regularized linear estimator that has cross validation built in)