Day 15

Jun 26, 2017 · 387 words

Today we got to hold our first real team meeting. Frank and Ajay joined me at the library, and I spent time explaining some of the work I had done and sharing the basics of how to use Pandas. For now I’m suggesting they spend time setting up Anaconda and Pandas and get a feel for how to use DataFrame and Series objects.

In the meantime I’ll continue working with probability distribution functions.

Kolmogorov–Smirnov test

Over the weekend I came across a statistical test designed for comparing probability functions. It’s called the Kolmogorov-Smirnov test, or K-S test for short.

This page warns that it is not valid when the distribution is estimated. I assume “the distribution” refers to the one being compared against. I am unsure of how this applies to our situation, since in a way we are estimating the underlying normal distribution. This other page also says not to use the K-S test when your expected normal distribution is estimated from the sample. Instead it recommends Lilliefors test if I decide to use the estimated normal distribution. I think that’s enought to convince me that the K-S test is not the best option.

Other distribution tests

The K-S test is just one of many statistical methods to compare data distribution. Here’s a list of other such tests. It’s pretty overwhelming how many options there are, and each test has a very specific purpose. I’m going to spend some time reading up on each of them.

Research paper on abnormal stock trading

I came across this interesting research study on analyzing abnormal stock trading patterns. It’s a lot of dense information to sift through but it could prove very useful. It seems to be tackling a very similar problem statement.

Non-parametric tests

I have learned to distinguish between parametric and non-parametric tests. Parametric tests assume an underlying normal distribution, while non-parametric ones don’t. So far you could say my work has been more like the former since I assume that the power usage should be normal. I should emphasize that this is not set in stone and I am very open to trying non-parametric methods since I don’t know what the ideal population distribution should look like.

I have more to talk about but not enought time to write it all down. More to come tomorrow.