Day 52

Aug 18, 2017 · 295 words

Last time, I left the SingleRFModel and MultiRFModel implementation as a rather tangled collection of methods. I wasn’t satisfied with the organization of everything. I also didn’t like the disjointedness between the process of training old sets and the process of predicting new ones.

Old setup

  • All training feature matricies are created independently from the X and y window lists
  • The training features matricies are merged along the column axis
  • Prediction inputs are created as a whole feature matrix from a single point in time

Proposed setup

  • Prediction inputs are created as a whole feature matrix from a single point in time
  • Training feature matricies are created using the above method


Unfortunately the first method is noticably faster to process than the latter. The alternative is to instead make the _input_vector function a special case of the _get_feats call with only one pair of input/output windows. That’s what I ended up doing. The _get_training_arrays method is still the slowest part of the training process, probably due to all the list comprehension loops. At some point I want to see if there’s a more efficient way to generate the training arrays from the training windows.

I also finished implementing the column-specific features implementation for MultiRFModel and verified that it works. I switched from storing the submodels and their specific feature tables from a list to a dictionary (with column names as keys) so that you don’t need placeholders if you only want certain columns to have specific features.

Along the way I came across a lot of small bugs that I ironed out (like incompatibility between certain window sizes and feature frequencies).

Lastly I continued working on the documentation for the classes and their methods. You can find the most recent version in the energize module.

comments powered by Disqus