Day 52
Aug 18, 2017 · 295 words
Last time, I left the SingleRFModel
and MultiRFModel
implementation as a rather tangled collection of methods. I wasn’t satisfied with the organization of everything. I also didn’t like the disjointedness between the process of training old sets and the process of predicting new ones.
Old setup
- All training feature matricies are created independently from the X and y window lists
- The training features matricies are merged along the column axis
- Prediction inputs are created as a whole feature matrix from a single point in time
Proposed setup
- Prediction inputs are created as a whole feature matrix from a single point in time
- Training feature matricies are created using the above method
Unfortunately the first method is noticably faster to process than the latter. The alternative is to instead make the _input_vector
function a special case of the _get_feats
call with only one pair of input/output windows. That’s what I ended up doing. The _get_training_arrays
method is still the slowest part of the training process, probably due to all the list comprehension loops. At some point I want to see if there’s a more efficient way to generate the training arrays from the training windows.
I also finished implementing the column-specific features implementation for MultiRFModel
and verified that it works. I switched from storing the submodels and their specific feature tables from a list to a dictionary (with column names as keys) so that you don’t need placeholders if you only want certain columns to have specific features.
Along the way I came across a lot of small bugs that I ironed out (like incompatibility between certain window sizes and feature frequencies).
Lastly I continued working on the documentation for the classes and their methods. You can find the most recent version in the energize module.