Skip to content

How to use featuretools at the test time? It seems featuretools' feature definitions do not store train time statistics to accurately apply primitives like 'PERCENTILE' at the test time #2697

@nitinmnsn

Description

@nitinmnsn

Creating a github issue for better attention. I have a StackOverflow question for the same as well

I would demonstrate the issue with an example:

Let us say we want to use the primitive 'PERCENTILE'

Imports:

import pandas as pd
import featuretools as ft

For training (create a simple data with one column and let featuretools compute a percentile feature on top of it):

df_train = pd.DataFrame({'index':[1,2,3,4,5], 'val':[1,2,3,4,5]})
es_train = ft.EntitySet("es_train")
es_train.add_dataframe(df_train,'df')
fm, fl = ft.dfs(entityset = es_train, trans_primitives=['percentile'], agg_primitives=[], target_dataframe_name='df')

output:

print(fm)
       val  PERCENTILE(val)
index                      
1        1              0.2
2        2              0.4
3        3              0.6
4        4              0.8
5        5              1.0

So far everything is expected

Now, when I get an example with the value, say, 3, at the test time. I would want it translated to 0.6 as per the training data. But, that is not what happens

df_test = pd.DataFrame({'index':[1], 'val':[3]})
es_test = ft.EntitySet("es_test")
es_test.add_dataframe(df_test,'df')
ft.calculate_feature_matrix(features = fl, entityset=es_test)

output:

       val  PERCENTILE(val)
index                      
1        3              1.0

So, metadata in feature definitions in fl that is the output of ft.dfs does not store train time stats needed to compute the features at the test time. This would throw any machine-learning model into a tailspin

What is the canonical way to apply featuretools at the test time?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions