-
Couldn't load subscription status.
- Fork 904
Description
Many times it so happens that you do not have the time index of a data entry and a lot of other features are present. However, the time index is still present in a lot of the cases and thus the time based features could and should be computed.
I am not finding a way to create such features using featuretools. All the created features are NaN
Example:
from datetime import timedelta
import featuretools as ft
import pandas as pd
import numpy as np
from featuretools import Timedelta
tw = Timedelta(3, unit = "d")
ts = pd.to_datetime("01-01-2020 01:00:00")
time_index = list(pd.date_range(ts, ts + timedelta(hours = 168), periods = 8, inclusive = 'both'))
time_index[4] = np.nan#np.datetime64("NaT")
df1 = pd.DataFrame({'ind': [1], 'time': ts})
df2 = pd.DataFrame({'ind':[1,2,3,4,5,6,7,8], 'id':[1,1,1,1,1,1,1,1], 'time': time_index, 'feat': [np.nan,1,2,4,8,16,48,144]})
es = ft.EntitySet('es')
es.add_dataframe(df1, index = 'ind', dataframe_name = 'base', time_index = 'time')
es.add_dataframe(df2, index = 'ind', dataframe_name = 'data', time_index = 'time')
es.add_relationship('base','ind','data','id')
ct = pd.DataFrame({"instance_id":[1,1,1], "time":[time_index[-1], time_index[-2], time_index[-3]]})
es.add_last_time_indexes()
ft.dfs(entityset = es, target_dataframe_name = 'base', agg_primitives=['trend'],
trans_primitives=[], cutoff_time=ct, cutoff_time_in_index=True, training_window = tw)[0].sort_index()
The output is all Null features. However, if I comment out the one line setting one of the time indices to 0 (time_index[4] = np.nan#np.datetime64("NaT")) then, we can see that the features are getting generated just fine.
How can we handle missing time indices in featuretools?
Output of featuretools.show_info()
Featuretools version: 1.30.0
Featuretools installation directory: /home/nitin/miniconda3/envs/dst/lib/python3.10/site-packages/featuretools
SYSTEM INFO
python: 3.10.13.final.0
python-bits: 64
OS: Linux
OS-release: 6.5.0-26-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_IN
LOCALE: en_IN.ISO8859-1
INSTALLED VERSIONS
numpy: 1.26.4
pandas: 2.2.1
tqdm: 4.65.0
cloudpickle: 2.2.1
dask: 2023.10.1
psutil: 5.9.8
pip: 23.3.1
setuptools: 68.2.2