-
Notifications
You must be signed in to change notification settings - Fork 81
Description
Bug Report: Data Leakage in auton_survival.estimators.SurvivalModel.fit
When weights is not passed (weights=None), the SurvivalModel.fit method trains the model on a dataset that contains the validation samples, leading to data leakage.
Cause:
A correct train/validation split is created internally within the .fit method. However, the subsequent internal call to _fit_dsm is passed the original, full dataset. 'features' is passed into '_fit_dsm', which is only updated in an if statement when 'weights=None'. Otherwise 'features' represents the entire train set.
Impact:
This data leakage causes the reported validation loss to be an unreliable and overly optimistic metric. It masks overfitting and can cause models to appear more stable than they are, where in reality the models are deeply overfitting due to the early stopping mechanism with the validation set having a much smaller likelihood of being triggered.