Skip to content

Using numpy arrays as data source may lead to errors if inferred encoding is used #193

@PGijsbers

Description

@PGijsbers
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=0
)

# Add checks on individuals (reproducibility)
gama.fit(X_train, y_train)

GAMA infers some features as categoricals (which is expected behavior, though incorrect).
This in turn creates new feature names, now some are int and some are str, e.g.: ['1_1', '1_2', 2, 3, ...]
This results in an error during evaluation: <class 'TypeError'> Feature names are only supported if all input features have string name.

Postponing on fixing this until #169 is merged.

For people encountering issues with this behavior, please use pandas dataframes for now.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions