Skip to content

[Question/Feedback] Difficulty Adding New Datasets for Research #2259

@JY132

Description

@JY132

Hi FedML team 👋,

First, thanks for the great framework! I’ve been experimenting with the examples in FedML/python/examples/federate/simulation and I wanted to add a new dataset for my research. Sorry that I am rookie.

I noticed that under FedML/python/fedml/data/, each dataset folder (e.g., MNIST, CIFAR-10, etc.) seems to have its own custom data loading logic — partitioning, preprocessing, dataloader definitions, etc.

This makes it a bit hard to reuse a consistent data-loading pipeline for new image datasets or standardize partitioning methods (IID, non-IID, Dirichlet) across datasets.

May I ask that is this the intended design (for flexibility)? Or would it make sense to provide a standardized dataset interface / utility functions for common tasks like:

  • Splitting image datasets (IID, Dirichlet, shard, etc.)
  • Handling train/test and client_local partitioning
  • Registering new datasets in a unified way

This would make it much easier for researchers to add and compare results across new datasets. Or perhaps, is there any documentation or tutorial for this (e.g., where should I add or modify the codes in order to use new dataset for experiments)?

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions