[Question/Feedback] Difficulty Adding New Datasets for Research

Hi FedML team 👋,

First, thanks for the great framework! I’ve been experimenting with the examples in FedML/python/examples/federate/simulation and I wanted to add a new dataset for my research. Sorry that I am rookie.

I noticed that under FedML/python/fedml/data/, each dataset folder (e.g., MNIST, CIFAR-10, etc.) seems to have its own custom data loading logic — partitioning, preprocessing, dataloader definitions, etc.

This makes it a bit hard to reuse a consistent data-loading pipeline for new image datasets or standardize partitioning methods (IID, non-IID, Dirichlet) across datasets.

May I ask that is this the intended design (for flexibility)? Or would it make sense to provide a standardized dataset interface / utility functions for common tasks like:
- Splitting image datasets (IID, Dirichlet, shard, etc.)
- Handling train/test and client_local partitioning
- Registering new datasets in a unified way

This would make it much easier for researchers to add and compare results across new datasets. Or perhaps, is there any documentation or tutorial for this (e.g., where should I add or modify the codes in order to use new dataset for experiments)?

Thanks a lot! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Question/Feedback] Difficulty Adding New Datasets for Research #2259

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Question/Feedback] Difficulty Adding New Datasets for Research #2259

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions