-
-
Notifications
You must be signed in to change notification settings - Fork 762
Description
Hi FedML team 👋,
First, thanks for the great framework! I’ve been experimenting with the examples in FedML/python/examples/federate/simulation and I wanted to add a new dataset for my research. Sorry that I am rookie.
I noticed that under FedML/python/fedml/data/, each dataset folder (e.g., MNIST, CIFAR-10, etc.) seems to have its own custom data loading logic — partitioning, preprocessing, dataloader definitions, etc.
This makes it a bit hard to reuse a consistent data-loading pipeline for new image datasets or standardize partitioning methods (IID, non-IID, Dirichlet) across datasets.
May I ask that is this the intended design (for flexibility)? Or would it make sense to provide a standardized dataset interface / utility functions for common tasks like:
- Splitting image datasets (IID, Dirichlet, shard, etc.)
- Handling train/test and client_local partitioning
- Registering new datasets in a unified way
This would make it much easier for researchers to add and compare results across new datasets. Or perhaps, is there any documentation or tutorial for this (e.g., where should I add or modify the codes in order to use new dataset for experiments)?
Thanks a lot!