|
| 1 | +--- |
| 2 | +title: Contribute to Dataflex Selector |
| 3 | +createTime: 2025/06/30 19:19:16 |
| 4 | +permalink: /en/guide/translation/ |
| 5 | +icon: basil:lightning-alt-outline |
| 6 | +--- |
| 7 | + |
| 8 | +# Less Algorithm |
| 9 | + |
| 10 | +This document will detail how to add and configure a custom data selector in the DataFlex framework, enabling dynamic sample selection during the training process, using `custom_selector` as an example. |
| 11 | +## Step 1: Create the Selector Implementation File |
| 12 | + |
| 13 | +First, create a new Python file in the specified project path to implement the core logic of your custom selector. |
| 14 | + |
| 15 | +1. **File Path**: `DataFlex-Preview/src/dataflex/train/selector/custom_selector.py` |
| 16 | +2. **File Content**: In this file, define a new class `CustomSelector` that inherits from `dataflex.train.selector.base_selector.Selector`. |
| 17 | + |
| 18 | +```python |
| 19 | +from dataflex.core.registry import register_selector |
| 20 | +from .base_selector import logger, Selector |
| 21 | + |
| 22 | +@register_selector('custom') |
| 23 | +class CustomSelector(Selector): |
| 24 | + """ |
| 25 | + An example implementation of a custom data selector. |
| 26 | + """ |
| 27 | + def __init__( |
| 28 | + self, |
| 29 | + dataset, |
| 30 | + accelerator, |
| 31 | + data_collator, |
| 32 | + cache_dir, |
| 33 | + ): |
| 34 | + """ |
| 35 | + Constructor for initializing the selector. |
| 36 | + """ |
| 37 | + super().__init__(dataset, accelerator, data_collator, cache_dir) |
| 38 | + logger.info(f"CustomSelector initialized.") |
| 39 | + |
| 40 | + def select(self, model, step_id: int, num_samples: int, **kwargs): |
| 41 | + """ |
| 42 | + The core selection logic. |
| 43 | + This method defines how to select samples from the dataset. |
| 44 | +
|
| 45 | + Args: |
| 46 | + model: The current model. |
| 47 | + step_id (int): The current training step. |
| 48 | + num_samples (int): The number of samples to select. |
| 49 | +
|
| 50 | + Returns: |
| 51 | + list: A list of indices of the selected samples. |
| 52 | + """ |
| 53 | + # Example logic: simply return a list of indices from 0 to num_samples-1. |
| 54 | + # You can implement more complex selection algorithms here. |
| 55 | + return list(range(num_samples)) |
| 56 | +``` |
| 57 | + |
| 58 | +### Key Points Explanation: |
| 59 | + |
| 60 | +* `@register_selector('custom')`: This decorator registers your `CustomSelector` class into the DataFlex framework and assigns it a unique name, `custom`. This name will be used in configuration files later. |
| 61 | +* `CustomSelector(Selector)`: Your custom class must inherit from the `Selector` base class provided by the framework. |
| 62 | +* `__init__`: The constructor is used to perform necessary initialization tasks. It calls `super().__init__(...)` to ensure that the base class initialization logic is executed correctly. |
| 63 | +* `select`: This is the core method where you implement your data selection algorithm. You should override this method according to your needs. |
| 64 | +* `warmup` (optional): You can also override the `warmup` method if you need to select data for the warmup phase of training. By default, data is randomly sampled during the warmup phase. |
| 65 | + |
| 66 | +## Step 2: Import the New Module |
| 67 | + |
| 68 | +In order for DataFlex to recognize and load your newly created selector, you need to edit the `__init__.py` file in this directory to expose your new module. |
| 69 | + |
| 70 | +1. **File Path**: `DataFlex-Preview/src/dataflex/train/selector/__init__.py` |
| 71 | +2. **Add Content**: Add the following line at the end of the file to import the `CustomSelector` class. |
| 72 | + |
| 73 | +```python |
| 74 | +from .custom_selector import * |
| 75 | +``` |
| 76 | + |
| 77 | +## Step 3: Configure the Selector Parameters |
| 78 | + |
| 79 | +Finally, define your new selector and its parameters in a YAML configuration file so it can be easily called during experiments. |
| 80 | + |
| 81 | +1. **File Path**: `DataFlex-Preview/src/dataflex/configs/components.yaml` |
| 82 | +2. **Add Configuration**: Under the `selectors` configuration block, add a new entry for your `custom` selector. |
| 83 | + |
| 84 | +```yaml |
| 85 | +selectors: |
| 86 | + ... |
| 87 | + # Add your custom selector configuration |
| 88 | + custom: |
| 89 | + name: custom |
| 90 | + params: |
| 91 | + cache_dir: ../dataflex_saves/custom_output |
| 92 | + ... |
| 93 | +``` |
| 94 | + |
| 95 | +### Key Points Explanation: |
| 96 | + |
| 97 | +* `params::` All parameters defined under this block will be passed as keyword arguments to the `__init__` constructor of the `CustomSelector` class. For example, the value of `cache_dir` here will be passed to the `cache_dir` parameter of the `__init__` method. |
| 98 | + |
| 99 | +``` |
| 100 | +
|
| 101 | +This English version of the tutorial can be used directly for documentation purposes, README files, or other resources where an English version is required. |
| 102 | +``` |
0 commit comments