Skip to content

Commit f11a1ce

Browse files
committed
add dataflex doc sample
1 parent be18a54 commit f11a1ce

File tree

15 files changed

+271
-50
lines changed

15 files changed

+271
-50
lines changed

docs/.vuepress/navbars/en/index.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,17 +35,17 @@ export const enNavbar = defineNavbarConfig([
3535
]
3636
},
3737
{
38-
text: 'Start with Dataflow',
38+
text: 'Dataflex Selector',
3939
items: [
4040
{
4141
text: 'Installation',
42-
link: '/en/notes/guide/quickstart/install.md',
42+
link: '/en/notes/guide/selector/install.md',
4343
icon: 'material-symbols-light:download-rounded',
4444
activeMatch: '^/guide/'
4545
},
4646
{
4747
text: 'Quick Start',
48-
link: '/en/notes/guide/quickstart/quickstart.md',
48+
link: '/en/notes/guide/selector/tutorial.md',
4949
icon: 'solar:flag-2-broken',
5050
activeMatch: '^/guide/'
5151
}

docs/.vuepress/notes/en/guide.ts

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,17 @@ export const Guide: ThemeNote = defineNoteConfig({
1313
items: [
1414
'intro',
1515
'framework',
16+
'install',
1617
],
1718
},
1819
{
19-
text: 'Start with Dataflex',
20+
text: 'Dataflex Selector',
2021
collapsed: false,
2122
icon: 'carbon:idea',
22-
prefix: 'quickstart',
23+
prefix: 'selector',
2324
items: [
24-
'install',
2525
'quickstart',
26-
'translation',
26+
'tutorial',
2727
],
2828
},
2929
],

docs/.vuepress/notes/zh/guide.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ export const Guide: ThemeNote = defineNoteConfig({
1313
items: [
1414
'intro',
1515
'framework',
16+
'install',
1617
],
1718
},
1819
{
@@ -21,9 +22,8 @@ export const Guide: ThemeNote = defineNoteConfig({
2122
icon: 'carbon:idea',
2223
prefix: 'quickstart',
2324
items: [
24-
'install',
2525
'quickstart',
26-
'translation',
26+
'tutorial',
2727
],
2828
},
2929
],

docs/en/notes/guide/basicinfo/framework.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@ createTime: 2025/06/13 14:59:56
55
permalink: /en/guide/basicinfo/framework/
66
---
77

8-
# Framework Design
8+
# Framework Design
9+
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: Installation
3+
icon: material-symbols-light:download-rounded
4+
createTime: 2025/06/09 10:29:31
5+
permalink: /en/guide/install/
6+
---
7+
# Installation
8+
You can use the following commands to install dataflex.
9+
10+
```bash
11+
git clone https://github.com/OpenDCAI/DataFlex-Preview.git
12+
cd DataFlex-Preview
13+
pip install -e .
14+
pip install llamafactory
15+
```

docs/en/notes/guide/quickstart/install.md

Lines changed: 0 additions & 7 deletions
This file was deleted.

docs/en/notes/guide/quickstart/quickstart.md

Lines changed: 0 additions & 8 deletions
This file was deleted.

docs/en/notes/guide/quickstart/translation.md

Lines changed: 0 additions & 9 deletions
This file was deleted.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
title: Quick Start
3+
createTime: 2025/06/30 19:19:16
4+
permalink: /en/guide/quickstart/
5+
icon: solar:flag-2-broken
6+
---
7+
8+
# Quick Start
9+
10+
The launch command is similar to [LlamaFactory](https://github.com/hiyouga/LLaMA-Factory).
11+
Below is an example using [LESS](https://arxiv.org/abs/2402.04333) :
12+
13+
```bash
14+
FORCE_TORCHRUN=1 DISABLE_VERSION_CHECK=1 dataflex-cli train examples/train_lora/less.yaml
15+
```
16+
17+
Unlike vanilla LlamaFactory, your `.yaml` config file must also include **DataFlex-specific parameters**.
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
title: Contribute to Dataflex Selector
3+
createTime: 2025/06/30 19:19:16
4+
permalink: /en/guide/translation/
5+
icon: basil:lightning-alt-outline
6+
---
7+
8+
# Less Algorithm
9+
10+
This document will detail how to add and configure a custom data selector in the DataFlex framework, enabling dynamic sample selection during the training process, using `custom_selector` as an example.
11+
## Step 1: Create the Selector Implementation File
12+
13+
First, create a new Python file in the specified project path to implement the core logic of your custom selector.
14+
15+
1. **File Path**: `DataFlex-Preview/src/dataflex/train/selector/custom_selector.py`
16+
2. **File Content**: In this file, define a new class `CustomSelector` that inherits from `dataflex.train.selector.base_selector.Selector`.
17+
18+
```python
19+
from dataflex.core.registry import register_selector
20+
from .base_selector import logger, Selector
21+
22+
@register_selector('custom')
23+
class CustomSelector(Selector):
24+
"""
25+
An example implementation of a custom data selector.
26+
"""
27+
def __init__(
28+
self,
29+
dataset,
30+
accelerator,
31+
data_collator,
32+
cache_dir,
33+
):
34+
"""
35+
Constructor for initializing the selector.
36+
"""
37+
super().__init__(dataset, accelerator, data_collator, cache_dir)
38+
logger.info(f"CustomSelector initialized.")
39+
40+
def select(self, model, step_id: int, num_samples: int, **kwargs):
41+
"""
42+
The core selection logic.
43+
This method defines how to select samples from the dataset.
44+
45+
Args:
46+
model: The current model.
47+
step_id (int): The current training step.
48+
num_samples (int): The number of samples to select.
49+
50+
Returns:
51+
list: A list of indices of the selected samples.
52+
"""
53+
# Example logic: simply return a list of indices from 0 to num_samples-1.
54+
# You can implement more complex selection algorithms here.
55+
return list(range(num_samples))
56+
```
57+
58+
### Key Points Explanation:
59+
60+
* `@register_selector('custom')`: This decorator registers your `CustomSelector` class into the DataFlex framework and assigns it a unique name, `custom`. This name will be used in configuration files later.
61+
* `CustomSelector(Selector)`: Your custom class must inherit from the `Selector` base class provided by the framework.
62+
* `__init__`: The constructor is used to perform necessary initialization tasks. It calls `super().__init__(...)` to ensure that the base class initialization logic is executed correctly.
63+
* `select`: This is the core method where you implement your data selection algorithm. You should override this method according to your needs.
64+
* `warmup` (optional): You can also override the `warmup` method if you need to select data for the warmup phase of training. By default, data is randomly sampled during the warmup phase.
65+
66+
## Step 2: Import the New Module
67+
68+
In order for DataFlex to recognize and load your newly created selector, you need to edit the `__init__.py` file in this directory to expose your new module.
69+
70+
1. **File Path**: `DataFlex-Preview/src/dataflex/train/selector/__init__.py`
71+
2. **Add Content**: Add the following line at the end of the file to import the `CustomSelector` class.
72+
73+
```python
74+
from .custom_selector import *
75+
```
76+
77+
## Step 3: Configure the Selector Parameters
78+
79+
Finally, define your new selector and its parameters in a YAML configuration file so it can be easily called during experiments.
80+
81+
1. **File Path**: `DataFlex-Preview/src/dataflex/configs/components.yaml`
82+
2. **Add Configuration**: Under the `selectors` configuration block, add a new entry for your `custom` selector.
83+
84+
```yaml
85+
selectors:
86+
...
87+
# Add your custom selector configuration
88+
custom:
89+
name: custom
90+
params:
91+
cache_dir: ../dataflex_saves/custom_output
92+
...
93+
```
94+
95+
### Key Points Explanation:
96+
97+
* `params::` All parameters defined under this block will be passed as keyword arguments to the `__init__` constructor of the `CustomSelector` class. For example, the value of `cache_dir` here will be passed to the `cache_dir` parameter of the `__init__` method.
98+
99+
```
100+
101+
This English version of the tutorial can be used directly for documentation purposes, README files, or other resources where an English version is required.
102+
```

0 commit comments

Comments
 (0)