-
Notifications
You must be signed in to change notification settings - Fork 24
完善了generaltext中62个算子的API文档 #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- 样本级评估算子(8个): BertSampleEvaluator, BleuSampleEvaluator, CiderSampleEvaluator, LangkitSampleEvaluator, LexicalDiversitySampleEvaluator, NgramSampleEvaluator, PerspectiveSampleEvaluator, PresidioSampleEvaluator - 数据集级评估算子(2个): Task2VecDatasetEvaluator, VendiDatasetEvaluator - 为每个算子添加完整的测试代码、真实的示例输入输出和详细的结果分析 - 创建针对性的测试数据文件提升文档可读性
- Sample-level evaluators (8): BertSampleEvaluator, BleuSampleEvaluator, CiderSampleEvaluator, LangkitSampleEvaluator, LexicalDiversitySampleEvaluator, NgramSampleEvaluator, PerspectiveSampleEvaluator, PresidioSampleEvaluator - Dataset-level evaluators (2): Task2VecDatasetEvaluator, VendiDatasetEvaluator - Translated from Chinese version with complete test code, real example inputs/outputs, and detailed result analysis - All 10 operators now have comprehensive English documentation
- AlphaWordsFilter: 添加测试用例、示例输入输出和结果分析 - BlocklistFilter: 添加测试用例、示例输入输出和结果分析 - CapitalWordsFilter: 添加测试用例、示例输入输出和结果分析 - CharNumberFilter: 添加测试用例、示例输入输出和结果分析 - ColonEndFilter: 添加测试用例、示例输入输出和结果分析 - ContentNullFilter: 添加测试用例、示例输入输出和结果分析 - CurlyBracketFilter: 添加测试用例、示例输入输出和结果分析 - HashDeduplicateFilter: 添加测试用例、示例输入输出和结果分析 - HtmlEntityFilter: 添加测试用例、示例输入输出和结果分析 - IDCardFilter: 添加测试用例、示例输入输出和结果分析 所有文档均已通过实际测试验证,包含完整的使用示例和详细的结果说明。
- Translate AlphaWordsFilter.md - Translate BlocklistFilter.md - Translate CapitalWordsFilter.md - Translate CharNumberFilter.md - Translate ColonEndFilter.md - Translate ContentNullFilter.md - Translate CurlyBracketFilter.md - Translate IDCardFilter.md - Translate HashDeduplicateFilter.md - Translate HtmlEntityFilter.md All translations maintain original structure, parameter tables, examples, and detailed documentation.
- 移除所有输出格式表格中的省略号(...) - 统一概述部分格式,移除超链接 - 修复26个文档(中英文各13个)的表格格式 - 修复HashDeduplicateFilter英文版的概述超链接 - 确保所有文档符合统一规范
- 更正概述:从'确保句子长度不超过阈值'改为'检测并过滤缺少标点符号的文本' - 更新输出格式说明:强调文本包含适当的标点符号 - 修正应用场景:突出过滤缺少标点符号的文本的功能 - 完善注意事项:说明通过检测超长句子片段来识别缺少标点符号的文本 - 同步中英文文档
- 新增 📦 API Key 配置 部分 - 说明两种配置方式:环境变量和PerspectiveAPIServing - 提供获取API Key的链接 - 更新注意事项,添加配置说明的引用链接 - 参考PerspectiveSampleEvaluator文档的格式 - 同步中英文文档
完成的 refine 算子文档: - RemoveEmojiRefiner, RemoveExtraSpacesRefiner, LowercaseRefiner - RemoveNumberRefiner, RemovePunctuationRefiner, RemoveRepetitionsPunctuationRefiner - HtmlUrlRemoverRefiner, HtmlEntityRefiner, TextNormalizationRefiner - RemoveContractionsRefiner, RemoveImageRefsRefiner, ReferenceRemoverRefiner - RemoveEmoticonsRefiner, RemoveStopwordsRefiner, StemmingLemmatizationRefiner - NERRefiner, SpellingCorrectionRefiner, PIIAnonymizeRefiner 完成的 eval 算子结果分析: - BertSampleEvaluator, BleuSampleEvaluator, CiderSampleEvaluator(中英文) 所有文档包含: - 完整的示例代码和测试数据 - 实际运行结果和详细分析 - 应用场景和注意事项 - NLTK 数据配置说明(StemmingLemmatizationRefiner)
为5个依赖 NLTK 的算子文档添加统一的配置说明: - RemoveStopwordsRefiner - BlocklistFilter - AlphaWordsFilter - StopWordFilter - StemmingLemmatizationRefiner 统一说明格式: 1. 推荐方式:手动下载 + 环境变量配置 2. 自动下载方式:首次使用自动下载 3. 明确指出从 https://github.com/nltk/nltk_data 下载 4. 说明如何避免网络问题导致的下载卡住
将 NLTK 数据配置章节移至 run 函数说明之后、示例用法之前, 与其他算子文档保持一致的格式。 统一文档结构: 1. 概述 2. __init__函数 3. run函数 4. NLTK 数据配置(如需要) 5. 示例用法 6. 结果分析
从中文文档同步以下内容到英文文档: - 示例用法代码 - 默认输出格式 - 示例输入输出 - 结果分析 - 应用场景和注意事项 完成的 refine 算子: 1. RemoveEmojiRefiner 2. RemoveExtraSpacesRefiner 3. LowercaseRefiner 4. RemoveNumberRefiner 5. RemovePunctuationRefiner 6. RemoveRepetitionsPunctuationRefiner 7. HtmlUrlRemoverRefiner 8. TextNormalizationRefiner 9. RemoveContractionsRefiner 10. HtmlEntityRefiner 11. RemoveImageRefsRefiner 12. ReferenceRemoverRefiner 13. RemoveEmoticonsRefiner 14. RemoveStopwordsRefiner(含 NLTK 配置) 15. StemmingLemmatizationRefiner(含 NLTK 配置) 16. NERRefiner 17. SpellingCorrectionRefiner 18. PIIAnonymizeRefiner 所有文档已包含完整的示例代码、测试数据和详细分析。
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Filter: 34个
Refine: 18个
Eval: 10个
均运行测试并补充了测试结果