Skip to content

Conversation

@DongBaiYue
Copy link

@DongBaiYue DongBaiYue commented Nov 10, 2025

PR types

New features

PR changes

APIs

Description

Add global param to refined_recompute
不同设备的显存大小可能不同,添加global参数可以精细调节重计算百分比,方便在不同设备上达到显存占用和性能的平衡。
refined_recompute: "global:x"的含义:n层ErnieDecoderLayer中有x层不重计算,n-x层重计算。默认值为0。
n由模型结构确定,以ERNIE VL 28B为例,此时n=14

  • A100(80GB)上可以默认所有层都进行重计算。
  • P800(96GB)上可以设置global:3,以关闭3/14的重计算,换取性能提升。

@paddle-bot
Copy link

paddle-bot bot commented Nov 10, 2025

Thanks for your contribution!

@codecov-commenter
Copy link

codecov-commenter commented Nov 10, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@344c970). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #2899   +/-   ##
==========================================
  Coverage           ?   31.33%           
==========================================
  Files              ?      420           
  Lines              ?    68486           
  Branches           ?        0           
==========================================
  Hits               ?    21458           
  Misses             ?    47028           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DongBaiYue
Copy link
Author

/re-run all-failed

Copy link

@yongqiangma yongqiangma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants