Skip to content

Conversation

@sundb
Copy link
Owner

@sundb sundb commented Feb 10, 2025

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2025

Codecov Report

Attention: Patch coverage is 96.21993% with 11 lines in your changes missing coverage. Please review.

Project coverage is 69.01%. Comparing base (1cd622b) to head (266d378).
Report is 6 commits behind head on unstable.

Files with missing lines Patch % Lines
src/defrag.c 96.84% 9 Missing ⚠️
src/module.c 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #371      +/-   ##
============================================
+ Coverage     68.84%   69.01%   +0.16%     
============================================
  Files           118      118              
  Lines         67984    68024      +40     
============================================
+ Hits          46804    46945     +141     
+ Misses        21180    21079     -101     
Files with missing lines Coverage Δ
src/kvstore.c 96.91% <100.00%> (ø)
src/server.c 88.89% <100.00%> (-0.02%) ⬇️
src/server.h 100.00% <ø> (ø)
src/module.c 9.29% <0.00%> (ø)
src/defrag.c 89.38% <96.84%> (+6.09%) ⬆️

... and 11 files with indirect coverage changes

@sundb sundb force-pushed the active-defrag branch 6 times, most recently from 1055283 to 2c4daa6 Compare February 12, 2025 07:55
src/ae.c Outdated
eventLoop->setsize = setsize;
eventLoop->timeEventHead = NULL;
eventLoop->timeEventNextId = 0;
eventLoop->timeEventNextId = 1;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify reason: valkey-io/valkey#1242 (comment)
actually, I don't want to go back to this modification, let's see what you think.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to track if a timer event is create, we can check if timeEventHead equals NULL, right?
a small change, and it is unrelated, maybe we can avoid this change.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modification is for removing timer by id, this modification is only for ambiguity, agree to delete it

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with f6971a9 (#371)

keyobj = createStringObject(keyobj->ptr, sdslen(keyobj->ptr));
}

serverLog(LL_DEBUG,"key %s %s: deleting it", (char*)keyobj->ptr, notify_type == NOTIFY_EXPIRED ? "expired" : "evicted");
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guybe7 i wanna delete this line, because sometimes when defrag test failed, this log takes up the entire screen and can only be downloaded.
https://github.com/sundb/redis/actions/runs/13263200768/job/37024279281

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH i think this log is useful.. maybe we can just turn off debug logs in the relevant tests?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test requires other debug logs.
it's ok., except for the trouble of downloading other is not a problem, let's keep it.

defrag.start_frag_pct = getAllocatorFragmentation(NULL);
defrag.timeproc_end_time = 0;
defrag.timeproc_overage_us = 0;
defrag.timeproc_id = aeCreateTimeEvent(server.el, 0, activeDefragTimeProc, NULL, NULL);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that activeDefragCycle() doesn't trigger defragment immediately, but waits until the next timer event loop.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is ok for me

Comment on lines +1346 to +1361
long waitedUs = getMonotonicUs() - defrag.timeproc_end_time;
/* Given the elapsed wait time between calls, compute the necessary duty time needed to
* achieve the desired CPU percentage.
* With: D = duty time, W = wait time, P = percent
* Solve: D P
* ----- = -----
* D + W 100
* Solving for D:
* D = P * W / (100 - P)
*
* Note that dutyCycleUs addresses starvation. If the wait time was long, we will compensate
* with a proportionately long duty-cycle. This won't significantly affect perceived
* latency, because clients are already being impacted by the long cycle time which caused
* the starvation of the timer. */
dutyCycleUs = targetCpuPercent * waitedUs / (100 - targetCpuPercent);

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a important modification, before we calculated how long the defragmentation would take based on the hz and cpu percent, but now we calculate the time difference between the last defragmentation and the start of the run (i.e. the time between two defragmentation intervals), and then calculate the time according to the cpu percent.

Copy link
Owner Author

@sundb sundb Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design idea of active_defrag_cycle_us is to ensure that the defrag can run to this time no matter what.

  1. if the defrag running time is less than active_defrag_cycle_us, we still run the defrag for active_defrag_cycle_us, then increse the next timer for defrag.
  2. If the interval between defragmenting is long (the command takes a long time to execute), the defragmenting time will be allocated according to this interval.

so active_defrag_cycle_us is a lower limit.

I was thinking that we could change the design, we don't need this lower limit, in redis#13752, we would reduce cpu usage when defragment isn't effective, another reason is that I feel that if the command is really busy, we should not try to compensate for the defragmentation time, which may aggravate the delay, and defragmentation should give way to the command execution.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so active_defrag_cycle_us is a lower limit.

to be honest, it is unexpected at the first glance. generally, i consider it is upper limit.

if the defrag running time is less than active_defrag_cycle_us, we still run the defrag for active_defrag_cycle_us, then increse the next timer for defrag.

it may have conflict with redis#13752?

D = P * W / (100 - P)

so duty time is longer if wait time is longer? makes blocking issue worse

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be honest, it is unexpected at the first glance. generally, i consider it is upper limit.

me too.

it may have conflict with redis#13752?

There will be some effect, because even if we reduce the rate this time, it will eventually make up for it.

so duty time is longer if wait time is longer? makes blocking issue worse

Yes, that's what I'm worried about, and it's unlimited.

Copy link
Owner Author

@sundb sundb Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT i added a upper limit for duty time in 7bb496e
Revert the change and eliminate the lower limit in 208ba72

proc wait_for_defrag_stop {maxtries delay {expect_frag 0}} {
wait_for_condition $maxtries $delay {
[s active_defrag_running] eq 0
[s active_defrag_running] eq 0 && ($expect_frag == 0 || [s allocator_frag_ratio] <= $expect_frag)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test bug also exists before this PR, but now due to the use of timers, it will be spaced out longer than before, so some of the tests below become more fragile.

@sundb
Copy link
Owner Author

sundb commented Feb 13, 2025

#372 was made for module to defrag global data incremental.

@sundb
Copy link
Owner Author

sundb commented Feb 14, 2025

in the c1c1363 (#371)
I combined the target and private in the callback into context, and the context is created dynamically when the stage is created instead of using static.

@sundb sundb requested a review from ShooterIT February 18, 2025 06:09
src/defrag.c Outdated
}

/* A kvstoreHelperPreContinueFn */
static doneStatus defragLaterStep(monotime endtime, void *ctx) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants