Refactor of ActiveDefrag to reduce latencies #371

sundb · 2025-02-10T07:54:50Z

No description provided.

codecov-commenter · 2025-02-10T08:05:35Z

Codecov Report

Attention: Patch coverage is 96.21993% with 11 lines in your changes missing coverage. Please review.

Project coverage is 69.01%. Comparing base (1cd622b) to head (266d378).
Report is 6 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/defrag.c	96.84%	9 Missing ⚠️
src/module.c	0.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #371      +/-   ##
============================================
+ Coverage     68.84%   69.01%   +0.16%     
============================================
  Files           118      118              
  Lines         67984    68024      +40     
============================================
+ Hits          46804    46945     +141     
+ Misses        21180    21079     -101

Files with missing lines	Coverage Δ
src/kvstore.c	`96.91% <100.00%> (ø)`
src/server.c	`88.89% <100.00%> (-0.02%)`	⬇️
src/server.h	`100.00% <ø> (ø)`
src/module.c	`9.29% <0.00%> (ø)`
src/defrag.c	`89.38% <96.84%> (+6.09%)`	⬆️

... and 11 files with indirect coverage changes

sundb · 2025-02-12T08:04:24Z

src/ae.c

    eventLoop->setsize = setsize;
    eventLoop->timeEventHead = NULL;
-    eventLoop->timeEventNextId = 0;
+    eventLoop->timeEventNextId = 1;


modify reason: valkey-io/valkey#1242 (comment)
actually, I don't want to go back to this modification, let's see what you think.

if we want to track if a timer event is create, we can check if timeEventHead equals NULL, right?
a small change, and it is unrelated, maybe we can avoid this change.

This modification is for removing timer by id, this modification is only for ambiguity, agree to delete it

done with f6971a9 (#371)

sundb · 2025-02-12T08:08:47Z

src/db.c

        keyobj = createStringObject(keyobj->ptr, sdslen(keyobj->ptr));
    }

-    serverLog(LL_DEBUG,"key %s %s: deleting it", (char*)keyobj->ptr, notify_type == NOTIFY_EXPIRED ? "expired" : "evicted");


@guybe7 i wanna delete this line, because sometimes when defrag test failed, this log takes up the entire screen and can only be downloaded.
https://github.com/sundb/redis/actions/runs/13263200768/job/37024279281

TBH i think this log is useful.. maybe we can just turn off debug logs in the relevant tests?

Since this test requires other debug logs.
it's ok., except for the trouble of downloading other is not a problem, let's keep it.

sundb · 2025-02-12T10:31:11Z

src/defrag.c

+    defrag.start_frag_pct = getAllocatorFragmentation(NULL);
+    defrag.timeproc_end_time = 0;
+    defrag.timeproc_overage_us = 0;
+    defrag.timeproc_id = aeCreateTimeEvent(server.el, 0, activeDefragTimeProc, NULL, NULL);


note that activeDefragCycle() doesn't trigger defragment immediately, but waits until the next timer event loop.

it is ok for me

sundb · 2025-02-12T13:10:50Z

src/defrag.c

+        long waitedUs = getMonotonicUs() - defrag.timeproc_end_time;
+        /* Given the elapsed wait time between calls, compute the necessary duty time needed to
+         * achieve the desired CPU percentage.
+         * With:  D = duty time, W = wait time, P = percent
+         * Solve:    D          P
+         *         -----   =  -----
+         *         D + W       100
+         * Solving for D:
+         *     D = P * W / (100 - P)
+         *
+         * Note that dutyCycleUs addresses starvation. If the wait time was long, we will compensate
+         * with a proportionately long duty-cycle. This won't significantly affect perceived
+         * latency, because clients are already being impacted by the long cycle time which caused
+         * the starvation of the timer. */
+        dutyCycleUs = targetCpuPercent * waitedUs / (100 - targetCpuPercent);
+


Here is a important modification, before we calculated how long the defragmentation would take based on the hz and cpu percent, but now we calculate the time difference between the last defragmentation and the start of the run (i.e. the time between two defragmentation intervals), and then calculate the time according to the cpu percent.

The design idea of active_defrag_cycle_us is to ensure that the defrag can run to this time no matter what.

if the defrag running time is less than active_defrag_cycle_us, we still run the defrag for active_defrag_cycle_us, then increse the next timer for defrag.

If the interval between defragmenting is long (the command takes a long time to execute), the defragmenting time will be allocated according to this interval.

so active_defrag_cycle_us is a lower limit.

I was thinking that we could change the design, we don't need this lower limit, in redis#13752, we would reduce cpu usage when defragment isn't effective, another reason is that I feel that if the command is really busy, we should not try to compensate for the defragmentation time, which may aggravate the delay, and defragmentation should give way to the command execution.

so active_defrag_cycle_us is a lower limit.

to be honest, it is unexpected at the first glance. generally, i consider it is upper limit.

if the defrag running time is less than active_defrag_cycle_us, we still run the defrag for active_defrag_cycle_us, then increse the next timer for defrag.

it may have conflict with redis#13752?

D = P * W / (100 - P)

so duty time is longer if wait time is longer? makes blocking issue worse

to be honest, it is unexpected at the first glance. generally, i consider it is upper limit.

me too.

it may have conflict with redis#13752?

There will be some effect, because even if we reduce the rate this time, it will eventually make up for it.

so duty time is longer if wait time is longer? makes blocking issue worse

Yes, that's what I'm worried about, and it's unlimited.

EDIT ~~i added a upper limit for duty time in 7bb496e~~
Revert the change and eliminate the lower limit in 208ba72

sundb · 2025-02-12T13:14:53Z

tests/unit/memefficiency.tcl

+    proc wait_for_defrag_stop {maxtries delay {expect_frag 0}} {
        wait_for_condition $maxtries $delay {
-            [s active_defrag_running] eq 0
+            [s active_defrag_running] eq 0 && ($expect_frag == 0 || [s allocator_frag_ratio] <= $expect_frag)


this test bug also exists before this PR, but now due to the use of timers, it will be spaced out longer than before, so some of the tests below become more fragile.

sundb · 2025-02-13T07:51:24Z

#372 was made for module to defrag global data incremental.

sundb · 2025-02-14T08:30:24Z

in the c1c1363 (#371)
I combined the target and private in the callback into context, and the context is created dynamically when the stage is created instead of using static.

Co-authored-by: ShooterIT <[email protected]>

src/defrag.c

sundb · 2025-02-18T08:24:42Z

src/defrag.c

+}
+
+/* A kvstoreHelperPreContinueFn */
+static doneStatus defragLaterStep(monotime endtime, void *ctx) {


At the same time, Add the upper limit for perodic process

Co-authored-by: ShooterIT <[email protected]>

sundb force-pushed the active-defrag branch 6 times, most recently from 1055283 to 2c4daa6 Compare February 12, 2025 07:55

Refactor of ActiveDefrag to reduce latencies

492ce51

sundb force-pushed the active-defrag branch from 2c4daa6 to 492ce51 Compare February 12, 2025 07:57

sundb commented Feb 12, 2025

View reviewed changes

sundb mentioned this pull request Feb 13, 2025

Add support for module to defrag incremental #372

Open

sundb added 2 commits February 14, 2025 16:15

Always use context for various stages

c1c1363

Refine comment for context

c432b55

sundb and others added 11 commits February 15, 2025 10:45

Refine comment

9d79bdc

Add license

a5f01ae

Simplify the release of stage list

7d4a3a0

dont use bool

091ed0a

Revert the change of timeEventNextId

f6971a9

Co-authored-by: ShooterIT <[email protected]>

Revert the license change of db.c

1b53310

Revert some test changes

6786d48

Fix test

d476272

Add free fn for stage

1569913

Move defrag_later into defragKeysCtx

339da6f

Style

3e68663

sundb requested a review from ShooterIT February 18, 2025 06:09

ShooterIT reviewed Feb 18, 2025

View reviewed changes

src/defrag.c Show resolved Hide resolved

sundb commented Feb 18, 2025

View reviewed changes

Move forward ctx pointer

7f855a1

oranagra mentioned this pull request Feb 18, 2025

Add RM_DefragRedisModuleDict module API #373

Closed

Revert the uppper limit, and eliminate the lower limit

208ba72

sundb force-pushed the active-defrag branch from e9b2c0e to 208ba72 Compare February 19, 2025 08:10

Revert minor change

b093f2b

sundb force-pushed the active-defrag branch from be5c8bd to b093f2b Compare February 19, 2025 11:49

sundb and others added 4 commits February 19, 2025 20:41

Cleanup

a3ea9ac

Use constant DEFRAG_CYCLE_US to replace config active-defrag-cycle-us

5887864

At the same time, Add the upper limit for perodic process

Revert test

266d378

Refine the comment for DEFRAG_CYCLE_US

bef633c

Co-authored-by: ShooterIT <[email protected]>

sundb closed this Mar 17, 2025

sundb mentioned this pull request Apr 14, 2025

Avoid using debug log level in tests that produce many keys redis/redis#13942

Merged

Refactor of ActiveDefrag to reduce latencies #371

Refactor of ActiveDefrag to reduce latencies #371

Uh oh!

Conversation

sundb commented Feb 10, 2025

Uh oh!

codecov-commenter commented Feb 10, 2025 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sundb Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sundb Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sundb commented Feb 13, 2025

Uh oh!

sundb commented Feb 14, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Feb 10, 2025 •

edited by codecov bot

Loading

sundb Feb 15, 2025 •

edited

Loading

sundb Feb 19, 2025 •

edited

Loading