ReplayCache #10

jochembroekhoff · 2023-03-09T17:04:35Z

No description provided.

souravmohapatra · 2023-03-13T19:01:28Z

icemu/plugins/replay_cache_plugin/AsyncWriteBackCache.hpp

+        // If we don't write a full cache line, we first need to read it to fill
+        // in the missing bytes
+        if (req.size != 4) {
+          // TODO: additional costs


Just curious, what is the additional cost here?

This is just a placeholder, copied from where there was an actual cost increment in your cache implementation.
Iirc applies if we write e.g. a byte, however we need to fetch the remaining 3 bytes first to have a word to update and write back.

Ah okay. I had that cost somewhere else.

What piece of code are you referring to then? Because I inherited this from CacheMem.hpp:488-496.

Oh, this is something that @iiKoe added later, I think.

souravmohapatra · 2023-03-13T19:03:27Z

icemu/plugins/replay_cache_plugin/AsyncWriteBackCache.hpp

+
+    // Perform the eviction based on the policy.
+    switch (policy) {
+      case LRU:


I do not remember the eviction policy the ReplayCache paper used, did they use a pseudorandom eviction policy as well? (I am maybe confusing with some other paper)

I will take a look and confirm what, if any, policy they mention.

souravmohapatra · 2023-03-13T19:23:06Z

icemu/plugins/replay_cache_plugin/AsyncWriteBackCache.hpp

+    }
+    writeback_queue.clear();
+
+    return max_cycles;


I am slightly confused, is the return "max_cycle" a sum of all cycles spent writing back or is it the maximum cycles of the pending cycles of the writebacks (the @return doc and code behavior does not seem to match). I would assume the line 154 should be max_cycles += ... instead of just "="?

Maybe I am missing something?

Perhaps I should add a clarifying comment, but my current understanding of how this works is that, even if there are many pending writebacks, it will only take at most the amount of cycles of the longest pending writeback to fully write back all items due to the async/pipelined nature.
I could also add an invariant check somewhere that asserts that each next item in the queue has a pending_cycles value that is strictly smaller than the previous one.
In other words, it would suffice to rewrite this function to just peeking at the end of the queue and returning its pending_cycles value.

Got it. I was not aware that it does the write back parallely.

Now I think about it, I am unsure. I'll mark this as a point of discussion for today's meeting.

souravmohapatra · 2023-03-13T19:32:49Z

The clwb and fence looks logical (bar the small comments) as of now, assuming the compiler gives correct insn for the region formation. The normal cache implementation also looks good. Do we have compiler support as well (to run and test if all asserts work?)

jochembroekhoff · 2023-03-14T07:42:55Z

No, I intend to only start working on compiler support in a few weeks when most of the cache has been verified with manual examples.

This is my current assembly input (as a custom benchmark) that I used to test the very basics of my implementation:

#define RC_START_REGION auipc x31, 0
#define RC_EXPLICITLY_END_REGION auipc x31, 1
#define RC_CLWB auipc x31, 2
#define RC_FENCE auipc x31, 3
#define RC_POWER_FAILURE_NEXT auipc x31, 4

.data
x: .skip 4
y: .skip 4

.text

.global main
main:
    RC_START_REGION
    li t1, 1

    // End the current region before a (conditional) branch,
    // the branch target will start a new region
    RC_FENCE
    RC_EXPLICITLY_END_REGION

    j a

a:
    // This is a branch target, so it will start a new region
    RC_START_REGION

    // ...
    li a0, 1
    la t0, x
    sw a0, 0(t0)
    RC_CLWB

    // No need to end the region when jumping out of a branch target,
    // because the destination we jump to will already have a region boundary
    j end

b:
    // This is a branch target, so it will start a new region
    RC_START_REGION

    // ...
    li a0, 2
    la t0, y
    sw a0, 0(t0)
    RC_CLWB

    // We directly flow to the end of the conditional branch,
    // so we do not need to end the region here either

end:
    // Request a power failure for demonstration purposes
    RC_POWER_FAILURE_NEXT

    // Regular boundary
    RC_FENCE
    RC_START_REGION

    // ...
    addi a0, a0, 10

    // End the region before returning, as that is just a form of branching
    RC_FENCE
    RC_EXPLICITLY_END_REGION

    ret

souravmohapatra · 2023-03-16T14:07:36Z

icemu/plugins/replay_cache_plugin/AsyncWriteBackCache.hpp

        if (req.size != 4) {
-          // TODO: additional costs
+          // TODO: in CacheMem.hpp, all 4 bytes are read, here we only read the remaining bytes.
+          //       Is that behavior correct?


A cache always reads/writes in blocks right? The exact location in that block is then indicated by the "offset".

@iiKoe we always wrote 4 bytes iirc right?

We should discuss this in our meeting 😄

icemu/plugins/replay_cache_plugin/AsyncWriteBackCache.hpp

souravmohapatra · 2023-03-16T14:14:58Z

icemu/plugins/replay_cache_plugin/AsyncWriteBackCache.hpp

        writeToCache(line);

-        // TODO: stats
+        if (stats) stats->incCacheWrites(line.blocks.size);


Consider moving this incWrites to inside writeToCache(). Then it becomes slightly less error prone

I can do that, but then it is not consistent with the MEM_READ part of this handleHit function.

…nforcement

Repo cleanup and upgrade to LLVM 16

This results in significant speedup of about 50% for long-running benchmarks.

This prevents error with relocations due to basic blocks being too far removed from each other due to the insertion of ReplayCache instructions.

jochembroekhoff added 3 commits March 2, 2023 09:48

[plugins] Add boilerplate for ReplayCache plugin

2668059

[plugins] Prepare for new async wrireback cache

81e9e47

[plugins] Implement basic async writeback cache

dc6ec74

souravmohapatra reviewed Mar 13, 2023

View reviewed changes

jochembroekhoff added 4 commits March 14, 2023 08:56

[plugins] Add some clarifying comments

c9cf754

[plugins] Implement dynamic cache configuration

3906cb5

[plugins] Estimate ReplayCache cycle costs

96ceb7c

[plugins] Report basic statistics

13b1782

souravmohapatra reviewed Mar 16, 2023

View reviewed changes

icemu/plugins/replay_cache_plugin/AsyncWriteBackCache.hpp Show resolved Hide resolved

souravmohapatra reviewed Mar 16, 2023

View reviewed changes

jochembroekhoff added 2 commits March 16, 2023 14:44

[plugins] Gather various additional statistics

0575216

[plugins] Implement WB queue size and parallelism configuration and e…

59fc9db

…nforcement

jochembroekhoff changed the title ~~Replay Cache~~ ReplayCache Mar 21, 2023

jochembroekhoff and others added 12 commits March 22, 2023 16:08

[plugins] Fail correctly for pending WB requests

c658e08

[plugins] Fix AWBC compilation

5864fb4

[plugins] Detect infinite checkpoint loops

d2cf32d

[chore] Remove chipyard, noelle and iclang

986a20b

[chore] Make build scripts CMake generator-agnostic

ed916ee

[chore] Remove passes directory

cb95b1b

[chore] Revise benchmark build scripts

ec5c0d3

Move to LLVM 16.0.2

475c16c

Merge pull request #11 from TUDSSL/repo-cleanup

65d2c69

Repo cleanup and upgrade to LLVM 16

[icemu] Update ICEMu

85d596a

[plotting] Support plotting from Docker container

e074e69

[docker] Move to Ubuntu 22.04

806364b

jochembroekhoff added 15 commits May 3, 2023 15:27

[llvm] Download pre-built binary version

a57aba7

[llvm] Skip libcxx benchmarks

aaeca89

[llvm] Implement diff calculation

cfd8024

[benchmarks] Account for ReplayCache

a9d4b45

[plugins] Run CLWB during recovery

73faf4a

[plugins] Add continuous logging for ReplayCache

2fe2b9d

[plugins] Optimize p_debug

2dee7ab

This results in significant speedup of about 50% for long-running benchmarks.

[benchmarks] Build with link-time optimization

f6ccb44

[benchmarks] Fix running ReplayCache PF targets

a9e58de

[plugins] ReplayCache: fix compressed store parse

c100f02

[plugins] Remove end-region command and cleanup

5ff37af

[plugins] Print debug info with "[awbc]" leader

506d04c

[plugins] Assert no store-after-fence

776944e

[plugins,llvm] Use ADDI/C.LI instead of AUIPC

b3d2c50

[replaycache] Implement proof-of-concept of final pass

c58b8f9

jochembroekhoff force-pushed the replay-cache branch from aad2eac to c58b8f9 Compare June 14, 2023 17:46

jochembroekhoff added 6 commits June 14, 2023 20:50

[replaycache] Don't generate CLWB if there is no next instr

bd3753a

[replaycache] Add some comments

3dbc47e

[replaycache] Move final pass to before branch relaxation

5b488d5

This prevents error with relocations due to basic blocks being too far removed from each other due to the insertion of ReplayCache instructions.

[replaycache] Perform manual branch analysis

e269eb1

[replaycache] Add store integrity sanity check

e48c9ad

Update README.md

d1c90fc

ReplayCache #10

Are you sure you want to change the base?

ReplayCache #10

Uh oh!

Conversation

jochembroekhoff commented Mar 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

souravmohapatra commented Mar 13, 2023

Uh oh!

jochembroekhoff commented Mar 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants