-
Notifications
You must be signed in to change notification settings - Fork 0
ReplayCache #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
ReplayCache #10
Conversation
| // If we don't write a full cache line, we first need to read it to fill | ||
| // in the missing bytes | ||
| if (req.size != 4) { | ||
| // TODO: additional costs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what is the additional cost here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a placeholder, copied from where there was an actual cost increment in your cache implementation.
Iirc applies if we write e.g. a byte, however we need to fetch the remaining 3 bytes first to have a word to update and write back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay. I had that cost somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What piece of code are you referring to then? Because I inherited this from CacheMem.hpp:488-496.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is something that @iiKoe added later, I think.
|
|
||
| // Perform the eviction based on the policy. | ||
| switch (policy) { | ||
| case LRU: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not remember the eviction policy the ReplayCache paper used, did they use a pseudorandom eviction policy as well? (I am maybe confusing with some other paper)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a look and confirm what, if any, policy they mention.
| } | ||
| writeback_queue.clear(); | ||
|
|
||
| return max_cycles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am slightly confused, is the return "max_cycle" a sum of all cycles spent writing back or is it the maximum cycles of the pending cycles of the writebacks (the @return doc and code behavior does not seem to match). I would assume the line 154 should be max_cycles += ... instead of just "="?
Maybe I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I should add a clarifying comment, but my current understanding of how this works is that, even if there are many pending writebacks, it will only take at most the amount of cycles of the longest pending writeback to fully write back all items due to the async/pipelined nature.
I could also add an invariant check somewhere that asserts that each next item in the queue has a pending_cycles value that is strictly smaller than the previous one.
In other words, it would suffice to rewrite this function to just peeking at the end of the queue and returning its pending_cycles value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I was not aware that it does the write back parallely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think about it, I am unsure. I'll mark this as a point of discussion for today's meeting.
|
The clwb and fence looks logical (bar the small comments) as of now, assuming the compiler gives correct insn for the region formation. The normal cache implementation also looks good. Do we have compiler support as well (to run and test if all asserts work?) |
|
No, I intend to only start working on compiler support in a few weeks when most of the cache has been verified with manual examples. This is my current assembly input (as a custom benchmark) that I used to test the very basics of my implementation: #define RC_START_REGION auipc x31, 0
#define RC_EXPLICITLY_END_REGION auipc x31, 1
#define RC_CLWB auipc x31, 2
#define RC_FENCE auipc x31, 3
#define RC_POWER_FAILURE_NEXT auipc x31, 4
.data
x: .skip 4
y: .skip 4
.text
.global main
main:
RC_START_REGION
li t1, 1
// End the current region before a (conditional) branch,
// the branch target will start a new region
RC_FENCE
RC_EXPLICITLY_END_REGION
j a
a:
// This is a branch target, so it will start a new region
RC_START_REGION
// ...
li a0, 1
la t0, x
sw a0, 0(t0)
RC_CLWB
// No need to end the region when jumping out of a branch target,
// because the destination we jump to will already have a region boundary
j end
b:
// This is a branch target, so it will start a new region
RC_START_REGION
// ...
li a0, 2
la t0, y
sw a0, 0(t0)
RC_CLWB
// We directly flow to the end of the conditional branch,
// so we do not need to end the region here either
end:
// Request a power failure for demonstration purposes
RC_POWER_FAILURE_NEXT
// Regular boundary
RC_FENCE
RC_START_REGION
// ...
addi a0, a0, 10
// End the region before returning, as that is just a form of branching
RC_FENCE
RC_EXPLICITLY_END_REGION
ret |
| if (req.size != 4) { | ||
| // TODO: additional costs | ||
| // TODO: in CacheMem.hpp, all 4 bytes are read, here we only read the remaining bytes. | ||
| // Is that behavior correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cache always reads/writes in blocks right? The exact location in that block is then indicated by the "offset".
@iiKoe we always wrote 4 bytes iirc right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should discuss this in our meeting 😄
| writeToCache(line); | ||
|
|
||
| // TODO: stats | ||
| if (stats) stats->incCacheWrites(line.blocks.size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider moving this incWrites to inside writeToCache(). Then it becomes slightly less error prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but then it is not consistent with the MEM_READ part of this handleHit function.
Repo cleanup and upgrade to LLVM 16
This results in significant speedup of about 50% for long-running benchmarks.
aad2eac to
c58b8f9
Compare
This prevents error with relocations due to basic blocks being too far removed from each other due to the insertion of ReplayCache instructions.
No description provided.