Skip to content

Conversation

@jochembroekhoff
Copy link
Collaborator

No description provided.

// If we don't write a full cache line, we first need to read it to fill
// in the missing bytes
if (req.size != 4) {
// TODO: additional costs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what is the additional cost here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a placeholder, copied from where there was an actual cost increment in your cache implementation.
Iirc applies if we write e.g. a byte, however we need to fetch the remaining 3 bytes first to have a word to update and write back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay. I had that cost somewhere else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What piece of code are you referring to then? Because I inherited this from CacheMem.hpp:488-496.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is something that @iiKoe added later, I think.


// Perform the eviction based on the policy.
switch (policy) {
case LRU:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not remember the eviction policy the ReplayCache paper used, did they use a pseudorandom eviction policy as well? (I am maybe confusing with some other paper)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take a look and confirm what, if any, policy they mention.

}
writeback_queue.clear();

return max_cycles;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly confused, is the return "max_cycle" a sum of all cycles spent writing back or is it the maximum cycles of the pending cycles of the writebacks (the @return doc and code behavior does not seem to match). I would assume the line 154 should be max_cycles += ... instead of just "="?

Maybe I am missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I should add a clarifying comment, but my current understanding of how this works is that, even if there are many pending writebacks, it will only take at most the amount of cycles of the longest pending writeback to fully write back all items due to the async/pipelined nature.
I could also add an invariant check somewhere that asserts that each next item in the queue has a pending_cycles value that is strictly smaller than the previous one.
In other words, it would suffice to rewrite this function to just peeking at the end of the queue and returning its pending_cycles value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I was not aware that it does the write back parallely.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think about it, I am unsure. I'll mark this as a point of discussion for today's meeting.

@souravmohapatra
Copy link
Collaborator

The clwb and fence looks logical (bar the small comments) as of now, assuming the compiler gives correct insn for the region formation. The normal cache implementation also looks good. Do we have compiler support as well (to run and test if all asserts work?)

@jochembroekhoff
Copy link
Collaborator Author

No, I intend to only start working on compiler support in a few weeks when most of the cache has been verified with manual examples.

This is my current assembly input (as a custom benchmark) that I used to test the very basics of my implementation:

#define RC_START_REGION auipc x31, 0
#define RC_EXPLICITLY_END_REGION auipc x31, 1
#define RC_CLWB auipc x31, 2
#define RC_FENCE auipc x31, 3
#define RC_POWER_FAILURE_NEXT auipc x31, 4

.data
x: .skip 4
y: .skip 4

.text

.global main
main:
    RC_START_REGION
    li t1, 1

    // End the current region before a (conditional) branch,
    // the branch target will start a new region
    RC_FENCE
    RC_EXPLICITLY_END_REGION

    j a

a:
    // This is a branch target, so it will start a new region
    RC_START_REGION

    // ...
    li a0, 1
    la t0, x
    sw a0, 0(t0)
    RC_CLWB

    // No need to end the region when jumping out of a branch target,
    // because the destination we jump to will already have a region boundary
    j end

b:
    // This is a branch target, so it will start a new region
    RC_START_REGION

    // ...
    li a0, 2
    la t0, y
    sw a0, 0(t0)
    RC_CLWB

    // We directly flow to the end of the conditional branch,
    // so we do not need to end the region here either

end:
    // Request a power failure for demonstration purposes
    RC_POWER_FAILURE_NEXT

    // Regular boundary
    RC_FENCE
    RC_START_REGION

    // ...
    addi a0, a0, 10

    // End the region before returning, as that is just a form of branching
    RC_FENCE
    RC_EXPLICITLY_END_REGION

    ret

if (req.size != 4) {
// TODO: additional costs
// TODO: in CacheMem.hpp, all 4 bytes are read, here we only read the remaining bytes.
// Is that behavior correct?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cache always reads/writes in blocks right? The exact location in that block is then indicated by the "offset".

@iiKoe we always wrote 4 bytes iirc right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss this in our meeting 😄

writeToCache(line);

// TODO: stats
if (stats) stats->incCacheWrites(line.blocks.size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this incWrites to inside writeToCache(). Then it becomes slightly less error prone

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but then it is not consistent with the MEM_READ part of this handleHit function.

@jochembroekhoff jochembroekhoff changed the title Replay Cache ReplayCache Mar 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants