- 
                Notifications
    You must be signed in to change notification settings 
- Fork 47
Revise Save/Restore for true pit snapshot. #401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0a657fd    to
    450e3b3      
    Compare
  
    Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
7b0fe82    to
    923c3d4      
    Compare
  
    Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
| absl::StatusOr<T> LoadObject() { | ||
| VMSDK_ASSIGN_OR_RETURN(auto buffer, LoadChunk()); | ||
| if (buffer->size() != sizeof(T)) { | ||
| return absl::InternalError("Mismatched size protocol error"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider asserting using CHECK or DCHECK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think crashing is the correct response to an invalid RDB file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to use DCHECK such that we'll get a clear signal for such in the lab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can support that. I thought you were complaining about the enable_if ugliness. Done.
        
          
                src/index_schema.cc
              
                Outdated
          
        
      | if (index_schema) { | ||
| VMSDK_RETURN_IF_ERROR(index_schema->LoadIndexExtension( | ||
| ctx, RDBChunkInputStream(supplemental_iter.IterateChunks()))); | ||
| bool backfilling = | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there is no point in defining a variable if used only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which variable? backfilling is used in the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but my point is that it is used just by the next line. I'm suggesting to change the next line to:
if (!supplemental_content->mutation_queue_header().backfilling()) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect I was printing it earlier. But I changed it as suggested.
| break; | ||
| } | ||
| } | ||
| size_t oracle_key_count = | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we break if non-vector index if found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that exactly what this does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oracle_index would point to the first vector index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest update. No validation when there aren't any non-vector fields. tests were extended to include vector-only, non-vector-only and mixed indexes.
| I might had missed it, but I haven't seen that the queued multi/exec are being serialized as well. | 
| 
 Good catch. | 
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Signed-off-by: Allen Samuels <[email protected]>
Revises the save/restore logic to support saving of all index data and pending mutation queue as discussed earlier.
This new logic is fully backward and forward compatible. RDB files written using the previous binaries (aka V1) will still be loaded and subject to the same stale data correction and forced backfill logic as existed before this PR. In other words, for RDB files written with the old code, the new code handles them exactly the same.
New format files (aka V2) contain additional data that will be ignored by old code. In other words, V2 RDB files can still be loaded into prior code which will ignore the extra data, i.e., they are treated like V1 files and be subjected to stale data correction and forced backfill.
V2 files which are loaded into the new code are not subjected to stale data correction and forced backfill. Once a V2 file is loaded it can "open for business" immediately.
The V2 format contains an additional section for the contents of non-vector indexes (Tags, Numeric). These sections contain the key and attribute-value for each entry in the index. This external format differs from the internal format, allowing the internal data structures to be freely optimized in the future.
The V2 format also has a section for the mutation queue and index backfill status. This additional section records the keys that were in the mutation queue at the time of the save as well as the index backfill status (complete or incomplete). This allows the recreation of the mutation queue and backfill status (backfill needed or not) to match the state at the time of the save. (One small quirk is that the saved mutation queue doesn't bother to record whether the saved keys are in the mutation queue due to a backfill or not. This behavior should not affect the client-visible semantics, but would affect internal counters for the ingestion pipeline as they would be mislead by the source of the mutation.
Two controls are provided to override the normal behavior of this code.
The configurable:
search.rdb_write_v2(default "yes") controls whether a generated RDB file is in the V1 or V2 format.The configurable:
search.rdb_read_v2(default "yes") controls whether the V2 data is used. Setting this to "no" will force the code to treat a V2 file as a V1 file, i.e., to ignore the extra V1 data.FIxes #41