Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions vmsdk/src/cluster_map.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
/*
* Copyright (c) 2025, valkey-search contributors
* All rights reserved.
* SPDX-License-Identifier: BSD 3-Clause
*
*/

#ifndef VMSDK_SRC_CLUSTER_MAP_H_
#define VMSDK_SRC_CLUSTER_MAP_H_

#include <bitset>
#include <string>
#include <vector>

#include "src/query/fanout_template.h"
#include "src/valkeymodule.h"

namespace vmsdk {

struct ShardInfo {
// shard_id is the primary node id
std::string shard_id;
std::string primary_address;
std::vector<std::string> replica_addresses;
std::vector<uint16_t> owned_slots;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe an ordered set would be more useful than a vector. When you fingerprint, it will eliminate any order dependency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be an unordered_set or a set here, is the order matter here?

// Hash of owned_slots vector
uint64_t slots_fingerprint;
};

class ClusterMap {
// flexible to add other getter methods
public:
// create a new cluster map in the background
static std::shared_ptr<ClusterMap> CreateNewClusterMap(ValkeyModuleCtx* ctx);

// slot ownership checks
bool IsSlotOwned(uint16_t slot) const;

// shard lookups
const ShardInfo* GetShardById(const std::string& shard_id) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is shard-not-found signalled? Also, use string_view to pass in string parameters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If shard not found then this function would return nullptr. Another choice is to return std::optional

const std::string& GetShardIdBySlot(uint16_t slot) const;
const absl::flat_hash_map<std::string, ShardInfo>& GetAllShards() const;

// get cluster level slot fingerprint
uint64_t GetClusterSlotsFingerprint() const;

// get fingerprint for a specific shard
uint64_t GetShardSlotsFingerprint(const std::string& shard_id) const;

private:
// 1: slot is owned by this cluster, 0: slot is not owned by this cluster
std::bitset<16384> owned_slots_;

// slot-to-shard lookup
std::array<std::string, 16384> slot_to_shard_id_;

absl::flat_hash_map<std::string, ShardInfo> shards_;

// Cluster-level fingerprint (hash of all shard fingerprints)
uint64_t cluster_slots_fingerprint_;

// Pre-computed target lists
std::vector<valkey_search::query::fanout::FanoutSearchTarget>
primary_targets_;
std::vector<valkey_search::query::fanout::FanoutSearchTarget>
replica_targets_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any interfaces that get any of these. Also, in looking at FanoutSearchTarget, I'm wondering how that's different from some per-Node information structure which belongs to this class. I'd recommend moving that struct into this class and naming it something like NodeInfo (parallel to ShardInfo).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can create a NodeInfo Struct in this class. Also all the current FanoutSearchTarget code can be moved entirely into the cluster map object, including the selection of nodes (kPrimary/kReplica/kRandom/kAll)

std::vector<valkey_search::query::fanout::FanoutSearchTarget> random_targets_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does random_targets make sense here? How do we randomize from one call to the next call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For here I think kRandom would just return the same set of nodes in a certain period. When the CreateNewClusterMap being called and refresh the cluster map, another set of random nodes are get selected.

std::vector<valkey_search::query::fanout::FanoutSearchTarget> all_targets_;
};

} // namespace vmsdk

#endif // VMSDK_SRC_CLUSTER_MAP_H_
Loading