-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
在论文里,初始每一个 cluster 的采样权重是均分的,都为1/k,后续每一轮各个cluster的采样权重是根据 reward_diff 的 softmax 值对上一轮的 weight 做加权。
但从 iter.py 里的实现代码来看,是直接用了reward_diff 的 softmax 值作为了下一轮的采样权重。想问下论文中的结果是使用的哪种方式?
def select_new_iter(rewards_gathered, dataset, indices_path, calculate_method="exp_reward_diff"):
# ....(省略)
merged_df = subset_df.merge(rewards_df, left_index=True, right_index=True)
merged_df = merged_df.groupby('cluster')['reward_diff'].mean().reset_index()
if calculate_method == "ppl":
merged_df['exp_reward_diff'] = merged_df['reward_diff']
else:
merged_df['exp_reward_diff'] = np.exp(merged_df['reward_diff'])
merged_df['exp_reward_diff'] = merged_df['exp_reward_diff'] / merged_df['exp_reward_diff'].sum()
size = (len(dataset) * portion) / K / round
exp_reward_diff = merged_df['exp_reward_diff']
# 下面这一行是直接用了 softmax 做新一轮采样 weight,没有乘以上一轮weight,与论文不一样?
select_new_iter = np.random.choice(K, size=int(size), p=exp_reward_diff, replace=True)
selected_clusters_size = Counter(select_new_iter)
# ....(省略)Metadata
Metadata
Assignees
Labels
No labels