Skip to content

KMQ-Iterative 下新一轮采样的 weight 更新方式实现与论文不一致? #10

@Aurelius84

Description

@Aurelius84

在论文里,初始每一个 cluster 的采样权重是均分的,都为1/k,后续每一轮各个cluster的采样权重是根据 reward_diff 的 softmax 值对上一轮的 weight 做加权。

但从 iter.py 里的实现代码来看,是直接用了reward_diff 的 softmax 值作为了下一轮的采样权重。想问下论文中的结果是使用的哪种方式?

def select_new_iter(rewards_gathered, dataset, indices_path, calculate_method="exp_reward_diff"):
    # ....(省略)
    merged_df = subset_df.merge(rewards_df, left_index=True, right_index=True)
    merged_df = merged_df.groupby('cluster')['reward_diff'].mean().reset_index()
    if calculate_method == "ppl":
        merged_df['exp_reward_diff'] = merged_df['reward_diff']
    else:
        merged_df['exp_reward_diff'] = np.exp(merged_df['reward_diff'])
        merged_df['exp_reward_diff'] = merged_df['exp_reward_diff'] / merged_df['exp_reward_diff'].sum()
    size = (len(dataset) * portion) / K / round
    exp_reward_diff = merged_df['exp_reward_diff']
    
    # 下面这一行是直接用了 softmax 做新一轮采样 weight,没有乘以上一轮weight,与论文不一样?
    select_new_iter = np.random.choice(K, size=int(size), p=exp_reward_diff, replace=True)  
    selected_clusters_size = Counter(select_new_iter)
    # ....(省略)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions