大模型生成回复解决“复读机”现象的原理

前言

在阅读本文前，建议熟悉 PyTorch 常用算子。以下列举核心算子及其功能：

torch.where(condition, x, y)：根据条件选择张量元素。
- condition：条件掩码。
- x：条件为 True 时选择的值。
- y：条件为 False 时选择的值。
Tensor.scatter_(dim, index, src)：将源张量按索引写入输出张量。
torch.gather(input, dim, index)：scatter 的反向操作，用于提取特定索引的值。
torch.sort(input, dim=-1, descending=False)：对张量进行排序。
torch.softmax(input, dim=None)：计算 Softmax 概率分布。
cumsum(input, dim, dtype=None)：累加求和。
Tensor.masked_fill(mask, value)：根据掩码填充指定值。
torch.topk(input, k, dim=None)：返回 k 个最大值及其索引。
torch.multinomial(input, num_samples, replacement=False)：根据概率分布抽取样本。
torch.div(input, other, rounding_mode=None)：除法运算。

模型生成策略概述

大模型通常继承自 PreTrainedModel，预测时调用 GenerationMixin 的 generate 方法。模型生成回答主要涉及以下几种搜索与采样方法：

Contrastive Search

Contrastive Search 是一种改进的解码策略，旨在平衡生成的流畅性与多样性。它通过对比当前 token 与历史上下文的差异来避免重复，具体实现略。

Multinomial Sampling（多项式采样）

与总是选择概率最高的标记作为下一个标记的贪婪搜索不同，Multinomial Sampling 根据模型给出的整个词汇表的概率分布随机选择下一个标记。这增加了生成的随机性，有助于打破死循环。

只需将 do_sample 设为 True 即可启用：

outputs = model.generate(**inputs, do_sample=True, num_beams=1, max_new_tokens=100)

源码逻辑如下：

while True:
    # forward pass to get next token
    outputs = self(
        **model_inputs,
        return_dict=True,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
    )

    next_token_logits = outputs.logits[:, -1, :]

    next_token_scores = logits_processor(input_ids, next_token_logits)
    next_token_scores = logits_warper(input_ids, next_token_scores)

    probs = nn.functional.softmax(next_token_scores, dim=-)
    next_tokens = torch.multinomial(probs, num_samples=).squeeze()

大模型生成回复解决“复读机”现象的原理

前言

模型生成策略概述

Contrastive Search

Multinomial Sampling（多项式采样）

更多推荐文章

相关免费在线工具

Beam Search 的实现

解码参数与重复惩罚机制

Temperature（温度）

Top-P (Nucleus Sampling)

Top-K

Repetition Penalty（重复性惩罚）

总结

更多推荐文章

相关免费在线工具

大模型生成回复解决“复读机”现象的原理

前言

模型生成策略概述

Contrastive Search

Multinomial Sampling（多项式采样）

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

Beam Search 的实现

解码参数与重复惩罚机制

Temperature（温度）

Top-P (Nucleus Sampling)

Top-K

Repetition Penalty（重复性惩罚）

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具