昇思25天学习打卡营第17天|文本解码原理--以MindNLP为例

文本解码就是根据当前已经输入的内容不断地预测下一个词,前期通过大量的文本文章等输入,让模型学习好以后,根据已学习的内容,不断预测下一个词。就像鹦鹉学舌一样你不断的叫他说你好大帅哥,你好大帅哥。后面某一天,当你说你好的时候,他会自然的接着说大帅哥。文本解码同理。
不过内容量会大很多,除了会说你好大帅哥,也会说你好大美女。那AI是怎么知道应该说哪个。他会看前文,因为我们喂给他文章里面,“女”这个词总是关联出现大美女,所以当前面出现女,接着说你好的时候,他就知道大美女的概率高于大帅哥,就是优先出现大帅哥。

import mindspore
from mindnlp.transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("iiBcai/gpt2", mirror='modelscope')

# add the EOS token as PAD token to avoid warnings
model = GPT2LMHeadModel.from_pretrained("iiBcai/gpt2", pad_token_id=tokenizer.eos_token_id, mirror='modelscope')

# encode context the generation is conditioned on
input_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors='ms')

mindspore.set_seed(0)
# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
sample_outputs = model.generate(
    input_ids,
    do_sample=True,
    max_length=50,
    top_k=5,
    top_p=0.95,
    num_return_sequences=3
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

Output:
----------------------------------------------------------------------------------------------------
0: I enjoy walking with my cute dog.

"My dog loves the smell of the dog. I'm so happy that she's happy with me.

"I love to walk with my dog. I'm so happy that she's happy
1: I enjoy walking with my cute dog. I'm a big fan of my cat and her dog, but I don't have the same enthusiasm for her. It's hard not to like her because it is my dog.

My husband, who
2: I enjoy walking with my cute dog, but I'm also not sure I would want my dog to walk alone with me."

She also told The Daily Beast that the dog is very protective.

"I think she's very protective of

类似 这个示例,当输入I enjoy walking with my cute dog的时候,AI会一直续写下去,总体看上去,效果还是很不错的。

最近更新

  1. docker php8.1+nginx base 镜像 dockerfile 配置

    2024-07-12 04:02:04       67 阅读
  2. Could not load dynamic library ‘cudart64_100.dll‘

    2024-07-12 04:02:04       72 阅读
  3. 在Django里面运行非项目文件

    2024-07-12 04:02:04       58 阅读
  4. Python语言-面向对象

    2024-07-12 04:02:04       69 阅读

热门阅读

  1. 代码随想录算法训练营第9天

    2024-07-12 04:02:04       25 阅读
  2. 担心插座预留的不够用,家里装修留多少开关插座

    2024-07-12 04:02:04       19 阅读
  3. Vue路由传参和接参如何实现

    2024-07-12 04:02:04       26 阅读
  4. android轮播图入门2——触摸停止与指示器

    2024-07-12 04:02:04       24 阅读
  5. Symfony 是一个用于构建PHP的框架

    2024-07-12 04:02:04       26 阅读
  6. 利用反射API时的代码注入风险与防护指南

    2024-07-12 04:02:04       19 阅读
  7. python为什么慢?(自用)

    2024-07-12 04:02:04       21 阅读
  8. F1-score

    2024-07-12 04:02:04       18 阅读