保存到磁盘后带有量化模型的汇总管道中断

问题描述

我正在开发一个应用程序，它使用 sshleifer/distilbart-cnn-12-6 Hugging Face 模型来生成文本摘要。大模型尺寸对我的应用程序来说是一个问题，所以我决定使用 PyTorch 的 dynamic quantization 来减少它一点点。应用量化后，我遇到了一种奇怪的行为。当我保存并从磁盘重新加载模型时，模型会中断。是我做错了什么还是 transformers 中的错误？我可以补充一点，我没想到其他模型 (nateraw/bert-base-uncased-emotion) 会出现这种问题。

>>> import torch  # version 1.8.1+cpu
>>> from transformers import pipeline  # version 4.5.1
>>> sample_text = """
The Queen has conducted her first in-person royal duty since her husband,the Duke of Edinburgh,died on Friday. The monarch hosted a ceremony in which 
the Earl Peel formally stood down as Lord ChAmberlain,whose office organises
royal ceremonies. During a private event held at Windsor Castle,the Queen
accepted her former royal aide's wand and office insignia. The Royal Family is
observing two weeks of mourning. The duke's funeral will take place at Windsor
on Saturday. A royal official said members of the family would continue "to 
undertake engagements appropriate to the circumstances".
"""
>>> p = pipeline("summarization",model="sshleifer/distilbart-cnn-12-6")
>>> quantized_model = torch.quantization.quantize_dynamic(
    p.model,{torch.nn.Linear},dtype=torch.qint8
)
>>> quantized_pipeline = pipeline("summarization",model=quantized_model,tokenizer=p.tokenizer)
>>> quantized_pipeline(sample_text)
[{'summary_text': ' The Queen has been in the spotlight for the first time in the past week . During a private event at Windsor Castle,the Queen was at the centre of the royal family . The Queen is the Queen,and the Queen has a lot to do well,but the Queen is in the news this week .'}]
>>> quantized_pipeline.save_pretrained("/tmp/my_model")
>>> loaded_pipeline = pipeline("summarization",model="/tmp/my_model")
>>> loaded_pipeline(sample_text)
[{'summary_text': ' high high highhighhighhigh high high High High HighHighHighHigh high high highest highest highest lowest lowest lowest highest highesthighesthighesthighest highest highest Highest Highest Highesthighesthighest lowest lowest lows lows highs highs highs lows lows lows low low lowlowlowlow low lowLowLowLow low low lowest lowest safest safest safest safe safe safesafesafesafe safe safe safest safest safer safer safer safe safe secure secure secure secured secured secured secure securesecuresecuresecure secure secureSecureSecureSecure secure secure securing securing securing secured secured securing securing secure secure Secure Secure Secure secure secure obtain obtain obtain obtained obtained obtained obtain obtain obtaining obtaining obtaining obtain obtain attain attain attain obtain obtainGetGetGet Get Get GetGetGet get get get Get'}]

量化后生成的摘要不是很好，但它与输入文本有一些共同点。保存模型并从磁盘重新加载后，我得到的是纯粹的垃圾。

当我从磁盘加载模型时也会显示一个警告（虽然在实例化 quatized_pipeline 时没有这样的消息）：

Some weights of the model checkpoint at /tmp/my_model were not used when initializing BartForConditionalGeneration: ['model.encoder.layers.0.self_attn.k_proj.scale','model.encoder.layers.0.self_attn.k_proj.zero_point','model.encoder.layers.0.self_attn.k_proj._packed_params.dtype','model.encoder.layers.0.self_attn.k_proj._packed_params._packed_params','model.encoder.layers.0.self_attn.v_proj.scale','model.encoder.layers.0.self_attn.v_proj.zero_point','model.encoder.layers.0.self_attn.v_proj._packed_params.dtype','model.encoder.layers.0.self_attn.v_proj._packed_params._packed_params','model.encoder.layers.0.self_attn.q_proj.scale','model.encoder.layers.0.self_attn.q_proj.zero_point','model.encoder.layers.0.self_attn.q_proj._packed_params.dtype','model.encoder.layers.0.self_attn.q_proj._packed_params._packed_params','model.encoder.layers.0.self_attn.out_proj.scale','model.encoder.layers.0.self_attn.out_proj.zero_point','model.encoder.layers.0.self_attn.out_proj._packed_params.dtype','model.encoder.layers.0.self_attn.out_proj._packed_params._packed_params','model.encoder.layers.0.fc1.scale','model.encoder.layers.0.fc1.zero_point','model.encoder.layers.0.fc1._packed_params.dtype','model.encoder.layers.0.fc1._packed_params._packed_params','model.encoder.layers.0.fc2.scale','model.encoder.layers.0.fc2.zero_point','model.encoder.layers.0.fc2._packed_params.dtype','model.encoder.layers.0.fc2._packed_params._packed_params',...]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

huggingface-transformers neural-network python pytorch summarization