问题描述
我通过TFRobertaModel.frompretrained('Roberta-base')加载Roberta模型,并使用Keras对其进行训练。我在Roberta上还有其他层,我需要使用所有参数初始化裸露的Roberta。我在Colab上运行代码,由于加载Roberta已有几周的时间了,我曾经收到以下警告,但是尽管“ lm_head”权重未初始化,但一切正常,模型正在正确训练:
Some weights of the model checkpoint at Roberta-base were not used when initializing ROBERTA: [‘lm_head’]
但是现在,我认为colab上的转换器版本已更改,因为我收到了使用相同代码的新警告,这表明未初始化更多的编码器和偏置层,因此损耗函数没有减少:
Some layers from the model checkpoint at roberta-base were not used when initializing ROBERTA: ['lm_head','encoder/layer_._3/attention/self/value/bias:0','encoder/layer_._10/attention/self/value/bias:0','encoder/layer_._10/attention/self/key/kernel:0','pooler/dense/bias:0','encoder/layer_._9/attention/self/query/kernel:0','encoder/layer_._10/attention/self/query/kernel:0','encoder/layer_._7/attention/output/dense/bias:0','embeddings/position_embeddings/embeddings:0','encoder/layer_._6/intermediate/dense/kernel:0','encoder/layer_._11/intermediate/dense/kernel:0','encoder/layer_._8/intermediate/dense/bias:0','encoder/layer_._10/attention/self/value/kernel:0','encoder/layer_._7/output/dense/bias:0','encoder/layer_._6/attention/self/value/bias:0','encoder/layer_._8/attention/output/dense/kernel:0','encoder/layer_._10/intermediate/dense/kernel:0','encoder/layer_._5/attention/self/value/kernel:0','encoder/layer_._6/attention/output/Layernorm/gamma:0','encoder/layer_._7/attention/self/query/kernel:0','encoder/layer_._6/attention/self/query/kernel:0','encoder/layer_._6/attention/self/key/bias:0','encoder/layer_._8/attention/output/Layernorm/gamma:0','encoder/layer_._2/output/dense/kernel:0','encoder/layer_._11/intermediate/dense/bias:0','encoder/layer_._6/output/dense/kernel:0','encoder/layer_._2/intermediate/dense/kernel:0','encoder/layer_._3/intermediate/dense/kernel:0','encoder/layer_._10/output/Layernorm/beta:0','encoder/layer_._6/attention/self/query/bias:0','encoder/layer_._6/attention/output/Layernorm/beta:0','encoder/layer_._9/attention/self/value/bias:0','encoder/layer_._8/attention/self/query/kernel:0','encoder/layer_._0/output/Layernorm/gamma:0','encoder/layer_._11/attention/output/dense/bias:0','encoder/layer_._7/attention/self/value/bias:0','encoder/layer_._0/attention/output/dense/kernel:0','encoder/layer_._9/intermediate/dense/bias:0','encoder/layer_._2/attention/self/query/kernel:0','encoder/layer_._0/attention/self/key/bias:0','encoder/layer_._8/attention/output/Layernorm/beta:0','encoder/layer_._1/attention/self/value/kernel:0','encoder/layer_._6/output/Layernorm/gamma:0','encoder/layer_._1/attention/output/dense/bias:0','encoder/layer_._3/attention/self/query/bias:0','encoder/layer_._3/output/dense/bias:0','encoder/layer_._1/attention/self/key/kernel:0','encoder/layer_._8/attention/self/key/kernel:0','encoder/layer_._9/intermediate/dense/kernel:0','encoder/layer_._3/output/dense/kernel:0','encoder/layer_._2/output/Layernorm/beta:0','encoder/layer_._7/attention/self/key/bias:0','encoder/layer_._5/attention/self/key/kernel:0','encoder/layer_._5/attention/self/query/bias:0','encoder/layer_._2/attention/output/dense/bias:0','encoder/layer_._4/intermediate/dense/kernel:0','encoder/layer_._1/intermediate/dense/bias:0','encoder/layer_._4/attention/self/value/kernel:0','encoder/layer_._11/attention/self/key/bias:0','encoder/layer_._5/output/dense/kernel:0','encoder/layer_._1/output/dense/bias:0','encoder/layer_._0/attention/self/value/bias:0','encoder/layer_._6/attention/self/key/kernel:0','encoder/layer_._9/attention/self/key/bias:0','encoder/layer_._7/output/Layernorm/gamma:0','encoder/layer_._8/attention/output/dense/bias:0','encoder/layer_._10/attention/output/dense/bias:0','encoder/layer_._0/intermediate/dense/kernel:0','encoder/layer_._5/intermediate/dense/kernel:0','encoder/layer_._11/attention/self/value/kernel:0','encoder/layer_._8/attention/self/key/bias:0','encoder/layer_._8/output/dense/bias:0','encoder/layer_._8/intermediate/dense/kernel:0','encoder/layer_._7/attention/output/Layernorm/beta:0','encoder/layer_._2/output/dense/bias:0','encoder/layer_._3/attention/output/dense/bias:0','encoder/layer_._0/output/dense/bias:0','encoder/layer_._9/attention/self/key/kernel:0','encoder/layer_._11/output/dense/bias:0','encoder/layer_._7/attention/self/query/bias:0','encoder/layer_._10/attention/self/key/bias:0','encoder/layer_._2/attention/output/dense/kernel:0','encoder/layer_._2/attention/self/query/bias:0','encoder/layer_._9/attention/output/dense/kernel:0','encoder/layer_._9/attention/output/Layernorm/gamma:0','encoder/layer_._9/output/Layernorm/gamma:0','encoder/layer_._0/attention/output/Layernorm/beta:0','encoder/layer_._1/intermediate/dense/kernel:0','encoder/layer_._1/output/dense/kernel:0','encoder/layer_._1/attention/self/key/bias:0','encoder/layer_._2/attention/self/value/kernel:0','encoder/layer_._9/attention/self/value/kernel:0','encoder/layer_._10/intermediate/dense/bias:0','encoder/layer_._4/intermediate/dense/bias:0','encoder/layer_._6/output/Layernorm/beta:0','encoder/layer_._7/output/Layernorm/beta:0','encoder/layer_._11/attention/self/query/bias:0','encoder/layer_._0/intermediate/dense/bias:0','encoder/layer_._11/attention/output/dense/kernel:0','encoder/layer_._5/attention/self/query/kernel:0','encoder/layer_._8/attention/self/value/kernel:0','encoder/layer_._11/output/Layernorm/beta:0','encoder/layer_._9/output/dense/bias:0','encoder/layer_._4/output/dense/bias:0','encoder/layer_._2/attention/self/key/bias:0','encoder/layer_._3/attention/self/query/kernel:0','encoder/layer_._4/attention/output/Layernorm/gamma:0','encoder/layer_._1/attention/output/Layernorm/beta:0','encoder/layer_._1/output/Layernorm/beta:0','encoder/layer_._10/attention/output/Layernorm/beta:0','encoder/layer_._3/attention/self/value/kernel:0','encoder/layer_._10/attention/self/query/bias:0','encoder/layer_._3/attention/self/key/bias:0','pooler/dense/kernel:0','encoder/layer_._1/attention/self/value/bias:0','encoder/layer_._7/attention/self/key/kernel:0','encoder/layer_._1/attention/output/dense/kernel:0','encoder/layer_._4/attention/self/key/kernel:0','encoder/layer_._8/output/dense/kernel:0','encoder/layer_._3/attention/output/Layernorm/gamma:0','encoder/layer_._0/attention/self/value/kernel:0','encoder/layer_._3/attention/self/key/kernel:0','encoder/layer_._0/attention/self/query/kernel:0','encoder/layer_._3/intermediate/dense/bias:0','encoder/layer_._7/output/dense/kernel:0','encoder/layer_._10/output/dense/kernel:0','encoder/layer_._7/intermediate/dense/bias:0','embeddings/word_embeddings/weight:0','encoder/layer_._3/attention/output/Layernorm/beta:0','encoder/layer_._0/attention/self/key/kernel:0','encoder/layer_._4/output/dense/kernel:0','encoder/layer_._5/output/Layernorm/gamma:0','encoder/layer_._9/attention/output/dense/bias:0','encoder/layer_._0/attention/output/dense/bias:0','encoder/layer_._5/attention/output/Layernorm/gamma:0','encoder/layer_._9/attention/output/Layernorm/beta:0','encoder/layer_._11/output/Layernorm/gamma:0','encoder/layer_._11/attention/output/Layernorm/gamma:0','encoder/layer_._6/intermediate/dense/bias:0','encoder/layer_._2/attention/output/Layernorm/gamma:0','encoder/layer_._5/output/dense/bias:0','encoder/layer_._0/output/dense/kernel:0','encoder/layer_._6/attention/output/dense/kernel:0','encoder/layer_._6/attention/output/dense/bias:0','encoder/layer_._1/attention/self/query/kernel:0','encoder/layer_._0/attention/self/query/bias:0','encoder/layer_._11/attention/self/value/bias:0','encoder/layer_._2/intermediate/dense/bias:0','embeddings/Layernorm/beta:0','encoder/layer_._4/attention/output/dense/kernel:0','encoder/layer_._3/output/Layernorm/beta:0','encoder/layer_._8/output/Layernorm/gamma:0','encoder/layer_._10/attention/output/dense/kernel:0','encoder/layer_._11/output/dense/kernel:0','encoder/layer_._2/attention/output/Layernorm/beta:0','encoder/layer_._7/attention/output/dense/kernel:0','encoder/layer_._9/attention/self/query/bias:0','encoder/layer_._4/attention/self/key/bias:0','encoder/layer_._2/output/Layernorm/gamma:0','encoder/layer_._0/attention/output/Layernorm/gamma:0','encoder/layer_._1/attention/output/Layernorm/gamma:0','encoder/layer_._1/attention/self/query/bias:0','encoder/layer_._5/attention/output/Layernorm/beta:0','encoder/layer_._10/output/dense/bias:0','encoder/layer_._8/output/Layernorm/beta:0','encoder/layer_._5/output/Layernorm/beta:0','embeddings/token_type_embeddings/embeddings:0','encoder/layer_._5/attention/output/dense/bias:0','encoder/layer_._4/output/Layernorm/beta:0','encoder/layer_._4/attention/self/query/kernel:0','encoder/layer_._5/attention/output/dense/kernel:0','encoder/layer_._7/attention/self/value/kernel:0','encoder/layer_._7/intermediate/dense/kernel:0','encoder/layer_._11/attention/self/key/kernel:0','encoder/layer_._3/output/Layernorm/gamma:0','encoder/layer_._10/output/Layernorm/gamma:0','encoder/layer_._8/attention/self/query/bias:0','encoder/layer_._3/attention/output/dense/kernel:0','encoder/layer_._4/output/Layernorm/gamma:0','encoder/layer_._10/attention/output/Layernorm/gamma:0','encoder/layer_._4/attention/self/value/bias:0','encoder/layer_._11/attention/self/query/kernel:0','encoder/layer_._4/attention/output/dense/bias:0','encoder/layer_._4/attention/output/Layernorm/beta:0','encoder/layer_._5/attention/self/key/bias:0','encoder/layer_._6/attention/self/value/kernel:0','encoder/layer_._5/attention/self/value/bias:0','encoder/layer_._11/attention/output/Layernorm/beta:0','encoder/layer_._1/output/Layernorm/gamma:0','encoder/layer_._2/attention/self/value/bias:0','encoder/layer_._9/output/dense/kernel:0','encoder/layer_._2/attention/self/key/kernel:0','encoder/layer_._9/output/Layernorm/beta:0','encoder/layer_._7/attention/output/Layernorm/gamma:0','encoder/layer_._5/intermediate/dense/bias:0','embeddings/Layernorm/gamma:0','encoder/layer_._0/output/Layernorm/beta:0','encoder/layer_._6/output/dense/bias:0','encoder/layer_._8/attention/self/value/bias:0','encoder/layer_._4/attention/self/query/bias:0']
有人可以帮助我解决以下问题:如何装载Roberta并正确初始化其所有权重吗?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)