在使用OpenNMT-py训练的NMT模型中,存在一些代际问题,包括在flash attention出现之前和当前正在训练的最新版本(包括flash attention)。这些模型使用onmt_release_model进行转换,存储量化设置为int8。
当在创建ctranslate.Translator对象时将flash_attention设置为True时会发生这种情况。GPU是RTX 3090。
不知道这是否只是一个架构问题,还是与从opennmt-py进行的转换过程有关。
给Flores200基准测试的一些输出示例:
sss of of of of of of of of of of of
sss in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in patients patients patients in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in patients in in in in in in in in in in in in in patients patients patients patients patients patients patients patients patients patients patients patients patients in in patients patients patients patients patients in countries countries in in in in in in in in in in in in in in in in in in in in in in in in
ssss
ssmmmmmmmm
ss
__opt_src_en__opt_src_en__opt_src_en
sss
sss of of of of of of of of of of of
sss tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax
sssmmmmmmmmmmmmmmmmmmm
1条答案
按热度按时间f87krz0w1#
它应该适用于Onmt-py的旧版本或新版本。很抱歉,我没有足够的信息来帮助您。
仅供参考,我将在未来的ctranslate2版本中禁用
flash attention
功能,因为它在推理性能上没有太大的改进,并且使包变得相当重。