[NEW MODEL]add OPT model #2659

wj-Mcat · 2022-06-27T13:39:03Z

PR types

New features

PR changes

Models

Description

add OPT: Open Pre-trained Transformer Language Models model based on the facebook/opt-* models.

guoshengCS · 2022-06-28T02:07:45Z

paddlenlp/transformers/opt/modeling.py

+    def gen_cache(self, memory):
+        incremental_cache = self.self_attn.gen_cache(memory,
+                                                     type=self.self_attn.Cache)
+        return incremental_cache


如果以上这些没有修改，可以先 import gpt 中的

当前上面这些还会和GPT的有差别不

在350m上还是有一些不一样，所以这个模块没有import

什么不一样呢，normalize_before=True/False 都支持就可以了吧

guoshengCS · 2022-06-28T02:40:04Z

也注意加入 transformers/auto/modeling.py 中

guoshengCS · 2022-06-28T02:41:47Z

还有个疑问是如果不增加 tokenzier 的话这个要如何使用Auto来加载创建呢

wj-Mcat · 2022-06-28T02:59:51Z

既然我们的模型最后的名称定位是facebook/opt-*，那我们只需要在gpt的 pretrained_init_configuration 和 pretrained_resource_files_map 两个字段中添加对应的配置即可。

同时还必须保证gpt的tokenizer和opt modeling中的模型名称必须保持一致才能够正常保存在相同的目录下。

除此之外，还需要有什么需要注意的地方吗？

guoshengCS · 2022-06-28T03:37:55Z

既然我们的模型最后的名称定位是facebook/opt-*，那我们只需要在gpt的 pretrained_init_configuration 和 pretrained_resource_files_map 两个字段中添加对应的配置即可。

同时还必须保证gpt的tokenizer和opt modeling中的模型名称必须保持一致才能够正常保存在相同的目录下。

除此之外，还需要有什么需要注意的地方吗？

就是HF是如何做的呢，看着并没有在代码中提供，tokenizer config 中也未指明使用GPT的tokenzier

wj-Mcat · 2022-07-04T13:09:02Z

I have complete the 125m ~ 2.7b works which contains the code-align, logit-align. But the 6.7b~66b model weight file is too big to be in one weight file which is splited into many weight file in huggingface transformers.

In conclusion, there are some things to do:

import layers from gpt.modeling which are same as opt.modeling to avoid duplicated code.
find out the method for handling the biiiiiiiiiiiig weight file which is splited into some pieces files.
make the opt can be loaded from AutoModel.

guoshengCS · 2022-07-08T02:21:32Z

paddlenlp/transformers/opt/modeling.py

+            "pad_token_id": 1,
+            "num_hidden_layers": 24,
+            "num_attention_heads": 16,
+            "max_position_embeddings": 2048


这个如果权重是按照社区生态模型来的话吗，上面这些config也不放在这里了吧，可以参考codegen。因为有些地方会根据 pretrained_init_configuration 里面包含来判断是内置模型还是社区生态模型，可能会有些混淆

Yes, I have removed the opt configuration which will be automatic loaded from the specific path.

guoshengCS · 2022-07-08T02:36:58Z

paddlenlp/transformers/opt/modeling.py

+                 hidden_dropout_prob: float = 0.1,
+                 max_position_embeddings: int = 512,
+                 type_vocab_size: Optional[int] = None,
+                 initializer_range=0.02):


如果OPT没有这些initializer_range就去掉吧，不必和GPT保持一致。想和GPT一致的主要是TransformerDecoder部分，希望能复用FasterGeneration的一些内容

The origin configuration of opt is init_std which maintains the same function as initializer_range in GPT. So this is added.

guoshengCS · 2022-07-08T02:39:58Z

paddlenlp/transformers/opt/modeling.py

+        return decoder_outputs
+
+
+class OPTForPretraining(OPTPretrainedModel):


这些ForPretraining和PretrainingCriterion如果HF没有也先不要加了

OK, I will remove it.

guoshengCS · 2022-07-08T02:41:37Z

paddlenlp/transformers/opt/modeling.py

+                    raise e
+
+
+OPTForCausalLM = OPTLMHeadModel


上面OPTLMHeadModel就直接用OPTForCausalLM吧，GPT的主要是考虑兼容性

Ok. I will rename it .

guoshengCS · 2022-07-08T02:43:44Z

paddlenlp/transformers/opt/modeling.py

+        decode_strategy = kwargs.get('decode_strategy')
+        if decode_strategy == "beam_search":
+            raise AttributeError(
+                "'beam_search' is not supported yet in the faster version of GPT"


如果Faster这部分还没有完成的话先暂时去掉再合入吧

Intergration with FasterGeneration will be done in next PR, so let's remove it.

At this time, OPT model can not play with FasterGenearation, so we should disable the method prepare_faster_entry. I raise Error on top of this method, How do you think about it ? @guoshengCS

def prepare_faster_entry(self, kwargs): # TODO(wj-Mcat): this error will be removed when opt can play with FasterGeneration. raise AttributeError( "FasterGeneration is not supported in OPT Model, please keep eyes on the latest feature of PaddleNLP" ) from paddlenlp.ops import FasterOPT use_fp16_decoding = kwargs.get('use_fp16_decoding', False) decode_strategy = kwargs.get('decode_strategy') if decode_strategy == "beam_search": raise AttributeError( "'beam_search' is not supported yet in the faster version of OPT" ) # Currently, FasterTransformer only support restricted size_per_head. size_per_head = self.opt.config["hidden_size"] // self.opt.config[ "num_attention_heads"] if size_per_head not in [32, 64, 80, 96, 128]: raise AttributeError( "'size_per_head = %d' is not supported yet in the faster version of OPT" % size_per_head) if kwargs['forced_bos_token_id'] is not None: # not support for min_length yet in the faster version raise AttributeError( "'forced_bos_token_id != None' is not supported yet in the faster version" ) if kwargs['min_length'] != 0: # not support for min_length yet in the faster version raise AttributeError( "'min_length != 0' is not supported yet in the faster version") self._faster_entry = FasterOPT( self, use_fp16_decoding=use_fp16_decoding).forward return self._faster_entry

guoshengCS · 2022-07-08T02:49:28Z

paddlenlp/transformers/opt/modeling.py

+                 word_embed_proj_dim: int,
+                 norm: Optional[Layer] = None,
+                 normalize_before: bool = False,
+                 remove_final_layer_norm: bool = False):


现在还需要remove_final_layer_norm吗，HF主要是考虑和此前权重的兼容性问题，最新的是不是不需要这个了呢

Yes, we should remove it. I will do it in next commit.

def gen_cache(self, memory): incremental_cache = self.self_attn.gen_cache(memory, type=self.self_attn.Cache) return incremental_cache

this method maintains the same behavior as GPT. but Class maintains the different behavior.

guoshengCS · 2022-07-08T02:59:05Z

paddlenlp/transformers/opt/modeling.py

+    def gen_cache(self, memory):
+        incremental_cache = self.self_attn.gen_cache(memory,
+                                                     type=self.self_attn.Cache)
+        return incremental_cache


当前上面这些还会和GPT的有差别不

wj-Mcat

I have done all of code & local test with loading, saving, generate.

So, please review it when you are free. @guoshengCS

wj-Mcat · 2022-07-11T07:53:29Z

paddlenlp/transformers/auto/tokenizer.py

+                if not init_class:
+                    init_class = init_kwargs.pop("tokenizer_class", None)
+


there are init_class and tokenizer_class supported in the module, you can refer to:

PaddleNLP/paddlenlp/transformers/auto/tokenizer.py

Lines 230 to 240 in 548d59a

init_class = init_kwargs.pop("init_class", None)

if init_class is None:

init_class = init_kwargs.pop("tokenizer_class", None)

if init_class:

class_name = cls._name_mapping[init_class]

import_class = importlib.import_module(

f"paddlenlp.transformers.{class_name}.tokenizer")

tokenizer_class = getattr(import_class, init_class)

logger.info(

"We are using %s to load '%s'." %

(tokenizer_class, pretrained_model_name_or_path))

wj-Mcat · 2022-07-11T08:01:21Z

paddlenlp/transformers/opt/modeling.py

+    def gen_cache(self, memory):
+        incremental_cache = self.self_attn.gen_cache(memory,
+                                                     type=self.self_attn.Cache)
+        return incremental_cache


在350m上还是有一些不一样，所以这个模块没有import

guoshengCS · 2022-07-11T02:04:55Z

paddlenlp/transformers/auto/tokenizer.py

@@ -1,4 +1,4 @@
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.k


这里再注意下

嗯嗯，这个地方确实是没有注意到，我这边已经修改了。

guoshengCS · 2022-07-12T03:12:30Z

paddlenlp/transformers/opt/modeling.py

+    def gen_cache(self, memory):
+        incremental_cache = self.self_attn.gen_cache(memory,
+                                                     type=self.self_attn.Cache)
+        return incremental_cache


什么不一样呢，normalize_before=True/False 都支持就可以了吧

wj-Mcat · 2022-07-12T08:01:09Z

现在主要讨论的是为什么TransformerDecoderLayer这个类没有从GPT中import过来使用。主要区别还是在于激活函数的区别：

在GPT中使用的gelu，并且是写死的。

        tgt = self.dropout2(
            self.linear2(F.gelu(self.linear1(tgt), approximate=True)))
        tgt = residual + tgt

在OPT中使用配置的方式来动态获取，并在计算时使用目标激活函数

        tgt = self.dropout2(self.linear2(self.activation(self.linear1(tgt))))
        tgt = residual + tgt

所以在代码层面这个类没有直接import过来。

wj-Mcat · 2022-07-13T06:49:52Z

ping @guoshengCS

examples/text_generation/opt/README.md

…d-opt-module

examples/text_generation/opt/README.md

faster_generation/perf/README.md

faster_generation/samples/opt_sample.py

paddlenlp/ops/Makefile

paddlenlp/ops/patches/FasterTransformer/CMakeLists.txt

examples/text_generation/opt/README.md

paddlenlp/ops/faster_transformer/transformer/decoding.py

wj-Mcat · 2022-08-08T01:42:31Z

ping @guoshengCS

guoshengCS · 2022-08-08T06:08:59Z

faster_generation/README.md

+**OPT** (opt, batch_size=4, max_length=32)
+
+<p align="left">
+  <img src="../docs/imgs/opt_perf.png" width="800" height ="400" />


这里看看能否将这几张图片都上传github的issue或者PR，这里使用github的链接来控制repo大小

好，这两个我调整一下

图片文件上传：

GPT Performance

OPT Performance

Bart Performance

guoshengCS · 2022-08-08T06:18:06Z

examples/text_generation/opt/demo.py

+
+    demo = Demo(model_name_or_path="facebook/opt-1.3b",
+                max_predict_len=10,
+                repetition_penalty=1.2)


这里为什么要引入repetition_penalty这个参数呢，不加效果有问题吗

不必要，我在下一个commit中删掉

guoshengCS · 2022-08-09T05:47:33Z

paddlenlp/ops/faster_transformer/transformer/decoding.py

+                       pos_emb, linear_weight, normalize_before, topk, topp,
+                       max_out_len, head_num, size_per_head, num_layer, bos_id,
+                       eos_id, temperature, use_fp16_decoding):
+    helper = LayerHelper('fusion_opt', **locals())


注意这里后面按照 #2795 迁移下

add opt model

4dfe3d2

guoshengCS reviewed Jun 28, 2022

View reviewed changes

ZeyuChen assigned guoshengCS Jun 29, 2022

update opt modeling

0244d79

guoshengCS reviewed Jul 8, 2022

View reviewed changes

wj-Mcat added 11 commits July 8, 2022 05:38

complete all of test

f0f661f

remove remote file link from opt configuration

c26cf16

enable auto model with opt

6b84782

remove pretraining and criterion

8503eb0

remove OPTLMHeadModel

0ea8e66

remove remove_final_layer_norm

e7dec4a

add example for opt text generation

087d933

Merge branch 'develop' into add-opt-module

a5e4243

improve auto model & tokenizer

a77f823

keep the same code style

884ea4b

revert changes fixed by PaddlePaddle#2764

b5fa5c8

wj-Mcat commented Jul 11, 2022

View reviewed changes

guoshengCS reviewed Jul 12, 2022

View reviewed changes

wj-Mcat added 2 commits July 12, 2022 08:03

remove unused character

8061f6d

use transformer-decoder-layer in opt

55d6e7e

align demo with huggingface website

d232fa1

guoshengCS reviewed Jul 14, 2022

View reviewed changes

examples/text_generation/opt/README.md Outdated Show resolved Hide resolved

Merge branch 'develop' into add-opt-module

9bfbc3b

wj-Mcat added 8 commits July 28, 2022 12:24

add opt op operations

3f1571f

Merge branch 'add-opt-module' of github.com:wj-Mcat/PaddleNLP into ad…

6d79a1c

…d-opt-module

update faster entry

81e8ad9

add opt convert_param supporting

6679e08

add Makefile to manualy build faster transformer

fcc2baf

fix fp16 usage

dd2c731

add performance & sample

3d8e1a4

update opt modeling docstring

be5dc76

guoshengCS reviewed Aug 3, 2022

View reviewed changes

paddlenlp/ops/faster_transformer/transformer/decoding.py Show resolved Hide resolved

wj-Mcat added 4 commits August 4, 2022 08:37

update opt by comments

aae55ac

remove Makefile && recover CMakeLists.txt

444bd0f

update metric of opt

3ccd12c

update opt perf image

617b658

guoshengCS reviewed Aug 8, 2022

View reviewed changes

wj-Mcat added 4 commits August 8, 2022 06:43

update opt example

b4b71fe

remove opt & gpt image file to reduce size of repo

4893bb6

remove bart file

3468829

remove bart file to reduce the size of repo

c4eaf1c

guoshengCS approved these changes Aug 9, 2022

View reviewed changes

guoshengCS added 2 commits August 9, 2022 11:15

Merge branch 'develop' into add-opt-module

396357e

Merge branch 'develop' into add-opt-module

8652894

guoshengCS reviewed Aug 9, 2022

View reviewed changes

guoshengCS merged commit 70e6a31 into PaddlePaddle:develop Aug 9, 2022

		return decoder_outputs


		class OPTForPretraining(OPTPretrainedModel):

		if not init_class:
		init_class = init_kwargs.pop("tokenizer_class", None)

	init_class = init_kwargs.pop("init_class", None)
	if init_class is None:
	init_class = init_kwargs.pop("tokenizer_class", None)
	if init_class:
	class_name = cls._name_mapping[init_class]
	import_class = importlib.import_module(
	f"paddlenlp.transformers.{class_name}.tokenizer")
	tokenizer_class = getattr(import_class, init_class)
	logger.info(
	"We are using %s to load '%s'." %
	(tokenizer_class, pretrained_model_name_or_path))

		@@ -1,4 +1,4 @@
		# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
		# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.k

[NEW MODEL]add OPT model #2659

[NEW MODEL]add OPT model #2659

Uh oh!

Conversation

wj-Mcat commented Jun 27, 2022

PR types

PR changes

Description

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guoshengCS commented Jun 28, 2022

Uh oh!

guoshengCS commented Jun 28, 2022

Uh oh!

wj-Mcat commented Jun 28, 2022

Uh oh!

guoshengCS commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wj-Mcat commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wj-Mcat Jul 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wj-Mcat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

guoshengCS commented Jun 28, 2022 •

edited

Loading

wj-Mcat commented Jul 4, 2022 •

edited

Loading

wj-Mcat Jul 8, 2022 •

edited

Loading

wj-Mcat left a comment •

edited

Loading

wj-Mcat Aug 8, 2022 •

edited

Loading