-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Add squeezebert #872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add squeezebert #872
Conversation
|
模型权重 |
@renmada Thanks for your contribution, please sign CLA agreement, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mean problem is the use of config
in layers. Please also provide examples with readme. You can refer to https://github.com/PaddlePaddle/PaddleNLP/blob/develop/examples/language_model/bert/run_glue.py
Examples: | ||
.. code-block:: python | ||
from paddlenlp.transformers import SqueezeBertTokenizer | ||
tokenizer = SqueezeBertTokenizer.from_pretrained('SqueezeBert-small-discriminator') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see SqueezeBert-small-discriminator
.
|
||
|
||
class SqueezeBertEmbeddings(nn.Layer): | ||
"""Construct the embeddings from word, position and token_type embeddings.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""
Construct the embeddings from word, position and token_type embeddings.
"""
self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.embedding_size) | ||
self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.embedding_size) | ||
|
||
# self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need explanation like this.
class SqueezeBertEmbeddings(nn.Layer): | ||
"""Construct the embeddings from word, position and token_type embeddings.""" | ||
|
||
def __init__(self, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not pass config
down to every layer. Please refer to https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/bert/modeling.py
# position_ids (1, len position emb) is contiguous in memory and exported when serialized | ||
self.register_buffer("position_ids", paddle.arange(config.max_position_embeddings).expand((1, -1))) | ||
|
||
def forward(self, input_ids=None, token_type_ids=None, position_ids=None, inputs_embeds=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PaddleNLP, only input_ids is allowed in embedding layer. Please refer to https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/bert/modeling.py
|
||
|
||
class SqueezeBertSelfAttention(nn.Layer): | ||
def __init__(self, config, cin, q_groups=1, k_groups=1, v_groups=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please declare all perems here instead of using config. Please refer to https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/gpt/modeling.py
|
||
|
||
class SqueezeBertLayer(nn.Layer): | ||
def __init__(self, config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same config problem as above.
|
||
|
||
class SqueezeBertPreTrainedModel(PretrainedModel): | ||
base_model_prefix = "squeezebert" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need more documentation here.
|
||
@register_base_model | ||
class SqueezeBertModel(SqueezeBertPreTrainedModel): | ||
def __init__(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need more documentation here. Please refer to https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/bert/modeling.py
# Tie weights if needed | ||
self.tie_weights() | ||
|
||
def tie_weights(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need tie_weights
? I didn't see any layer has get_output_embeddings
function.
PR types