Skip to content

Conversation

@solrex
Copy link
Contributor

@solrex solrex commented Apr 20, 2023

PR types

Bug fixes

PR changes

APIs

Description

Fix the bug that raises a NotImplementedError exception when calling GPTChineseTokenizer.get_vocab().

@paddle-bot
Copy link

paddle-bot bot commented Apr 20, 2023

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Apr 20, 2023

Codecov Report

Merging #5736 (fed57eb) into develop (7d48f0e) will decrease coverage by 0.01%.
The diff coverage is 80.00%.

@@             Coverage Diff             @@
##           develop    #5736      +/-   ##
===========================================
- Coverage    61.70%   61.69%   -0.01%     
===========================================
  Files          487      487              
  Lines        68285    68290       +5     
===========================================
+ Hits         42133    42134       +1     
- Misses       26152    26156       +4     
Impacted Files Coverage Δ
paddlenlp/transformers/gpt/tokenizer.py 91.62% <80.00%> (-0.45%) ⬇️

... and 4 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@sijunhe sijunhe requested a review from wj-Mcat April 20, 2023 11:26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

非常感谢解决修复此问题,不过这里还需要考虑:added_tokens_encoder,建议调整为以下代码:

Suggested change
return {self.sp.IdToPiece(i): i for i in range(0, self.sp.GetPieceSize())}
return dict({self.sp.IdToPiece(i): i for i in range(0, self.sp.GetPieceSize())}, **self.added_tokens_encoder)

具体可参考:

def get_vocab(self):
return dict(self.vocab.token_to_idx, **self.added_tokens_encoder)

Raising NotImplementedError exception when calling GPTChineseTokenizer.get_vocab().
@CLAassistant
Copy link

CLAassistant commented Apr 21, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sijunhe sijunhe merged commit 4b8c0c4 into PaddlePaddle:develop Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants