-
Notifications
You must be signed in to change notification settings - Fork 3.1k
BugFix: GPTChineseTokenizer.get_vocab() not implemented #5736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #5736 +/- ##
===========================================
- Coverage 61.70% 61.69% -0.01%
===========================================
Files 487 487
Lines 68285 68290 +5
===========================================
+ Hits 42133 42134 +1
- Misses 26152 26156 +4
... and 4 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
非常感谢解决修复此问题,不过这里还需要考虑:added_tokens_encoder,建议调整为以下代码:
| return {self.sp.IdToPiece(i): i for i in range(0, self.sp.GetPieceSize())} | |
| return dict({self.sp.IdToPiece(i): i for i in range(0, self.sp.GetPieceSize())}, **self.added_tokens_encoder) |
具体可参考:
PaddleNLP/paddlenlp/transformers/bert/tokenizer.py
Lines 452 to 453 in 7d48f0e
| def get_vocab(self): | |
| return dict(self.vocab.token_to_idx, **self.added_tokens_encoder) |
Raising NotImplementedError exception when calling GPTChineseTokenizer.get_vocab().
sijunhe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
PR types
Bug fixes
PR changes
APIs
Description
Fix the bug that raises a NotImplementedError exception when calling GPTChineseTokenizer.get_vocab().