BugFix: GPTChineseTokenizer.get_vocab() not implemented #5736

solrex · 2023-04-20T09:43:20Z

PR types

Bug fixes

PR changes

APIs

Description

Fix the bug that raises a NotImplementedError exception when calling GPTChineseTokenizer.get_vocab().

paddle-bot · 2023-04-20T09:43:24Z

Thanks for your contribution!

codecov · 2023-04-20T10:26:10Z

Codecov Report

Merging #5736 (fed57eb) into develop (7d48f0e) will decrease coverage by 0.01%.
The diff coverage is 80.00%.

@@             Coverage Diff             @@
##           develop    #5736      +/-   ##
===========================================
- Coverage    61.70%   61.69%   -0.01%     
===========================================
  Files          487      487              
  Lines        68285    68290       +5     
===========================================
+ Hits         42133    42134       +1     
- Misses       26152    26156       +4

Impacted Files	Coverage Δ
paddlenlp/transformers/gpt/tokenizer.py	`91.62% <80.00%> (-0.45%)`	⬇️

... and 4 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

wj-Mcat · 2023-04-20T11:27:17Z

paddlenlp/transformers/gpt/tokenizer.py

非常感谢解决修复此问题，不过这里还需要考虑：added_tokens_encoder，建议调整为以下代码：

Suggested change

return {self.sp.IdToPiece(i): i for i in range(0, self.sp.GetPieceSize())}

return dict({self.sp.IdToPiece(i): i for i in range(0, self.sp.GetPieceSize())}, **self.added_tokens_encoder)

具体可参考：

PaddleNLP/paddlenlp/transformers/bert/tokenizer.py

Lines 452 to 453 in 7d48f0e

def get_vocab(self):

return dict(self.vocab.token_to_idx, **self.added_tokens_encoder)

Raising NotImplementedError exception when calling GPTChineseTokenizer.get_vocab().

CLAassistant · 2023-04-21T04:40:03Z

All committers have signed the CLA.

sijunhe

lgtm

paddle-bot bot added contributor status: proposed labels Apr 20, 2023

sijunhe requested a review from wj-Mcat April 20, 2023 11:26

wj-Mcat reviewed Apr 20, 2023

View reviewed changes

BugFix: get_vocab() not implemented

a084bc5

Raising NotImplementedError exception when calling GPTChineseTokenizer.get_vocab().

solrex force-pushed the patch-1 branch from ae08928 to a084bc5 Compare April 21, 2023 04:39

sijunhe added 2 commits April 21, 2023 15:04

Fix lint error

56bf684

Fix isort

fed57eb

sijunhe approved these changes Apr 21, 2023

View reviewed changes

sijunhe merged commit 4b8c0c4 into PaddlePaddle:develop Apr 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BugFix: GPTChineseTokenizer.get_vocab() not implemented #5736

BugFix: GPTChineseTokenizer.get_vocab() not implemented #5736

Uh oh!

solrex commented Apr 20, 2023

Uh oh!

paddle-bot bot commented Apr 20, 2023

Uh oh!

codecov bot commented Apr 20, 2023 •

edited

Loading

Uh oh!

wj-Mcat Apr 20, 2023

Uh oh!

CLAassistant commented Apr 21, 2023 •

edited

Loading

Uh oh!

sijunhe left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	return {self.sp.IdToPiece(i): i for i in range(0, self.sp.GetPieceSize())}
	return dict({self.sp.IdToPiece(i): i for i in range(0, self.sp.GetPieceSize())}, **self.added_tokens_encoder)

	def get_vocab(self):
	return dict(self.vocab.token_to_idx, **self.added_tokens_encoder)

BugFix: GPTChineseTokenizer.get_vocab() not implemented #5736

BugFix: GPTChineseTokenizer.get_vocab() not implemented #5736

Uh oh!

Conversation

solrex commented Apr 20, 2023

PR types

PR changes

Description

Uh oh!

paddle-bot bot commented Apr 20, 2023

Uh oh!

codecov bot commented Apr 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wj-Mcat Apr 20, 2023

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Apr 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sijunhe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Apr 20, 2023 •

edited

Loading

CLAassistant commented Apr 21, 2023 •

edited

Loading