-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Update demo #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update demo #72
Conversation
…rp & LCQMC datasets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好像没看到新加的dataset
example(obj:`list[str]`): List of input data, containing query, title and label if it have label. | ||
tokenizer(obj:`PretrainedTokenizer`): This tokenizer inherits from :class:`~paddlenlp.transformers.PretrainedTokenizer` | ||
which contains most of the methods. Users should refer to the superclass for more information regarding methods. | ||
label_list(obj:`list[str]`): All the labels that the data has. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是不是不需要了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已去除
encoded_inputs = tokenizer( | ||
text=example["text"], | ||
max_seq_len=max_seq_length, | ||
pad_to_max_seq_len=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上回复
|
||
query, title = example[0], example[1] | ||
query, title = example["query"], example["title"] | ||
query_ids = np.array(tokenizer.encode(query), dtype="int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
统一改成__call__()方法吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是JiebaTokenizer
,不是PretrainedTokenizer
,所以不需要更改。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to discuss about the scenario of data.Pad
API and Tokenizer pad_to_max_seq
train_ds, dev_ds, test_ds = ChnSentiCorp.get_datasets( | ||
['train', 'dev', 'test']) | ||
train_ds, dev_ds, test_ds = load_dataset( | ||
"chnsenticorp", splits=["train", "dev", "test"], lazy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazy=False是不是默认选项? @smallv0221 我们是否要求只有Iterable场景下才需要lazy=True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的是默认选项。只有Iterable场景下才需要lazy=True。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_dataset()
lazy
参数默认为None
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/datasets/experimental/dataset.py#L59
…nto update-demo
2. remove dataset field_indices, num_discard_samples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @ZeyuChen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
APIs
Describe