-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Add BQCorpus Dataset #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BQCorpus Dataset #562
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems miss a init.py in paddlenlp/dataset/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is not need to modify init.py to implement new datasets. @ZeyuChen @frozenfish123
|
||
class BQCorpus(DatasetBuilder): | ||
""" | ||
BQCorpus: the largest dataset available for for the banking and finance sector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here.
META_INFO = collections.namedtuple('META_INFO', ('file', 'md5')) | ||
SPLITS = { | ||
'train': META_INFO( | ||
os.path.join('BQCorpus', 'train.tsv'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filepath seems wrong. Have you check your dataset using load_dataset()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry actually I've been trapped in the trouble for a long time and I don't know how to fix it ... I've been dealing with it for the past two days, and I can't solve the problem. Sorry for slowing down your progress : (
""" | ||
BQCorpus: the largest dataset available for for the banking and finance sector | ||
|
||
by frozenfish123@Wuhan University |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also give original author information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""
BQ Corpus (Bank Question Corpus), a large-scale domain-specific Chinese corpus for sentence semantic
equivalence identification is constructed from online Webank custom service logs.
The BQ corpus contains 120,000 question pairs with manual annotation.
The following two lines are examples extracted from training data:
'''
"微粒贷开通" 你好,我的微粒贷怎么没有开通呢 0
为什么借款后一直没有给我回拨电话 怎么申请借款后没有打电话过来呢! 1
'''
Provider: Intelligent Computing Research Center, Harbin Institute of Technology(Shenzhen)
Contacts: Qingcai Chen (email: [email protected]; Fax: +86-755-26033182)
More Info: https://www.luge.ai/
"""
Sorry I have been preparing for my final exam these days and I didn't reply timely. Here are my chages on this opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great description! Good luck in the final.
Got it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decide to merge your dataset and fix it for you due to our release schedule. Excellent work! Thanks for your contribution.
PR types
database
PR changes
modified bq_corpus.py according to the opinion
Description