GitHub - joe9724/top-keyword: Extract hot keywords of questions on Zhihu and make some use of these words.

TopKeyword

Extract hot keywords of questions on Zhihu and make some use of these words.

依赖

Jieba 分词
Python 版本 Spark

步骤

MySQL 数据库 question 表导出为 question.txt ——> 使用 Jieba 分词 ——> 

存储分词结果 ——> Spark 进行词频统计 ——> 排序选出 Top 10 ——> 做一些 ML 的预测

目录结构

└── TopKeyword
    ├── question_title.txt # 知乎抓取的近期时间内的标题
    ├── question.txt       
    ├── question_word.txt  # 将标题进行分词
    ├── README.md
    ├── wordcount.py       # 分词，词频统计主逻辑
    ├── machinelearning.py # 把结果做一些 ML 的工作
    └── wordcount_result   # Spark 词频统计结果
        ├── part-00000
        └── _SUCCESS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TopKeyword

依赖

步骤

目录结构

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
wordcount_result		wordcount_result
README.md		README.md
machinelearning.py		machinelearning.py
question.txt		question.txt
question_title.txt		question_title.txt
question_word.txt		question_word.txt
wordcount.py		wordcount.py

joe9724/top-keyword

Folders and files

Latest commit

History

Repository files navigation

TopKeyword

依赖

步骤

目录结构

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages