30行代码完成对知乎热榜的抓取!

2023-11-05 大全 25 作者：考证青年

想必知乎这个网站大家都很熟悉吧，每天上午可能都会摸摸鱼来看看热榜上又有哪些热点。但是要是明目张胆的看网站时突然发现背后站着一个领导，那时可能就会很尴尬了，哈哈！

今天我就教大家如何快速而又优雅的获取知乎热榜上的信息，只需要30行代码即可完成。话不多说，咱们直接开始！

首先导入我们需要的库以及定义我们的,注意这里里面的要替换为自己网站上的值哦!

import requests
from bs4 import BeautifulSoup
import json
import reheaders = {'X-Requested-With': 'XMLHttpRequest','User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36','cookie': ''} #cookie值替换为自己网站上的cookie值

知乎热榜算法_抓取知乎文章_

之后来获取关于知乎热榜上的所有问题链接：

url = 'https://www.zhihu.com/hot'
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
all_a = soup.select('div.HotItem-content a')

最后对每个问题链接进行请求，并获取该问题下的所有回答：

for a in all_a:link = a['href']print(link + a['title'])question_id = link.replace('https://www.zhihu.com/question/', '')res2 = requests.get(link, headers=headers)soup2 = BeautifulSoup(res2.text, 'html.parser')counts_content = soup2.select('h4.List-headerText')[0].get_text()counts_re = re.search('\d+\.?\d*', counts_content)counts = int(counts_re.group())print('共有{}个回答'.format(counts))for i in range(0, counts, 20):href = 'https://www.zhihu.com/api/v4/questions/{}/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Cis_labeled%2Cpaid_info%2Cpaid_info_content%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_recognized%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics%3Bdata%5B%2A%5D.settings.table_of_content.enabled&limit=20&offset={}&platform=desktop&sort_by=default'.format(question_id, str(i))print(href)res3 = requests.get(href, headers=headers)datas = json.loads(res3.text).get('data')for data in datas:content = data.get('content')soup = BeautifulSoup(content, 'html.parser')print(soup.text)

_抓取知乎文章_知乎热榜算法

关于知乎热榜的抓取就完成啦，抓取结果大致如下，数量很多，这里只展示前面一部分：

怎么样，是不是还是挺简单的！当然如果你还想获取其它信息的话也可以针对程序进行修改来获取你想要的信息，我们要学会举一反三。

完整程序如下：

import requests
from bs4 import BeautifulSoup
import json
import reheaders = {'X-Requested-With': 'XMLHttpRequest','User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36','cookie': ''} #cookie值替换为自己网站上的cookie值
url = 'https://www.zhihu.com/hot'
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
all_a = soup.select('div.HotItem-content a')
for a in all_a:link = a['href']print(link + a['title'])question_id = link.replace('https://www.zhihu.com/question/', '')res2 = requests.get(link, headers=headers)soup2 = BeautifulSoup(res2.text, 'html.parser')counts_content = soup2.select('h4.List-headerText')[0].get_text()counts_re = re.search('\d+\.?\d*', counts_content)counts = int(counts_re.group())print('共有{}个回答'.format(counts))for i in range(0, counts, 20):href = 'https://www.zhihu.com/api/v4/questions/{}/answers?include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cattachment%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Cis_labeled%2Cpaid_info%2Cpaid_info_content%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cis_recognized%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics%3Bdata%5B%2A%5D.settings.table_of_content.enabled&limit=20&offset={}&platform=desktop&sort_by=default'.format(question_id, str(i))print(href)res3 = requests.get(href, headers=headers)datas = json.loads(res3.text).get('data')for data in datas:content = data.get('content')soup = BeautifulSoup(content, 'html.parser')print(soup.text)

tags: 知乎抓取获取每天替换

30行代码完成对知乎热榜的抓取!

一起去按摩啊

冯雪老师的·《家庭健康管理100讲》重点笔记总结（不定时更新）

Python读取数据库报错：mysql.connector.errors

STM32CubeIDE:如何将STM32F103C8T6项目移植到STM32F

QQ2017如何关掉“QQ精选”消息

STM32缺货涨价了？心里怕怕了？STM32国产替代厂商汇总

利用区块链等技术，加强对交通运输信用信息的归集共享和分析应用

印尼西爪哇梳邦县发生山体滑坡已经导致2人死亡

【SpringBoot笔记10】Spring中Bean的6种作用域

ARS548 ARS549RDI 80GHZ毫米波雷达达学习笔记（一)

叠氮PEG修饰二硒化钨 (N3-WSe2；azide

ATFX：黑海运粮遭俄暂停，小麦期货开盘跳涨

关于我们

最火推荐

小编推荐

联系我们

复制成功

30行代码完成对知乎热榜的抓取!

一起去按摩啊

冯雪老师的·《家庭健康管理100讲》重点笔记总结（不定时更新）

Python读取数据库报错：mysql.connector.errors

STM32CubeIDE:如何将STM32F103C8T6项目移植到STM32F

QQ2017如何关掉“QQ精选”消息

STM32缺货涨价了？心里怕怕了？STM32国产替代厂商汇总

利用区块链等技术，加强对交通运输信用信息的归集共享和分析应用

印尼西爪哇梳邦县发生山体滑坡 已经导致2人死亡

【SpringBoot笔记10】Spring中Bean的6种作用域

ARS548 ARS549RDI 80GHZ毫米波雷达达学习笔记（一)

叠氮PEG修饰二硒化钨 (N3-WSe2；azide

ATFX：黑海运粮遭俄暂停，小麦期货开盘跳涨

关于我们

最火推荐

小编推荐

联系我们

复制成功

印尼西爪哇梳邦县发生山体滑坡已经导致2人死亡