Python Requests 爬虫入门实战：GET/POST 请求与反爬绕过 | 极客日志

Python算法

Python Requests 爬虫入门实战：GET/POST 请求与反爬绕过

综述由AI生成介绍使用 Python requests 库进行网络爬虫的基础操作，涵盖 GET 与 POST 请求方法、参数传递、响应对象处理及文件保存。同时讲解如何通过设置 User-Agent 头部信息绕过基础反爬机制，并补充了异常处理、Session 会话管理及基本的 HTML 解析技巧，帮助开发者构建稳健的数据采集脚本。

星河入梦发布于 2025/2/7更新于 2026/6/221 浏览

Python Requests 爬虫入门实战

本文介绍使用 Python requests 库进行网络爬虫的基础操作，涵盖 GET 与 POST 请求方法、参数传递、响应对象处理及文件保存。同时讲解如何通过设置 User-Agent 头部信息绕过基础反爬机制，并补充了异常处理、Session 会话管理及基本的 HTML 解析技巧。

1. 环境准备

首先确保已安装 Python 3.x 环境，并通过 pip 安装 requests 库：

pip install requests

若需解析 HTML 内容，建议额外安装 BeautifulSoup4：

pip install beautifulsoup4

2. 基础请求：GET 与 POST

2.1 GET 请求示例

发送 GET 请求获取网页内容，并处理编码问题。

import requests

url = "http://www.baidu.com"
response = requests.get(url)

# 自动识别编码或手动指定
response.encoding = response.apparent_encoding

print(f"状态码：{response.status_code}")
print(f"响应文本长度：{len(response.text)}")

2.2 POST 请求示例

向服务器提交数据通常使用 POST 方法。

import requests

url = "http://httpbin.org/post"
data = {
    "key": "value",
    "number": 123
}

response = requests.post(url, data=data)
print(f"状态码：{response.status_code}")
print(response.text)

3. 参数传递与 Headers

3.1 URL 传参

在 GET 请求中，可以通过字典形式传递参数，库会自动拼接 URL。

import requests

params = {"name": "hezhi", "age": 20}
response = requests.get("http://httpbin.org/get", params=params)
(response.url)

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online
Base64 字符串编码/解码
将字符串编码和解码为其 Base64 格式表示形式即可。在线工具，Base64 字符串编码/解码在线工具，online
Base64 文件转换器
将字符串、文件或图像转换为其 Base64 表示形式。在线工具，Base64 文件转换器在线工具，online
Markdown转HTML
将 Markdown（GFM）转为 HTML 片段，浏览器内 marked 解析；与 HTML转Markdown 互为补充。在线工具，Markdown转HTML在线工具，online

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Safari/537.36"
}

response = requests.get("https://www.zhihu.com", headers=headers)
print(f"状态码：{response.status_code}")

import requests

url = "https://www.baidu.com/img/baidu_jgylogo3.gif"
response = requests.get(url)

if response.status_code == 200:
    with open("baidu_logo.gif", "wb") as f:
        f.write(response.content)
    print("图片保存成功")
else:
    print("下载失败")

import requests

session = requests.Session()
headers = {"User-Agent": "Mozilla/5.0..."}  # 自定义 UA

# 第一步：获取登录页，携带 Cookie
resp1 = session.get("https://example.com/login", headers=headers)
cookies = resp1.cookies

# 第二步：提交登录表单
login_data = {"username": "user", "password": "pass"}
session.post("https://example.com/login", data=login_data, cookies=cookies)

# 第三步：访问受保护页面
profile = session.get("https://example.com/profile")
print(profile.text)

import requests
from requests.exceptions import Timeout, ConnectionError

try:
    response = requests.get("http://www.baidu.com", timeout=5)
    response.raise_for_status()  # 如果状态码不是 200，抛出 HTTPError
except Timeout:
    print("请求超时")
except ConnectionError:
    print("网络连接错误")
except Exception as e:
    print(f"发生未知错误：{e}")

from bs4 import BeautifulSoup

html_doc = """
<html><head><title>Sample</title></head>
<body><p id="notice">Notice!</p></body></html>
"""

soup = BeautifulSoup(html_doc, 'lxml')
print(soup.title.string)  # 输出：Sample
print(soup.p['id'])       # 输出：notice

Python Requests 爬虫入门实战：GET/POST 请求与反爬绕过

Python Requests 爬虫入门实战

1. 环境准备

2. 基础请求：GET 与 POST

2.1 GET 请求示例

2.2 POST 请求示例

3. 参数传递与 Headers

3.1 URL 传参

更多推荐文章

相关免费在线工具

3.2 设置请求头（Headers）

4. 响应数据处理与文件保存

4.1 文本与二进制

4.2 保存图片示例

5. 进阶技巧

5.2 异常处理

5.3 简单的 HTML 解析

6. 法律与道德规范

7. 总结

更多推荐文章

相关免费在线工具

Python Requests 爬虫入门实战：GET/POST 请求与反爬绕过

Python Requests 爬虫入门实战

1. 环境准备

2. 基础请求：GET 与 POST

2.1 GET 请求示例

2.2 POST 请求示例

3. 参数传递与 Headers

3.1 URL 传参

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

3.2 设置请求头（Headers）

4. 响应数据处理与文件保存

4.1 文本与二进制

4.2 保存图片示例

5. 进阶技巧

5.1 Session 对象管理 Cookie

5.2 异常处理

5.3 简单的 HTML 解析

6. 法律与道德规范

7. 总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具