Python HTTP 客户端库 HTTPX 核心用法与异步支持详解

前言

在 Python 生态中，网络请求是开发者的日常需求。传统的 requests 库虽然简单易用，但仅支持同步操作；aiohttp 功能强大但 API 设计较为复杂且偏向异步编程。HTTPX 作为新一代 HTTP 客户端，完美融合了两者优势：它既支持同步调用（API 类似 requests），又原生支持异步编程（基于 asyncio）。此外，HTTPX 默认支持 HTTP/2 协议，提供了更高效的连接复用能力。

安装

使用 pip 安装 httpx：

pip install httpx

如果需要命令行工具支持，可以安装额外组件：

pip install 'httpx[cli]'

基础请求

GET 请求

直接使用 get 方法发送请求：

import httpx

r = httpx.get('https://httpbin.org/get')
print(r.status_code)    # 状态码
print(r.text)           # 响应文本内容

对于带参数的 URL，传入一个 dict 作为 params 参数：

import httpx

r = httpx.get('https://httpbin.org/get', params={'q': 'python', 'cat': '1001'})
print(r.url)            # 实际请求的 URL
print(r.text)

对于特定类型的响应，例如 JSON，可以直接获取：

r = httpx.get('https://httpbin.org/get')
data = r.json()
print(data['args'])     # 解析后的字典对象

对于非文本响应，响应内容也可以以字节的形式访问：

>>> r.content
b'<!doctype html>\n<html>\n<head>\n<title>Example Domain</title>...'

POST 请求

要发送 POST 请求，只需将 get() 方法变成 post()，然后传入 data 参数：

r = httpx.post('https://httpbin.org/post', data={'form_email': '[email protected]', 'form_password': '123456'})

httpx 默认使用 application/x-www-form-urlencoded 对 POST 数据编码。如果要传递 JSON 数据，可以直接传入 json 参数，内部会自动序列化：

params = {'key': 'value'}
r = httpx.post(url, json=params)

上传文件

上传文件操作如下：

upload_files = {'upload-file': open('report.xls', 'rb')}
r = httpx.post('https://httpbin.org/post', files=upload_files)

如果需要在上传文件时包含非文件数据字段，请使用 data 参数：

data = {'message': 'Hello, world!'}
files = {'file': open('report.xls', 'rb')}
r = httpx.post("https://httpbin.org/post", data=data, files=files)
print(r.text)

添加 Headers

需要传入 HTTP Header 时，我们传入一个 dict 作为 headers 参数：

r = httpx.get('https://www.baidu.com/', headers={'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit'})

获取响应头：

print(r.headers)
# {Content-Type: 'text/html; charset=utf-8', ...}

print(r.headers['Content-Type'])
# 'text/html; charset=utf-8'

在请求中传入 Cookie，只需准备一个 dict 传入 cookies 参数：

cs = {'token': '12345', 'status': 'working'}
r = httpx.get(url, cookies=cs)

httpx 对 Cookie 做了特殊处理，使得我们不必解析 Cookie 就可以轻松获取指定的 Cookie：

print(r.cookies['token'])
# 12345

超时与重试

默认超时为 5 秒。要指定超时，传入以秒为单位的 timeout 参数。超时分为连接超时和读取超时：

try:
    # 3.1 秒后连接超时，27 秒后读取超时
    r = httpx.get(url, timeout=(3.1, 27))
except httpx.TimeoutException as e:
    print(e)

当然，也可以禁用超时：

httpx.get('https://github.com/', timeout=None)

实现简单的超时重连逻辑：

def gethtml(url):
    i = 0
    while i < 3:
        try:
            html = httpx.get(url, timeout=5).text
            return html
        except httpx.RequestException:
            i += 1
    return None

重定向控制

默认情况下，httpx 不会遵循所有 HTTP 方法的重定向，不过可以使用 follow_redirects 开启：

>>> r = httpx.get('http://github.com/', follow_redirects=True)
>>> r.url
URL('https://github.com/')
>>> r.status_code
200
>>> r.history
[<Response [301 Moved Permanently]>]

高级用法：Client 连接池

使用 httpx.Client() ，实际上是调用 HTTP 链接池。可以带来显著的性能改进，包括减少跨请求的延迟、减少 CPU 使用和往返次数、减少网络拥塞。

基本用法

使用 Client 的推荐方式是作为上下文管理器。这将确保在离开 with 块时正确清理连接：

with httpx.Client() as client:
    r = client.get('https://example.com')

或者，可以使用 close() 显式关闭连接池：

client = httpx.Client()
try:
    r = client.get('https://example.com')
finally:
    client.close()

跨请求共享配置

Client 允许您通过将参数传递给 Client 构造函数来将配置应用于所有传出请求：

url = 'http://httpbin.org/headers'
headers = {'user-agent': 'my-app/0.0.1'}
with httpx.Client(headers=headers) as client:
    r = client.get(url)
    print(r.json()['headers']['User-Agent'])
    # 'my-app/0.0.1'

此外，base_url 允许您为所有传出请求预留 URL：

with httpx.Client(base_url='http://httpbin.org') as client:
    r = client.get('/headers')
    print(r.request.url)
    # URL('http://httpbin.org/headers')

监控下载进度

如果您需要监控大响应的下载进度，您可以使用响应流并检查 response.num_bytes_downloaded 属性：

import tempfile
import httpx

with tempfile.NamedTemporaryFile() as download_file:
    url = "https://speed.hetzner.de/100MB.bin"
    with httpx.stream("GET", url) as response:
        total = int(response.headers["Content-Length"])
        for chunk in response.iter_bytes():
            download_file.write(chunk)
            percent = response.num_bytes_downloaded / total
            print(f'percent: {percent:.2%}')

代理配置

httpx 支持通过 proxies 参数设置 HTTP 代理：

with httpx.Client(proxies="http://localhost:8030") as client:
    r = client.get('https://example.com')

对于更高级的用例，请传递代理 dict。例如，要将 HTTP 和 HTTPS 请求路由到 2 个不同的代理：

proxies = {
    "http://": "http://localhost:8030",
    "https://": "http://localhost:8031",
}

with httpx.Client(proxies=proxies) as client:
    r = client.get('https://example.com')

代理凭据可以作为代理 URL userinfo 部分传递：

proxies = {
    "http://": "http://username:password@localhost:8030",
}

异步支持

发送异步请求

import asyncio

async def main():
    async with httpx.AsyncClient() as client:
        r = await client.get('https://www.example.com/')
        print(r.status_code)

asyncio.run(main())

打开和关闭 Client

使用 async with：

async with httpx.AsyncClient() as client:
    r = await client.get('https://www.example.com/')

显式关闭客户端：

client = httpx.AsyncClient()
try:
    r = await client.get('https://www.example.com/')
finally:
    await client.aclose()

异步流媒体响应

client = httpx.AsyncClient()
async with client.stream('GET', 'https://www.example.com/') as response:
    async for chunk in response.aiter_bytes():
        print(chunk)

异步响应流方法有：

Response.aread() - 用于有条件地读取流块内的响应。
Response.aiter_bytes() - 用于将响应内容流化为字节。
Response.aiter_text() - 用于将响应内容流化为文本。
Response.aiter_lines() - 用于将响应内容流式传输为文本行。
Response.aiter_raw() - 用于流式传输原始响应字节，无需应用内容解码。
Response.aclose() - 用于结束回复。

错误处理

HTTPX 提供了丰富的异常类来处理不同类型的网络错误。常见的异常包括 RequestException 及其子类。

import httpx

try:
    r = httpx.get('https://httpbin.org/status/404')
    r.raise_for_status()
except httpx.HTTPStatusError as e:
    print(f"HTTP Error: {e.response.status_code}")
except httpx.ConnectTimeout:
    print("连接超时")
except httpx.ReadTimeout:
    print("读取超时")
except httpx.RequestException as e:
    print(f"其他请求错误：{e}")

认证与授权

HTTPX 支持多种认证方式，包括 Basic Auth、Token 等。

from httpx import BasicAuth

with httpx.Client(auth=BasicAuth(username='user', password='pass')) as client:
    r = client.get('https://httpbin.org/basic-auth/user/pass')
    print(r.status_code)

对于 Token 认证，通常直接在 Headers 中设置：

headers = {'Authorization': 'Bearer YOUR_TOKEN_HERE'}
r = httpx.get(url, headers=headers)

总结

HTTPX 凭借其同步与异步的双重支持、HTTP/2 原生支持以及强大的连接池机制，成为了现代 Python 网络开发的首选库。通过合理使用 Client 对象、掌握异步编程模式以及妥善处理异常情况，开发者可以构建出高性能、高可靠性的网络请求服务。