Python 网络爬虫实战：从基础请求到数据可视化 | 极客日志

PythonAI算法

Python 网络爬虫实战：从基础请求到数据可视化

Python 爬虫技术通过模拟浏览器行为自动抓取网页信息。演示了基于 FastAPI 搭建本地服务器作为目标站点，利用 requests 发送请求，结合正则表达式解析 HTML 内容。流程涵盖数据获取、清洗、入库以及使用 pyecharts 进行可视化展示，并集成 logging 模块记录运行日志，适合初学者掌握爬虫核心逻辑与工程化实践。

kaikai发布于 2026/3/26更新于 2026/7/2030 浏览

Python 网络爬虫实战：从基础请求到数据可视化

搭建本地测试环境

在开始爬取外部网站之前，为了演示爬虫原理，我们通常先搭建一个本地的 Web 服务器作为目标站点。这里使用 FastAPI 框架快速构建服务。

基础路由与响应

创建一个简单的 FastAPI 应用，用于返回 HTML 页面和图片资源。

from fastapi import FastAPI, Response
import uvicorn

app = FastAPI()

@app.get("/index.html")
def main():
    with open("source/html/index.html", "rb") as f:
        data = f.read()
    return Response(content=data, media_type="text/html")

if __name__ == "__main__":
    uvicorn.run(app, host="127.0.0.1", port=8000)

运行后，浏览器访问 http://127.0.0.1:8000/index.html 即可看到页面。需要注意的是，HTML 中引用的图片（如 0.jpg）也需要服务器支持对应的路由。

优化图片资源加载

如果每张图片都写一个函数，代码会非常冗余。我们可以利用 FastAPI 的路径参数功能，动态处理图片请求。

@app.get("/images/{path}")
def get_pic(path: str):
    # 路径拼接：source/images/0.jpg
    file_path = f"source/images/{path}"
    with open(file_path, "rb") as f:
        data = f.read()
    return Response(content=data, media_type="image/jpeg")

@app.get()
 ():
    file_path = 
     (file_path, )  f:
        data = f.read()
     Response(content=data, media_type=)

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

pip install requests

import requests

response = requests.get("http://127.0.0.1:8000/index.html")
html_content = response.content.decode("utf-8")

import re

def get_pic_urls(html):
    urls = []
    # 分割 HTML 内容以便匹配
    lines = html.split(' ')
    for line in lines:
        match = re.match(r'.*src="(.*)" width.*', line)
        if match:
            urls.append(match.group(1))
    return urls

def save_pics(url_list):
    num = 0
    for url in url_list:
        # 去除可能的相对路径前缀
        clean_url = url[1:] if url.startswith('/') else url
        full_url = f"http://127.0.0.1:8000{clean_url}"
        pic_data = requests.get(full_url).content
        
        with open(f"./source/spyder/{num}.jpg", "wb") as f:
            f.write(pic_data)
        num += 1

import requests
import re

country_list = []
gdp_list = []

def fetch_gdp():
    resp = requests.get("http://localhost:8000/gdp.html")
    html = resp.content.decode("utf-8")
    
    for line in html.split(' '):
        # 提取国家名
        c_match = re.match(r'.*<a><font>(.*)</font></a>', line)
        if c_match:
            country_list.append(c_match.group(1))
        
        # 提取 GDP 数值
        g_match = re.match(r'.*￥(.*)亿元', line)
        if g_match:
            gdp_list.append(g_match.group(1))
    
    # 合并数据
    return list(zip(country_list, gdp_list))

import multiprocessing

def task_pic():
    # 执行图片爬取逻辑
    pass

def task_gdp():
    # 执行 GDP 数据爬取逻辑
    pass

if __name__ == '__main__':
    p1 = multiprocessing.Process(target=task_pic)
    p2 = multiprocessing.Process(target=task_gdp)
    
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()

from pyecharts.charts import Pie
import pyecharts.options as opts

def render_chart(data):
    pie = Pie(init_opts=opts.InitOpts(width="1400px", height="800px"))
    pie.add("GDP 排名", data, label_opts=opts.LabelOpts(formatter='{b}:{d}%'))
    pie.set_global_opts(title_opts=opts.TitleOpts(title="2020 年世界 GDP 排名"))
    pie.render("gdp_chart.html")

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(filename)s[line:%(lineno)d] - %(levelname)s: %(message)s',
    filename="app.log",
    filemode="w"
)

logging.info("系统启动成功")
logging.error("发生异常")

Python 网络爬虫实战：从基础请求到数据可视化

Python 网络爬虫实战：从基础请求到数据可视化

搭建本地测试环境

基础路由与响应

优化图片资源加载

更多推荐文章

相关免费在线工具

爬虫核心原理与步骤

安装依赖

发送请求

图像资源爬取实战

解析图片链接

保存本地文件

结构化数据处理 (GDP 示例)

数据清洗与整合

多任务并发优化

数据可视化呈现

绘制饼图

日志监控与调试

配置 Logging

更多推荐文章

相关免费在线工具

Python 网络爬虫实战：从基础请求到数据可视化

Python 网络爬虫实战：从基础请求到数据可视化

搭建本地测试环境

基础路由与响应

优化图片资源加载

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

爬虫核心原理与步骤

安装依赖

发送请求

图像资源爬取实战

解析图片链接

保存本地文件

结构化数据处理 (GDP 示例)

数据清洗与整合

多任务并发优化

数据可视化呈现

绘制饼图

日志监控与调试

配置 Logging

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具