Python 爬虫实战：精准抓取携程酒店价格数据

Python 爬虫利用 requests 和 jsonpath 库实现携程酒店价格数据的自动化抓取。流程涵盖接口定位、动态参数构造、Cookie 管理及反爬策略（IP 代理、UA 随机化）。通过解析 JSONP 响应提取酒店名称、价格、评分等字段，支持多城市分页抓取。数据可清洗存储为 Excel 或 JSON，用于价格监控与可视化。需遵守平台规则，控制请求频率以确保合规。

DebugKing发布于 2026/3/22更新于 2026/5/3119 浏览

前言

携程旅行作为国内领先的在线旅游平台，其酒店价格数据包含实时房价、房型信息、优惠活动、用户评分等核心维度，是旅游数据分析、价格监控、竞品分析的重要数据源。相较于静态页面，携程酒店页面融合了动态加载、反爬验证、数据加密等机制，抓取难度更高。本文将从页面分析、反反爬策略、动态数据抓取等维度，系统讲解如何使用 Python 实现携程酒店价格数据的高效抓取，帮助开发者突破平台限制，获取结构化的酒店价格信息。

摘要

本文聚焦携程旅行酒店价格爬虫的全流程实现，核心涵盖动态页面数据抓取、请求头加密参数处理、分页与多城市数据抓取三大核心技术点，通过 requests 库发送 HTTP 请求、jsonpath 解析 JSON 数据、fake-useragent 伪装请求特征，结合实战案例完成酒店名称、价格、房型、评分、位置等核心字段的抓取。最终实现支持多城市、多日期的酒店价格爬虫脚本，并提供数据清洗与可视化基础方案。

一、技术原理与环境准备

1.1 核心技术原理

携程酒店价格抓取的核心难点与解决思路：

酒店列表数据通过异步接口返回 JSON 格式，而非直接渲染在 HTML 中，需先抓包定位数据接口；
请求头包含 Referer、Origin 等校验字段，且部分接口需携带 Cookie 才能返回完整数据；
高频请求会触发 IP 封禁或验证码，需结合代理 IP、请求延迟、UA 随机化等反反爬策略；
价格数据实时变动，接口参数包含时间戳、城市编码等动态信息，需动态构造请求参数。

1.2 环境配置

工具 / 库	版本	作用
Python	3.8+	核心开发语言
requests	2.31.0	发送 HTTP 请求
jsonpath	0.82	解析 JSON 数据（高效提取嵌套字段）
fake-useragent	1.4.0	随机生成 User-Agent
python-dotenv	1.0.0	管理环境变量（存储敏感参数）
pandas	2.1.4	数据清洗与结构化存储

环境安装命令

pip install requests jsonpath fake-useragent python-dotenv pandas

二、实战开发：携程酒店价格爬虫

2.1 核心思路拆解

抓包定位携程酒店列表数据接口（通过 Chrome F12 Network 面板）；
分析接口请求参数（城市编码、入住 / 离店日期、页码等）；
构造合规的请求头与请求参数，模拟真实请求；
发送请求获取 JSON 数据，使用 jsonpath 提取核心字段；
实现多页数据自动抓取，处理接口返回的分页标识；
数据清洗与结构化存储，输出 Excel/JSON 格式结果。

2.2 完整代码实现

import requests
import json
import time
 os
 fake_useragent  UserAgent
 jsonpath  jsonpath
 pandas  pd
 dotenv  load_dotenv


load_dotenv()

 :
     ():
        
        
        .ua = UserAgent()
        .headers = {
            : .ua.random,
            : ,
            : ,
            : ,
            : ,
            : ,
            
            : os.getenv(, ),
            : 
        }
        .timeout =   
        .delay =   
        .all_hotel_data = []  
        
        .api_url = 

     ():
        
        params = {
            : .get_city_code(city_code),  
            : check_in,
            : check_out,
            : page,
            : ,  
            : ,  
            : ,
            : ,
            :  + ((time.time() * )),  
            : (time.time() * )  
        }
         params

     ():
        
        city_code_map = {
            : ,
            : ,
            : ,
            : ,
            : ,
            : 
        }
         city_code_map.get(city_name, )  

     ():
        
        :
            
            time.sleep(.delay)
            
            response = requests.get(
                url=.api_url,
                headers=.headers,
                params=params,
                timeout=.timeout
            )
            
            response.raise_for_status()
            
            response_text = response.text
               response_text:
                
                json_start = response_text.find() + 
                json_end = response_text.rfind()
                json_data = json.loads(response_text[json_start:json_end])
            :
                json_data = response.json()
             json_data
         requests.exceptions.RequestException  e:
            ()
             
         json.JSONDecodeError  e:
            ()
             

     ():
        
          json_data  json_data.get() != :
            ()
             []
        hotel_list = []
        
        hotels = jsonpath(json_data, )[]  jsonpath(json_data, )  []
         hotel  hotels:
            :
                
                hotel_name = jsonpath(hotel, )[]  jsonpath(hotel, )  
                hotel_id = jsonpath(hotel, )[]  jsonpath(hotel, )  
                price = jsonpath(hotel, )[]  jsonpath(hotel, )  
                score = jsonpath(hotel, )[]  jsonpath(hotel, )  
                address = jsonpath(hotel, )[]  jsonpath(hotel, )  
                star_rating = jsonpath(hotel, )[]  jsonpath(hotel, )  
                room_type = jsonpath(hotel, )[]  jsonpath(hotel, )  
                distance = jsonpath(hotel, )[]  jsonpath(hotel, )  
                
                hotel_info = {
                    : hotel_name,
                    : hotel_id,
                    : price,
                    : score,
                    : star_rating,
                    : room_type,
                    : address,
                    : distance
                }
                hotel_list.append(hotel_info)
             Exception  e:
                ()
                
         hotel_list

     ():
        
        ()
         page  (, max_page + ):
            ()
            
            params = .build_request_params(city, check_in, check_out, page)
            
            json_data = .get_hotel_data(params)
            
            page_hotels = .parse_hotel_data(json_data)
              page_hotels:
                ()
                
            
            .all_hotel_data.extend(page_hotels)
            ()
        ()
         .all_hotel_data

     ():
        
          .all_hotel_data:
            ()
            
        :
            
            df = pd.DataFrame(.all_hotel_data)
            
            df = df.sort_values(by=, ascending=)
            
            df.to_excel(file_path, index=, engine=)
            ()
         Exception  e:
            ()

     ():
        
          .all_hotel_data:
            ()
            
        ()
        
        df = pd.DataFrame(.all_hotel_data[:sample_num])
        (df.to_string(index=))

 __name__ == :
    
    spider = CtripHotelSpider()
    
    spider.crawl_hotels(
        city=,
        check_in=,
        check_out=,
        max_page=
    )
    
    spider.print_sample_results(sample_num=)
    
    spider.save_to_excel()

代码模块	核心原理	关键作用
`__init__` 方法	初始化请求头（含随机 UA、Cookie）、接口地址等	模拟真实用户请求，通过 Cookie 绕过基础鉴权
`build_request_params` 方法	动态构造接口参数（含时间戳、城市编码）	适配携程接口的动态参数校验，支持多城市 / 多日期
`get_hotel_data` 方法	发送 GET 请求，处理 JSONP 格式返回数据	解决携程接口 JSONP 包裹问题，正确解析数据
`parse_hotel_data` 方法	使用 jsonpath 提取嵌套 JSON 字段	避免 KeyError，高效解析多层嵌套的酒店数据
`crawl_hotels` 方法	循环构造分页参数，批量抓取多页数据	实现自动化分页抓取，支持终止无数据页面
`save_to_excel` 方法	转换为 DataFrame 并排序后保存为 Excel	结构化存储数据，便于后续价格分析 / 可视化

Python 爬虫实战：精准抓取携程酒店价格数据

前言

摘要

一、技术原理与环境准备

1.1 核心技术原理

1.2 环境配置

环境安装命令

二、实战开发：携程酒店价格爬虫

2.1 核心思路拆解

2.2 完整代码实现

2.3 环境变量配置（.env 文件）

2.4 代码输出结果示例

2.5 核心代码原理说明

三、反反爬优化策略

3.1 IP 代理池集成

3.3 请求频率动态调整

四、数据扩展与分析

4.1 价格趋势监控

4.2 价格可视化分析

五、注意事项与合规声明

总结

更多推荐文章

相关免费在线工具

Python 爬虫实战：精准抓取携程酒店价格数据

前言

摘要

一、技术原理与环境准备

1.1 核心技术原理

1.2 环境配置

环境安装命令

二、实战开发：携程酒店价格爬虫

2.1 核心思路拆解

2.2 完整代码实现

2.3 环境变量配置（.env 文件）

2.4 代码输出结果示例

2.5 核心代码原理说明

三、反反爬优化策略

3.1 IP 代理池集成

3.2 Cookie 自动更新

3.3 请求频率动态调整

四、数据扩展与分析

4.1 价格趋势监控

4.2 价格可视化分析

五、注意事项与合规声明

总结

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具