【Python爬虫】写真专辑智能下载器开发全攻略:从爬虫到GUI的完整实现
【Python爬虫系列】📸 写真专辑智能下载器开发全攻略:从爬虫到GUI的完整实现


🌈 个人主页:创客白泽 - ZEEKLOG博客
🔥 系列专栏:🐍《Python开源项目实战》
💡 热爱不止于代码,热情源自每一个灵感闪现的夜晚。愿以开源之火,点亮前行之路。
🐋 希望大家多多支持,我们一起进步!
👍 🎉如果文章对你有帮助的话,欢迎 点赞 👍🏻 评论 💬 收藏 ⭐️ 加关注+💗分享给更多人哦


📸 【爬虫开源】写真专辑智能下载器开发全攻略:从爬虫到GUI的完整实现
🌟 摘要
本文详细介绍了一款基于Python的写真专辑智能下载器的开发全过程。该项目创新性地将网络爬虫技术与PySide6图形界面相结合,实现了从搜索、预览到批量下载的完整工作流。通过深度解析多线程爬虫、请求模拟、Qt界面开发等关键技术点,展示了如何构建一个功能完善且用户友好的专业级下载工具。文章包含7000+字详细说明、完整项目代码、效果展示图及技术架构图,为Python爬虫和GUI开发学习者提供了一份高质量的实践指南。
📖 目录
- 项目概述
- 核心功能
- 效果展示
- 技术架构
- 实现步骤详解
- 关键代码解析
- 项目优化建议
- 源码下载
- 总结展望
🏆 项目概述
写真专辑下载器是一款专为图片收藏爱好者设计的智能工具,主要解决传统下载方式存在的三大痛点:
- 效率问题:传统手动下载耗时耗力
- 管理困难:分散的图片难以系统化管理
- 预览缺失:无法快速浏览全部内容
项目采用分层架构设计:
指令传递数据请求数据返回界面更新用户界面层业务逻辑层网络爬虫层
技术栈组成:
- 前端界面:PySide6 + QSS美化
- 网络通信:Requests + BeautifulSoup
- 异步处理:QThread + threading
- 图像处理:QPixmap + QNetworkAccessManager
🛠️ 核心功能
1. 智能搜索系统
- 支持关键词模糊搜索(如"古风 汉服")
- 支持直接URL解析(自动识别专辑页/搜索页)
- 多页自动爬取(最大支持1000+结果)
2. 可视化预览

3. 批量下载管理
- 断点续传功能
- 自动分类存储(按作者/主题)
- 实时进度显示
4. 用户友好设计
- 国际化emoji图标系统
- 自适应Dark/Light主题
- 操作历史记录
🎨 效果展示
主界面截图


图示:采用暗黑风格设计,左侧为搜索结果列表,右侧为预览区域
缩略图浏览

下载过程
https://fake-url.com/progress.png
图示:多线程下载时的实时进度显示
⚙️ 实现步骤详解
1. 环境搭建
# 创建虚拟环境 python -m venv venv source venv/bin/activate # 安装依赖 pip install PySide6 requests beautifulsoup4 2. 爬虫核心开发
分三个阶段实现爬虫功能:
- 请求模拟阶段:
defsubmit_search(keywords): form_data ={"keyboard": keywords,"show":"title","tempid":"1","tbname":"news"}# 模拟浏览器头省略...- 页面解析阶段:

- 反反爬策略:
- 随机User-Agent轮换
- 请求间隔随机化(0.5-2s)
- 自动重试机制(最大3次)
3. GUI开发流程
组件树结构
MainWindow ├── SearchPanel ├── ResultTree ├── PreviewArea │ ├── ImageLabel │ └── ThumbnailButton └── StatusBar ├── ProgressBar └── SpeedLabel 样式定制
/* 暗黑主题示例 */QTreeWidget{background-color: #3c3f41;border: 1px solid #555;alternate-background-color: #383b3d;}🔍 关键代码解析
1. 多线程下载器
classDownloadWorker(QThread): progress = Signal(int)defrun(self):for url in self.get_image_urls():if self.cancel_requested:break self.download_image(url) self.progress.emit(percent)2. 智能URL处理
defprocess_url(url):if"searchid"in url:return parse_search_result(url)elif"/gallery"in url:return parse_gallery(url)else:return parse_index_page(url)3. 图像缓存系统

🚀 项目优化建议
性能优化方向
- 引入SQLite缓存已下载记录
- 实现zip打包下载功能
- 添加EXIF元数据写入
扩展功能

📥 源码下载
import sys import threading import requests from bs4 import BeautifulSoup import re import os import html from collections import deque import time import random from PySide6.QtCore import QThread, Signal, QEvent from typing import List, Dict from urllib.parse import urljoin, urlparse, urlunparse from PySide6.QtWidgets import( QApplication, QWidget, QVBoxLayout, QHBoxLayout, QLineEdit, QPushButton, QTreeWidget, QTreeWidgetItem, QMessageBox, QLabel, QSizePolicy, QDialog, QScrollArea, QGridLayout, QStyle )from PySide6.QtWidgets import QProgressBar from PySide6.QtGui import QPixmap, QFont, QColor, QPalette from PySide6.QtCore import Qt, QThread, QUrl, QSize from PySide6.QtNetwork import QNetworkAccessManager, QNetworkRequest from PySide6.QtCore import Qt, QByteArray import logging # Configure logging logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__)# Constants BASE_URL ="https://www.mxd009.cc"# Application styling APP_STYLE =""" QWidget { background-color: #2b2b2b; color: #e0e0e0; font-family: 'Segoe UI', Arial; } QLineEdit { background-color: #3c3f41; border: 1px solid #555; border-radius: 4px; padding: 5px; color: #e0e0e0; selection-background-color: #3d8ec9; } QPushButton { background-color: #3c3f41; border: 1px solid #555; border-radius: 4px; padding: 5px 10px; min-width: 80px; color: #e0e0e0; } QPushButton:hover { background-color: #4e5254; border: 1px solid #666; } QPushButton:pressed { background-color: #2d2f30; } QTreeWidget { background-color: #3c3f41; border: 1px solid #555; alternate-background-color: #383b3d; } QHeaderView::section { background-color: #3c3f41; padding: 5px; border: 1px solid #555; } QProgressBar { border: 1px solid #555; border-radius: 3px; text-align: center; background-color: #3c3f41; } QProgressBar::chunk { background-color: #4CAF50; width: 10px; } QLabel { color: #e0e0e0; } QScrollArea { border: 1px solid #555; background-color: #3c3f41; } QDialog { background-color: #2b2b2b; } """defsubmit_search(keywords:str)->str:"""Submit search request and return redirected URL""" form_data ={"keyboard": keywords,"show":"title","tempid":"1","tbname":"news"} SEARCH_URL =f"{BASE_URL}/e/search/index.php" session = requests.Session() response = session.post(SEARCH_URL, data=form_data, allow_redirects=False)if response.status_code ==302: new_location = response.headers.get("Location")return urljoin(SEARCH_URL, new_location)else:print("No redirect occurred, status code:", response.status_code)return""defget_total_count(soup: BeautifulSoup)->int:"""Extract total gallery count from page""" biaoqian_div = soup.find("div", class_="biaoqian")if biaoqian_div: p_text = biaoqian_div.find("p").get_text(strip=True)match= re.search(r"(\d+)", p_text)ifmatch:returnint(match.group(1))return0defparse_gallery_items_from_root(soup: BeautifulSoup)-> List[Dict[str,str]]:"""Extract all gallery info from page""" gallery_root = soup.find("div", class_="box galleryList") items =[]ifnot gallery_root:return items for li in gallery_root.select("ul.databox > li"): img_tag = li.select_one("div.img-box img") ztitle_tag = li.select_one("p.ztitle a") rtitle_tag = li.select_one("p.rtitle a") author_tag = li.select_one("p.ztitle font") count_tag = li.select_one("em.num") href = ztitle_tag["href"]if ztitle_tag and ztitle_tag.has_attr("href")else"" full_link = urljoin(BASE_URL, href) count =0if count_tag: text = count_tag.get_text(strip=True)# '15P'match= re.search(r'\d+', text)ifmatch: count =int(match.group(0)) rtitle = rtitle_tag.get_text(strip=True)if rtitle_tag else""if author_tag: author = author_tag.get_text(strip=True)else: author = rtitle item ={"img": img_tag["src"]if img_tag else"","ztitle": ztitle_tag.get_text(strip=True)if ztitle_tag else"","ztitle_href": full_link,"author": author,"rtitle": rtitle,"count":str(count)} items.append(item)return items defcrawl_single_gallery(url)-> List[Dict[str,str]]:"""Extract info from single gallery page""" items =[]try: response = requests.get(url, timeout=10) soup = BeautifulSoup(response.text,"html.parser")# Check for login prompt tishi_div = soup.find('div',id='tishi')if tishi_div: total_text = tishi_div.find('p').get_text()if tishi_div else''match= re.search(r'全本(\d+)张图片', total_text)ifmatch: total_count =int(match.group(1))else:# For logged-in users (not used in this program) page_div = soup.find('div',id='page')if page_div: span = page_div.find('span', string=re.compile(r'\d+/\d+'))if span:match= re.search(r'\d+/(\d+)', span.text)ifmatch: total_count =int(match.group(1))# Get gallery info gallery_div = soup.find('div', class_='gallerypic')ifnot gallery_div:return items jieshao_div = soup.find('div', class_='gallery_jieshao')if jieshao_div: title = jieshao_div.find('h1').get_text(strip=True) type_author =[a.get_text(strip=True)for a in soup.select('.gallery_renwu_title a')] first_img = gallery_div.find('img') item ={"img": first_img["src"]if first_img else"","ztitle": title,"ztitle_href": url,"author": type_author[1],"rtitle": type_author[0],"count":str(total_count)} items.append(item)return items except Exception as e: result ="Request failed:"+str(e)print(result)return items defcrawl_all_pages(search_url:str)-> List[Dict[str,str]]:"""Crawl all pages of search results""" all_results =[] page =0if"searchid"in search_url: searchid_match = re.search(r"searchid=(\d+)", search_url)ifnot searchid_match:print("Could not extract searchid")return[] searchid = searchid_match.group(1)whileTrue:# Construct paginated URLif page ==0: page_url = search_url else: page_url =f"{BASE_URL}/e/search/result/index.php?page={page}&searchid={searchid}"print(f"\n[Crawling page {page +1}] {page_url}")try: response = requests.get(page_url, timeout=10) soup = BeautifulSoup(response.text,"html.parser")except Exception as e:print("Request failed:", e)breakif page ==0: total = get_total_count(soup)print(f"[Total] Gallery count: {total}") results = parse_gallery_items_from_root(soup)ifnot results:print("[End] No data on current page, ending early.")break all_results.extend(results)iflen(all_results)>= total:print("[Complete] All items crawled.")break page +=1return all_results else: search_url = re.sub(r'_\d+\.html$','.html', search_url) response = requests.get(search_url)ifnot response:return[],[] soup = BeautifulSoup(response.text,"html.parser")# Get total pages page_div = soup.find("div", class_="layui-box layui-laypage layui-laypage-default") total_pages =1if page_div: span = page_div.find("span")if span:match= re.search(r'\d+/(\d+)', span.text.strip())ifmatch: total_pages =int(match.group(1))print(f"Total pages: {total_pages}")for index inrange(1,total_pages+1): page_url = re.sub(r'\.html$',f'_{index}.html', search_url)print(f"\n[Crawling page {index +1}] {page_url}")try: response = requests.get(page_url, timeout=10) soup = BeautifulSoup(response.text,"html.parser")except Exception as e:print("Request failed:", e)break results = parse_gallery_items_from_root(soup)ifnot results:print("[End] No data on current page, ending early.")break all_results.extend(results)return all_results classGalleryCrawler(QWidget):def__init__(self, cookies=None):super().__init__() self.setWindowTitle("📷 写真专辑下载器") self.resize(1000,700) self.setStyleSheet(APP_STYLE) self.init_ui()# Set window icon self.setWindowIcon(self.style().standardIcon(QStyle.SP_DirIcon)) self.download_queue = deque() self.current_worker =None self.cancel_requested =False self.selected_items =None self.is_downloading =False self.cookies = cookies self.headers = self._default_headers() self.session = requests.Session() self.session.headers.update(self.headers)if cookies: self.session.cookies.update(cookies)definit_ui(self):# Main layout main_layout = QVBoxLayout(self) main_layout.setContentsMargins(10,10,10,10) main_layout.setSpacing(10)# Search bar with emoji search_layout = QHBoxLayout() self.search_input = QLineEdit() self.search_input.setPlaceholderText("🔍 输入关键词搜索或直接粘贴网址...") self.search_input.setClearButtonEnabled(True) self.search_btn = QPushButton("🔎 搜索") self.search_btn.setStyleSheet(""" QPushButton { background-color: #4CAF50; font-weight: bold; } QPushButton:hover { background-color: #5CBF60; } """) self.search_btn.clicked.connect(self.start_search) search_layout.addWidget(self.search_input) search_layout.addWidget(self.search_btn)# Content area content_layout = QHBoxLayout() content_layout.setSpacing(15)# Left panel (tree view) left_panel = QVBoxLayout() left_panel.setSpacing(10)# Tree widget with emoji headers self.tree = QTreeWidget() self.tree.setHeaderLabels(["#️⃣ 序号","📛 主标题","🏷️ 分类","🔗 链接","🖼️ 数量","👤 作者"]) self.tree.setColumnWidth(0,50) self.tree.setColumnWidth(1,300) self.tree.setColumnWidth(2,120) self.tree.setColumnWidth(3,250) self.tree.setColumnWidth(4,50) self.tree.setColumnWidth(5,100) self.tree.setStyleSheet(""" QTreeWidget::item { padding: 5px; } QTreeWidget::item:hover { background-color: #4e5254; } """) self.tree.itemSelectionChanged.connect(self.on_tree_selection_changed)# Button panel with emoji btn_layout = QHBoxLayout() self.btn_download_selected = QPushButton("💾 下载选中") self.btn_download_selected.setStyleSheet(""" QPushButton { background-color: #2196F3; font-weight: bold; } QPushButton:hover { background-color: #31A6FF; } """) self.btn_download_selected.clicked.connect(self.download_selected) self.btn_download_all = QPushButton("📥 下载全部") self.btn_download_all.setStyleSheet(""" QPushButton { background-color: #FF9800; font-weight: bold; } QPushButton:hover { background-color: #FFA820; } """) self.btn_download_all.clicked.connect(self.download_all) self.btn_cancel_download = QPushButton("❌ 取消下载") self.btn_cancel_download.setEnabled(False) self.btn_cancel_download.clicked.connect(self.cancel_download) self.btn_show_all_more = QPushButton("🖼️ 显示全部缩略图") self.btn_show_all_more.clicked.connect(self.show_allthumbnails) btn_layout.addWidget(self.btn_download_selected) btn_layout.addWidget(self.btn_download_all) btn_layout.addWidget(self.btn_cancel_download) btn_layout.addWidget(self.btn_show_all_more) left_panel.addLayout(search_layout) left_panel.addWidget(self.tree) left_panel.addLayout(btn_layout)# Right panel (preview) right_panel = QVBoxLayout() right_panel.setSpacing(10)# Preview area with emoji title preview_title = QLabel("🖼️ 预览区域") preview_title.setAlignment(Qt.AlignCenter) preview_title.setStyleSheet("font-size: 16px; font-weight: bold;") self.image_label = QLabel() self.image_label.setAlignment(Qt.AlignCenter) self.image_label.setMinimumSize(320,320) self.image_label.setStyleSheet(""" QLabel { background-color: #3c3f41; border: 2px dashed #555; border-radius: 5px; } """) self.image_label.setText("👆 请选择项目预览") self.link_label = QLabel() self.link_label.setText('<a href="https://www.mxd009.cc/">🌐 在线浏览</a>') self.link_label.setOpenExternalLinks(True) self.link_label.setTextInteractionFlags(Qt.TextBrowserInteraction) self.link_label.setAlignment(Qt.AlignCenter) self.btn_show_more = QPushButton("🖼️ 显示更多缩略图") self.btn_show_more.setStyleSheet("font-weight: bold;") self.btn_show_more.clicked.connect(self.show_thumbnails) right_panel.addWidget(preview_title) right_panel.addWidget(self.image_label, stretch=1) right_panel.addWidget(self.link_label) right_panel.addWidget(self.btn_show_more, alignment=Qt.AlignHCenter) content_layout.addLayout(left_panel,3) content_layout.addLayout(right_panel,1)# Status bar status_layout = QHBoxLayout() self.album_label = QLabel("🔄 准备就绪") self.album_label.setStyleSheet("font-weight: bold; color: #4CAF50;") self.progress_bar = QProgressBar() self.progress_bar.setMinimum(0) self.progress_bar.setMaximum(100) self.progress_bar.setValue(0) self.progress_bar.setFormat("📊 下载进度: %p%") self.progress_bar.setStyleSheet(""" QProgressBar { text-align: center; border-radius: 3px; } QProgressBar::chunk { background-color: #4CAF50; border-radius: 2px; } """) status_layout.addWidget(self.album_label) status_layout.addWidget(self.progress_bar)# Add to main layout main_layout.addLayout(content_layout) main_layout.addLayout(status_layout)def_default_headers(self):"""Default headers to mimic browser behavior"""return{"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8","accept-encoding":"gzip, deflate, br, zstd","accept-language":"zh-CN,zh;q=0.9","cache-control":"max-age=0","sec-ch-ua":'"Google Chrome";v="137", "Chromium";v="137", "Not/A)Brand";v="24"',"sec-ch-ua-mobile":"?0","sec-ch-ua-platform":'"Windows"',"sec-fetch-dest":"document","sec-fetch-mode":"navigate","sec-fetch-site":"same-origin","upgrade-insecure-requests":"1","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",}defshow_allthumbnails(self): img_list =[]for i inrange(self.tree.topLevelItemCount()): item = self.tree.topLevelItem(i) img = item.data(0, Qt.ItemDataRole.UserRole +1)if img: img_list.append(img)# Remove duplicates while preserving order img_urls =list(dict.fromkeys(img_list))iflen(img_urls)==0: QMessageBox.warning(self,"提示","请先搜索")return dialog = ThumbnailViewer(img_urls, self) dialog.exec()defshow_thumbnails(self): selected_item = self.tree.currentItem()if selected_item isNone: QMessageBox.warning(self,"提示","请先选择一项")return img_url = selected_item.data(0, Qt.ItemDataRole.UserRole +1) album_index = selected_item.text(0) album_title = selected_item.text(1) img_url = selected_item.text(3) total_count =int(selected_item.text(4)) author = selected_item.text(5)# Get all image URLs worker = DownloadWorker(self.session, author, album_title, img_url, total_count) image_urls = worker.get_image_urls() dialog = ThumbnailViewer(image_urls, self) dialog.exec()defcancel_download(self): self.cancel_requested =True self.download_queue.clear()# Clear pending tasksif self.current_worker and self.current_worker.isRunning(): self.current_worker.terminate() self.current_worker.wait() self.progress_bar.setValue(0) self.is_downloading =False self.btn_cancel_download.setEnabled(False) self.btn_download_all.setEnabled(True) self.search_btn.setEnabled(True) self.enable_search() QMessageBox.information(self,"提示","下载任务已被取消。") self.set_album("")defset_album(self, value:str):defupdate(): self.album_label.setText(value) self.run_on_main_thread(update)defset_progress(self, value:int):defupdate(): self.progress_bar.setValue(value) self.run_on_main_thread(update)defon_tree_selection_changed(self): selected_items = self.tree.selectedItems()ifnot selected_items: self.image_label.setText("👆 请选择项目预览") self.image_label.setPixmap(QPixmap())# Clear image self.link_label.setText('<a href="https://www.mxd009.cc/">🌐 在线浏览</a>')return item = selected_items[0] img_url = item.data(0, Qt.ItemDataRole.UserRole +1) album_index = item.text(0) album_title = item.text(1)ifnot self.is_downloading: self.set_album(f"📌 {album_index}{album_title}") self.set_progress(0)ifnot img_url: self.image_label.setText("🖼️ 无缩略图") self.image_label.setPixmap(QPixmap()) self.link_label.setText('<a href="https://www.mxd009.cc/">🌐 在线浏览</a>')return self.link_label.setText(f'<a href="{item.text(3)}">🌐 在线浏览</a>')# Async image loading to prevent UI freezing threading.Thread(target=self.load_image_from_url, args=(img_url,), daemon=True).start()defload_image_from_url(self, url):try: response = requests.get(url, timeout=10) response.raise_for_status() data = response.content except Exception: self.run_on_main_thread(lambda: self.image_label.setText("❌ 加载图片失败"))return pixmap = QPixmap() pixmap.loadFromData(data) scaled = pixmap.scaled(self.image_label.size(), Qt.AspectRatioMode.KeepAspectRatio, Qt.TransformationMode.SmoothTransformation)defset_pix(): self.image_label.setPixmap(scaled) self.image_label.setText("") self.run_on_main_thread(set_pix)defstart_search(self): keyword = self.search_input.text().strip()ifnot keyword: QMessageBox.warning(self,"提示","请输入关键词!")returnif keyword.startswith("https://www.mxd009.cc")and"searchid"notin keyword and"/gallery"notin keyword: threading.Thread(target=self.search_by_url, args=(keyword,), daemon=True).start()elif keyword.startswith("https://www.mxd009.cc")and"/gallery"in keyword: threading.Thread(target=self.search_by_gallery, args=(keyword,), daemon=True).start()else: threading.Thread(target=self.search_and_load, args=(keyword,), daemon=True).start() self.search_btn.setEnabled(False) self.tree.clear()ifnot self.is_downloading: self.set_album("🔍 搜索中...") self.set_progress(0) self.selected_items =None self.image_label.setText("👆 请选择项目预览")defsearch_by_gallery(self,search_url): results = crawl_single_gallery(search_url)ifnot results: self.show_message("未找到任何结果。") self.enable_search()return self.load_results(results) self.enable_search()defsearch_and_load(self, keyword): search_url = submit_search(keyword)ifnot search_url: self.show_message("搜索失败,未获取有效跳转链接。") self.enable_search()return self.search_by_url(search_url)defsearch_by_url(self,search_url): results = crawl_all_pages(search_url)ifnot results: self.show_message("未找到任何结果。") self.enable_search()return self.load_results(results) self.enable_search()defload_results(self, results):defupdate_ui(): self.tree.clear()for idx, item inenumerate(results,1): tree_item = QTreeWidgetItem([str(idx), item["ztitle"], item["rtitle"], item["ztitle_href"], item["count"], item["author"]]) tree_item.setData(0, Qt.ItemDataRole.UserRole +1, item["img"]) self.tree.addTopLevelItem(tree_item) self.run_on_main_thread(update_ui)defdownload_selected(self):if self.is_downloading: QMessageBox.information(self,"提示","当前任务正在下载中,请等待完成。")return items = self.tree.selectedItems()ifnot items: QMessageBox.information(self,"提示","请先选中要下载的项。")return album_index = items[0].text(0) album_title = items[0].text(1) download_url = items[0].text(3) total_count =int(items[0].text(4)) author = items[0].text(5) self.is_downloading =True self.progress_bar.setValue(0) self.set_album(f"⏬ 正在下载: {album_index}{album_title}") self.worker = DownloadWorker(self.session, author, album_title, download_url, total_count) self.worker.progress.connect(self.set_progress)deffinish_handler(msg): QMessageBox.information(self,"完成", msg) self.is_downloading =False self.btn_download_selected.setEnabled(True) self.btn_download_all.setEnabled(True) self.search_btn.setEnabled(True) self.worker.finished.connect(finish_handler) self.worker.message.connect(lambda err: QMessageBox.critical(self,"错误", err)) self.worker.start()defdownload_all(self):if self.is_downloading: QMessageBox.information(self,"提示","当前任务正在下载中,请等待完成。")return count = self.tree.topLevelItemCount()if count ==0: QMessageBox.information(self,"提示","无数据可下载。")return self.download_queue.clear() self.cancel_requested =False self.btn_cancel_download.setEnabled(True) self.btn_download_all.setEnabled(False) self.search_btn.setEnabled(False)for i inrange(count): item = self.tree.topLevelItem(i) album_index = item.text(0) album_title = item.text(1) download_url = item.text(3) total_count =int(item.text(4)) author = item.text(5) self.download_queue.append((album_index, author, album_title, download_url, total_count)) self.progress_bar.setValue(0) self.start_next_download()defstart_next_download(self):if self.cancel_requested: QMessageBox.information(self,"取消","下载已取消。") self.btn_cancel_download.setEnabled(False) self.btn_download_all.setEnabled(True) self.search_btn.setEnabled(True) self.enable_search()returnifnot self.download_queue: QMessageBox.information(self,"完成","✅ 全部下载完成!") self.is_downloading =False self.btn_cancel_download.setEnabled(False) self.btn_download_all.setEnabled(True) self.search_btn.setEnabled(True) self.enable_search()return album_index, author, album_title, url, total_count = self.download_queue.popleft() self.progress_bar.setValue(0) self.set_album(f"⏬ 正在下载: {album_index}{album_title}") self.current_worker = DownloadWorker(self.session, author, album_title, url, total_count) self.current_worker.progress.connect(self.set_progress) self.current_worker.message.connect(lambda err: QMessageBox.critical(self,"错误", err))defon_finished(msg): logger.info(msg) self.start_next_download() self.current_worker.finished.connect(on_finished) self.current_worker.start()defshow_message(self, text):defmsg(): QMessageBox.information(self,"提示", text) self.run_on_main_thread(msg)defenable_search(self):defen(): self.search_btn.setEnabled(True) self.run_on_main_thread(en)defrun_on_main_thread(self, func):# PySide6 main thread call QApplication.instance().postEvent(self, _FuncEvent(func))defcustomEvent(self, event): event.func()classDownloadWorker(QThread): progress = Signal(int) finished = Signal(str) message = Signal(str)def__init__(self, session, album_title, title, url, total_count):super().__init__() self.session = session self.album_title = album_title self.title = title self.url = url self.total_count = total_count defrun(self):try: success = self.process_album()if success: self.finished.emit(f"✅ {self.title} 下载完成")else: self.finished.emit(f"❌ {self.title} 下载失败")except Exception as e: self.message.emit(f"⚠️ {self.title} 异常: {e}")defsafe_request(self, url, timeout=10):try: response = self.session.get(url, timeout=timeout) response.raise_for_status() response.encoding = response.apparent_encoding return response except Exception:returnNonedefdownload_image(self, url, filepath):if os.path.exists(filepath):returnTruetry: response = self.session.get(url, timeout=15, stream=True) response.raise_for_status() os.makedirs(os.path.dirname(filepath), exist_ok=True)withopen(filepath,"wb")as f:for chunk in response.iter_content(chunk_size=8192): f.write(chunk)returnTrueexcept:returnFalsedefget_image_urls(self): response = self.safe_request(self.url)ifnot response:return[] soup = BeautifulSoup(response.text,"html.parser") img_tags = soup.select("div.gallerypic img")ifnot img_tags:return[] first_img_url = img_tags[0].get("src","")# Extract extension (like .jpg or .png) ext_match = re.search(r'\.(jpg|png|jpeg|gif|webp)$', first_img_url, re.IGNORECASE) extension = ext_match.group(0)if ext_match else".jpg"# Default .jpg is_numbered = re.search(r"/(\d{3})\.[a-zA-Z]+$", first_img_url) image_urls =[]# Image index img_index_match = re.search(r'/(\d+)\.(jpg|png|jpeg|gif|webp)$', first_img_url)if img_index_match: img_index = img_index_match.group(1) parsed = urlparse(first_img_url) base_path = os.path.dirname(parsed.path) base_url = urlunparse((parsed.scheme, parsed.netloc, base_path,'','',''))if is_numbered:for i inrange(1, self.total_count +1):ifint(img_index)>1: i = i +int(img_index) img_url =f"{base_url}/{i:03d}{extension}" image_urls.append(img_url)else:for i inrange(0, self.total_count):ifint(img_index)>1: i = i +int(img_index) img_url =f"{base_url}/{i}{extension}" image_urls.append(img_url)return image_urls defprocess_album(self): image_urls = self.get_image_urls()ifnot image_urls:returnFalse album_dir = os.path.join(self.album_title, self.title)if self.album_title else self.title os.makedirs(album_dir, exist_ok=True) total_count =len(image_urls) success_count =0for index, img_url inenumerate(image_urls):# Keep original extension, default to .jpg filename_from_url = os.path.basename(img_url)# Construct save path filename = os.path.join(album_dir, filename_from_url)if self.download_image(img_url, filename): success_count +=1 self.progress.emit(int(success_count / total_count *100))return success_count >0class_FuncEvent(QEvent):def__init__(self, func):super().__init__(QEvent.Type.User) self.func = func classThumbnailViewer(QDialog):def__init__(self, image_urls, parent=None):super().__init__(parent) title =f"🖼️ 缩略图预览 (共 {len(image_urls)} 张)" self.setWindowTitle(title) self.resize(900,700) self.setStyleSheet(""" QLabel { background-color: #3c3f41; border: 1px solid #555; } """) scroll = QScrollArea(self) scroll.setWidgetResizable(True) container = QWidget() scroll.setWidget(container) layout = QVBoxLayout(self) layout.addWidget(scroll) grid = QGridLayout(container) grid.setSpacing(10) grid.setContentsMargins(15,15,15,15) self.nam = QNetworkAccessManager(self) self.thumb_size = QSize(160,160)# Thumbnail sizefor i, url inenumerate(image_urls):# Create a vertical container for image and index label widget = QWidget() widget.setStyleSheet("background-color: #3c3f41; border-radius: 5px;") v_layout = QVBoxLayout(widget) v_layout.setContentsMargins(5,5,5,5) v_layout.setSpacing(5) label = QLabel("⏳ 加载中...") label.setFixedSize(self.thumb_size) label.setAlignment(Qt.AlignCenter) label.setStyleSheet(""" QLabel { background-color: #2b2b2b; border: 1px solid #555; border-radius: 3px; } """) number_label = QLabel(f"#{i+1}") number_label.setAlignment(Qt.AlignCenter) number_label.setStyleSheet("font-weight: bold; color: #4CAF50;") v_layout.addWidget(label) v_layout.addWidget(number_label) grid.addWidget(widget, i //5, i %5)# 5 per row self.load_image_async(url, label)defload_image_async(self, url, label): request = QNetworkRequest(QUrl(url)) request.setAttribute(QNetworkRequest.Http2AllowedAttribute,False)# Disable HTTP/2 reply = self.nam.get(request)defhandle_finished(): pixmap = QPixmap()if pixmap.loadFromData(reply.readAll()): label.setPixmap(pixmap.scaled( self.thumb_size, Qt.KeepAspectRatio, Qt.SmoothTransformation))else: label.setText("❌ 加载失败") reply.deleteLater() reply.finished.connect(handle_finished)if __name__ =="__main__": app = QApplication(sys.argv)# Set application font font = QFont() font.setFamily("Segoe UI") font.setPointSize(10) app.setFont(font) window = GalleryCrawler() window.show() sys.exit(app.exec())✨ 总结展望
本项目通过将爬虫技术与GUI开发相结合,打造了一个功能完备的写真专辑下载工具。关键技术亮点包括:
- 智能解析引擎:自动识别多种URL格式
- 高效下载机制:多线程+断点续传
- 美观界面:专业级Qt界面设计
未来可向三个方向发展:
- 移动端适配(Kivy框架)
- AI智能推荐(基于内容分析)
- 浏览器插件版本
作者心得:在开发过程中,最关键的突破点是解决了动态加载内容的抓取问题和Qt的多线程通信机制。建议初学者可以重点关注BeautifulSoup的CSS选择器使用和QThread的信号槽机制。