【Python爬虫】写真专辑智能下载器开发全攻略:从爬虫到GUI的完整实现

【Python爬虫】写真专辑智能下载器开发全攻略:从爬虫到GUI的完整实现

【Python爬虫系列】📸 写真专辑智能下载器开发全攻略:从爬虫到GUI的完整实现

在这里插入图片描述


请添加图片描述
🌈 个人主页:创客白泽 - ZEEKLOG博客
🔥 系列专栏:🐍《Python开源项目实战》
💡 热爱不止于代码,热情源自每一个灵感闪现的夜晚。愿以开源之火,点亮前行之路。
🐋 希望大家多多支持,我们一起进步!
👍 🎉如果文章对你有帮助的话,欢迎 点赞 👍🏻 评论 💬 收藏 ⭐️ 加关注+💗分享给更多人哦
请添加图片描述


在这里插入图片描述

📸 【爬虫开源】写真专辑智能下载器开发全攻略:从爬虫到GUI的完整实现

🌟 摘要

本文详细介绍了一款基于Python的写真专辑智能下载器的开发全过程。该项目创新性地将网络爬虫技术与PySide6图形界面相结合,实现了从搜索、预览到批量下载的完整工作流。通过深度解析多线程爬虫、请求模拟、Qt界面开发等关键技术点,展示了如何构建一个功能完善且用户友好的专业级下载工具。文章包含7000+字详细说明、完整项目代码、效果展示图及技术架构图,为Python爬虫和GUI开发学习者提供了一份高质量的实践指南。


📖 目录

  1. 项目概述
  2. 核心功能
  3. 效果展示
  4. 技术架构
  5. 实现步骤详解
  6. 关键代码解析
  7. 项目优化建议
  8. 源码下载
  9. 总结展望

🏆 项目概述

写真专辑下载器是一款专为图片收藏爱好者设计的智能工具,主要解决传统下载方式存在的三大痛点:

  1. 效率问题:传统手动下载耗时耗力
  2. 管理困难:分散的图片难以系统化管理
  3. 预览缺失:无法快速浏览全部内容

项目采用分层架构设计:

指令传递数据请求数据返回界面更新用户界面层业务逻辑层网络爬虫层

技术栈组成:

  • 前端界面:PySide6 + QSS美化
  • 网络通信:Requests + BeautifulSoup
  • 异步处理:QThread + threading
  • 图像处理:QPixmap + QNetworkAccessManager

🛠️ 核心功能

1. 智能搜索系统

  • 支持关键词模糊搜索(如"古风 汉服")
  • 支持直接URL解析(自动识别专辑页/搜索页)
  • 多页自动爬取(最大支持1000+结果)

2. 可视化预览

在这里插入图片描述

3. 批量下载管理

  • 断点续传功能
  • 自动分类存储(按作者/主题)
  • 实时进度显示

4. 用户友好设计

  • 国际化emoji图标系统
  • 自适应Dark/Light主题
  • 操作历史记录

🎨 效果展示

主界面截图

在这里插入图片描述
在这里插入图片描述

图示:采用暗黑风格设计,左侧为搜索结果列表,右侧为预览区域

缩略图浏览

在这里插入图片描述

下载过程

https://fake-url.com/progress.png
图示:多线程下载时的实时进度显示


⚙️ 实现步骤详解

1. 环境搭建

# 创建虚拟环境 python -m venv venv source venv/bin/activate # 安装依赖 pip install PySide6 requests beautifulsoup4 

2. 爬虫核心开发

分三个阶段实现爬虫功能:

  1. 请求模拟阶段
defsubmit_search(keywords): form_data ={"keyboard": keywords,"show":"title","tempid":"1","tbname":"news"}# 模拟浏览器头省略...
  1. 页面解析阶段
在这里插入图片描述
  1. 反反爬策略
  • 随机User-Agent轮换
  • 请求间隔随机化(0.5-2s)
  • 自动重试机制(最大3次)

3. GUI开发流程

组件树结构
MainWindow ├── SearchPanel ├── ResultTree ├── PreviewArea │ ├── ImageLabel │ └── ThumbnailButton └── StatusBar ├── ProgressBar └── SpeedLabel 
样式定制
/* 暗黑主题示例 */QTreeWidget{background-color: #3c3f41;border: 1px solid #555;alternate-background-color: #383b3d;}

🔍 关键代码解析

1. 多线程下载器

classDownloadWorker(QThread): progress = Signal(int)defrun(self):for url in self.get_image_urls():if self.cancel_requested:break self.download_image(url) self.progress.emit(percent)

2. 智能URL处理

defprocess_url(url):if"searchid"in url:return parse_search_result(url)elif"/gallery"in url:return parse_gallery(url)else:return parse_index_page(url)

3. 图像缓存系统

在这里插入图片描述

🚀 项目优化建议

性能优化方向

  1. 引入SQLite缓存已下载记录
  2. 实现zip打包下载功能
  3. 添加EXIF元数据写入

扩展功能

在这里插入图片描述

📥 源码下载

import sys import threading import requests from bs4 import BeautifulSoup import re import os import html from collections import deque import time import random from PySide6.QtCore import QThread, Signal, QEvent from typing import List, Dict from urllib.parse import urljoin, urlparse, urlunparse from PySide6.QtWidgets import( QApplication, QWidget, QVBoxLayout, QHBoxLayout, QLineEdit, QPushButton, QTreeWidget, QTreeWidgetItem, QMessageBox, QLabel, QSizePolicy, QDialog, QScrollArea, QGridLayout, QStyle )from PySide6.QtWidgets import QProgressBar from PySide6.QtGui import QPixmap, QFont, QColor, QPalette from PySide6.QtCore import Qt, QThread, QUrl, QSize from PySide6.QtNetwork import QNetworkAccessManager, QNetworkRequest from PySide6.QtCore import Qt, QByteArray import logging # Configure logging logging.basicConfig(level=logging.INFO,format='%(asctime)s - %(levelname)s - %(message)s') logger = logging.getLogger(__name__)# Constants BASE_URL ="https://www.mxd009.cc"# Application styling APP_STYLE =""" QWidget { background-color: #2b2b2b; color: #e0e0e0; font-family: 'Segoe UI', Arial; } QLineEdit { background-color: #3c3f41; border: 1px solid #555; border-radius: 4px; padding: 5px; color: #e0e0e0; selection-background-color: #3d8ec9; } QPushButton { background-color: #3c3f41; border: 1px solid #555; border-radius: 4px; padding: 5px 10px; min-width: 80px; color: #e0e0e0; } QPushButton:hover { background-color: #4e5254; border: 1px solid #666; } QPushButton:pressed { background-color: #2d2f30; } QTreeWidget { background-color: #3c3f41; border: 1px solid #555; alternate-background-color: #383b3d; } QHeaderView::section { background-color: #3c3f41; padding: 5px; border: 1px solid #555; } QProgressBar { border: 1px solid #555; border-radius: 3px; text-align: center; background-color: #3c3f41; } QProgressBar::chunk { background-color: #4CAF50; width: 10px; } QLabel { color: #e0e0e0; } QScrollArea { border: 1px solid #555; background-color: #3c3f41; } QDialog { background-color: #2b2b2b; } """defsubmit_search(keywords:str)->str:"""Submit search request and return redirected URL""" form_data ={"keyboard": keywords,"show":"title","tempid":"1","tbname":"news"} SEARCH_URL =f"{BASE_URL}/e/search/index.php" session = requests.Session() response = session.post(SEARCH_URL, data=form_data, allow_redirects=False)if response.status_code ==302: new_location = response.headers.get("Location")return urljoin(SEARCH_URL, new_location)else:print("No redirect occurred, status code:", response.status_code)return""defget_total_count(soup: BeautifulSoup)->int:"""Extract total gallery count from page""" biaoqian_div = soup.find("div", class_="biaoqian")if biaoqian_div: p_text = biaoqian_div.find("p").get_text(strip=True)match= re.search(r"(\d+)", p_text)ifmatch:returnint(match.group(1))return0defparse_gallery_items_from_root(soup: BeautifulSoup)-> List[Dict[str,str]]:"""Extract all gallery info from page""" gallery_root = soup.find("div", class_="box galleryList") items =[]ifnot gallery_root:return items for li in gallery_root.select("ul.databox > li"): img_tag = li.select_one("div.img-box img") ztitle_tag = li.select_one("p.ztitle a") rtitle_tag = li.select_one("p.rtitle a") author_tag = li.select_one("p.ztitle font") count_tag = li.select_one("em.num") href = ztitle_tag["href"]if ztitle_tag and ztitle_tag.has_attr("href")else"" full_link = urljoin(BASE_URL, href) count =0if count_tag: text = count_tag.get_text(strip=True)# '15P'match= re.search(r'\d+', text)ifmatch: count =int(match.group(0)) rtitle = rtitle_tag.get_text(strip=True)if rtitle_tag else""if author_tag: author = author_tag.get_text(strip=True)else: author = rtitle item ={"img": img_tag["src"]if img_tag else"","ztitle": ztitle_tag.get_text(strip=True)if ztitle_tag else"","ztitle_href": full_link,"author": author,"rtitle": rtitle,"count":str(count)} items.append(item)return items defcrawl_single_gallery(url)-> List[Dict[str,str]]:"""Extract info from single gallery page""" items =[]try: response = requests.get(url, timeout=10) soup = BeautifulSoup(response.text,"html.parser")# Check for login prompt tishi_div = soup.find('div',id='tishi')if tishi_div: total_text = tishi_div.find('p').get_text()if tishi_div else''match= re.search(r'全本(\d+)张图片', total_text)ifmatch: total_count =int(match.group(1))else:# For logged-in users (not used in this program) page_div = soup.find('div',id='page')if page_div: span = page_div.find('span', string=re.compile(r'\d+/\d+'))if span:match= re.search(r'\d+/(\d+)', span.text)ifmatch: total_count =int(match.group(1))# Get gallery info gallery_div = soup.find('div', class_='gallerypic')ifnot gallery_div:return items jieshao_div = soup.find('div', class_='gallery_jieshao')if jieshao_div: title = jieshao_div.find('h1').get_text(strip=True) type_author =[a.get_text(strip=True)for a in soup.select('.gallery_renwu_title a')] first_img = gallery_div.find('img') item ={"img": first_img["src"]if first_img else"","ztitle": title,"ztitle_href": url,"author": type_author[1],"rtitle": type_author[0],"count":str(total_count)} items.append(item)return items except Exception as e: result ="Request failed:"+str(e)print(result)return items defcrawl_all_pages(search_url:str)-> List[Dict[str,str]]:"""Crawl all pages of search results""" all_results =[] page =0if"searchid"in search_url: searchid_match = re.search(r"searchid=(\d+)", search_url)ifnot searchid_match:print("Could not extract searchid")return[] searchid = searchid_match.group(1)whileTrue:# Construct paginated URLif page ==0: page_url = search_url else: page_url =f"{BASE_URL}/e/search/result/index.php?page={page}&searchid={searchid}"print(f"\n[Crawling page {page +1}] {page_url}")try: response = requests.get(page_url, timeout=10) soup = BeautifulSoup(response.text,"html.parser")except Exception as e:print("Request failed:", e)breakif page ==0: total = get_total_count(soup)print(f"[Total] Gallery count: {total}") results = parse_gallery_items_from_root(soup)ifnot results:print("[End] No data on current page, ending early.")break all_results.extend(results)iflen(all_results)>= total:print("[Complete] All items crawled.")break page +=1return all_results else: search_url = re.sub(r'_\d+\.html$','.html', search_url) response = requests.get(search_url)ifnot response:return[],[] soup = BeautifulSoup(response.text,"html.parser")# Get total pages page_div = soup.find("div", class_="layui-box layui-laypage layui-laypage-default") total_pages =1if page_div: span = page_div.find("span")if span:match= re.search(r'\d+/(\d+)', span.text.strip())ifmatch: total_pages =int(match.group(1))print(f"Total pages: {total_pages}")for index inrange(1,total_pages+1): page_url = re.sub(r'\.html$',f'_{index}.html', search_url)print(f"\n[Crawling page {index +1}] {page_url}")try: response = requests.get(page_url, timeout=10) soup = BeautifulSoup(response.text,"html.parser")except Exception as e:print("Request failed:", e)break results = parse_gallery_items_from_root(soup)ifnot results:print("[End] No data on current page, ending early.")break all_results.extend(results)return all_results classGalleryCrawler(QWidget):def__init__(self, cookies=None):super().__init__() self.setWindowTitle("📷 写真专辑下载器") self.resize(1000,700) self.setStyleSheet(APP_STYLE) self.init_ui()# Set window icon self.setWindowIcon(self.style().standardIcon(QStyle.SP_DirIcon)) self.download_queue = deque() self.current_worker =None self.cancel_requested =False self.selected_items =None self.is_downloading =False self.cookies = cookies self.headers = self._default_headers() self.session = requests.Session() self.session.headers.update(self.headers)if cookies: self.session.cookies.update(cookies)definit_ui(self):# Main layout main_layout = QVBoxLayout(self) main_layout.setContentsMargins(10,10,10,10) main_layout.setSpacing(10)# Search bar with emoji search_layout = QHBoxLayout() self.search_input = QLineEdit() self.search_input.setPlaceholderText("🔍 输入关键词搜索或直接粘贴网址...") self.search_input.setClearButtonEnabled(True) self.search_btn = QPushButton("🔎 搜索") self.search_btn.setStyleSheet(""" QPushButton { background-color: #4CAF50; font-weight: bold; } QPushButton:hover { background-color: #5CBF60; } """) self.search_btn.clicked.connect(self.start_search) search_layout.addWidget(self.search_input) search_layout.addWidget(self.search_btn)# Content area content_layout = QHBoxLayout() content_layout.setSpacing(15)# Left panel (tree view) left_panel = QVBoxLayout() left_panel.setSpacing(10)# Tree widget with emoji headers self.tree = QTreeWidget() self.tree.setHeaderLabels(["#️⃣ 序号","📛 主标题","🏷️ 分类","🔗 链接","🖼️ 数量","👤 作者"]) self.tree.setColumnWidth(0,50) self.tree.setColumnWidth(1,300) self.tree.setColumnWidth(2,120) self.tree.setColumnWidth(3,250) self.tree.setColumnWidth(4,50) self.tree.setColumnWidth(5,100) self.tree.setStyleSheet(""" QTreeWidget::item { padding: 5px; } QTreeWidget::item:hover { background-color: #4e5254; } """) self.tree.itemSelectionChanged.connect(self.on_tree_selection_changed)# Button panel with emoji btn_layout = QHBoxLayout() self.btn_download_selected = QPushButton("💾 下载选中") self.btn_download_selected.setStyleSheet(""" QPushButton { background-color: #2196F3; font-weight: bold; } QPushButton:hover { background-color: #31A6FF; } """) self.btn_download_selected.clicked.connect(self.download_selected) self.btn_download_all = QPushButton("📥 下载全部") self.btn_download_all.setStyleSheet(""" QPushButton { background-color: #FF9800; font-weight: bold; } QPushButton:hover { background-color: #FFA820; } """) self.btn_download_all.clicked.connect(self.download_all) self.btn_cancel_download = QPushButton("❌ 取消下载") self.btn_cancel_download.setEnabled(False) self.btn_cancel_download.clicked.connect(self.cancel_download) self.btn_show_all_more = QPushButton("🖼️ 显示全部缩略图") self.btn_show_all_more.clicked.connect(self.show_allthumbnails) btn_layout.addWidget(self.btn_download_selected) btn_layout.addWidget(self.btn_download_all) btn_layout.addWidget(self.btn_cancel_download) btn_layout.addWidget(self.btn_show_all_more) left_panel.addLayout(search_layout) left_panel.addWidget(self.tree) left_panel.addLayout(btn_layout)# Right panel (preview) right_panel = QVBoxLayout() right_panel.setSpacing(10)# Preview area with emoji title preview_title = QLabel("🖼️ 预览区域") preview_title.setAlignment(Qt.AlignCenter) preview_title.setStyleSheet("font-size: 16px; font-weight: bold;") self.image_label = QLabel() self.image_label.setAlignment(Qt.AlignCenter) self.image_label.setMinimumSize(320,320) self.image_label.setStyleSheet(""" QLabel { background-color: #3c3f41; border: 2px dashed #555; border-radius: 5px; } """) self.image_label.setText("👆 请选择项目预览") self.link_label = QLabel() self.link_label.setText('<a href="https://www.mxd009.cc/">🌐 在线浏览</a>') self.link_label.setOpenExternalLinks(True) self.link_label.setTextInteractionFlags(Qt.TextBrowserInteraction) self.link_label.setAlignment(Qt.AlignCenter) self.btn_show_more = QPushButton("🖼️ 显示更多缩略图") self.btn_show_more.setStyleSheet("font-weight: bold;") self.btn_show_more.clicked.connect(self.show_thumbnails) right_panel.addWidget(preview_title) right_panel.addWidget(self.image_label, stretch=1) right_panel.addWidget(self.link_label) right_panel.addWidget(self.btn_show_more, alignment=Qt.AlignHCenter) content_layout.addLayout(left_panel,3) content_layout.addLayout(right_panel,1)# Status bar status_layout = QHBoxLayout() self.album_label = QLabel("🔄 准备就绪") self.album_label.setStyleSheet("font-weight: bold; color: #4CAF50;") self.progress_bar = QProgressBar() self.progress_bar.setMinimum(0) self.progress_bar.setMaximum(100) self.progress_bar.setValue(0) self.progress_bar.setFormat("📊 下载进度: %p%") self.progress_bar.setStyleSheet(""" QProgressBar { text-align: center; border-radius: 3px; } QProgressBar::chunk { background-color: #4CAF50; border-radius: 2px; } """) status_layout.addWidget(self.album_label) status_layout.addWidget(self.progress_bar)# Add to main layout main_layout.addLayout(content_layout) main_layout.addLayout(status_layout)def_default_headers(self):"""Default headers to mimic browser behavior"""return{"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8","accept-encoding":"gzip, deflate, br, zstd","accept-language":"zh-CN,zh;q=0.9","cache-control":"max-age=0","sec-ch-ua":'"Google Chrome";v="137", "Chromium";v="137", "Not/A)Brand";v="24"',"sec-ch-ua-mobile":"?0","sec-ch-ua-platform":'"Windows"',"sec-fetch-dest":"document","sec-fetch-mode":"navigate","sec-fetch-site":"same-origin","upgrade-insecure-requests":"1","user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",}defshow_allthumbnails(self): img_list =[]for i inrange(self.tree.topLevelItemCount()): item = self.tree.topLevelItem(i) img = item.data(0, Qt.ItemDataRole.UserRole +1)if img: img_list.append(img)# Remove duplicates while preserving order img_urls =list(dict.fromkeys(img_list))iflen(img_urls)==0: QMessageBox.warning(self,"提示","请先搜索")return dialog = ThumbnailViewer(img_urls, self) dialog.exec()defshow_thumbnails(self): selected_item = self.tree.currentItem()if selected_item isNone: QMessageBox.warning(self,"提示","请先选择一项")return img_url = selected_item.data(0, Qt.ItemDataRole.UserRole +1) album_index = selected_item.text(0) album_title = selected_item.text(1) img_url = selected_item.text(3) total_count =int(selected_item.text(4)) author = selected_item.text(5)# Get all image URLs worker = DownloadWorker(self.session, author, album_title, img_url, total_count) image_urls = worker.get_image_urls() dialog = ThumbnailViewer(image_urls, self) dialog.exec()defcancel_download(self): self.cancel_requested =True self.download_queue.clear()# Clear pending tasksif self.current_worker and self.current_worker.isRunning(): self.current_worker.terminate() self.current_worker.wait() self.progress_bar.setValue(0) self.is_downloading =False self.btn_cancel_download.setEnabled(False) self.btn_download_all.setEnabled(True) self.search_btn.setEnabled(True) self.enable_search() QMessageBox.information(self,"提示","下载任务已被取消。") self.set_album("")defset_album(self, value:str):defupdate(): self.album_label.setText(value) self.run_on_main_thread(update)defset_progress(self, value:int):defupdate(): self.progress_bar.setValue(value) self.run_on_main_thread(update)defon_tree_selection_changed(self): selected_items = self.tree.selectedItems()ifnot selected_items: self.image_label.setText("👆 请选择项目预览") self.image_label.setPixmap(QPixmap())# Clear image self.link_label.setText('<a href="https://www.mxd009.cc/">🌐 在线浏览</a>')return item = selected_items[0] img_url = item.data(0, Qt.ItemDataRole.UserRole +1) album_index = item.text(0) album_title = item.text(1)ifnot self.is_downloading: self.set_album(f"📌 {album_index}{album_title}") self.set_progress(0)ifnot img_url: self.image_label.setText("🖼️ 无缩略图") self.image_label.setPixmap(QPixmap()) self.link_label.setText('<a href="https://www.mxd009.cc/">🌐 在线浏览</a>')return self.link_label.setText(f'<a href="{item.text(3)}">🌐 在线浏览</a>')# Async image loading to prevent UI freezing threading.Thread(target=self.load_image_from_url, args=(img_url,), daemon=True).start()defload_image_from_url(self, url):try: response = requests.get(url, timeout=10) response.raise_for_status() data = response.content except Exception: self.run_on_main_thread(lambda: self.image_label.setText("❌ 加载图片失败"))return pixmap = QPixmap() pixmap.loadFromData(data) scaled = pixmap.scaled(self.image_label.size(), Qt.AspectRatioMode.KeepAspectRatio, Qt.TransformationMode.SmoothTransformation)defset_pix(): self.image_label.setPixmap(scaled) self.image_label.setText("") self.run_on_main_thread(set_pix)defstart_search(self): keyword = self.search_input.text().strip()ifnot keyword: QMessageBox.warning(self,"提示","请输入关键词!")returnif keyword.startswith("https://www.mxd009.cc")and"searchid"notin keyword and"/gallery"notin keyword: threading.Thread(target=self.search_by_url, args=(keyword,), daemon=True).start()elif keyword.startswith("https://www.mxd009.cc")and"/gallery"in keyword: threading.Thread(target=self.search_by_gallery, args=(keyword,), daemon=True).start()else: threading.Thread(target=self.search_and_load, args=(keyword,), daemon=True).start() self.search_btn.setEnabled(False) self.tree.clear()ifnot self.is_downloading: self.set_album("🔍 搜索中...") self.set_progress(0) self.selected_items =None self.image_label.setText("👆 请选择项目预览")defsearch_by_gallery(self,search_url): results = crawl_single_gallery(search_url)ifnot results: self.show_message("未找到任何结果。") self.enable_search()return self.load_results(results) self.enable_search()defsearch_and_load(self, keyword): search_url = submit_search(keyword)ifnot search_url: self.show_message("搜索失败,未获取有效跳转链接。") self.enable_search()return self.search_by_url(search_url)defsearch_by_url(self,search_url): results = crawl_all_pages(search_url)ifnot results: self.show_message("未找到任何结果。") self.enable_search()return self.load_results(results) self.enable_search()defload_results(self, results):defupdate_ui(): self.tree.clear()for idx, item inenumerate(results,1): tree_item = QTreeWidgetItem([str(idx), item["ztitle"], item["rtitle"], item["ztitle_href"], item["count"], item["author"]]) tree_item.setData(0, Qt.ItemDataRole.UserRole +1, item["img"]) self.tree.addTopLevelItem(tree_item) self.run_on_main_thread(update_ui)defdownload_selected(self):if self.is_downloading: QMessageBox.information(self,"提示","当前任务正在下载中,请等待完成。")return items = self.tree.selectedItems()ifnot items: QMessageBox.information(self,"提示","请先选中要下载的项。")return album_index = items[0].text(0) album_title = items[0].text(1) download_url = items[0].text(3) total_count =int(items[0].text(4)) author = items[0].text(5) self.is_downloading =True self.progress_bar.setValue(0) self.set_album(f"⏬ 正在下载: {album_index}{album_title}") self.worker = DownloadWorker(self.session, author, album_title, download_url, total_count) self.worker.progress.connect(self.set_progress)deffinish_handler(msg): QMessageBox.information(self,"完成", msg) self.is_downloading =False self.btn_download_selected.setEnabled(True) self.btn_download_all.setEnabled(True) self.search_btn.setEnabled(True) self.worker.finished.connect(finish_handler) self.worker.message.connect(lambda err: QMessageBox.critical(self,"错误", err)) self.worker.start()defdownload_all(self):if self.is_downloading: QMessageBox.information(self,"提示","当前任务正在下载中,请等待完成。")return count = self.tree.topLevelItemCount()if count ==0: QMessageBox.information(self,"提示","无数据可下载。")return self.download_queue.clear() self.cancel_requested =False self.btn_cancel_download.setEnabled(True) self.btn_download_all.setEnabled(False) self.search_btn.setEnabled(False)for i inrange(count): item = self.tree.topLevelItem(i) album_index = item.text(0) album_title = item.text(1) download_url = item.text(3) total_count =int(item.text(4)) author = item.text(5) self.download_queue.append((album_index, author, album_title, download_url, total_count)) self.progress_bar.setValue(0) self.start_next_download()defstart_next_download(self):if self.cancel_requested: QMessageBox.information(self,"取消","下载已取消。") self.btn_cancel_download.setEnabled(False) self.btn_download_all.setEnabled(True) self.search_btn.setEnabled(True) self.enable_search()returnifnot self.download_queue: QMessageBox.information(self,"完成","✅ 全部下载完成!") self.is_downloading =False self.btn_cancel_download.setEnabled(False) self.btn_download_all.setEnabled(True) self.search_btn.setEnabled(True) self.enable_search()return album_index, author, album_title, url, total_count = self.download_queue.popleft() self.progress_bar.setValue(0) self.set_album(f"⏬ 正在下载: {album_index}{album_title}") self.current_worker = DownloadWorker(self.session, author, album_title, url, total_count) self.current_worker.progress.connect(self.set_progress) self.current_worker.message.connect(lambda err: QMessageBox.critical(self,"错误", err))defon_finished(msg): logger.info(msg) self.start_next_download() self.current_worker.finished.connect(on_finished) self.current_worker.start()defshow_message(self, text):defmsg(): QMessageBox.information(self,"提示", text) self.run_on_main_thread(msg)defenable_search(self):defen(): self.search_btn.setEnabled(True) self.run_on_main_thread(en)defrun_on_main_thread(self, func):# PySide6 main thread call QApplication.instance().postEvent(self, _FuncEvent(func))defcustomEvent(self, event): event.func()classDownloadWorker(QThread): progress = Signal(int) finished = Signal(str) message = Signal(str)def__init__(self, session, album_title, title, url, total_count):super().__init__() self.session = session self.album_title = album_title self.title = title self.url = url self.total_count = total_count defrun(self):try: success = self.process_album()if success: self.finished.emit(f"✅ {self.title} 下载完成")else: self.finished.emit(f"❌ {self.title} 下载失败")except Exception as e: self.message.emit(f"⚠️ {self.title} 异常: {e}")defsafe_request(self, url, timeout=10):try: response = self.session.get(url, timeout=timeout) response.raise_for_status() response.encoding = response.apparent_encoding return response except Exception:returnNonedefdownload_image(self, url, filepath):if os.path.exists(filepath):returnTruetry: response = self.session.get(url, timeout=15, stream=True) response.raise_for_status() os.makedirs(os.path.dirname(filepath), exist_ok=True)withopen(filepath,"wb")as f:for chunk in response.iter_content(chunk_size=8192): f.write(chunk)returnTrueexcept:returnFalsedefget_image_urls(self): response = self.safe_request(self.url)ifnot response:return[] soup = BeautifulSoup(response.text,"html.parser") img_tags = soup.select("div.gallerypic img")ifnot img_tags:return[] first_img_url = img_tags[0].get("src","")# Extract extension (like .jpg or .png) ext_match = re.search(r'\.(jpg|png|jpeg|gif|webp)$', first_img_url, re.IGNORECASE) extension = ext_match.group(0)if ext_match else".jpg"# Default .jpg is_numbered = re.search(r"/(\d{3})\.[a-zA-Z]+$", first_img_url) image_urls =[]# Image index img_index_match = re.search(r'/(\d+)\.(jpg|png|jpeg|gif|webp)$', first_img_url)if img_index_match: img_index = img_index_match.group(1) parsed = urlparse(first_img_url) base_path = os.path.dirname(parsed.path) base_url = urlunparse((parsed.scheme, parsed.netloc, base_path,'','',''))if is_numbered:for i inrange(1, self.total_count +1):ifint(img_index)>1: i = i +int(img_index) img_url =f"{base_url}/{i:03d}{extension}" image_urls.append(img_url)else:for i inrange(0, self.total_count):ifint(img_index)>1: i = i +int(img_index) img_url =f"{base_url}/{i}{extension}" image_urls.append(img_url)return image_urls defprocess_album(self): image_urls = self.get_image_urls()ifnot image_urls:returnFalse album_dir = os.path.join(self.album_title, self.title)if self.album_title else self.title os.makedirs(album_dir, exist_ok=True) total_count =len(image_urls) success_count =0for index, img_url inenumerate(image_urls):# Keep original extension, default to .jpg filename_from_url = os.path.basename(img_url)# Construct save path filename = os.path.join(album_dir, filename_from_url)if self.download_image(img_url, filename): success_count +=1 self.progress.emit(int(success_count / total_count *100))return success_count >0class_FuncEvent(QEvent):def__init__(self, func):super().__init__(QEvent.Type.User) self.func = func classThumbnailViewer(QDialog):def__init__(self, image_urls, parent=None):super().__init__(parent) title =f"🖼️ 缩略图预览 (共 {len(image_urls)} 张)" self.setWindowTitle(title) self.resize(900,700) self.setStyleSheet(""" QLabel { background-color: #3c3f41; border: 1px solid #555; } """) scroll = QScrollArea(self) scroll.setWidgetResizable(True) container = QWidget() scroll.setWidget(container) layout = QVBoxLayout(self) layout.addWidget(scroll) grid = QGridLayout(container) grid.setSpacing(10) grid.setContentsMargins(15,15,15,15) self.nam = QNetworkAccessManager(self) self.thumb_size = QSize(160,160)# Thumbnail sizefor i, url inenumerate(image_urls):# Create a vertical container for image and index label widget = QWidget() widget.setStyleSheet("background-color: #3c3f41; border-radius: 5px;") v_layout = QVBoxLayout(widget) v_layout.setContentsMargins(5,5,5,5) v_layout.setSpacing(5) label = QLabel("⏳ 加载中...") label.setFixedSize(self.thumb_size) label.setAlignment(Qt.AlignCenter) label.setStyleSheet(""" QLabel { background-color: #2b2b2b; border: 1px solid #555; border-radius: 3px; } """) number_label = QLabel(f"#{i+1}") number_label.setAlignment(Qt.AlignCenter) number_label.setStyleSheet("font-weight: bold; color: #4CAF50;") v_layout.addWidget(label) v_layout.addWidget(number_label) grid.addWidget(widget, i //5, i %5)# 5 per row self.load_image_async(url, label)defload_image_async(self, url, label): request = QNetworkRequest(QUrl(url)) request.setAttribute(QNetworkRequest.Http2AllowedAttribute,False)# Disable HTTP/2 reply = self.nam.get(request)defhandle_finished(): pixmap = QPixmap()if pixmap.loadFromData(reply.readAll()): label.setPixmap(pixmap.scaled( self.thumb_size, Qt.KeepAspectRatio, Qt.SmoothTransformation))else: label.setText("❌ 加载失败") reply.deleteLater() reply.finished.connect(handle_finished)if __name__ =="__main__": app = QApplication(sys.argv)# Set application font font = QFont() font.setFamily("Segoe UI") font.setPointSize(10) app.setFont(font) window = GalleryCrawler() window.show() sys.exit(app.exec())

✨ 总结展望

本项目通过将爬虫技术与GUI开发相结合,打造了一个功能完备的写真专辑下载工具。关键技术亮点包括:

  1. 智能解析引擎:自动识别多种URL格式
  2. 高效下载机制:多线程+断点续传
  3. 美观界面:专业级Qt界面设计

未来可向三个方向发展:

  • 移动端适配(Kivy框架)
  • AI智能推荐(基于内容分析)
  • 浏览器插件版本
作者心得:在开发过程中,最关键的突破点是解决了动态加载内容的抓取问题和Qt的多线程通信机制。建议初学者可以重点关注BeautifulSoup的CSS选择器使用和QThread的信号槽机制。

Read more

【Git】GitHub 连接失败解决方案:Failed to connect to github.com port 443 after 21090 ms: Couldn’t connect to se

【Git】GitHub 连接失败解决方案:Failed to connect to github.com port 443 after 21090 ms: Couldn’t connect to se

文章目录 * 一、使用 VPN 环境下的解决方案 * 1. 检查当前代理设置 * 2. 配置 Git 使用代理 * 3. 验证代理设置是否生效 * 4. 刷新 DNS 缓存 * 5. 重新尝试 Git 操作 * 二、未使用 VPN 环境下的解决方案 * 1. 取消 Git 配置的代理 * 2. 验证代理设置已成功移除 * 3. 重试 Git 操作 * 三、总结 * 使用 VPN 的解决方案: * 未使用 VPN 的解决方案: 在使用 Git 进行代码管理时,可能会遇到“Failed to connect

By Ne0inhk
解决Markdown笔记图片失效问题:Gitee+PicGo图床搭建全攻略

解决Markdown笔记图片失效问题:Gitee+PicGo图床搭建全攻略

引言:为什么要解决搭建图床? 你是否遇到过这样的场景: * 用 Obsidian 写了半年的知识库,换电脑时发现 所有图片都变成 “破碎图标”; * 把 Markdown 笔记分享给同事,对方打开后 图片全是本地路径,根本看不到内容; * 尝试用云盘链接替代,却因为 “防盗链” 或 “链接过期”,图片还是无法正常显示…… 本地 Markdown 笔记的 “图片依赖本地路径”,是困扰无数创作者的痛点。而解决这个问题的核心,就是搭建一个 “图床” —— 把图片托管到云端,让链接永远有效。 本文将带你用 “Gitee(国内免费仓库)+ PicGo(自动上传工具)+ Node.js(运行环境)” 搭建图床,不仅解决 “图片失效”,还能实现: * ✔️ 国内访问快:Gitee 服务器在国内,无需科学上网,图片秒加载; * ✔️ 完全免费:Gitee

By Ne0inhk
免费且完全开源的金融平台,金融数据集软件openbb

免费且完全开源的金融平台,金融数据集软件openbb

首个免费且完全开源的金融平台 repo:https://github.com/OpenBB-finance/OpenBB 手册:https://docs.openbb.co/odp/python/quickstart agent:https://github.com/OpenBB-finance/agents-for-openbb 提供股票、期权、加密货币、外汇、宏观经济、固定收益等多种金融工具的访问权限,并提供广泛的扩展功能,以满足用户的不同需求。 注册 OpenBB Hub,充分利用 OpenBB 生态系统。 还开源了一个可以访问 OpenBB 中所有数据的 AI 金融分析师代理,该存储库可以在此找到这里。 1. 安装 OpenBB 平台可以通过运行 pip install openbb 作为 PyPI

By Ne0inhk