Python 爬虫入门基础教程：流程、模块与框架详解

Python 爬虫入门基础教程：流程、模块与框架详解 | 极客日志

import requests

url = 'https://example.com'
response = requests.get(url)
print(response.text)

import requests

url = 'https://example.com/login'
data = {'username': 'user', 'password': 'pass'}
response = requests.post(url, data=data)
print(response.text)

import requests

proxies = {
    'http': 'http://127.0.0.1:8080',
    'https': 'http://127.0.0.1:8080'
}
response = requests.get('https://example.com', proxies=proxies)

import requests
import json

url = 'https://api.example.com/data'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
data = response.json()
print(json.dumps(data, indent=4, ensure_ascii=False))

import threading
import requests

def fetch(url):
    response = requests.get(url)
    print(f'{url}: {response.status_code}')

urls = ['https://example.com']
threads = []
for url in urls:
    t = threading.Thread(target=fetch, args=(url,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

Python 爬虫入门基础教程：流程、模块与框架详解

Python 爬虫入门基础教程

一、基础入门

1.1 什么是爬虫

1.2 爬虫基本流程

1.3 HTTP 协议请求与响应

1.3.1 Request

1.3.2 Response

二、基础模块

2.1 Requests

2.2 Re 正则表达式

2.3 XPath

2.4 BeautifulSoup

2.5 Json

2.6 Threading

三、方法实例

3.1 GET 方法实例

3.2 POST 方法实例

3.3 添加代理

3.4 获取 AJAX 类数据实例

3.5 使用多线程实例

四、爬虫框架

4.1 Scrapy 框架

4.2 Scrapy 架构图

4.3 Scrapy 主要组件

4.4 Scrapy 的运作流程

4.5 制作 Scrapy 爬虫 4 步曲

五、常用工具

5.1 Fiddler

5.2 XPath Helper

六、分布式爬虫

6.1 Scrapy-Redis

6.2 分布式策略

更多推荐文章

相关免费在线工具

Python 爬虫入门基础教程：流程、模块与框架详解

Python 爬虫入门基础教程

一、基础入门

1.1 什么是爬虫

1.2 爬虫基本流程

1.3 HTTP 协议 请求与响应

1.3.1 Request

1.3.2 Response

二、基础模块

2.1 Requests

2.2 Re 正则表达式

2.3 XPath

2.4 BeautifulSoup

2.5 Json

2.6 Threading

三、方法实例

3.1 GET 方法实例

3.2 POST 方法实例

3.3 添加代理

3.4 获取 AJAX 类数据实例

3.5 使用多线程实例

四、爬虫框架

4.1 Scrapy 框架

4.2 Scrapy 架构图

4.3 Scrapy 主要组件

4.4 Scrapy 的运作流程

4.5 制作 Scrapy 爬虫 4 步曲

五、常用工具

5.1 Fiddler

5.2 XPath Helper

六、分布式爬虫

6.1 Scrapy-Redis

6.2 分布式策略

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

1.3 HTTP 协议请求与响应