跳到主要内容Python CSV 模块完整教程 | 极客日志Python
Python CSV 模块完整教程
综述由AI生成详细讲解了 Python 标准库 csv 模块的使用方法,包括基础读写、参数配置、Dialect 定制、字典读写器及生产环境优化技巧。内容涵盖代码示例、错误处理、大数据量处理方案及数据库交互案例,旨在帮助开发者高效处理 CSV 数据。
雪落无声32 浏览 Python CSV 模块完整教程
本教程全面覆盖 Python 标准库 csv 模块的所有知识点,代码逐行注释,包含生产环境实战案例。
环境要求
- Python 3.12+
- 虚拟环境目录:
.venv
使用虚拟环境
.venv\Scripts\activate
.venv\Scripts\python.exe chapter01_intro\01_what_is_csv.py
第 1 章:CSV 模块基础介绍
1.1 什么是 CSV 格式
CSV(Comma-Separated Values,逗号分隔值)是一种通用的、简单的数据存储格式,被广泛应用于数据交换、数据存储和数据处理场景。
CSV 格式的特点:
- 纯文本格式,可用任何文本编辑器打开
- 每行代表一条记录
- 字段之间用逗号(或其他分隔符)分隔
- 第一行通常是表头(字段名)
- 跨平台兼容性好
基本结构示例:
姓名,年龄,城市
张三,25,北京
李四,30,上海
王五,28,广州
1.2 CSV 模块常量
import csv
print(f"csv.QUOTE_ALL = {csv.QUOTE_ALL}")
print(f"csv.QUOTE_MINIMAL = {csv.QUOTE_MINIMAL}")
print(f"csv.QUOTE_NONNUMERIC = {csv.QUOTE_NONNUMERIC}")
print(f"csv.QUOTE_NONE = {csv.QUOTE_NONE}")
print()
()
f"csv.QUOTE_NOTNULL = {csv.QUOTE_NOTNULL}"
print
f"csv.QUOTE_STRINGS = {csv.QUOTE_STRINGS}"
1.3 快速入门示例
1.3.1 写入 CSV 文件
import csv
output_file = 'sample_output.csv'
header = ['姓名', '年龄', '城市', '职业']
data = [
['张三', '25', '北京', '工程师'],
['李四', '30', '上海', '设计师'],
['王五', '28', '广州', '教师'],
]
with open(output_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(data)
print(f"✓ CSV 文件已创建:{output_file}")
1.3.2 读取 CSV 文件
import csv
with open('sample_output.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row_num, row in enumerate(reader, start=1):
print(f"第{row_num}行:{row}")
1.3.3 使用 StringIO(内存中操作)
import csv
from io import StringIO
csv_data = """
姓名,年龄,城市
张三,25,北京
李四,30,上海
王五,28,广州
"""
string_io = StringIO(csv_data)
reader = csv.reader(string_io)
for row in reader:
print(row)
第 2 章:csv.reader - 读取 CSV 文件
2.1 基本读取操作
2.1.1 最基本的读取方式
import csv
with open('data.csv', 'r', newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print(f"读取到:{row}")
2.1.2 获取行号
import csv
with open('data.csv', 'r', newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
for line_num, row in enumerate(reader, start=1):
print(f"第{line_num}行:{row}")
2.1.3 分别处理表头和数据行
import csv
with open('data.csv', 'r', newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
headers = next(reader)
print(f"表头:{headers}")
for row_num, row in enumerate(reader, start=1):
print(f"数据行{row_num}: {row}")
2.1.4 转换为列表(全部加载到内存)
import csv
with open('data.csv', 'r', newline='', encoding='utf-8') as csvfile:
reader = csv.reader(csvfile)
all_rows = list(reader)
print(f"总行数:{len(all_rows)}")
print(f"第一行:{all_rows[0]}")
2.2 reader 函数的参数详解
2.2.1 delimiter - 字段分隔符
import csv
from io import StringIO
csv_semicolon = """
姓名;年龄;城市
张三;25;北京
李四;30;上海
"""
string_io = StringIO(csv_semicolon)
reader = csv.reader(string_io, delimiter=';')
for row in reader:
print(row)
2.2.2 quotechar - 引号字符
import csv
from io import StringIO
csv_single_quote = """
姓名,年龄,描述
张三,25,'喜欢编程,热爱 Python'
李四,30,'设计师,擅长 UI/UX'
"""
string_io = StringIO(csv_single_quote)
reader = csv.reader(string_io, quotechar="'")
for row in reader:
print(row)
2.2.3 doublequote - 双写引号处理
import csv
from io import StringIO
csv_data = '''姓名,描述
张三,"他说:""你好"""
李四,"擅长""Python""编程"'''
string_io = StringIO(csv_data)
reader = csv.reader(string_io, doublequote=True)
for row in reader:
print(row)
2.2.4 escapechar - 转义字符
import csv
from io import StringIO
csv_data = """
姓名,描述
张三,喜欢\,编程
李四,擅长\"Python\""""
string_io = StringIO(csv_data)
reader = csv.reader(string_io, escapechar='\\')
for row in reader:
print(row)
2.3 实际应用场景
2.3.1 数据统计分析
import csv
with open('sales_data.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
headers = next(reader)
total_sales = 0
total_quantity = 0
row_count = 0
for row in reader:
total_sales += int(row[4])
total_quantity += int(row[5])
row_count += 1
print(f"统计结果:")
print(f" 总记录数:{row_count}")
print(f" 总销售额:¥{total_sales:,}")
print(f" 总数量:{total_quantity}")
print(f" 平均单价:¥{total_sales / total_quantity:.2f}")
2.3.2 按类别分组统计
import csv
from collections import defaultdict
with open('sales_data.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
next(reader)
category_stats = defaultdict(lambda: {'sales': 0, 'quantity': 0})
for row in reader:
category = row[2]
sales = int(row[4])
quantity = int(row[5])
category_stats[category]['sales'] += sales
category_stats[category]['quantity'] += quantity
print("按类别统计:")
for category, stats in sorted(category_stats.items()):
print(f" {category}: 销售额¥{stats['sales']:,}, 数量{stats['quantity']}")
2.3.3 数据筛选(按条件过滤)
import csv
with open('sales_data.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
headers = next(reader)
filtered_rows = []
for row in reader:
sales = int(row[4])
if sales > 5000:
filtered_rows.append(row)
print(f"筛选结果(销售额>5000): {len(filtered_rows)} 条")
for row in filtered_rows:
print(f" {row[0]} - {row[1]}: ¥{row[4]}")
第 3 章:csv.writer - 写入 CSV 文件
3.1 基本写入操作
3.1.1 最基本的写入方式
import csv
output_file = 'output_basic.csv'
with open(output_file, 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['姓名', '年龄', '城市'])
writer.writerow(['张三', '25', '北京'])
writer.writerow(['李四', '30', '上海'])
print(f"✓ 文件已创建:{output_file}")
3.1.2 使用 writerows() 批量写入
import csv
output_file = 'output_batch.csv'
header = ['产品', '价格', '库存']
data = [
['iPhone', '5999', '100'],
['iPad', '3999', '50'],
['MacBook', '9999', '30'],
]
with open(output_file, 'w', newline='', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(header)
writer.writerows(data)
3.2 writer 函数的参数详解
3.2.1 delimiter - 字段分隔符
import csv
from io import StringIO
string_io = StringIO()
writer = csv.writer(string_io, delimiter=';')
writer.writerow(['姓名', '年龄', '城市'])
writer.writerow(['张三', '25', '北京'])
print(string_io.getvalue())
3.2.2 quoting - 引号规则
import csv
from io import StringIO
data = [
['纯文本', '100'],
['包含,逗号', '200'],
['包含"引号', '300'],
]
string_io = StringIO()
writer = csv.writer(string_io, quoting=csv.QUOTE_MINIMAL)
writer.writerows(data)
print("QUOTE_MINIMAL:", string_io.getvalue())
string_io = StringIO()
writer = csv.writer(string_io, quoting=csv.QUOTE_ALL)
writer.writerows(data)
print("QUOTE_ALL:", string_io.getvalue())
string_io = StringIO()
writer = csv.writer(string_io, quoting=csv.QUOTE_NONNUMERIC)
writer.writerows(data)
print("QUOTE_NONNUMERIC:", string_io.getvalue())
3.2.3 lineterminator - 行终止符
import csv
from io import StringIO
string_io = StringIO()
writer = csv.writer(string_io, lineterminator='\n')
writer.writerow(['A', 'B'])
writer.writerow(['1', '2'])
print(repr(string_io.getvalue()))
第 4 章:Dialect 和格式参数
4.1 查看内置 Dialect
import csv
dialects = csv.list_dialects()
print(f"已注册的 Dialect: {dialects}")
for dialect_name in dialects:
dialect = csv.get_dialect(dialect_name)
print(f"\n{dialect_name} dialect 配置:")
print(f" delimiter: '{dialect.delimiter}'")
print(f" quotechar: '{dialect.quotechar}'")
print(f" doublequote: {dialect.doublequote}")
print(f" skipinitialspace: {dialect.skipinitialspace}")
print(f" lineterminator: {repr(dialect.lineterminator)}")
print(f" quoting: {dialect.quoting}")
print(f" escapechar: {dialect.escapechar}")
4.2 使用 Dialect
import csv
from io import StringIO
string_io = StringIO()
writer = csv.writer(string_io, dialect='excel-tab')
writer.writerow(['姓名', '年龄', '城市'])
writer.writerow(['张三', '25', '北京'])
print(string_io.getvalue())
4.3 自定义 Dialect
import csv
from io import StringIO
csv.register_dialect(
'myexcel',
delimiter=';',
quotechar="'",
quoting=csv.QUOTE_ALL
)
string_io = StringIO()
writer = csv.writer(string_io, dialect='myexcel')
writer.writerow(['姓名', '年龄'])
writer.writerow(['张三', '25'])
print(string_io.getvalue())
csv.unregister_dialect('myexcel')
4.4 Sniffer 自动检测格式
import csv
from io import StringIO
sample = "姓名;年龄;城市\n张三;25;北京"
sniffer = csv.Sniffer()
dialect = sniffer.sniff(sample)
print(f"检测到的分隔符:'{dialect.delimiter}'")
print(f"检测到的引号字符:'{dialect.quotechar}'")
string_io = StringIO(sample)
reader = csv.reader(string_io, dialect=dialect)
for row in reader:
print(row)
第 5 章:DictReader 和 DictWriter
5.1 DictReader - 字典形式读取
5.1.1 基本用法
import csv
with open('employees.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['姓名']} 在 {row['部门']} 担任 {row['职位']}")
5.1.2 DictReader vs 普通 reader 对比
import csv
with open('data.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
headers = next(reader)
for row in reader:
print(f"{row[0]} 在 {row[1]} 担任 {row[2]}")
with open('data.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['姓名']} 在 {row['部门']} 担任 {row['职位']}")
5.1.3 手动指定字段名
import csv
from io import StringIO
no_header_data = """
张三,技术部,软件工程师,15000
李四,设计部,UI 设计师,12000
"""
string_io = StringIO(no_header_data)
reader = csv.DictReader(string_io, fieldnames=['姓名', '部门', '职位', '薪资'])
for row in reader:
print(f"{row['姓名']}: {row['职位']}, 薪资{row['薪资']}")
5.2 DictWriter - 字典形式写入
5.2.1 基本用法
import csv
output_file = 'output_dict.csv'
fieldnames = ['姓名', '年龄', '城市', '职业']
data = [
{'姓名': '张三', '年龄': '25', '城市': '北京', '职业': '工程师'},
{'姓名': '李四', '年龄': '30', '城市': '上海', '职业': '设计师'},
]
with open(output_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'姓名': '王五', '年龄': '28', '城市': '广州', '职业': '教师'})
writer.writerows(data)
5.2.2 处理缺失字段
import csv
from io import StringIO
fieldnames = ['姓名', '年龄', '城市', '职业', '备注']
data = [
{'姓名': '张三', '年龄': '25', '城市': '北京'},
{'姓名': '李四', '年龄': '30', '城市': '上海', '职业': '设计师'},
]
string_io = StringIO()
writer = csv.DictWriter(string_io, fieldnames=fieldnames, restval='N/A')
writer.writeheader()
writer.writerows(data)
print(string_io.getvalue())
第 6 章:高级用法和实际案例
6.1 错误处理
import csv
from io import StringIO
csv_data = "姓名,年龄\n张三,25\n李四,30"
string_io = StringIO(csv_data)
with open('data.csv', 'r', newline='', encoding='utf-8', errors='replace') as f:
reader = csv.reader(f)
for row in reader:
print(row)
6.2 大文件处理
import csv
def process_large_file(filepath):
"""逐行处理大文件,内存占用低"""
with open(filepath, 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
yield row
for row in process_large_file('large_file.csv'):
pass
def process_in_batches(filepath, batch_size=1000):
"""批量处理"""
with open(filepath, 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
next(reader)
batch = []
for row in reader:
batch.append(row)
if len(batch) >= batch_size:
process_batch(batch)
batch = []
if batch:
process_batch(batch)
6.3 最佳实践总结
6.3.1 始终使用 newline=''
with open('file.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
with open('file.csv', 'w', encoding='utf-8') as f:
writer = csv.writer(f)
6.3.2 始终指定编码
with open('file.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
with open('file.csv', 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
6.3.3 使用上下文管理器
with open('file.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
f = open('file.csv', 'r', newline='', encoding='utf-8')
reader = csv.reader(f)
f.close()
6.3.4 优先使用 DictReader/DictWriter
with open('data.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
print(row['name'])
with open('data.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
print(row[0])
第 7 章:生产环境高级特性
7.1 Python 3.12+ 新增引号规则
import csv
import sys
print(f"当前 Python 版本:{sys.version}")
string_io = StringIO()
writer = csv.writer(string_io, quoting=csv.QUOTE_STRINGS)
writer.writerow(['用户 ID', '用户名', '年龄', '余额'])
writer.writerow(['U001', '张三', 25, 1500.50])
print(string_io.getvalue())
data = [
['订单号', '客户名', '折扣', '备注'],
['ORD001', '张三', None, 'VIP 客户'],
['ORD002', '李四', 0.15, None],
]
string_io = StringIO()
writer = csv.writer(string_io, quoting=csv.QUOTE_NOTNULL)
writer.writerows(data)
print(string_io.getvalue())
7.2 CSV 与数据库交互
7.2.1 数据库导出为 CSV
import csv
import sqlite3
def export_table_to_csv(conn, table_name, output_file, where_clause=None):
"""将数据库表导出为 CSV 文件"""
cursor = conn.cursor()
cursor.execute(f"PRAGMA table_info({table_name})")
columns = [col[1] for col in cursor.fetchall()]
query = f"SELECT * FROM {table_name}"
if where_clause:
query += f" WHERE {where_clause}"
cursor.execute(query)
rows = cursor.fetchall()
with open(output_file, 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
writer.writerow(columns)
writer.writerows(rows)
return len(rows)
conn = sqlite3.connect('production.db')
count = export_table_to_csv(conn, 'employees', 'export_employees.csv')
print(f"导出 {count} 条记录")
conn.close()
7.2.2 CSV 导入数据库
import csv
import sqlite3
def import_csv_to_table(conn, csv_file, table_name):
"""将 CSV 文件导入数据库表"""
cursor = conn.cursor()
success_count = 0
with open(csv_file, 'r', newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
columns = list(row.keys())
placeholders = ', '.join(['?' for _ in columns])
column_names = ', '.join(columns)
query = f"INSERT INTO {table_name} ({column_names}) VALUES ({placeholders})"
cursor.execute(query, list(row.values()))
success_count += 1
conn.commit()
return success_count
conn = sqlite3.connect('production.db')
count = import_csv_to_table(conn, 'import_data.csv', 'employees')
print(f"导入 {count} 条记录")
conn.close()
7.2.3 批量导入优化
import csv
import sqlite3
def import_batch(conn, csv_file, table_name, batch_size=1000):
"""批量导入,性能提升 10 倍以上"""
cursor = conn.cursor()
total_count = 0
batch = []
with open(csv_file, 'r', newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
columns = None
for row in reader:
if columns is None:
columns = list(row.keys())
batch.append(list(row.values()))
if len(batch) >= batch_size:
placeholders = ', '.join(['?' for _ in columns])
column_names = ', '.join(columns)
query = f"INSERT INTO {table_name} ({column_names}) VALUES ({placeholders})"
cursor.executemany(query, batch)
total_count += len(batch)
batch = []
if batch:
cursor.executemany(query, batch)
total_count += len(batch)
conn.commit()
return total_count
7.3 数据验证和 Schema 验证
import csv
import re
from collections import defaultdict
class FieldValidator:
"""字段验证器基类"""
def __init__(self, name, required=True, allow_empty=False):
self.name = name
self.required = required
self.allow_empty = allow_empty
def validate(self, value):
if value is None or value == '':
if self.required and not self.allow_empty:
return False, f"{self.name}: 必填字段不能为空"
return True, None
return self._validate_value(value)
def _validate_value(self, value):
return True, None
class StringValidator(FieldValidator):
"""字符串验证器"""
def __init__(self, name, min_length=None, max_length=None, pattern=None, **kwargs):
super().__init__(name, **kwargs)
self.min_length = min_length
self.max_length = max_length
self.pattern = re.compile(pattern) if pattern else None
def _validate_value(self, value):
if self.min_length and len(value) < self.min_length:
return False, f"{self.name}: 长度不能少于 {self.min_length}"
if self.max_length and len(value) > self.max_length:
return False, f"{self.name}: 长度不能超过 {self.max_length}"
if self.pattern and not self.pattern.match(value):
return False, f"{self.name}: 格式不匹配"
return True, None
class IntegerValidator(FieldValidator):
"""整数验证器"""
def __init__(self, name, min_value=None, max_value=None, **kwargs):
super().__init__(name, **kwargs)
self.min_value = min_value
self.max_value = max_value
def _validate_value(self, value):
try:
num = int(value)
if self.min_value and num < self.min_value:
return False, f"{self.name}: 不能小于 {self.min_value}"
if self.max_value and num > self.max_value:
return False, f"{self.name}: 不能大于 {self.max_value}"
return True, None
except ValueError:
return False, f"{self.name}: 必须是整数"
class EmailValidator(StringValidator):
"""邮箱验证器"""
def __init__(self, name, **kwargs):
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
super().__init__(name, pattern=pattern, **kwargs)
validators = [
StringValidator('用户名', min_length=3, max_length=20),
IntegerValidator('年龄', min_value=0, max_value=150),
EmailValidator('邮箱'),
]
data = {
'用户名': '张三',
'年龄': '25',
'邮箱': '[email protected]'
}
for validator in validators:
is_valid, error = validator.validate(data.get(validator.name))
if is_valid:
print(f"✓ {validator.name}: 有效")
else:
print(f"✗ {validator.name}: {error}")
7.4 性能优化技巧
7.4.1 生成器逐行处理
import csv
def process_with_generator(filepath):
"""使用生成器逐行处理,内存占用低"""
with open(filepath, 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
yield row
7.4.2 字符串拼接优化
from io import StringIO
import csv
result = ""
for i in range(10000):
result += f"row{i},data{i}\n"
lines = []
for i in range(10000):
lines.append(f"row{i},data{i}")
result = '\n'.join(lines)
output = StringIO()
writer = csv.writer(output)
for i in range(10000):
writer.writerow([f'row{i}', f'data{i}'])
result = output.getvalue()
7.4.3 批量写入优化
import csv
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
for row in data:
writer.writerow(row)
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerows(data)
7.4.4 使用__slots__减少内存
class Employee:
def __init__(self, id, name, age):
self.id = id
self.name = name
self.age = age
class EmployeeOptimized:
__slots__ = ['id', 'name', 'age']
def __init__(self, id, name, age):
self.id = id
self.name = name
self.age = age
核心知识点速查
基本读取
import csv
with open('file.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
print(row)
基本写入
import csv
with open('file.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['A', 'B', 'C'])
writer.writerows([[1, 2, 3], [4, 5, 6]])
DictReader
import csv
with open('file.csv', 'r', newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
print(row['column_name'])
DictWriter
import csv
with open('file.csv', 'w', newline='', encoding='utf-8') as f:
fieldnames = ['name', 'age']
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'name': '张三', 'age': 25})
重要提示
- 始终使用
newline='' - 防止在 Windows 上产生空行
- 始终指定
encoding='utf-8' - 正确处理中文字符
- 使用上下文管理器 (
with 语句) - 确保文件正确关闭
- 优先使用 DictReader/DictWriter - 代码可读性更好
- 大批量数据使用批量操作 - 提升性能
- 生产环境添加数据验证 - 保证数据质量
许可证
相关免费在线工具
- curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。 在线工具,curl 转代码在线工具,online
- Base64 字符串编码/解码
将字符串编码和解码为其 Base64 格式表示形式即可。 在线工具,Base64 字符串编码/解码在线工具,online
- Base64 文件转换器
将字符串、文件或图像转换为其 Base64 表示形式。 在线工具,Base64 文件转换器在线工具,online
- Markdown转HTML
将 Markdown(GFM)转为 HTML 片段,浏览器内 marked 解析;与 HTML转Markdown 互为补充。 在线工具,Markdown转HTML在线工具,online
- HTML转Markdown
将 HTML 片段转为 GitHub Flavored Markdown,支持标题、列表、链接、代码块与表格等;浏览器内处理,可链接预填。 在线工具,HTML转Markdown在线工具,online
- JSON 压缩
通过删除不必要的空白来缩小和压缩JSON。 在线工具,JSON 压缩在线工具,online