Python diskcache 磁盘缓存工具使用指南 | 极客日志

Python

Python diskcache 磁盘缓存工具使用指南

Python 的 diskcache 库，这是一个基于 SQLite 的持久化内存缓存工具。相比纯内存的 cachetools，diskcache 支持数据持久化，重启后不丢失，且不受内存大小限制。文章详细讲解了安装方法、淘汰策略（如 LRS、LRU）、基本存取操作、过期机制、清理方法（清空、按 Tag、按 Key）、自动缓存装饰器、队列操作、事务处理以及多进程分片缓存（FanoutCache）等功能。通过示例代码展示了如何配置、使用及监控缓存状态，适用于需要长期存储或大数据量缓存的场景。

云朵棉花糖发布于 2026/3/23更新于 2026/7/29.6K 浏览

之前写了 cachetools 的缓存工具，那个是纯内存的，性能上确实有优势，但重启后缓存数据会丢失。diskcache 则利用轻量级的 sqlite 数据库，该数据库不需要单独的服务器进程，并可以持久化数据结构，且可以突破内存的限制，针对大量数据的缓存时，不会因为内存溢出而丢失数据。

特性	diskcache	cachetools
存储位置	磁盘为主（内存为辅）	纯内存
持久化	✅ 支持（重启后数据还在）	❌ 不支持
数据大小	适合大数据（受磁盘限制）	适合小数据（受内存限制）
速度	磁盘 I/O 较慢	纯内存很快
使用场景	长期缓存、大数据	短期缓存、小数据

安装

pip install diskcache

淘汰策略

从源码的 EVICTION_POLICY 值可以看出，淘汰策略主要有以下几种。

'least-recently-stored' - 默认，按存储时间淘汰
'least-recently-used' - 按访问时间淘汰（每次访问都写数据库）
'least-frequently-used' - 按访问频率淘汰（每次访问都写数据库）
'none' - 禁用自动淘汰

默认则是按照 LRS 按缓存存储的先后时间进行淘汰的淘汰策略。

简单存取

from diskcache import Cache # 1. 实例一个缓存对象 # 需要传入目录路径。如果目录路径不存在，将创建该路径，并且会在指定位置创建一个 cache.db 的文件。 # 如果未指定，则会自动创建一个临时目录，并且会在指定位置创建一个 cache.db 的文件。 cache = Cache("cache") # 2. 保存缓存 cache.set('name', '张三', expire=60, read=True, tag='姓名', retry=True) # 3. 获取缓存 expire_time 为真，返回有效期时间；tag 为真，返回缓存设置的 tag 值 name = cache.get('name', default=False, expire_time=True, tag=True) print(name) # ('张三', 1770617370.6258903, '姓名')

上面代码执行之后，我们可以在当前位置的下发现有个 cache 目录，其中有个 cache.db 文件。因为这个 cache.db 是个 sqlite 数据库文件，我们可以尝试使用 pandas 读取一下。

import pandas as pd from sqlalchemy import create_engine engine = create_engine('sqlite:///cache/cache.db') pd.set_option('display.max_columns', None)

相关免费在线工具

curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online
Base64 字符串编码/解码
将字符串编码和解码为其 Base64 格式表示形式即可。在线工具，Base64 字符串编码/解码在线工具，online
Base64 文件转换器
将字符串、文件或图像转换为其 Base64 表示形式。在线工具，Base64 文件转换器在线工具，online
Markdown转HTML
将 Markdown（GFM）转为 HTML 片段，浏览器内 marked 解析；与 HTML转Markdown 互为补充。在线工具，Markdown转HTML在线工具，online
HTML转Markdown
将 HTML 片段转为 GitHub Flavored Markdown，支持标题、列表、链接、代码块与表格等；浏览器内处理，可链接预填。在线工具，HTML转Markdown在线工具，online
JSON 压缩
通过删除不必要的空白来缩小和压缩JSON。在线工具，JSON 压缩在线工具，online

 rowid key raw store_time expire_time access_time access_count tag size mode filename value 0 1 name 1 1.770605e+09 1.770605e+09 1.770605e+09 0 姓名 0 1 None 张三

from diskcache import Cache cache = Cache("cache") print(cache.get('key_not_exist')) # None

from diskcache import Cache cache = Cache("cache") print(cache.read('key_not_exist')) # Traceback (most recent call last): # File "test4.py", line 4, in <module> # print(cache.read('key_not_exist')) # ^^^^^^^^^^^^^^^^^^^^^^^^^^^ # File "site-packages/diskcache/core.py", line 1252, in read # raise KeyError(key) # KeyError: 'key_not_exist'

from diskcache import Cache cache = Cache("cache") print(cache.peekitem()) # ('key5', 'value5') print(cache.peekitem(last=False)) # ('key1', 'value1')

from diskcache import Cache cache = Cache("cache") result = cache.add("key1", "value1", expire=60) print(result) # True result = cache.add("key2", "value2", expire=30) print(result) # True

from diskcache import Cache cache = Cache("cache") name = cache.get('name', default=False, expire_time=True, tag=True) print(name) # (False, None, None)

import pandas as pd from sqlalchemy import create_engine engine = create_engine('sqlite:///cache/cache.db') pd.set_option('display.max_columns', None) # 不限制列数 pd.set_option('display.width', None) # 不限制列宽 if __name__ == '__main__': res = pd.read_sql('SELECT * FROM cache;', con=engine) print(res) # rowid key raw store_time expire_time access_time access_count tag size mode filename value # 0 1 name 1 1.770617e+09 1.770617e+09 1.770617e+09 0 姓名 0 1 None 张三

import time from diskcache import Cache cache = Cache("cache") cache.set('name', '张三', expire=60, read=True, tag='姓名', retry=True) cache.set('age', 30, expire=30, read=True, tag='年龄', retry=True) time.sleep(35) count = cache.expire() print(count) # 1

from diskcache import Cache cache = Cache("cache") cache.set('name', '张三', expire=60, read=True, tag='姓名', retry=True) cache.set('age', 30, expire=60, read=True, tag='年龄', retry=True) count = cache.clear() print(count) # 2

from diskcache import Cache cache = Cache("cache") cache.set('name', '张三', expire=60, read=True, tag='姓名', retry=True) cache.set('age', 30, expire=60, read=True, tag='年龄', retry=True) count = cache.cull() print(count)

from diskcache import Cache cache = Cache("cache") cache.set('name', '张三', expire=60, read=True, tag='姓名', retry=True) cache.set('age', 30, expire=60, read=True, tag='年龄', retry=True) count = cache.evict("年龄") print(count) # 1

from diskcache import Cache cache = Cache("cache") cache.set('name', '张三', expire=60, read=True, tag='姓名', retry=True) cache.set('age', 30, expire=60, read=True, tag='年龄', retry=True) result = cache.delete("name") print(result) # True

from diskcache import Cache cache = Cache("cache") cache.set('name', '张三', expire=60, read=True, tag='姓名', retry=True) cache.set('age', 30, expire=60, read=True, tag='年龄', retry=True) result = cache.pop("name") print(result) # 张三

from diskcache import Cache cache = Cache("cache") result = cache.add("key1", "value1") print(result) # True

from diskcache import Cache cache = Cache("cache") result = cache.touch("key1", 60) print(result) # True

from diskcache import Cache cache = Cache("cache") cache.set("key1", "value1") print("key1" in cache) # True

from diskcache import Cache cache = Cache("cache") cache.set('age', 30, expire=60, read=True, tag='年龄', retry=True) cache.incr("age", delta=5) print(cache.get("age")) # 35 cache.decr("age", delta=3) print(cache.get("age")) # 32

from diskcache import Cache cache = Cache("cache") warnings = cache.check() print(warnings) # []

from diskcache import Cache cache = Cache('cache') # 使用 cache.memoize() 装饰器自动缓存函数结果 @cache.memoize(expire=60) def compute_expensive_operation(x): # 模拟耗时计算 print("call function") return x * x * x # 第一次调用，计算结果并缓存 result = compute_expensive_operation(3) print(result) # 输出：call function 和 27 # 第二次调用，直接从缓存获取结果，不会再真正执行函数 result = compute_expensive_operation(3) print(result) # 输出：27

def wrapper(*args, **kwargs):
    """Wrapper for callable to cache arguments and return values."""
    key = wrapper.__cache_key__(*args, **kwargs)
    result = self.get(key, default=ENOVAL, retry=True)
    if result is ENOVAL:
        result = func(*args, **kwargs)
    if expire is None or expire > 0:
        self.set(key, result, expire, tag=tag, retry=True)
    return result

from diskcache import Cache cache = Cache("cache") result = cache.push("first", prefix="test") print(result) # test-500000000000000 result = cache.push("second", prefix="test") print(result) # test-500000000000001 result = cache.push("third", prefix="test", side="front") # 从队列前面插入 print(result) # test-499999999999999

from diskcache import Cache cache = Cache("cache") result = cache.pull("test") print(result) # ('test-499999999999999', 'third') result = cache.pull("test") print(result) # ('test-500000000000000', 'first') result = cache.pull("test") print(result) # ('test-500000000000001', 'second')

from diskcache import Cache cache = Cache("cache") result = cache.pull("test") print(result) # ('test-499999999999999', 'third') result = cache.pull("test", side="back") print(result) # ('test-500000000000001', 'second') result = cache.pull("test") print(result) # ('test-500000000000000', 'first')

from diskcache import Cache cache = Cache("cache") print(cache.peek(prefix="test")) # ('test-499999999999999', 'third') print(cache.peek(prefix="test", side="back")) # ('test-500000000000001', 'second')

from diskcache import Cache cache = Cache("cache") cache.close()

SELECT key, value FROM Settings;

 key value 0 count 0 1 size 0 2 hits 0 3 misses 0 4 statistics 0 5 tag_index 0 6 eviction_policy least-recently-stored 7 size_limit 1073741824 8 cull_limit 10 9 sqlite_auto_vacuum 1 10 sqlite_cache_size 8192 11 sqlite_journal_mode wal 12 sqlite_mmap_size 67108864 13 sqlite_synchronous 1 14 disk_min_file_size 32768 15 disk_pickle_protocol 5

from diskcache import Cache cache = Cache("cache") value = cache.reset("cull_limit", 30) print(value) # 30

 key value 0 count 2 1 size 0 2 hits 0 3 misses 0 4 statistics 0 5 tag_index 0 6 eviction_policy least-recently-stored 7 size_limit 1073741824 8 cull_limit 30 ...

from diskcache import Cache cache = Cache("cache") value = cache.reset("cull_limit", update=False) print(value) # 10

from diskcache import Cache cache = Cache("cache") items = {"key1": "value1", "key2": "value2", "key3": "value3"} with cache.transact(): # 性能提升 2-5 倍 for key, value in items.items(): cache[key] = value

from diskcache import Cache cache = Cache("cache") # 启用统计 cache.stats(enable=True) # 获取命中率 hits, misses = cache.stats() print(f"hits: {hits}; misses: {misses}") # hits: 5; misses: 2 # 检查缓存一致性 warnings = cache.check() print(warnings) # [] # 获取缓存体积（字节） size = cache.volume() print(size) # 32768

from diskcache import Deque cache = Deque([0, 1, 2, 3, 4], "cache") cache.append(5) print(list(cache)) # [0, 1, 2, 3, 4, 5]

 rowid key raw store_time expire_time access_time access_count tag size mode filename value 0 1 500000000000000 1 1.770714e+09 None 1.770714e+09 0 None 0 1 None 0 1 2 500000000000001 1 1.770714e+09 None 1.770714e+09 0 None 0 1 None 1 2 3 500000000000002 1 1.770714e+09 None 1.770714e+09 0 None 0 1 None 2 3 4 500000000000003 1 1.770714e+09 None 1.770714e+09 0 None 0 1 None 3 4 5 500000000000004 1 1.770714e+09 None 1.770714e+09 0 None 0 1 None 4 5 6 500000000000005 1 1.770714e+09 None 1.770714e+09 0 None 0 1 None 5

self._cache = Cache(directory, eviction_policy='none')

from diskcache import Index # 持久化字典，永不淘汰 index = Index("cache", {"a": 1, "b": 2}, c=3) index.update([('d', 4), ('e', 5)]) print(list(index)) # ['a', 'b', 'c', 'd', 'e']

from diskcache import Cache, Index cache = Cache("cache") index = Index.fromcache(cache)

from diskcache import Index index = Index("cache") index['f'] = 6 print(index.get('c')) # 3

from diskcache import Index index = Index("cache") result = index.setdefault('h', 9) print(result) # 8

from diskcache import FanoutCache # 初始化分布式缓存，8 个分片，每个分片 1GB 大小限制 distributed_cache = FanoutCache( directory='./cache', shards=8, size_limit=1024 ** 3 # 1GB per shard ) # 在多进程环境中共享使用 from multiprocessing import Pool def fetch_page(url): return f"content of {url}" def spider_worker(url): # 自动路由到对应分片 if distributed_cache.get(url, default=None) is None: data = fetch_page(url) distributed_cache.set(url, data, expire=86400) # 缓存 24 小时 return url url_list = [ "https://example1.com", "https://example2.com", "https://example3.com", "https://example4.com", ] if __name__ == '__main__': with Pool(processes=8) as pool: # 并行处理 URL 列表 results = pool.map(spider_worker, url_list)

self._shards = tuple(
    Cache(
        directory=op.join(directory, '%03d' % num),
        timeout=timeout,
        disk=disk,
        size_limit=size_limit,
        **settings,
    )
    for num in range(shards)
)

import pandas as pd from sqlalchemy import create_engine engine = create_engine('sqlite:///cache/001/cache.db') pd.set_option('display.max_columns', None) # 不限制列数 pd.set_option('display.width', None) # 不限制列宽 if __name__ == '__main__': res = pd.read_sql('SELECT * FROM cache;', con=engine) # res = pd.read_sql('SELECT key, value FROM Settings;', con=engine) print(res)

# rowid key raw store_time expire_time access_time access_count tag size mode filename value # 0 1 https://example4.com 1 1.770632e+09 1.770718e+09 1.770632e+09 0 None 0 1 None content of https://example4.com

Python diskcache 磁盘缓存工具使用指南

安装

淘汰策略

简单存取

更多推荐文章

相关免费在线工具

缓存获取

获取指定

获取最近

缓存过期

缓存清理

清空过期

清空所有

强制清理

按 tag 清理

按 key 清理

缓存添加

缓存刷新

缓存判断

缓存修改

缓存检查

自动缓存

队列操作

队列推送

队列拉取

队列查看

连接关闭

配置重置

配置查看

事务操作

命中统计

Deque 队列缓存

Index 索引缓存

Fanout 分片缓存

更多推荐文章

相关免费在线工具

Python diskcache 磁盘缓存工具使用指南

安装

淘汰策略

简单存取

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

缓存获取

获取指定

获取最近

缓存过期

缓存清理

清空过期

清空所有

强制清理

按 tag 清理

按 key 清理

缓存添加

缓存刷新

缓存判断

缓存修改

缓存检查

自动缓存

队列操作

队列推送

队列拉取

队列查看

连接关闭

配置重置

配置查看

事务操作

命中统计

Deque 队列缓存

Index 索引缓存

Fanout 分片缓存

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具