大模型

第九节基于huggingface加载openai/clip-vit-large-patch14-336视觉模型demo

Ne0inhk

14 Jan 2025 — 4 min read

文章目录

引言

本文介绍如何使用huggingface加载视觉模型openai/clip-vit-large-patch14-336，我之所以记录此方法源于现有大模型基本采用huggingface库来加载视觉模型和大语言模型，我也是在做LLaVA模型等模型。基于此，本节将介绍如何huggingface如何加载vit视觉模型。

一、模型加载

使用huggingface模型加载是非常简单，其代码如下：

from transformers import CLIPVisionModel, CLIPImageProcessor if __name__ == '__main__': vit_path='D:/clip-vit-large-patch14-336' img_path='dogs.jpg' image_processor = CLIPImageProcessor.from_pretrained(vit_path) # 加载图像预处理 vision_tower = CLIPVisionModel.from_pretrained(vit_path) # 加载图像模型 vision_tower.requires_grad_(False) # 模型冻结 for name, param in vision_tower.named_parameters(): print(name, param.requires_grad)

二、huggingface梯度更新使用

一般视觉模型需要冻结，使用lora训练，那么我们需要如何关闭视觉模型梯度。为此，我继续探讨梯度设置方法，其代码如下：

 vision_tower.requires_grad_(False) # 模型冻结 for name, param in vision_tower.named_parameters(): print(name, param.requires_grad)

以上代码第一句话是视觉模型梯度冻结方法，下面2句是验证梯度是否冻结。如果设置’‘vision_tower.requires_grad_(False)’'表示冻结梯度，如果不设置表示需要梯度传播。我将不在介绍了，若你想详细了解，只要执行以上
代码便可知晓。

三、图像处理

在输入模型前，我们需要对图像进行预处理，然huggingface也很人性的自带了对应视觉模型的图像处理，我们只需使用PIL实现图像处理，其代码如下：

 image = Image.open(img_path).convert('RGB') # PIL读取图像 def expand2square(pil_img, background_color): width, height = pil_img.size # 获得图像宽高 if width == height: # 相等直接返回不用重搞 return pil_img elif width > height: # w大构建w尺寸图 result = Image.new(pil_img.mode, (width, width), background_color) result.paste(pil_img, (0, (width - height) // 2)) # w最大，以坐标x=0,y=(width - height) // 2位置粘贴原图 return result else: result = Image.new(pil_img.mode, (height, height), background_color) result.paste(pil_img, ((height - width) // 2, 0)) return result image = expand2square(image, tuple(int(x * 255) for x in image_processor.image_mean)) image = image_processor.preprocess(image, return_tensors='pt')['pixel_values'][0]

四、模型推理

最后就是模型推理，其代码如下：

 image_forward_out = vision_tower(image.unsqueeze(0), output_hidden_states=True) feature = image_forward_out['last_hidden_state'] print(feature.shape)

但是，我想说输出包含很多内容，其中hidden_states有25个列表，表示每个block输出结果，而hidden_states的最后一层与last_hidden_states值相同，结果分别如下图：

www.zeeklog.com - 第九节基于huggingface加载openai/clip-vit-large-patch14-336视觉模型demo

hidden_states与last_hidden_states对比如下：

五、整体代码

from transformers import CLIPVisionModel, CLIPImageProcessor from PIL import Image if __name__ == '__main__': vit_path='E:/clip-vit-large-patch14-336' img_path='dogs.jpg' image_processor = CLIPImageProcessor.from_pretrained(vit_path) # 加载图像预处理 vision_tower = CLIPVisionModel.from_pretrained(vit_path) # 加载图像模型 vision_tower.requires_grad_(False) # 模型冻结 for name, param in vision_tower.named_parameters(): print(name, param.requires_grad) image = Image.open(img_path).convert('RGB') # PIL读取图像 def expand2square(pil_img, background_color): width, height = pil_img.size # 获得图像宽高 if width == height: # 相等直接返回不用重搞 return pil_img elif width > height: # w大构建w尺寸图 result = Image.new(pil_img.mode, (width, width), background_color) result.paste(pil_img, (0, (width - height) // 2)) # w最大，以坐标x=0,y=(width - height) // 2位置粘贴原图 return result else: result = Image.new(pil_img.mode, (height, height), background_color) result.paste(pil_img, ((height - width) // 2, 0)) return result image = expand2square(image, tuple(int(x * 255) for x in image_processor.image_mean)) image = image_processor.preprocess(image, return_tensors='pt')['pixel_values'][0] image_forward_out = vision_tower(image.unsqueeze(0), output_hidden_states=True) feature = image_forward_out['last_hidden_state'] print(feature.shape)

总结

本文是一个huggingface加载视觉模型的方法，另一个重点是梯度冻结。然而，我只代表VIT模型是如此使用，其它模型还未验证，不做任何说明。

探索Vortex开源GPGPU：RISC-V SIMT架构(4-2)，TCU 矩阵计算(2)

目录前言一、TCU模块框图二、WMMA代码分析 2.1 WMMA矩阵分块 2.2 WMMA矩阵地址偏移计算 2.2.1 WMMA matrixA 2.2.2 WMMA matrixB 2.2.3 WMMA matrixC 2.2.4 tcu_int WMMA源代码总结前言本篇分析Vortex矩阵计算的核心模块TCU WMMA。前文：探索Vortex开源GPGPU：RISC-V SIMT架构(4-2)，TCU 矩阵计算(1)https://blog.ZEEKLOG.net/weixin_

Apache IoTDB 架构特性与 Prometheus+Grafana 监控体系部署实践

Apache IoTDB 架构特性与 Prometheus+Grafana 监控体系部署实践文章目录 * Apache IoTDB 架构特性与 Prometheus+Grafana 监控体系部署实践 * Apache IoTDB 核心特性与价值 * Apache IoTDB 监控面板完整部署方案 * 安装步骤 * 步骤一：IoTDB开启监控指标采集 * 步骤二：安装、配置Prometheus * 步骤三：安装grafana并配置数据源 * 步骤四：导入IoTDB Grafana看板 * TimechoDB（基于 Apache IoTDB）增强特性 * 总结与应用场景建议 Apache IoTDB 核心特性与价值 Apache IoTDB 专为物联网场景打造的高性能轻量级时序数据库，以 “设备 - 测点” 原生数据模型贴合物理设备与传感器关系，通过高压缩算法、百万级并发写入能力和毫秒级查询响应优化海量时序数据存储成本与处理效率，同时支持边缘轻量部署、

SQL Server 2019安装教程(超详细图文)

SQL Server 介绍） SQL Server 是由微软（Microsoft）开发的一款关系型数据库管理系统（RDBMS），支持结构化查询语言（SQL）进行数据存储、管理和分析。自1989年首次发布以来，SQL Server 已成为企业级数据管理的核心解决方案，广泛应用于金融、电商、ERP、CRM 等业务系统。它提供高可用性、安全性、事务处理（ACID）和商业智能（BI）支持，并支持 Windows 和 Linux 跨平台部署。一、获取 SQL Server 2019 安装包 1. 官方下载方式前往微软官网注册账号后，即可下载 SQL Server Developer 版本（