基于 LoRA+Stable Diffusion 的 100 种动物图像生成 | 极客日志

PythonAI算法

基于 LoRA+Stable Diffusion 的 100 种动物图像生成

综述由AI生成一个基于 Stable Diffusion 和 LoRA 技术的动物图像生成系统。项目包含完整的训练流程和 PyQt5 图形界面，支持文本生成高质量动物图像。内容涵盖模型架构解析（VAE、CLIP、U-Net）、LoRA 微调原理、训练代码实现（参数配置、数据处理、早停机制）及 UI 交互设计。通过 CLIP 分数评估生成质量，实现了跨平台运行和参数自定义调整。

Elasticer发布于 2026/4/5更新于 2026/5/2026 浏览

基于 LoRA+Stable Diffusion 的 100 种动物图像生成

代码详见：https://github.com/xiaozhou-alt/Animals_Generation

一、项目介绍

这是一个基于 Stable Diffusion 和 LoRA 技术的动物图像生成系统，能够通过文本描述生成高质量的动物图像，包含完整的训练流程和用户友好的图形界面，支持自定义参数调整和实时图像生成。

主要特性

高效训练: 使用 LoRA (Low-Rank Adaptation) 技术对 Stable Diffusion 模型进行轻量级微调
用户友好界面: 基于 PyQt5 的图形界面，支持实时参数调整和图像预览
高质量生成: 经过优化的生成流程，支持色彩校正和后处理
跨平台支持: 支持 CPU 和 GPU 运行环境

生成的部分动物图像：

请添加图片描述

二、文件夹结构

Animals_Creation/
├── README.md
├── demo.gif # 演示动画
├── demo.mp4 # 演示视频
├── demo.py # 主演示脚本
├── icons/ # 图标资源目录
├── train.py
├── log/ # 日志目录
├── model/
│   └── LCM-runwayml-stable-diffusion-v1-5/ # Stable Diffusion 模型
│       ├── feature_extractor/ # 特征提取器
│       ├── model_index.json # 模型索引文件
│       ├── safety_checker/ # 安全检查器
│       ├── scheduler/ # 调度器
│       ├── text_encoder/ # 文本编码器
│       ├── tokenizer/ # 分词器
│       ├── unet/ # UNet 模型
│       └── vae/ # 变分自编码器
├── output/
│   ├── evaluation_results.xlsx # 评估结果 Excel 文件
│   ├── lora_models/ # LoRA 模型权重
│   │   └── clip-31.475.safetensors
│   ├── training_history.xlsx # 训练历史记录
│   └── pic/
└── requirements.txt

三、数据集介绍

本项目使用的动物数据集包含 100 个不同类别的动物图片，因为使用网页图片提取下载，清洗由个人完全进行，数据集数据量较大，所以部分动物文件夹存在 1%-1.5% 的噪声图片，数据集组织结构如下：

总类别数：100 种动物
数据划分：采用 80% 作为训练集，20% 作为验证集

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

Haojing ZHOU.100 种动物识别数据集 [DS/OL]. V1. Science Data Bank,2025[2025-08-30]. https://cstr.cn/31253.11.sciencedb.29221. CSTR:31253.11.sciencedb.29221.

@misc{动物识别，author ={Haojing ZHOU}, title ={100 种动物识别数据集}, year ={2025}, doi ={10.57760/sciencedb.29221}, url ={https://doi.org/10.57760/sciencedb.29221}, note ={CSTR:31253.11.sciencedb.29221}, publisher ={ScienceDB}}

vae = AutoencoderKL.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="vae"
)

latents = vae.encode(pixel_values).latent_dist.sample()
latents = latents * 0.18215 # 缩放因子

text_encoder = CLIPTextModel.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="text_encoder"
)
tokenizer = CLIPTokenizer.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="tokenizer"
)

self.prompt_templates = [
    "a photo of a {}",
    "a high quality image of a {}",
    # 更多模板...
]

unet = UNet2DConditionModel.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="unet"
)
# 噪声预测 noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample

noise_scheduler = DDPMScheduler.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="scheduler"
)

noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)

def prepare_unet_for_lora(unet, rank=2, alpha=16):
    lora_config = LoraConfig(
        r=rank,
        lora_alpha=alpha,
        target_modules=["to_q","to_k","to_v","to_out.0"],
        lora_dropout=0.0,
        bias="none",
    )
    unet = get_peft_model(unet, lora_config)
    return unet

# LoRA 参数
rank = 2
lora_alpha = 16

# 参数配置 - 关键优化点
class Config:
    # 数据参数 - 减少数据量
    data_root = "/kaggle/input/animals/Animal/Animal" # 动物数据集根路径
    output_dir = "/kaggle/working/output" # 所有输出文件的目录
    lora_model_dir = os.path.join(output_dir, "lora_models") # 保存 LoRA 模型的目录
    history_file = os.path.join(output_dir, "training_history.xlsx") # 训练历史记录文件
    sample_output_dir = os.path.join(output_dir, "validation_samples") # 验证样本输出目录
    evaluation_file = os.path.join(output_dir, "evaluation_results.xlsx") # 评估结果文件
    comparison_dir = os.path.join(output_dir, "comparison_samples") # 对比样本目录

    # 模型参数 - 降低分辨率
    pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5" # 使用 SD 1.5 作为基础模型
    resolution = 256 # 降低分辨率以减少计算量 (原为 512)
    center_crop = True # 中心裁剪
    random_flip = True # 随机水平翻转 (数据增强)

    # LoRA 参数 - 简化 LoRA
    rank = 2 # 降低 LoRA 的秩 (原为 4)
    lora_alpha = 16 # 降低 LoRA 的 alpha 值 (原为 32)

    # 训练参数 - 关键优化
    train_batch_size = 1 # 批大小
    gradient_accumulation_steps = 4 # 梯度累积步数
    num_train_epochs = 10 # 训练轮数
    learning_rate = 1e-5 # 学习率
    lr_scheduler_type = "cosine_with_warmup" # 学习率调度器类型
    lr_warmup_steps = 200 # 预热步数
    max_grad_norm = 0.5 # 梯度裁剪阈值
    use_ema = True # 启用 EMA 以提高稳定性
    gradient_checkpointing = True # 梯度检查点 (节省显存)
    mixed_precision = "fp16" # 混合精度训练

    # 早停参数 - 使用 CLIP 分数作为指标
    early_stopping_patience = 5 # 早停耐心值
    early_stopping_delta = 0.02 # CLIP 分数的最小改善值
    validation_split = 0.1 # 验证集比例

    # 验证参数
    num_validation_samples = 5 # 验证生成的动物种类数量
    num_inference_steps = 20 # 验证时推理步数
    num_final_inference_steps = 100 # 最终评估推理步数
    guidance_scale = 7.5 # 指导尺度 (CFG)

    # 每类最大样本数
    max_samples_per_class = 100 # 限制每类动物使用的最大样本数

    # 评估参数
    num_evaluation_samples = 10 # 评估样本数量
    clip_model_name = "openai/clip-vit-base-patch32" # CLIP 模型名称

# 1. 数据处理与准备 - 添加样本限制
class AnimalDataset(Dataset):
    def __init__(self, data_root, tokenizer, size=384, center_crop=True, random_flip=True, max_samples_per_class=100):
        self.data_root = data_root
        self.tokenizer = tokenizer
        self.size = size # 使用新的分辨率
        self.center_crop = center_crop
        self.random_flip = random_flip
        self.max_samples_per_class = max_samples_per_class
        # 获取所有图像路径和对应的类别（动物名称）
        self.image_paths = []
        self.class_names = []
        # 假设子文件夹以动物英文名称命名
        subfolders = [f.name for f in os.scandir(data_root) if f.is_dir()]
        for class_name in subfolders:
            class_dir = os.path.join(data_root, class_name)
            image_files = glob.glob(os.path.join(class_dir, "*.jpg")) + \
                          glob.glob(os.path.join(class_dir, "*.png")) + \
                          glob.glob(os.path.join(class_dir, "*.jpeg"))
            # 限制每类样本数量
            if len(image_files) > max_samples_per_class:
                image_files = random.sample(image_files, max_samples_per_class)
            for img_path in image_files:
                self.image_paths.append(img_path)
                self.class_names.append(class_name)
        # 为每个类别创建提示词模板
        self.prompt_templates = [
            "a photo of a {}",
            "a high quality image of a {}",
            "a clear picture of a {}",
            "a realistic image of a {}",
            "a cute {}",
            "a wild {} in its natural habitat",
            "a close-up of a {}"
        ]

def __len__(self):
    return len(self.image_paths)

def __getitem__(self, idx):
    image_path = self.image_paths[idx]
    class_name = self.class_names[idx]
    # 加载和预处理图像
    image = Image.open(image_path).convert("RGB")
    # 调整大小和中心裁剪
    if self.center_crop:
        # 保持宽高比的调整大小和中心裁剪
        image = self._center_crop(image)
    else:
        image = image.resize((self.size, self.size), Image.Resampling.LANCZOS)
    # 随机水平翻转 (数据增强)
    if self.random_flip and random.random() < 0.5:
        image = image.transpose(Image.FLIP_LEFT_RIGHT)
    # 将图像转换为模型输入的张量 (-1 to 1)
    image_tensor = (torch.tensor(np.array(image).astype(np.float32)/127.5)-1.0).permute(2,0,1)
    # 为图像生成随机的提示词
    prompt_template = random.choice(self.prompt_templates)
    prompt = prompt_template.format(class_name)
    # 对提示词进行标记化
    tokenized_input = self.tokenizer(
        prompt,
        max_length=self.tokenizer.model_max_length,
        padding="max_length",
        truncation=True,
        return_tensors="pt"
    )
    input_ids = tokenized_input.input_ids.squeeze(0)
    return {
        "pixel_values": image_tensor,
        "input_ids": input_ids,
        "prompt": prompt,
        "class_name": class_name
    }

def _center_crop(self, image):
    width, height = image.size
    new_size = min(width, height)
    left = (width - new_size)/2
    top = (height - new_size)/2
    right = (width + new_size)/2
    bottom = (height + new_size)/2
    image = image.crop((left, top, right, bottom))
    image = image.resize((self.size, self.size), Image.Resampling.LANCZOS)
    return image

# 早停机制 (PyTorch 实现) - 使用 CLIP 分数作为指标
class EarlyStopping:
    def __init__(self, patience=3, delta=0.05, verbose=False):
        self.patience = patience
        self.delta = delta
        self.verbose = verbose
        self.counter = 0
        self.best_score = None
        self.early_stop = False

    def __call__(self, clip_score):
        if self.best_score is None:
            self.best_score = clip_score
        elif clip_score < self.best_score + self.delta:
            self.counter += 1
            if self.verbose:
                print(f"EarlyStopping counter: {self.counter} out of {self.patience}")
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = clip_score
            self.counter = 0

# 为 UNet 准备 LoRA 的函数 - 使用 peft 库
def prepare_unet_for_lora(unet, rank=2, alpha=16):
    # 配置 LoRA
    lora_config = LoraConfig(
        r=rank,
        lora_alpha=alpha,
        target_modules=["to_q","to_k","to_v","to_out.0"],
        lora_dropout=0.0,
        bias="none",
    )
    # 应用 LoRA 到 UNet
    unet = get_peft_model(unet, lora_config)
    unet.print_trainable_parameters()
    return unet

# 计算验证 CLIP 分数的函数
def compute_validation_clip_score(config, unet, text_encoder, vae, tokenizer, device):
    # 获取所有动物类别
    animal_classes = [f.name for f in os.scandir(config.data_root) if f.is_dir()]
    # 随机选择验证用的动物
    selected_animals = random.sample(animal_classes, min(config.num_validation_samples, len(animal_classes)))
    print(f"Selected animals for validation CLIP score: {selected_animals}")
    # 创建生成管道
    pipe = StableDiffusionPipeline.from_pretrained(
        config.pretrained_model_name_or_path,
        text_encoder=text_encoder,
        vae=vae,
        unet=unet,
        tokenizer=tokenizer,
        safety_checker=None,
        torch_dtype=torch.float16 if config.mixed_precision == "fp16" else torch.float32
    ).to(device)
    # 加载 CLIP 模型和处理器
    clip_model = CLIPModel.from_pretrained(config.clip_model_name).to(device)
    clip_processor = CLIPProcessor.from_pretrained(config.clip_model_name)
    clip_scores = []
    for animal in selected_animals:
        prompt = f"a high quality photo of a {animal}"
        # 生成图像
        with torch.autocast(device.type):
            image = pipe(
                prompt,
                num_inference_steps=config.num_inference_steps,
                guidance_scale=config.guidance_scale,
                height=config.resolution,
                width=config.resolution
            ).images[0]
        # 计算 CLIP Score
        with torch.no_grad():
            # 处理图像和文本
            inputs = clip_processor(
                text=[prompt],
                images=image,
                return_tensors="pt",
                padding=True
            ).to(device)
            # 获取特征
            outputs = clip_model(**inputs)
            # 计算相似度 (CLIP Score)
            logits_per_image = outputs.logits_per_image # 图像 - 文本相似度
            clip_score = logits_per_image.item()
            print(f"Animal: {animal}, CLIP Score: {clip_score:.4f}")
            clip_scores.append(clip_score)
    avg_clip_score = np.mean(clip_scores)
    print(f"Average Validation CLIP Score: {avg_clip_score:.4f}")
    return avg_clip_score

# 2. 训练函数 (包含早停和历史记录)
def train_lora_with_earlystopping(config):
    # 初始化模型组件
    tokenizer = CLIPTokenizer.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="tokenizer"
    )
    text_encoder = CLIPTextModel.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="text_encoder"
    )
    vae = AutoencoderKL.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="vae"
    )
    unet = UNet2DConditionModel.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="unet"
    )
    # 添加 LoRA 适配器到 UNet
    unet = prepare_unet_for_lora(unet, config.rank, config.lora_alpha)
    # 设置噪声调度器
    noise_scheduler = DDPMScheduler.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="scheduler"
    )
    # 启用梯度检查点以节省显存
    if config.gradient_checkpointing:
        unet.enable_gradient_checkpointing()
    # 将模型移动到 GPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    text_encoder.to(device)
    vae.to(device)
    unet.to(device)

# 设置优化器 (只优化 LoRA 参数)
lora_params = []
for name, param in unet.named_parameters():
    if param.requires_grad:
        # 只优化需要梯度的参数
        lora_params.append(param)
# 使用更稳定的优化器配置
optimizer = torch.optim.AdamW(
    lora_params,
    lr=config.learning_rate,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0.01
)
# 准备数据集和数据加载器
full_dataset = AnimalDataset(
    config.data_root,
    tokenizer,
    size=config.resolution,
    center_crop=config.center_crop,
    random_flip=config.random_flip,
    max_samples_per_class=config.max_samples_per_class
)
# 分割训练集和验证集
val_size = int(len(full_dataset) * config.validation_split)
train_size = len(full_dataset) - val_size
train_dataset, val_dataset = random_split(full_dataset, [train_size, val_size])
train_dataloader = DataLoader(
    train_dataset,
    batch_size=config.train_batch_size,
    shuffle=True,
    num_workers=2
)
val_dataloader = DataLoader(
    val_dataset,
    batch_size=config.train_batch_size,
    shuffle=False,
    num_workers=2
)

# 计算总训练步数
num_update_steps_per_epoch = len(train_dataloader) // config.gradient_accumulation_steps
max_train_steps = config.num_train_epochs * num_update_steps_per_epoch
# 学习率调度器
lr_scheduler = get_cosine_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=config.lr_warmup_steps,
    num_training_steps=max_train_steps
)
# 初始化早停 (使用 CLIP 分数作为指标)
early_stopping = EarlyStopping(
    patience=config.early_stopping_patience,
    delta=config.early_stopping_delta,
    verbose=True
)
# 创建 Excel 工作簿用于记录历史
history_wb = Workbook()
history_ws = history_wb.active
history_ws.title = "Training History"
history_ws.append(["Epoch", "Step", "Train Loss", "Validation Loss", "CLIP Score", "Learning Rate", "Best CLIP Score", "Gradient Norm"])

# 训练循环
global_step = 0
best_clip_score = 0.0
# 训练循环部分
for epoch in range(config.num_train_epochs):
    unet.train()
    total_loss = 0
    optimizer.zero_grad()
    current_grad_norm = 0.0
    # 初始化梯度范数
    for step, batch in enumerate(train_dataloader):
        # 将批次数据移动到设备
        pixel_values = batch["pixel_values"].to(device)
        input_ids = batch["input_ids"].to(device)
        # 将图像编码到潜在空间
        with torch.no_grad():
            latents = vae.encode(pixel_values).latent_dist.sample()
            latents = latents * 0.18215 # 缩放因子
        # 采样噪声
        noise = torch.randn_like(latents)
        bsz = latents.shape[0]
        timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (bsz,), device=device).long()
        # 向潜在表示添加噪声
        noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
        # 获取文本嵌入
        with torch.no_grad():
            encoder_hidden_states = text_encoder(input_ids)[0]
        # 预测噪声残差
        noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
        # 计算损失
        loss = F.mse_loss(noise_pred, noise, reduction="mean") / config.gradient_accumulation_steps
        # 反向传播
        loss.backward()
        # 梯度累积
        if (step + 1) % config.gradient_accumulation_steps == 0:
            # 梯度裁剪
            torch.nn.utils.clip_grad_norm_(lora_params, config.max_grad_norm)
            # 计算梯度范数用于监控
            current_grad_norm = 0
            for p in lora_params:
                if p.grad is not None:
                    param_norm = p.grad.data.norm(2)
                    current_grad_norm += param_norm.item() ** 2
            current_grad_norm = current_grad_norm ** 0.5
            # 更新参数
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()
            global_step += 1
            total_loss += loss.item() * config.gradient_accumulation_steps
        # 打印训练信息
        if global_step % 50 == 0:
            # 减少打印频率
            avg_loss = total_loss / (step + 1)
            current_lr = lr_scheduler.get_last_lr()[0]
            print(f"Epoch {epoch}, Step {global_step}, Loss: {avg_loss:.4f}, LR: {current_lr:.6f}, Grad Norm: {current_grad_norm:.6f}")

# 每个 epoch 结束后计算验证损失和 CLIP 分数
val_loss = compute_validation_loss(unet, vae, text_encoder, val_dataloader, noise_scheduler, device)
avg_train_loss = total_loss / len(train_dataloader)
print(f"Epoch {epoch} completed. Train Loss: {avg_train_loss:.4f}, Validation Loss: {val_loss:.4f}")
# 计算 CLIP 分数
clip_score = compute_validation_clip_score(config, unet, text_encoder, vae, tokenizer, device)
# 记录到历史
current_lr = lr_scheduler.get_last_lr()[0]
history_ws.append([epoch, global_step, avg_train_loss, val_loss, clip_score, current_lr, best_clip_score, current_grad_norm])
# 早停检查 (基于 CLIP 分数)
early_stopping(clip_score)
# 保存最佳模型
if clip_score > best_clip_score:
    best_clip_score = clip_score
    # 保存 LoRA 权重
    save_path = os.path.join(config.lora_model_dir, f"lora_weights_epoch_{epoch}.safetensors")
    save_lora_weights(unet, save_path)
    print(f"Saved best model with CLIP score: {best_clip_score:.4f}")
# 保存训练历史
history_wb.save(config.history_file)
# 检查早停
if early_stopping.early_stop:
    print("Early stopping triggered")
    break
print("Training completed!")
return unet, text_encoder, vae, tokenizer

# 3. 验证和生成样本
def generate_validation_samples(config, unet, text_encoder, vae, tokenizer):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    # 获取所有动物类别
    animal_classes = [f.name for f in os.scandir(config.data_root) if f.is_dir()]
    # 随机选择 5 种动物
    selected_animals = random.sample(animal_classes, config.num_validation_samples)
    print(f"Selected animals for validation: {selected_animals}")
    # 创建生成管道
    pipe = StableDiffusionPipeline.from_pretrained(
        config.pretrained_model_name_or_path,
        text_encoder=text_encoder,
        vae=vae,
        unet=unet,
        tokenizer=tokenizer,
        safety_checker=None, # 禁用安全检查器以加快生成速度
        torch_dtype=torch.float16 if config.mixed_precision == "fp16" else torch.float32
    ).to(device)
    # 生成每种动物的图像
    all_images = []
    all_titles = []
    for animal in selected_animals:
        prompt = f"a high quality photo of a {animal}"
        # 生成图像 (使用更多推理步数)
        with torch.autocast(device.type):
            image = pipe(
                prompt,
                num_inference_steps=config.num_final_inference_steps,
                guidance_scale=config.guidance_scale,
                height=config.resolution,
                width=config.resolution
            ).images[0]
        # 保存图像
        save_path = os.path.join(config.sample_output_dir, f"{animal}.png")
        image.save(save_path)
        print(f"Generated image for {animal} saved at {save_path}")
        # 添加到列表用于创建对比图像
        all_images.append(image)
        all_titles.append(animal)
    # 创建对比图像
    comparison_path = os.path.join(config.comparison_dir, "animal_comparison.png")
    create_comparison_image(all_images, all_titles, comparison_path)
    return selected_animals

# 4. 评估函数 - 使用 CLIP Score 评估生成质量
def evaluate_with_clip_score(config, unet, text_encoder, vae, tokenizer):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    # 加载 CLIP 模型和处理器
    clip_model = CLIPModel.from_pretrained(config.clip_model_name).to(device)
    clip_processor = CLIPProcessor.from_pretrained(config.clip_model_name)
    # 获取所有动物类别
    animal_classes = [f.name for f in os.scandir(config.data_root) if f.is_dir()]
    # 随机选择评估用的动物
    selected_animals = random.sample(animal_classes, config.num_evaluation_samples)
    print(f"Selected animals for evaluation: {selected_animals}")
    # 创建生成管道
    pipe = StableDiffusionPipeline.from_pretrained(
        config.pretrained_model_name_or_path,
        text_encoder=text_encoder,
        vae=vae,
        unet=unet,
        tokenizer=tokenizer,
        safety_checker=None,
        torch_dtype=torch.float16 if config.mixed_precision == "fp16" else torch.float32
    ).to(device)
    # 存储评估结果
evaluation_results = []
for animal in selected_animals:
    prompt = f"a high quality photo of a {animal}"
    # 生成图像 (使用更多推理步数)
    with torch.autocast(device.type):
        image = pipe(
            prompt,
            num_inference_steps=config.num_final_inference_steps,
            guidance_scale=config.guidance_scale,
            height=config.resolution,
            width=config.resolution
        ).images[0]
    # 保存图像
    save_path = os.path.join(config.sample_output_dir, f"eval_{animal}.png")
    image.save(save_path)
    # 计算 CLIP Score
    with torch.no_grad():
        # 处理图像和文本
        inputs = clip_processor(
            text=[prompt],
            images=image,
            return_tensors="pt",
            padding=True
        ).to(device)
        # 获取特征
        outputs = clip_model(**inputs)
        # 计算相似度 (CLIP Score)
        logits_per_image = outputs.logits_per_image # 图像 - 文本相似度
        clip_score = logits_per_image.item()
        print(f"Animal: {animal}, CLIP Score: {clip_score:.4f}")
        evaluation_results.append({
            "animal": animal,
            "prompt": prompt,
            "clip_score": clip_score,
            "image_path": save_path
        })
# 计算平均 CLIP Score
avg_clip_score = np.mean([result["clip_score"] for result in evaluation_results])
print(f"Average CLIP Score: {avg_clip_score:.4f}")
# 保存评估结果到 Excel
evaluation_wb = Workbook()
evaluation_ws = evaluation_wb.active
evaluation_ws.title = "Evaluation Results"
evaluation_ws.append(["Animal", "Prompt", "CLIP Score", "Image Path"])
for result in evaluation_results:
    evaluation_ws.append([result["animal"], result["prompt"], result["clip_score"], result["image_path"]])
evaluation_ws.append([])
evaluation_ws.append(["Average CLIP Score", avg_clip_score])
evaluation_wb.save(config.evaluation_file)
print(f"Evaluation results saved to {config.evaluation_file}")
return evaluation_results, avg_clip_score

# 配置类 - 增加色彩相关参数
class Config:
    pretrained_model_name_or_path = "model/LCM-runwayml-stable-diffusion-v1-5" # 本地基础模型路径
    resolution = 512 # 生成图像分辨率（默认 512x512）
    rank = 2 # LoRA 微调秩（控制微调强度）
    lora_alpha = 16 # LoRA 缩放因子
    device = "cpu" # 运行设备（cpu/cuda，cuda 需安装 GPU 版本 PyTorch）
    num_final_inference_steps = 100 # 默认推理步数（步数越多生成越精细，但耗时更长）
    guidance_scale = 5.0 # 引导尺度（控制提示词对生成的影响，值越低色彩越自然）
    contrast_factor = 1.0 # 对比度调整因子（1.0 为默认，<1 降低对比度，>1 增强）
    saturation_factor = 1.0 # 饱和度调整因子（同上，影响色彩鲜艳度）
    brightness_factor = 1.0 # 亮度调整因子（同上，影响图像明暗）

# 加载 LoRA 权重的函数
def load_lora_weights(unet, load_path):
    # 从本地文件加载 LoRA 权重，指定设备（与模型一致）
    lora_state_dict = torch.load(load_path, map_location=torch.device(Config.device))
    # 非严格模式加载（LoRA 权重仅覆盖 unet 部分层，无需匹配所有参数）
    unet.load_state_dict(lora_state_dict, strict=False)
    return unet

# 修复 tokenizer 加载问题的函数
def load_tokenizer_with_fix(model_path):
    try:
        # 尝试正常加载（默认路径：模型目录下的 tokenizer 文件夹）
        tokenizer = CLIPTokenizer.from_pretrained(
            os.path.join(model_path, "tokenizer")
        )
        return tokenizer
    except Exception as e:
        print(f"加载 tokenizer 时出错：{e}")
        print("尝试修复 tokenizer 配置...")
        # 降级方案：手动指定 vocab.json 和 merges.txt 文件（Tokenizer 核心文件）
        from transformers import CLIPTokenizerFast
        vocab_file = os.path.join(model_path, "tokenizer", "vocab.json")
        merges_file = os.path.join(model_path, "tokenizer", "merges.txt")
        if os.path.exists(vocab_file) and os.path.exists(merges_file):
            tokenizer = CLIPTokenizerFast(
                vocab_file=vocab_file,
                merges_file=merges_file,
                max_length=77, # CLIP 模型固定输入长度（超过截断，不足补全）
                pad_token="!", # 填充 token（统一输入长度）
                additional_special_tokens=["<startoftext|>","<endoftext|>"] # 特殊分隔符
            )
            return tokenizer
        else:
            raise Exception(f"找不到 tokenizer 文件：{vocab_file} 或 {merges_file}")

# 图像色彩校正函数
def adjust_image_colors(image):
    """调整图像的色彩、对比度和饱和度，使其更自然"""
    # 1. 调整对比度（增强细节层次）
    enhancer = ImageEnhance.Contrast(image)
    image = enhancer.enhance(Config.contrast_factor)
    # 2. 调整饱和度（提升色彩鲜艳度，避免偏灰）
    enhancer = ImageEnhance.Color(image)
    image = enhancer.enhance(Config.saturation_factor)
    # 3. 调整亮度（平衡整体明暗，避免过暗/过曝）
    enhancer = ImageEnhance.Brightness(image)
    image = enhancer.enhance(Config.brightness_factor)
    return image

# 模型加载类
class ModelLoader:
    def __init__(self, config, lora_model_path):
        self.config = config # 全局配置
        self.lora_model_path = lora_model_path # LoRA 模型路径
        # 初始化各组件（后续加载）
        self.tokenizer = None # 文本分词器
        self.text_encoder = None # 文本编码器（将分词结果转为向量）
        self.vae = None # 变分自编码器（负责图像解码：latent→像素）
        self.unet = None # 核心生成网络（扩散过程核心，更新 latent）
        self.pipe = None # 最终生成流水线

    def load_models(self):
        # 1. 加载 Tokenizer（调用修复函数，避免路径异常）
        self.tokenizer = load_tokenizer_with_fix(self.config.pretrained_model_name_or_path)
        # 2. 加载 Text Encoder（CLIP 模型，将文本转为语义向量）
        text_encoder_path = os.path.join(self.config.pretrained_model_name_or_path, "text_encoder")
        self.text_encoder = CLIPTextModel.from_pretrained(text_encoder_path)
        # 3. 加载 VAE（将扩散过程的 latent 向量解码为图像像素）
        vae_path = os.path.join(self.config.pretrained_model_name_or_path, "vae")
        self.vae = AutoencoderKL.from_pretrained(vae_path)
        # 4. 加载 UNet（扩散核心，通过迭代去噪生成 latent）
        unet_path = os.path.join(self.config.pretrained_model_name_or_path, "unet")
        self.unet = UNet2DConditionModel.from_pretrained(unet_path)
        # 5. 加载 LoRA 权重到 UNet（让模型适配动物图像生成）
        self.unet = load_lora_weights(self.unet, self.lora_model_path)
        # 6. 将所有组件移动到指定设备（cpu/cuda）
        self.text_encoder.to(self.config.device)
        self.vae.to(self.config.device)
        self.unet.to(self.config.device)
        # 7. 加载 Scheduler（扩散调度器，控制去噪步骤节奏）
        scheduler_path = os.path.join(self.config.pretrained_model_name_or_path, "scheduler")
        scheduler = DDPMScheduler.from_pretrained(scheduler_path)
        # 8. 组装生成流水线（整合所有组件，提供统一生成接口）
        self.pipe = StableDiffusionPipeline(
            vae=self.vae,
            text_encoder=self.text_encoder,
            tokenizer=self.tokenizer,
            unet=self.unet,
            scheduler=scheduler,
            safety_checker=None, # 关闭安全检查（避免误判动物图像）
            feature_extractor=None,
            requires_safety_checker=False
        )
        return self.pipe

# 生成线程类 - 增加色彩校正步骤
class GenerateThread(QThread):
    # 定义信号：生成完成（返回 PIL 图像）、错误（返回错误信息）、进度更新（进度百分比 + 剩余时间）
    finished = pyqtSignal(Image.Image)
    error = pyqtSignal(str)
    progress_updated = pyqtSignal(int, float)

    def __init__(self, pipe, animal_name, num_inference_steps, guidance_scale, contrast_factor, saturation_factor, brightness_factor):
        super().__init__()
        self.pipe = pipe # 生成流水线
        self.animal_name = animal_name # 目标动物名称（用户输入）
        self.num_inference_steps = num_inference_steps # 推理步数
        self.guidance_scale = guidance_scale # 引导尺度
        # 色彩调整参数（从 UI 获取，覆盖全局配置）
        self.contrast_factor = contrast_factor
        self.saturation_factor = saturation_factor
        self.brightness_factor = brightness_factor
        self.start_time = 0 # 生成开始时间（计算总耗时）
        self.step_times = [] # 每步耗时（估算剩余时间）

    def run(self):
        try:
            # 1. 优化提示词（增加环境/光照描述，提升生成质量）
            prompt = (
                f"a high quality photo of a {self.animal_name}, natural lighting, "
                f"realistic colors, in natural habitat, detailed texture"
            )
            # 2. 文本编码（生成'条件嵌入'和'无条件嵌入'，用于引导生成）
            with torch.no_grad():
                # 禁用梯度计算，减少内存占用
                # 条件嵌入：基于提示词的向量（引导模型生成符合提示的内容）
                text_inputs = self.pipe.tokenizer(
                    prompt,
                    padding="max_length", # 补全到 77 长度
                    max_length=self.pipe.tokenizer.model_max_length,
                    truncation=True, # 超过 77 长度截断
                    return_tensors="pt", # 返回 PyTorch 张量
                )
                text_input_ids = text_inputs.input_ids
                text_embeddings = self.pipe.text_encoder(text_input_ids.to(self.pipe.device))[0]
                # 无条件嵌入：基于空提示词的向量（作为对比，让模型更'关注'条件提示词）
                max_length = text_input_ids.shape[-1]
                uncond_input = self.pipe.tokenizer([""], padding="max_length", max_length=max_length, return_tensors="pt",)
                uncond_embeddings = self.pipe.text_encoder(uncond_input.input_ids.to(self.pipe.device))[0]
                # 合并两种嵌入（Stable Diffusion 要求输入格式：[无条件，条件]）
                text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
            # 3. 初始化 Latent（隐空间向量，扩散模型的'初始噪声'）
            latents = torch.randn(
                (1, self.pipe.unet.config.in_channels, Config.resolution // 8, Config.resolution // 8),
                # 尺寸=分辨率/8（VAE 下采样比例）
                generator=torch.Generator(device=Config.device), # 随机生成器（保证可复现）
                device=Config.device,
            )
            # 4. 配置调度器（设置推理步数）
            self.pipe.scheduler.set_timesteps(self.num_inference_steps, device=Config.device)
            # 5. 迭代扩散去噪（核心步骤：逐步将噪声转为符合提示词的 latent）
            self.start_time = time.time()
            self.step_times = []
            for i, t in enumerate(self.pipe.scheduler.timesteps):
                step_start_time = time.time()
                # 复制 latent（对应两种嵌入：无条件 + 条件）
                latent_model_input = torch.cat([latents]*2)
                # 调度器缩放输入（匹配当前去噪步骤的噪声水平）
                latent_model_input = self.pipe.scheduler.scale_model_input(latent_model_input, t)
                # UNet 预测噪声（输入：当前 latent+ 时间步 t+ 文本嵌入，输出：预测的噪声）
                noise_pred = self.pipe.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
                # 分离噪声（无条件预测 vs 条件预测）
                noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
                # 引导式去噪（用引导尺度控制提示词影响：噪声=无条件噪声 + 引导尺度*(条件噪声 - 无条件噪声)）
                noise_pred = noise_pred_uncond + self.guidance_scale *(noise_pred_text - noise_pred_uncond)
                # 调度器更新 latent（根据预测噪声去除当前步骤的噪声）
                latents = self.pipe.scheduler.step(noise_pred, t, latents).prev_sample
                # 6. 计算进度与剩余时间（反馈给 UI）
                step_time = time.time() - step_start_time
                self.step_times.append(step_time)
                progress = int((i + 1) / self.num_inference_steps * 100) # 进度百分比
                steps_remaining = self.num_inference_steps -(i + 1) # 剩余步数
                # 估算剩余时间（用最近 5 步的平均耗时，避免初始步骤波动影响）
                if len(self.step_times) >= 5:
                    avg_step_time = sum(self.step_times[-5:]) / 5
                else:
                    avg_step_time = sum(self.step_times) / len(self.step_times) if self.step_times else 0
                remaining_time = avg_step_time * steps_remaining
                # 发送进度信号（UI 接收后更新进度条）
                self.progress_updated.emit(progress, remaining_time)
            # 7. 解码 Latent 为图像（VAE 将隐空间向量转为像素）
            latents = 1/0.18215* latents # VAE 解码缩放因子（固定值，模型训练时确定）
            with torch.no_grad():
                image = self.pipe.vae.decode(latents).sample
            # 8. 图像后处理（标准化→转 PIL→色彩校正）
            image = (image / 2 + 0.5).clamp(0, 1) # 标准化：将 [-1, 1] 转为 [0, 1]
            image = image.cpu().permute(0, 2, 3, 1).float().numpy() # 调整维度：(1,C,H,W)→(H,W,C)
            image = (image[0] * 255).round().astype("uint8") # 转为 8 位像素（0-255）
            image = Image.fromarray(image) # 转 PIL 图像
            # 应用色彩校正（调用之前定义的函数）
            image = adjust_image_colors(image)
            # 确保图像分辨率一致
            if image.size != (Config.resolution, Config.resolution):
                image = image.resize((Config.resolution, Config.resolution), Image.LANCZOS) # 高质量缩放
            # 9. 发送生成完成信号（UI 接收后显示图像）
            self.finished.emit(image)
        except Exception as e:
            # 发送错误信号（UI 接收后弹窗提示）
            self.error.emit(str(e))

class AnimalGeneratorApp(QMainWindow):
    def __init__(self):
        super().__init__()
        self.pipe = None # 生成流水线（加载模型后赋值）
        self.current_image = None # 当前生成的图像
        self.initUI() # 初始化 UI

    def initUI(self):
        # 1. 基础设置（字体、窗口标题、尺寸）
        font = QFont("SimHei") # 支持中文显示（避免乱码）
        font.setPointSize(10)
        self.setFont(font)
        self.setWindowTitle('动物图像生成器')
        self.setGeometry(100, 100, 1100, 800) # 窗口位置与尺寸
        # 2. 中心部件与主布局（左右分栏：控制面板 + 图像显示区）
        central_widget = QWidget()
        self.setCentralWidget(central_widget)
        main_layout = QHBoxLayout(central_widget)
        main_layout.setContentsMargins(15, 15, 15, 15)
        main_layout.setSpacing(20)
        # 3. 左侧控制面板（模型设置、生成参数、进度）
        control_panel = self.create_control_panel()
        main_layout.addWidget(control_panel, 3) # 占 3 份宽度
        # 4. 右侧图像显示区（默认图、生成图、水印）
        image_panel = self.create_image_panel()
        main_layout.addWidget(image_panel, 5) # 占 5 份宽度（图像区更宽，提升体验）

def generate_image(self):
    # 前置校验（避免无效操作）
    if not self.pipe:
        QMessageBox.warning(self, "错误", "请先加载模型")
        return
    animal_name = self.animal_edit.text().strip()
    if not animal_name:
        QMessageBox.warning(self, "错误", "请输入动物名称")
        return
    # 1. 获取 UI 参数（覆盖全局配置）
    Config.resolution = self.resolution_spin.value()
    num_inference_steps = self.steps_spin.value()
    guidance_scale = self.guidance_spin.value()
    contrast_factor = self.contrast_spin.value()
    saturation_factor = self.saturation_spin.value()
    brightness_factor = self.brightness_spin.value()
    # 2. UI 状态更新（禁用生成/保存按钮，显示进度条）
    self.generate_btn.setEnabled(False)
    self.save_btn.setEnabled(False)
    self.progress_bar.setVisible(True)
    self.progress_bar.setRange(0, 100)
    self.progress_bar.setValue(0)
    self.progress_label.setText("准备生成 (第一次加载请耐心等待哦)...")
    self.statusBar().showMessage("正在生成图像，请稍候...")
    # 3. 启动生成线程（传入参数，绑定信号）
    self.gen_thread = GenerateThread(
        self.pipe, animal_name, num_inference_steps, guidance_scale,
        contrast_factor, saturation_factor, brightness_factor
    )
    # 线程信号绑定：完成→显示图像，错误→弹窗，进度→更新进度条
    self.gen_thread.finished.connect(self.on_generation_finished)
    self.gen_thread.error.connect(self.on_generation_error)
    self.gen_thread.progress_updated.connect(self.on_progress_updated)
    self.gen_thread.start() # 启动线程（执行 run 方法）

# 生成完成回调（接收线程信号，显示图像）
def on_generation_finished(self, image):
    self.current_image = image # 保存当前图像（用于后续保存）
    pixmap = self.pil2pixmap(image) # PIL 图像转 PyQt5 的 QPixmap（用于显示）
    # 显示图像（缩放至图像区尺寸，保持比例）
    self.image_label.setPixmap(pixmap.scaled(
        self.image_label.width(),
        self.image_label.height(),
        Qt.KeepAspectRatio,
        Qt.SmoothTransformation
    ))
    # 恢复 UI 状态（启用按钮，更新提示）
    self.generate_btn.setEnabled(True)
    self.save_btn.setEnabled(True)
    self.progress_bar.setValue(100)
    self.progress_label.setText("生成完成!")
    self.statusBar().showMessage("图像生成成功!")

def create_image_panel(self):
    panel = QWidget()
    layout = QVBoxLayout(panel)
    # 图像显示容器（带阴影，提升美观度）
    image_container = QWidget()
    image_container.setStyleSheet("""
        background-color: white;
        border-radius: 8px;
        box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
        padding: 15px;
    """)
    image_layout = QVBoxLayout(image_container)
    # 1. 底层：默认图（70% 透明度）
    self.default_image_label = QLabel()
    self.default_image_label.setAlignment(Qt.AlignCenter)
    self.default_image_label.setMinimumSize(512, 512)
    self.load_default_image() # 加载默认提示图（如'请生成动物图像'）
    # 2. 中层：生成图（初始为空，生成后显示）
    self.image_label = QLabel()
    self.image_label.setAlignment(Qt.AlignCenter)
    self.image_label.setMinimumSize(512, 512)
    self.image_label.setStyleSheet("background-color: transparent;") # 透明背景，避免遮挡底层
    # 3. 顶层：水印（右下角对齐）
    self.watermark_label = QLabel("制作者：热心市民小周")
    self.watermark_label.setStyleSheet("""
        color: rgba(100, 100, 100, 150); /* 半透明灰色 */
        font-size: 12px;
        padding: 5px;
        background-color: rgba(255, 255, 255, 100);
        border-radius: 2px;
    """)
    self.watermark_label.setAlignment(Qt.AlignRight | Qt.AlignBottom)
    # 网格布局实现层级叠加（同一单元格内，后添加的控件在顶层）
    grid_layout = QGridLayout()
    grid_layout.addWidget(self.default_image_label, 0, 0) # 底层
    grid_layout.addWidget(self.image_label, 0, 0) # 中层
    grid_layout.addWidget(self.watermark_label, 0, 0) # 顶层
    image_layout.addLayout(grid_layout)
    layout.addWidget(image_container, 1)
    return panel

推理步数	推理平均时间（CPU）	平均总时间
20	68.64s	456.65s
100	349.35s	823.45s
200	683.96s	1209.65s
400	1356.86s	1863.25s

基于 LoRA+Stable Diffusion 的 100 种动物图像生成

基于 LoRA+Stable Diffusion 的 100 种动物图像生成

一、项目介绍

二、文件夹结构

三、数据集介绍

更多推荐文章

相关免费在线工具

四、Stable Diffusion 与 LoRA 模型介绍

1. Stable Diffusion 模型架构解析

1.1 变分自编码器（VAE）

1.2 CLIP 文本编码器

1.3 U-Net 条件扩散模型

1.4 噪声调度器（DDPMScheduler）

2. LoRA 参数高效微调技术

2.1 LoRA 工作原理

2.2 LoRA 参数配置与优势

五、项目实现

1. 训练代码实现

① 参数配置

② 数据处理

③ 早停机制

④ LoRA 模型配置

⑤ CLIP 分数计算

⑥ 开始训练！

⑦ 验证与评估

2. UI 界面代码实现

① 全局参数配置

② 核心技术函数

③ 模型加载器：整合 Stable Diffusion 核心组件（ModelLoader）

④ 生成线程：避免 UI 卡顿（GenerateThread）

⑤ 主窗口 UI：可视化交互入口（AnimalGeneratorApp）

六、结果展示

训练指标

推理时间

生成示例

更多推荐文章

相关免费在线工具

基于 LoRA+Stable Diffusion 的 100 种动物图像生成

基于 LoRA+Stable Diffusion 的 100 种动物图像生成

一、项目介绍

二、文件夹结构

三、数据集介绍

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

四、Stable Diffusion 与 LoRA 模型介绍

1. Stable Diffusion 模型架构解析

1.1 变分自编码器（VAE）

1.2 CLIP 文本编码器

1.3 U-Net 条件扩散模型

1.4 噪声调度器（DDPMScheduler）

2. LoRA 参数高效微调技术

2.1 LoRA 工作原理

2.2 LoRA 参数配置与优势

五、项目实现

1. 训练代码实现

① 参数配置

② 数据处理

③ 早停机制

④ LoRA 模型配置

⑤ CLIP 分数计算

⑥ 开始训练！

⑦ 验证与评估

2. UI 界面代码实现

① 全局参数配置

② 核心技术函数

③ 模型加载器：整合 Stable Diffusion 核心组件（ModelLoader）

④ 生成线程：避免 UI 卡顿（GenerateThread）

⑤ 主窗口 UI：可视化交互入口（AnimalGeneratorApp）

六、结果展示

训练指标

推理时间

生成示例

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具