基于 LoRA 与 Stable Diffusion 的 100 种动物图像生成系统 | 极客日志

PythonAI算法

基于 LoRA 与 Stable Diffusion 的 100 种动物图像生成系统

综述由AI生成本项目介绍了一个基于 Stable Diffusion 和 LoRA 技术的动物图像生成系统。系统支持 100 种动物类别，采用 PyTorch 框架进行训练，使用 LoRA 技术进行参数高效微调。项目包含完整的训练流程、数据增强、早停机制及基于 CLIP 分数的评估体系。同时提供 PyQt5 图形界面，支持实时参数调整与图像预览。实现了显存优化、色彩校正等功能，适用于 CPU 和 GPU 环境。

人间过客发布于 2026/4/5更新于 2026/5/2434 浏览

基于 LoRA 与 Stable Diffusion 的 100 种动物图像生成系统

生成的甚么玩意？基于 LoRA+Stable Diffusion 的 100 种动物图像生成

代码详见：https://github.com/xiaozhou-alt/Animals_Generationn

一、项目介绍

这是一个基于 Stable Diffusion 和 LoRA 技术的动物图像生成系统，能够通过文本描述生成高质量的动物图像，包含完整的训练流程和用户友好的图形界面，支持自定义参数调整和实时图像生成。

主要特性

高效训练: 使用 LoRA (Low-Rank Adaptation) 技术对 Stable Diffusion 模型进行轻量级微调
用户友好界面: 基于 PyQt5 的图形界面，支持实时参数调整和图像预览
高质量生成: 经过优化的生成流程，支持色彩校正和后处理
跨平台支持: 支持 CPU 和 GPU 运行环境

生成的部分动物图像：

请添加图片描述

二、文件夹结构

Animals_Creation/
├── README.md
├── demo.gif # 演示动画
├── demo.mp4 # 演示视频
├── demo.py # 主演示脚本
├── icons/ # 图标资源目录
├── train.py
├── log/ # 日志目录
├── model/
│   └── LCM-runwayml-stable-diffusion-v1-5/ # Stable Diffusion 模型
│       ├── feature_extractor/ # 特征提取器
│       ├── model_index.json # 模型索引文件
│       ├── safety_checker/ # 安全检查器
│       ├── scheduler/ # 调度器
│       ├── text_encoder/ # 文本编码器
│       ├── tokenizer/ # 分词器
│       ├── unet/ # UNet 模型
│       └── vae/ # 变分自编码器
├── output/
│   ├── evaluation_results.xlsx # 评估结果 Excel 文件
│   ├── lora_models/ # LoRA 模型权重
│   │   └── clip-31.475.safetensors
│   ├── training_history.xlsx # 训练历史记录
│   └── pic/
└── requirements.txt

三、数据集介绍

本项目使用的动物数据集包含 100 个不同类别的动物图片，因为使用网页图片提取下载，清洗由个人完全进行，数据集数据量较大，所以部分动物文件夹存在 1%-1.5% 的噪声图片，数据集组织结构如下：

相关免费在线工具

加密/解密文本
使用加密算法（如AES、TripleDES、Rabbit或RC4）加密和解密文本明文。在线工具，加密/解密文本在线工具，online
RSA密钥对生成器
生成新的随机RSA私钥和公钥pem证书。在线工具，RSA密钥对生成器在线工具，online
Mermaid 预览与可视化编辑
基于 Mermaid.js 实时预览流程图、时序图等图表，支持源码编辑与即时渲染。在线工具，Mermaid 预览与可视化编辑在线工具，online
随机西班牙地址生成器
随机生成西班牙地址（支持马德里、加泰罗尼亚、安达卢西亚、瓦伦西亚筛选），支持数量快捷选择、显示全部与下载。在线工具，随机西班牙地址生成器在线工具，online
Gemini 图片去水印
基于开源反向 Alpha 混合算法去除 Gemini/Nano Banana 图片水印，支持批量处理与下载。在线工具，Gemini 图片去水印在线工具，online
curl 转代码
解析常见 curl 参数并生成 fetch、axios、PHP curl 或 Python requests 示例代码。在线工具，curl 转代码在线工具，online

Haojing ZHOU.100 种动物识别数据集 [DS/OL]. V1. Science Data Bank,2025[2025-08-30]. https://cstr.cn/31253.11.sciencedb.29221. CSTR:31253.11.sciencedb.29221.

@misc{动物识别，author ={Haojing ZHOU}, title ={100 种动物识别数据集}, year ={2025}, doi ={10.57760/sciencedb.29221}, url ={https://doi.org/10.57760/sciencedb.29221}, note ={CSTR:31253.11.sciencedb.29221}, publisher ={ScienceDB}}

vae = AutoencoderKL.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="vae"
)

latents = vae.encode(pixel_values).latent_dist.sample()
latents = latents * 0.18215 # 缩放因子

text_encoder = CLIPTextModel.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="text_encoder"
)
tokenizer = CLIPTokenizer.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="tokenizer"
)

self.prompt_templates = [
    "a photo of a {}",
    "a high quality image of a {}",
    # 更多模板...
]

unet = UNet2DConditionModel.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="unet"
)
# 噪声预测 noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample

noise_scheduler = DDPMScheduler.from_pretrained(
    config.pretrained_model_name_or_path,
    subfolder="scheduler"
)

noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)

def prepare_unet_for_lora(unet, rank=2, alpha=16):
    lora_config = LoraConfig(
        r=rank,
        lora_alpha=alpha,
        target_modules=["to_q","to_k","to_v","to_out.0"],
        lora_dropout=0.0,
        bias="none",
    )
    unet = get_peft_model(unet, lora_config)
    return unet

# LoRA 参数
rank = 2
lora_alpha = 16

# 参数配置 - 关键优化点
class Config:
    # 数据参数 - 减少数据量
    data_root = "/kaggle/input/animals/Animal/Animal"
    output_dir = "/kaggle/working/output"
    lora_model_dir = os.path.join(output_dir, "lora_models")
    history_file = os.path.join(output_dir, "training_history.xlsx")
    sample_output_dir = os.path.join(output_dir, "validation_samples")
    evaluation_file = os.path.join(output_dir, "evaluation_results.xlsx")
    comparison_dir = os.path.join(output_dir, "comparison_samples")
    
    # 模型参数 - 降低分辨率
    pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
    resolution = 256
    center_crop = True
    random_flip = True
    
    # LoRA 参数 - 简化 LoRA
    rank = 2
    lora_alpha = 16
    
    # 训练参数 - 关键优化
    train_batch_size = 1
    gradient_accumulation_steps = 4
    num_train_epochs = 10
    learning_rate = 1e-5
    lr_scheduler_type = "cosine_with_warmup"
    lr_warmup_steps = 200
    max_grad_norm = 0.5
    use_ema = True
    gradient_checkpointing = True
    mixed_precision = "fp16"
    
    # 早停参数 - 使用 CLIP 分数作为指标
    early_stopping_patience = 5
    early_stopping_delta = 0.02
    validation_split = 0.1
    
    # 验证参数
    num_validation_samples = 5
    num_inference_steps = 20
    num_final_inference_steps = 100
    guidance_scale = 7.5
    
    # 每类最大样本数
    max_samples_per_class = 100
    
    # 评估参数
    num_evaluation_samples = 10
    clip_model_name = "openai/clip-vit-base-patch32"

# 1. 数据处理与准备 - 添加样本限制
class AnimalDataset(Dataset):
    def __init__(self, data_root, tokenizer, size=384, center_crop=True, random_flip=True, max_samples_per_class=100):
        self.data_root = data_root
        self.tokenizer = tokenizer
        self.size = size
        self.center_crop = center_crop
        self.random_flip = random_flip
        self.max_samples_per_class = max_samples_per_class
        self.image_paths = []
        self.class_names = []
        
        subfolders = [f.name for f in os.scandir(data_root) if f.is_dir()]
        for class_name in subfolders:
            class_dir = os.path.join(data_root, class_name)
            image_files = glob.glob(os.path.join(class_dir, "*.jpg")) + \
                          glob.glob(os.path.join(class_dir, "*.png")) + \
                          glob.glob(os.path.join(class_dir, "*.jpeg"))
            
            if len(image_files) > max_samples_per_class:
                image_files = random.sample(image_files, max_samples_per_class)
            
            for img_path in image_files:
                self.image_paths.append(img_path)
                self.class_names.append(class_name)
        
        self.prompt_templates = [
            "a photo of a {}",
            "a high quality image of a {}",
            "a clear picture of a {}",
            "a realistic image of a {}",
            "a cute {}",
            "a wild {} in its natural habitat",
            "a close-up of a {}"
        ]

def __len__(self):
    return len(self.image_paths)

def __getitem__(self, idx):
    image_path = self.image_paths[idx]
    class_name = self.class_names[idx]
    
    image = Image.open(image_path).convert("RGB")
    
    if self.center_crop:
        image = self._center_crop(image)
    else:
        image = image.resize((self.size, self.size), Image.Resampling.LANCZOS)
    
    if self.random_flip and random.random() < 0.5:
        image = image.transpose(Image.FLIP_LEFT_RIGHT)
    
    image_tensor = (torch.tensor(np.array(image).astype(np.float32)/127.5)-1.0).permute(2,0,1)
    
    prompt_template = random.choice(self.prompt_templates)
    prompt = prompt_template.format(class_name)
    
    tokenized_input = self.tokenizer(
        prompt,
        max_length=self.tokenizer.model_max_length,
        padding="max_length",
        truncation=True,
        return_tensors="pt"
    )
    input_ids = tokenized_input.input_ids.squeeze(0)
    
    return {
        "pixel_values": image_tensor,
        "input_ids": input_ids,
        "prompt": prompt,
        "class_name": class_name
    }

def _center_crop(self, image):
    width, height = image.size
    new_size = min(width, height)
    left = (width - new_size)/2
    top = (height - new_size)/2
    right = (width + new_size)/2
    bottom = (height + new_size)/2
    image = image.crop((left, top, right, bottom))
    image = image.resize((self.size, self.size), Image.Resampling.LANCZOS)
    return image

# 早停机制 (PyTorch 实现) - 使用 CLIP 分数作为指标
class EarlyStopping:
    def __init__(self, patience=3, delta=0.05, verbose=False):
        self.patience = patience
        self.delta = delta
        self.verbose = verbose
        self.counter = 0
        self.best_score = None
        self.early_stop = False
    
    def __call__(self, clip_score):
        if self.best_score is None:
            self.best_score = clip_score
        elif clip_score < self.best_score + self.delta:
            self.counter += 1
            if self.verbose:
                print(f"EarlyStopping counter: {self.counter} out of {self.patience}")
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = clip_score
            self.counter = 0

# 为 UNet 准备 LoRA 的函数 - 使用 peft 库
def prepare_unet_for_lora(unet, rank=2, alpha=16):
    # 配置 LoRA
    lora_config = LoraConfig(
        r=rank,
        lora_alpha=alpha,
        target_modules=["to_q","to_k","to_v","to_out.0"],
        lora_dropout=0.0,
        bias="none",
    )
    # 应用 LoRA 到 UNet
    unet = get_peft_model(unet, lora_config)
    unet.print_trainable_parameters()
    return unet

# 计算验证 CLIP 分数的函数
def compute_validation_clip_score(config, unet, text_encoder, vae, tokenizer, device):
    animal_classes = [f.name for f in os.scandir(config.data_root) if f.is_dir()]
    selected_animals = random.sample(animal_classes, min(config.num_validation_samples, len(animal_classes)))
    print(f"Selected animals for validation CLIP score: {selected_animals}")
    
    pipe = StableDiffusionPipeline.from_pretrained(
        config.pretrained_model_name_or_path,
        text_encoder=text_encoder,
        vae=vae,
        unet=unet,
        tokenizer=tokenizer,
        safety_checker=None,
        torch_dtype=torch.float16 if config.mixed_precision == "fp16" else torch.float32
    ).to(device)
    
    clip_model = CLIPModel.from_pretrained(config.clip_model_name).to(device)
    clip_processor = CLIPProcessor.from_pretrained(config.clip_model_name)
    clip_scores = []
    
    for animal in selected_animals:
        prompt = f"a high quality photo of a {animal}"
        with torch.autocast(device.type):
            image = pipe(
                prompt,
                num_inference_steps=config.num_inference_steps,
                guidance_scale=config.guidance_scale,
                height=config.resolution,
                width=config.resolution
            ).images[0]
        
        with torch.no_grad():
            inputs = clip_processor(
                text=[prompt],
                images=image,
                return_tensors="pt",
                padding=True
            ).to(device)
            outputs = clip_model(**inputs)
            logits_per_image = outputs.logits_per_image
            clip_score = logits_per_image.item()
            print(f"Animal: {animal}, CLIP Score: {clip_score:.4f}")
            clip_scores.append(clip_score)
    
    avg_clip_score = np.mean(clip_scores)
    print(f"Average Validation CLIP Score: {avg_clip_score:.4f}")
    return avg_clip_score

# 2. 训练函数 (包含早停和历史记录)
def train_lora_with_earlystopping(config):
    # 初始化模型组件
    tokenizer = CLIPTokenizer.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="tokenizer"
    )
    text_encoder = CLIPTextModel.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="text_encoder"
    )
    vae = AutoencoderKL.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="vae"
    )
    unet = UNet2DConditionModel.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="unet"
    )
    
    # 添加 LoRA 适配器到 UNet
    unet = prepare_unet_for_lora(unet, config.rank, config.lora_alpha)
    
    # 设置噪声调度器
    noise_scheduler = DDPMScheduler.from_pretrained(
        config.pretrained_model_name_or_path,
        subfolder="scheduler"
    )
    
    # 启用梯度检查点以节省显存
    if config.gradient_checkpointing:
        unet.enable_gradient_checkpointing()
    
    # 将模型移动到 GPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    text_encoder.to(device)
    vae.to(device)
    unet.to(device)

# 设置优化器 (只优化 LoRA 参数)
lora_params = []
for name, param in unet.named_parameters():
    if param.requires_grad:
        lora_params.append(param)

optimizer = torch.optim.AdamW(
    lora_params,
    lr=config.learning_rate,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0.01
)

# 准备数据集和数据加载器
full_dataset = AnimalDataset(
    config.data_root,
    tokenizer,
    size=config.resolution,
    center_crop=config.center_crop,
    random_flip=config.random_flip,
    max_samples_per_class=config.max_samples_per_class
)

val_size = int(len(full_dataset) * config.validation_split)
train_size = len(full_dataset) - val_size
train_dataset, val_dataset = random_split(full_dataset, [train_size, val_size])

train_dataloader = DataLoader(
    train_dataset,
    batch_size=config.train_batch_size,
    shuffle=True,
    num_workers=2
)
val_dataloader = DataLoader(
    val_dataset,
    batch_size=config.train_batch_size,
    shuffle=False,
    num_workers=2
)

# 计算总训练步数
num_update_steps_per_epoch = len(train_dataloader) // config.gradient_accumulation_steps
max_train_steps = config.num_train_epochs * num_update_steps_per_epoch

# 学习率调度器
lr_scheduler = get_cosine_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=config.lr_warmup_steps,
    num_training_steps=max_train_steps
)

# 初始化早停 (使用 CLIP 分数作为指标)
early_stopping = EarlyStopping(
    patience=config.early_stopping_patience,
    delta=config.early_stopping_delta,
    verbose=True
)

# 创建 Excel 工作簿用于记录历史
history_wb = Workbook()
history_ws = history_wb.active
history_ws.title = "Training History"
history_ws.append(["Epoch", "Step", "Train Loss", "Validation Loss", "CLIP Score", "Learning Rate", "Best CLIP Score", "Gradient Norm"])

# 训练循环
global_step = 0
best_clip_score = 0.0

for epoch in range(config.num_train_epochs):
    unet.train()
    total_loss = 0
    optimizer.zero_grad()
    current_grad_norm = 0.0
    
    for step, batch in enumerate(train_dataloader):
        pixel_values = batch["pixel_values"].to(device)
        input_ids = batch["input_ids"].to(device)
        
        # 将图像编码到潜在空间
        with torch.no_grad():
            latents = vae.encode(pixel_values).latent_dist.sample()
            latents = latents * 0.18215 # 缩放因子
        
        # 采样噪声
        noise = torch.randn_like(latents)
        bsz = latents.shape[0]
        timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (bsz,), device=device).long()
        
        # 向潜在表示添加噪声
        noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
        
        # 获取文本嵌入
        with torch.no_grad():
            encoder_hidden_states = text_encoder(input_ids)[0]
        
        # 预测噪声残差
        noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
        
        # 计算损失
        loss = F.mse_loss(noise_pred, noise, reduction="mean") / config.gradient_accumulation_steps
        
        # 反向传播
        loss.backward()
        
        # 梯度累积
        if (step + 1) % config.gradient_accumulation_steps == 0:
            # 梯度裁剪
            torch.nn.utils.clip_grad_norm_(lora_params, config.max_grad_norm)
            
            # 计算梯度范数用于监控
            current_grad_norm = 0
            for p in lora_params:
                if p.grad is not None:
                    param_norm = p.grad.data.norm(2)
                    current_grad_norm += param_norm.item()**2
            current_grad_norm = current_grad_norm**0.5
            
            # 更新参数
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()
            global_step += 1
            total_loss += loss.item() * config.gradient_accumulation_steps
        
        # 打印训练信息
        if global_step % 50 == 0:
            avg_loss = total_loss / (step + 1)
            current_lr = lr_scheduler.get_last_lr()[0]
            print(f"Epoch {epoch}, Step {global_step}, Loss: {avg_loss:.4f}, LR: {current_lr:.6f}, Grad Norm: {current_grad_norm:.6f}")

# 每个 epoch 结束后计算验证损失和 CLIP 分数
val_loss = compute_validation_loss(unet, vae, text_encoder, val_dataloader, noise_scheduler, device)
avg_train_loss = total_loss / len(train_dataloader)
print(f"Epoch {epoch} completed. Train Loss: {avg_train_loss:.4f}, Validation Loss: {val_loss:.4f}")

# 计算 CLIP 分数
clip_score = compute_validation_clip_score(config, unet, text_encoder, vae, tokenizer, device)

# 记录到历史
current_lr = lr_scheduler.get_last_lr()[0]
history_ws.append([epoch, global_step, avg_train_loss, val_loss, clip_score, current_lr, best_clip_score, current_grad_norm])

# 早停检查 (基于 CLIP 分数)
early_stopping(clip_score)

# 保存最佳模型
if clip_score > best_clip_score:
    best_clip_score = clip_score
    save_path = os.path.join(config.lora_model_dir, f"lora_weights_epoch_{epoch}.safetensors")
    save_lora_weights(unet, save_path)
    print(f"Saved best model with CLIP score: {best_clip_score:.4f}")

# 保存训练历史
history_wb.save(config.history_file)

# 检查早停
if early_stopping.early_stop:
    print("Early stopping triggered")
    break

print("Training completed!")
return unet, text_encoder, vae, tokenizer

# 3. 验证和生成样本
def generate_validation_samples(config, unet, text_encoder, vae, tokenizer):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    animal_classes = [f.name for f in os.scandir(config.data_root) if f.is_dir()]
    selected_animals = random.sample(animal_classes, config.num_validation_samples)
    print(f"Selected animals for validation: {selected_animals}")
    
    pipe = StableDiffusionPipeline.from_pretrained(
        config.pretrained_model_name_or_path,
        text_encoder=text_encoder,
        vae=vae,
        unet=unet,
        tokenizer=tokenizer,
        safety_checker=None, # 禁用安全检查器以加快生成速度
        torch_dtype=torch.float16 if config.mixed_precision == "fp16" else torch.float32
    ).to(device)
    
    all_images = []
    all_titles = []
    
    for animal in selected_animals:
        prompt = f"a high quality photo of a {animal}"
        with torch.autocast(device.type):
            image = pipe(
                prompt,
                num_inference_steps=config.num_final_inference_steps,
                guidance_scale=config.guidance_scale,
                height=config.resolution,
                width=config.resolution
            ).images[0]
        
        save_path = os.path.join(config.sample_output_dir, f"{animal}.png")
        image.save(save_path)
        print(f"Generated image for {animal} saved at {save_path}")
        all_images.append(image)
        all_titles.append(animal)
    
    comparison_path = os.path.join(config.comparison_dir, "animal_comparison.png")
    create_comparison_image(all_images, all_titles, comparison_path)
    return selected_animals

# 4. 评估函数 - 使用 CLIP Score 评估生成质量
def evaluate_with_clip_score(config, unet, text_encoder, vae, tokenizer):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    clip_model = CLIPModel.from_pretrained(config.clip_model_name).to(device)
    clip_processor = CLIPProcessor.from_pretrained(config.clip_model_name)
    
    animal_classes = [f.name for f in os.scandir(config.data_root) if f.is_dir()]
    selected_animals = random.sample(animal_classes, config.num_evaluation_samples)
    print(f"Selected animals for evaluation: {selected_animals}")
    
    pipe = StableDiffusionPipeline.from_pretrained(
        config.pretrained_model_name_or_path,
        text_encoder=text_encoder,
        vae=vae,
        unet=unet,
        tokenizer=tokenizer,
        safety_checker=None,
        torch_dtype=torch.float16 if config.mixed_precision == "fp16" else torch.float32
    ).to(device)
    
    evaluation_results = []
    
    for animal in selected_animals:
        prompt = f"a high quality photo of a {animal}"
        with torch.autocast(device.type):
            image = pipe(
                prompt,
                num_inference_steps=config.num_final_inference_steps,
                guidance_scale=config.guidance_scale,
                height=config.resolution,
                width=config.resolution
            ).images[0]
        
        save_path = os.path.join(config.sample_output_dir, f"eval_{animal}.png")
        image.save(save_path)
        
        with torch.no_grad():
            inputs = clip_processor(
                text=[prompt],
                images=image,
                return_tensors="pt",
                padding=True
            ).to(device)
            outputs = clip_model(**inputs)
            logits_per_image = outputs.logits_per_image
            clip_score = logits_per_image.item()
            print(f"Animal: {animal}, CLIP Score: {clip_score:.4f}")
            evaluation_results.append({
                "animal": animal,
                "prompt": prompt,
                "clip_score": clip_score,
                "image_path": save_path
            })
    
    avg_clip_score = np.mean([result["clip_score"] for result in evaluation_results])
    print(f"Average CLIP Score: {avg_clip_score:.4f}")
    
    evaluation_wb = Workbook()
    evaluation_ws = evaluation_wb.active
    evaluation_ws.title = "Evaluation Results"
    evaluation_ws.append(["Animal", "Prompt", "CLIP Score", "Image Path"])
    for result in evaluation_results:
        evaluation_ws.append([result["animal"], result["prompt"], result["clip_score"], result["image_path"]])
    evaluation_ws.append([])
    evaluation_ws.append(["Average CLIP Score", avg_clip_score])
    evaluation_wb.save(config.evaluation_file)
    print(f"Evaluation results saved to {config.evaluation_file}")
    return evaluation_results, avg_clip_score

# 配置类 - 增加色彩相关参数
class Config:
    pretrained_model_name_or_path = "model/LCM-runwayml-stable-diffusion-v1-5"
    resolution = 512
    rank = 2
    lora_alpha = 16
    device = "cpu"
    num_final_inference_steps = 100
    guidance_scale = 5.0
    contrast_factor = 1.0
    saturation_factor = 1.0
    brightness_factor = 1.0

# 加载 LoRA 权重的函数
def load_lora_weights(unet, load_path):
    lora_state_dict = torch.load(load_path, map_location=torch.device(Config.device))
    unet.load_state_dict(lora_state_dict, strict=False)
    return unet

# 修复 tokenizer 加载问题的函数
def load_tokenizer_with_fix(model_path):
    try:
        tokenizer = CLIPTokenizer.from_pretrained(
            os.path.join(model_path, "tokenizer")
        )
        return tokenizer
    except Exception as e:
        print(f"加载 tokenizer 时出错：{e}")
        print("尝试修复 tokenizer 配置...")
        from transformers import CLIPTokenizerFast
        vocab_file = os.path.join(model_path, "tokenizer", "vocab.json")
        merges_file = os.path.join(model_path, "tokenizer", "merges.txt")
        if os.path.exists(vocab_file) and os.path.exists(merges_file):
            tokenizer = CLIPTokenizerFast(
                vocab_file=vocab_file,
                merges_file=merges_file,
                max_length=77,
                pad_token="!",
                additional_special_tokens=["<startoftext|>","<endoftext|>"]
            )
            return tokenizer
        else:
            raise Exception(f"找不到 tokenizer 文件：{vocab_file} 或 {merges_file}")

# 图像色彩校正函数
def adjust_image_colors(image):
    enhancer = ImageEnhance.Contrast(image)
    image = enhancer.enhance(Config.contrast_factor)
    enhancer = ImageEnhance.Color(image)
    image = enhancer.enhance(Config.saturation_factor)
    enhancer = ImageEnhance.Brightness(image)
    image = enhancer.enhance(Config.brightness_factor)
    return image

# 模型加载类
class ModelLoader:
    def __init__(self, config, lora_model_path):
        self.config = config
        self.lora_model_path = lora_model_path
        self.tokenizer = None
        self.text_encoder = None
        self.vae = None
        self.unet = None
        self.pipe = None
    
    def load_models(self):
        self.tokenizer = load_tokenizer_with_fix(self.config.pretrained_model_name_or_path)
        
        text_encoder_path = os.path.join(self.config.pretrained_model_name_or_path, "text_encoder")
        self.text_encoder = CLIPTextModel.from_pretrained(text_encoder_path)
        
        vae_path = os.path.join(self.config.pretrained_model_name_or_path, "vae")
        self.vae = AutoencoderKL.from_pretrained(vae_path)
        
        unet_path = os.path.join(self.config.pretrained_model_name_or_path, "unet")
        self.unet = UNet2DConditionModel.from_pretrained(unet_path)
        
        self.unet = load_lora_weights(self.unet, self.lora_model_path)
        
        self.text_encoder.to(self.config.device)
        self.vae.to(self.config.device)
        self.unet.to(self.config.device)
        
        scheduler_path = os.path.join(self.config.pretrained_model_name_or_path, "scheduler")
        scheduler = DDPMScheduler.from_pretrained(scheduler_path)
        
        self.pipe = StableDiffusionPipeline(
            vae=self.vae,
            text_encoder=self.text_encoder,
            tokenizer=self.tokenizer,
            unet=self.unet,
            scheduler=scheduler,
            safety_checker=None,
            feature_extractor=None,
            requires_safety_checker=False
        )
        return self.pipe

# 生成线程类 - 增加色彩校正步骤
class GenerateThread(QThread):
    finished = pyqtSignal(Image.Image)
    error = pyqtSignal(str)
    progress_updated = pyqtSignal(int, float)
    
    def __init__(self, pipe, animal_name, num_inference_steps, guidance_scale, contrast_factor, saturation_factor, brightness_factor):
        super().__init__()
        self.pipe = pipe
        self.animal_name = animal_name
        self.num_inference_steps = num_inference_steps
        self.guidance_scale = guidance_scale
        self.contrast_factor = contrast_factor
        self.saturation_factor = saturation_factor
        self.brightness_factor = brightness_factor
        self.start_time = 0
        self.step_times = []
    
    def run(self):
        try:
            prompt = (
                f"a high quality photo of a {self.animal_name}, natural lighting, "
                f"realistic colors, in natural habitat, detailed texture"
            )
            
            with torch.no_grad():
                text_inputs = self.pipe.tokenizer(
                    prompt,
                    padding="max_length",
                    max_length=self.pipe.tokenizer.model_max_length,
                    truncation=True,
                    return_tensors="pt"
                )
                text_input_ids = text_inputs.input_ids
                text_embeddings = self.pipe.text_encoder(text_input_ids.to(self.pipe.device))[0]
                
                max_length = text_input_ids.shape[-1]
                uncond_input = self.pipe.tokenizer([""], padding="max_length", max_length=max_length, return_tensors="pt",)
                uncond_embeddings = self.pipe.text_encoder(uncond_input.input_ids.to(self.pipe.device))[0]
                
                text_embeddings = torch.cat([uncond_embeddings, text_embeddings])
                
                latents = torch.randn(
                    (1, self.pipe.unet.config.in_channels, Config.resolution // 8, Config.resolution // 8),
                    generator=torch.Generator(device=Config.device),
                    device=Config.device,
                )
                
                self.pipe.scheduler.set_timesteps(self.num_inference_steps, device=Config.device)
                
                self.start_time = time.time()
                self.step_times = []
                
                for i, t in enumerate(self.pipe.scheduler.timesteps):
                    step_start_time = time.time()
                    latent_model_input = torch.cat([latents]*2)
                    latent_model_input = self.pipe.scheduler.scale_model_input(latent_model_input, t)
                    
                    noise_pred = self.pipe.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
                    noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
                    noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)
                    
                    latents = self.pipe.scheduler.step(noise_pred, t, latents).prev_sample
                    
                    step_time = time.time() - step_start_time
                    self.step_times.append(step_time)
                    progress = int((i + 1) / self.num_inference_steps * 100)
                    steps_remaining = self.num_inference_steps - (i + 1)
                    
                    if len(self.step_times) >= 5:
                        avg_step_time = sum(self.step_times[-5:]) / 5
                    else:
                        avg_step_time = sum(self.step_times) / len(self.step_times) if self.step_times else 0
                    remaining_time = avg_step_time * steps_remaining
                    
                    self.progress_updated.emit(progress, remaining_time)
                
                latents = 1/0.18215 * latents
                
                with torch.no_grad():
                    image = self.pipe.vae.decode(latents).sample
                    image = (image / 2 + 0.5).clamp(0, 1)
                    image = image.cpu().permute(0, 2, 3, 1).float().numpy()
                    image = (image[0] * 255).round().astype("uint8")
                    image = Image.fromarray(image)
                    
                    image = adjust_image_colors(image)
                    
                    if image.size != (Config.resolution, Config.resolution):
                        image = image.resize((Config.resolution, Config.resolution), Image.LANCZOS)
                    
                    self.finished.emit(image)
        except Exception as e:
            self.error.emit(str(e))

class AnimalGeneratorApp(QMainWindow):
    def __init__(self):
        super().__init__()
        self.pipe = None
        self.current_image = None
        self.initUI()
    
    def initUI(self):
        font = QFont("SimHei")
        font.setPointSize(10)
        self.setFont(font)
        self.setWindowTitle('动物图像生成器')
        self.setGeometry(100, 100, 1100, 800)
        
        central_widget = QWidget()
        self.setCentralWidget(central_widget)
        main_layout = QHBoxLayout(central_widget)
        main_layout.setContentsMargins(15, 15, 15, 15)
        main_layout.setSpacing(20)
        
        control_panel = self.create_control_panel()
        main_layout.addWidget(control_panel, 3)
        
        image_panel = self.create_image_panel()
        main_layout.addWidget(image_panel, 5)

def generate_image(self):
    if not self.pipe:
        QMessageBox.warning(self, "错误", "请先加载模型")
        return
    
    animal_name = self.animal_edit.text().strip()
    if not animal_name:
        QMessageBox.warning(self, "错误", "请输入动物名称")
        return
    
    Config.resolution = self.resolution_spin.value()
    num_inference_steps = self.steps_spin.value()
    guidance_scale = self.guidance_spin.value()
    contrast_factor = self.contrast_spin.value()
    saturation_factor = self.saturation_spin.value()
    brightness_factor = self.brightness_spin.value()
    
    self.generate_btn.setEnabled(False)
    self.save_btn.setEnabled(False)
    self.progress_bar.setVisible(True)
    self.progress_bar.setRange(0, 100)
    self.progress_bar.setValue(0)
    self.progress_label.setText("准备生成 (第一次加载请耐心等待哦)...")
    self.statusBar().showMessage("正在生成图像，请稍候...")
    
    self.gen_thread = GenerateThread(
        self.pipe, animal_name, num_inference_steps, guidance_scale,
        contrast_factor, saturation_factor, brightness_factor
    )
    
    self.gen_thread.finished.connect(self.on_generation_finished)
    self.gen_thread.error.connect(self.on_generation_error)
    self.gen_thread.progress_updated.connect(self.on_progress_updated)
    self.gen_thread.start()

def on_generation_finished(self, image):
    self.current_image = image
    pixmap = self.pil2pixmap(image)
    self.image_label.setPixmap(pixmap.scaled(
        self.image_label.width(), self.image_label.height(), Qt.KeepAspectRatio, Qt.SmoothTransformation
    ))
    self.generate_btn.setEnabled(True)
    self.save_btn.setEnabled(True)
    self.progress_bar.setValue(100)
    self.progress_label.setText("生成完成!")
    self.statusBar().showMessage("图像生成成功!")

def create_image_panel(self):
    panel = QWidget()
    layout = QVBoxLayout(panel)
    
    image_container = QWidget()
    image_container.setStyleSheet("""
        background-color: white;
        border-radius: 8px;
        box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
        padding: 15px;
    """)
    image_layout = QVBoxLayout(image_container)
    
    self.default_image_label = QLabel()
    self.default_image_label.setAlignment(Qt.AlignCenter)
    self.default_image_label.setMinimumSize(512, 512)
    self.load_default_image()
    
    self.image_label = QLabel()
    self.image_label.setAlignment(Qt.AlignCenter)
    self.image_label.setMinimumSize(512, 512)
    self.image_label.setStyleSheet("background-color: transparent;")
    
    self.watermark_label = QLabel("制作者：热心市民小周")
    self.watermark_label.setStyleSheet("""
        color: rgba(100, 100, 100, 150);
        font-size: 12px;
        padding: 5px;
        background-color: rgba(255, 255, 255, 100);
        border-radius: 2px;
    """)
    self.watermark_label.setAlignment(Qt.AlignRight | Qt.AlignBottom)
    
    grid_layout = QGridLayout()
    grid_layout.addWidget(self.default_image_label, 0, 0)
    grid_layout.addWidget(self.image_label, 0, 0)
    grid_layout.addWidget(self.watermark_label, 0, 0)
    image_layout.addLayout(grid_layout)
    layout.addWidget(image_container, 1)
    return panel

推理步数	推理平均时间（CPU）	平均总时间
20	68.64s	456.65s
100	349.35s	823.45s
200	683.96s	1209.65s
400	1356.86s	1863.25s

基于 LoRA 与 Stable Diffusion 的 100 种动物图像生成系统

生成的甚么玩意？基于 LoRA+Stable Diffusion 的 100 种动物图像生成

一、项目介绍

二、文件夹结构

三、数据集介绍

更多推荐文章

相关免费在线工具

四、Stable Diffusion 与 LoRA 模型介绍

1. Stable Diffusion 模型架构解析

1.1 变分自编码器（VAE）

1.2 CLIP 文本编码器

1.3 U-Net 条件扩散模型

1.4 噪声调度器（DDPMScheduler）

2. LoRA 参数高效微调技术

2.1 LoRA 工作原理

2.2 LoRA 参数配置与优势

五、项目实现

1. 训练代码实现

① 参数配置

② 数据处理

③ 早停机制

④ LoRA 模型配置

⑤ CLIP 分数计算

⑥ 开始训练！

⑦ 验证与评估

2. UI 界面代码实现

① 全局参数配置

② 核心技术函数

③ 模型加载器：整合 Stable Diffusion 核心组件（ModelLoader）

④ 生成线程：避免 UI 卡顿（GenerateThread）

⑤ 主窗口 UI：可视化交互入口（AnimalGeneratorApp）

六、结果展示

训练指标

推理时间

生成示例

更多推荐文章

相关免费在线工具

基于 LoRA 与 Stable Diffusion 的 100 种动物图像生成系统

生成的甚么玩意？基于 LoRA+Stable Diffusion 的 100 种动物图像生成

一、项目介绍

二、文件夹结构

三、数据集介绍

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

四、Stable Diffusion 与 LoRA 模型介绍

1. Stable Diffusion 模型架构解析

1.1 变分自编码器（VAE）

1.2 CLIP 文本编码器

1.3 U-Net 条件扩散模型

1.4 噪声调度器（DDPMScheduler）

2. LoRA 参数高效微调技术

2.1 LoRA 工作原理

2.2 LoRA 参数配置与优势

五、项目实现

1. 训练代码实现

① 参数配置

② 数据处理

③ 早停机制

④ LoRA 模型配置

⑤ CLIP 分数计算

⑥ 开始训练！

⑦ 验证与评估

2. UI 界面代码实现

① 全局参数配置

② 核心技术函数

③ 模型加载器：整合 Stable Diffusion 核心组件（ModelLoader）

④ 生成线程：避免 UI 卡顿（GenerateThread）

⑤ 主窗口 UI：可视化交互入口（AnimalGeneratorApp）

六、结果展示

训练指标

推理时间

生成示例

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具