算法

2024百度商业AI技术创新大赛赛道二：广告图片描述生成 AI Studio

Ne0inhk

25 Dec 2024 — 47 min read

项目简介

2024百度商业AI技术创新大赛赛道二：广告图片描述生成

比赛背景

在百度商业营销场景，广告图片生成精细化描述具有重要意义和价值。图片描述信息（Image Caption），可以作为特征丰富广告系统对多模态内容的理解，提升模型的泛化能力，也可以为文生图模型提供高质量的训练样本，提升文生图模型的文本控制能力。当前，随着大模型时代的到来，利用多模态大语言模型（MLLM）生成图片描述已经成为业界的通用做法（DallE3，Sora, Stable Diffusion3）。本赛道任务是广告图片描述生成，期望通过高质量数据和建模优化，提升图片描述的准度和完备性。

赛题说明

比赛任务

本次任务提供百度商业真实的广告图片和图片中文描述，数据量级约100万，选手可自行划分训练集和验证集，并训练多模态大语言模型，提升模型的多模态理解与生成能力，完成广告图片描述生成的任务。

基线程序

本基线程序demo采用paddlepaddle版本的多模态大模型，旨在为选手们提供一个数据准备、模型训练推理、结果评估等的样例，以方便选手们上手比赛任务。

(一) 比赛环境和代码准备

In [2]

# 本次demo程序 %cd /home/aistudio/work # 查看一下主目录下的文件 !ls /home/aistudio/work # 若代码不存在，则解压压缩包得到代码paddlemix !nohup tar -xvf paddlemix.tar

/home/aistudio/work nohup.out paddlemix paddlemix.tar weights nohup: ignoring input and appending output to 'nohup.out'

基础环境

Ubuntu 20.04.6 LTS

CUDA 11.8

CUDNN 8.9.X

Python >= 3.8

PaddlePaddle >= 2.6.1 (GPU版本)

PaddlePaddle安装可参考

In [1]

# 在AIStudio环境中安装相关依赖 %cd /home/aistudio/work/paddlemix !pip install --upgrade pip !pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

/home/aistudio/work/paddlemix Looking in indexes: https://mirror.baidu.com/pypi/simple/, https://mirrors.aliyun.com/pypi/simple/ Requirement already satisfied: pip in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (23.3.1) Collecting pip Downloading https://mirrors.aliyun.com/pypi/packages/8a/6a/19e9fe04fca059ccf770861c7d5721ab4c2aebc539889e97c7977528a53b/pip-24.0-py3-none-any.whl (2.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 369.3 kB/s eta 0:00:0000:0100:01 Installing collected packages: pip Attempting uninstall: pip Found existing installation: pip 23.3.1 Uninstalling pip-23.3.1: Successfully uninstalled pip-23.3.1 Successfully installed pip-24.0 Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, https://mirrors.aliyun.com/pypi/simple/ Requirement already satisfied: numpy in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r requirements.txt (line 1)) (1.26.4) Collecting tensorboardX (from -r requirements.txt (line 2)) Downloading https://mirrors.aliyun.com/pypi/packages/44/71/f3e7c9b2ab67e28c572ab4e9d5fa3499e0d252650f96d8a3a03e26677f53/tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.7/101.7 kB 333.9 kB/s eta 0:00:00a 0:00:01 Requirement already satisfied: opencv-python in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r requirements.txt (line 3)) (4.9.0.80) Requirement already satisfied: Pillow in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from -r requirements.txt (line 4)) (10.3.0) Collecting ftfy (from -r requirements.txt (line 5)) Downloading https://mirrors.aliyun.com/pypi/packages/f4/f0/21efef51304172736b823689aaf82f33dbc64f54e9b046b75f5212d5cee7/ftfy-6.2.0-py3-none-any.whl (54 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.4/54.4 kB 365.0 kB/s eta 0:00:00a 0:00:01 Collecting regex (from -r requirements.txt (line 6)) Downloading https://mirrors.aliyun.com/pypi/packages/07/17/5d92509b4dccacf9767d8607112c19667e15db2428014440bae4356b8aff/regex-2024.5.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (775 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 775.1/775.1 kB 221.6 kB/s eta 0:00:00a 0:00:01 Collecting einops>=0.6.1 (from -r requirements.txt (line 7)) Downloading https://mirrors.aliyun.com/pypi/packages/44/5a/f0b9ad6c0a9017e62d4735daaeb11ba3b6c009d69a26141b258cd37b5588/einops-0.8.0-py3-none-any.whl (43 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.2/43.2 kB 402.0 kB/s eta 0:00:00a 0:00:01 Collecting tiktoken (from -r requirements.txt (line 8)) Downloading https://mirrors.aliyun.com/pypi/packages/e7/8c/7d1007557b343d5cf18349802e94d3a14397121e9105b4661f8cd753f9bf/tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 429.2 kB/s eta 0:00:00a 0:00:01 Requirement already satisfied: packaging in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from tensorboardX->-r requirements.txt (line 2)) (24.0) Requirement already satisfied: protobuf>=3.20 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from tensorboardX->-r requirements.txt (line 2)) (3.20.3) Requirement already satisfied: wcwidth<0.3.0,>=0.2.12 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from ftfy->-r requirements.txt (line 5)) (0.2.13) Requirement already satisfied: requests>=2.26.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from tiktoken->-r requirements.txt (line 8)) (2.31.0) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken->-r requirements.txt (line 8)) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken->-r requirements.txt (line 8)) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken->-r requirements.txt (line 8)) (2.2.1) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages (from requests>=2.26.0->tiktoken->-r requirements.txt (line 8)) (2024.2.2) Installing collected packages: tensorboardX, regex, ftfy, einops, tiktoken Successfully installed einops-0.8.0 ftfy-6.2.0 regex-2024.5.15 tensorboardX-2.6.2.2 tiktoken-0.7.0

(二) 数据准备

data目录下准备了训练需要的demo数据数据使用文本文件，每行三列，tab分割，分别为：

图片id
图片base64编码
图片文字描述，用于训练和测试

图片和文字描述示例如下图所示（本次比赛提供的文本都是中文），文字描述通常包括了对图片中各个主体（人物的外貌、衣着、表情、物体颜色）、主体之间关系、背景、风格等细粒度描述。

www.zeeklog.com - 2024百度商业AI技术创新大赛赛道二：广告图片描述生成 AI Studio

In [4]

# 复制训练集到/home/aistudio/work/dataset !mkdir /home/aistudio/work/dataset !cp -a /home/aistudio/data/data268898/train_samples.txt /home/aistudio/work/dataset/ # 转换标注labels到对话格式 %cd /home/aistudio/work/paddlemix !python tools/convert_labels.py ../dataset/train_samples.txt ../dataset/train_demo # 查看转换后的数据文件 %ls ../dataset/train_demo

/home/aistudio/work/paddlemix num of training set: 941 num of validation set: 50 chat_template.json train.json val.json

转换成对话格式的数据格式如下：

{ [ "id": "identity_0", "conversations": [ [ prompt, 图片文字描述 ] "image": 图片base64编码 ], ] }

注：train_samples.txt是demo训练数据，数据量较少；

正式比赛的训练数据是caption_train_data.tar.gz，可在同一目录下获取，解压后使用。

(三) 模型训练和推理

A. 模型介绍

In [ ]

#获取预训练权重 %cd /home/aistudio/work !mkdir weights %cd weights !cp -a /home/aistudio/data/data274378/pretrained_models.zip . !nohup unzip pretrained_models.zip !rm -rf pretrained_models.zip

推理示例图片

In [5]

#运行一下在单卡GPU环境下描述一张图片的程序，推理需要23G左右显存 %cd /home/aistudio/work/paddlemix !python chat.py ../weights/pretrained_models/mgen-vl-chat-7b

In [ ]

""" 推理脚本chat.py ***此处为推理过程讲解代码，不需要运行；运行上面cell的命令即可*** """ import os import sys import paddle import time import random import numpy as np from PIL import Image from io import BytesIO import base64 from PIL import Image, ImageFile ImageFile.LOAD_TRUNCATED_IMAGES = True # 允许加载截断的图像 from auto import ( AutoConfigMIX, AutoModelMIX, AutoProcessorMIX, AutoTokenizerMIX, ) from utils.log import logger seed = 24 paddle.seed(seed) random.seed(seed) np.random.seed(seed) dtype = "bfloat16" #dtype = "float16" if not paddle.amp.is_bfloat16_supported(): logger.warning("bfloat16 is not supported on your device,change to float16") dtype = "float16" model_name_or_path = sys.argv[1] tokenizer = AutoTokenizerMIX.from_pretrained(model_name_or_path) processor, _ = AutoProcessorMIX.from_pretrained(model_name_or_path) model_config = AutoConfigMIX.from_pretrained(model_name_or_path, dtype=dtype) model = AutoModelMIX.from_pretrained(model_name_or_path, config=model_config, dtype=dtype) model.eval() prompt = "请描述图片内容" start = time.time() query1 = [ {"image": "https://bj.bcebos.com/v1/paddlenlp/models/community/GroundingDino/000000004505.jpg"}, {"text": prompt}, ] input = processor(query=query1, return_tensors="pd") query1 = tokenizer.from_list_format(query1) response, history = model.chat(tokenizer, query=query1, history=None, images=input["images"]) response = response.replace("\n", " ").replace("\r", " ") print("prompt: %s" % prompt) print("response: %s" % response) print("------------------") end = time.time() length = end - start print("It took", length, "seconds!")

模型结构

基线模型是一个多模态大语言模型，主要包括三部分：图像编码器、图像与文本连接器以及大语言模型LLM。

我们在paddlemix代码库中提供了基线模型MGen-VL-7B，示例如下（详细代码可在paddlemix/models/mgen_vl目录查看）:

In [ ]

""" MGen-VL-7B模型 ***此处为模型结构的讲解代码，不需要运行*** """ import paddle from paddlenlp.generation import GenerationConfig from paddlenlp.transformers import AutoConfig, AutoModel, PretrainedTokenizer from paddlenlp.transformers.model_outputs import ( BaseModelOutputWithPast, CausalLMOutputWithPast, ) from paddlenlp.transformers.model_utils import PretrainedModel from paddlenlp.transformers.mgen.modeling import MGenPretrainedModel from .visual import Vision class MGen(PretrainedModel): def __init__(self, config): super().__init__(config) llm_config = AutoConfig.from_pretrained(config.llm_pretrained_model_name_or_path) self.llm = AutoModel.from_config(config=llm_config, dtype=config.dtype) class MGenLMHeadModel(MGenPretrainedModel): def __init__(self, config): super().__init__(config) self.visual = Vision(config.visual) self.transformer = MGen(config) self.lm_head = paddle.nn.Linear( in_features=config.hidden_size, out_features=config.vocab_size, bias_attr=False )

B. 修改比赛的训练配置文件

训练配置包括数据配置、热启模型路径、训练超参、模型输出路径等，选手们可根据实际训练需要修改

In [3]

%cd /home/aistudio/work/paddlemix # 训练配置包括全量sft微调，lora微调等 # 查看lora微调训练配置。batch-size默认为1，lora微调显存大概占用25G !cat config/mgen_vl/lora_sft_argument.json

/home/aistudio/work/paddlemix { "model_name_or_path": "../weights/pretrained_models/mgen-vl-chat-7b", "dataset": { "train":[{"name": "chatml_dataset", "data_files": "../dataset/train_demo/train.json","chat_template":"../dataset/train_demo/chat_template.json"}], "eval": [{"name": "chatml_dataset", "data_files": "../dataset/train_demo/val.json","chat_template":"../dataset/train_demo/chat_template.json"}] }, "mixtoken": false, "output_dir": "../ckpt/mgen_vl_lora_sft_ckpts", "overwrite_output_dir": true, "remove_unused_columns": false, "per_device_train_batch_size": 1, "gradient_accumulation_steps": 16, "per_device_eval_batch_size": 1, "eval_accumulation_steps":16, "num_train_epochs": 1, "learning_rate": 1e-05, "weight_decay": 0.1, "adam_beta2": 0.95, "warmup_ratio": 0.01, "lr_scheduler_type": "cosine", "logging_steps": 1, "save_steps": 10, "max_steps": 10, "evaluation_strategy": "epoch", "save_strategy": "steps", "max_length": 2048, "bf16": false, "fp16": true, "fp16_opt_level": "O2", "do_train": true, "do_eval": false, "disable_tqdm": true, "load_best_model_at_end": false, "eval_with_do_generation": false, "skip_memory_metrics": false, "benchmark": false, "save_total_limit": 2, "freeze_include": ["*visual*"], "freeze_exclude": ["*visual.attn_pool*"], "lora": true, "lora_rank": 64, "lora_alpha": 16, "lora_dropout": 0.05, "lora_target_modules":[ ".*attn.c_attn.*", ".*attn.c_proj.*", ".*mlp.w1.*", ".*mlp.w2.*"] }

参数说明

选手可自定义修改的部分训练参数：

"model_name_or_path" #设置实际使用的模型名称或模型路径 "dataset": { "train":[{"name": "chatml_dataset", "data_files": "train.json"}], "eval": [{"name": "chatml_dataset", "data_files": "val.json"}] }, #数据集配置 "output_dir": #模型存储路径 "overwrite_output_dir": # 覆盖输出目录，默认False "per_device_train_batch_size": #训练batch大小 “gradient_accumulation_steps”: #在执行backward更新过程之前，用于累积梯度的更新步骤数。 "per_device_eval_batch_size" #评估batch大小 "eval_accumulation_steps" : 评估累积步数 "save_strategy": #训练期间要采用保存模型策略。可选择： #“no”：在训练期间不进行任何保存。 #“epoch”：每个epoch后保存。 #“steps”：每“Save_steps”保存一次。 "save_steps": #每多少个steps保存一次模型 "max_steps": #最大训练steps "save_total_limit": #最多保存多少个模型 "evaluation_strategy": #评估策略。可选择： #“no”：在训练期间不进行任何评估。 #“epoch”：每个epoch后评估。 #“steps”：每“eval_steps”评估一次。 "do_train": #是否进行训练 "bf16": #是否使用bf16训练，默认False，仅支持a100,h100 "fp16": #是使用fp16训练，默认True "fp16_opt_level": #混合精度训练等级，可选O1,O2 "learning_rate": #学习率 "adam_beta2": #optimizer中beta2参数 "warmup_ratio": #学习率warm up比例 "weight_decay": #权重衰减 "lr_scheduler_type": #学习率衰减策略，可选cosine、linear "logging_steps": #日志打印间隔 "max_length": #模型最大长度，默认2048 "benchmark": #是否开启benchmark模式，默认False "skip_memory_metrics": #是否跳过内存指标，默认False "freeze_include" #设置需要冻结的层，如["*visual*"]，默认None "freeze_exclude" #设置不需要冻结的层，如["*visual.attn_pool*"]，默认None "lora": #是否使用LoRA策略，默认False "lora_rank": #LoRA rank "lora_alpha": #LoRA alpha "lora_dropout": #LoRA dropout "lora_target_modules": #LoRA target modules,如 [ ".*attn.c_attn.*", ".*attn.c_proj.*", ".*mlp.w1.*", ".*mlp.w2.*"] "tensor_parallel_degree": # 模型并行系数，设置为N则进行N卡间模型并行。 "sharding_parallel_degree": #显存优化策略，可选参数。详情参考 [《ZeRO: Memory Optimizations Toward Training Trillion Parameter Models》]（https://arxiv.org/abs/1910.02054） "sharding": #显存优化策略stage选择，目前支持stage1、stage2。 "pipeline_parallel_degree": #流水线并行。详情参考[飞桨大语言模型工具链]（https://github.com/PaddlePaddle/PaddleNLP/blob/develop/llm/README.md）

C. 模型训练

In [4]

# 下面使用已经写好的配置文件进行训练 # lora微调需要25g左右显存，sft训练则需要大于32g显存 %cd /home/aistudio/work/paddlemix #单卡lora微调训练 !python supervised_finetune.py config/mgen_vl/lora_sft_argument.json

/home/aistudio/work/paddlemix /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2024-06-03 16:11:13,755] [ INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-). [2024-06-03 16:11:13,756] [ INFO] - ============================================================ [2024-06-03 16:11:13,756] [ INFO] - Model Configuration Arguments [2024-06-03 16:11:13,756] [ INFO] - paddle commit id : fbf852dd832bc0e63ae31cd4aa37defd829e4c03 [2024-06-03 16:11:13,756] [ INFO] - paddlenlp commit id : adf9e6fac8a915327f72dc2543242bd14ed9580d [2024-06-03 16:11:13,756] [ INFO] - aistudio_repo_id : None [2024-06-03 16:11:13,756] [ INFO] - aistudio_repo_license : Apache License 2.0 [2024-06-03 16:11:13,756] [ INFO] - aistudio_repo_private : True [2024-06-03 16:11:13,757] [ INFO] - aistudio_token : None [2024-06-03 16:11:13,757] [ INFO] - freeze_exclude : ['*visual.attn_pool*'] [2024-06-03 16:11:13,757] [ INFO] - freeze_include : ['*visual*'] [2024-06-03 16:11:13,757] [ INFO] - from_aistudio : False [2024-06-03 16:11:13,757] [ INFO] - lora : True [2024-06-03 16:11:13,757] [ INFO] - lora_alpha : 16 [2024-06-03 16:11:13,757] [ INFO] - lora_dropout : 0.05 [2024-06-03 16:11:13,757] [ INFO] - lora_path : None [2024-06-03 16:11:13,757] [ INFO] - lora_rank : 64 [2024-06-03 16:11:13,757] [ INFO] - lora_target_modules : ['.*attn.c_attn.*', '.*attn.c_proj.*', '.*mlp.w1.*', '.*mlp.w2.*'] [2024-06-03 16:11:13,757] [ INFO] - model_name_or_path : ../weights/pretrained_models/mgen-vl-chat-7b [2024-06-03 16:11:13,757] [ INFO] - neftune : False [2024-06-03 16:11:13,757] [ INFO] - neftune_noise_alpha : 5.0 [2024-06-03 16:11:13,757] [ INFO] - num_prefix_tokens : 128 [2024-06-03 16:11:13,757] [ INFO] - prefix_tuning : False [2024-06-03 16:11:13,757] [ INFO] - save_to_aistudio : False [2024-06-03 16:11:13,757] [ INFO] - text_model_name_or_path : None [2024-06-03 16:11:13,757] [ INFO] - use_flash_attention : False [2024-06-03 16:11:13,757] [ INFO] - [2024-06-03 16:11:13,758] [ INFO] - ============================================================ [2024-06-03 16:11:13,758] [ INFO] - Data Configuration Arguments [2024-06-03 16:11:13,758] [ INFO] - paddle commit id : fbf852dd832bc0e63ae31cd4aa37defd829e4c03 [2024-06-03 16:11:13,758] [ INFO] - paddlenlp commit id : adf9e6fac8a915327f72dc2543242bd14ed9580d [2024-06-03 16:11:13,758] [ INFO] - chat_template : None [2024-06-03 16:11:13,758] [ INFO] - dataset : {'train': [{'name': 'chatml_dataset', 'data_files': '../dataset/train_demo/train.json', 'chat_template': '../dataset/train_demo/chat_template.json'}], 'eval': [{'name': 'chatml_dataset', 'data_files': '../dataset/train_demo/val.json', 'chat_template': '../dataset/train_demo/chat_template.json'}]} [2024-06-03 16:11:13,758] [ INFO] - eval_with_do_generation : False [2024-06-03 16:11:13,758] [ INFO] - lazy : False [2024-06-03 16:11:13,758] [ INFO] - max_length : 2048 [2024-06-03 16:11:13,758] [ INFO] - mixtoken : False [2024-06-03 16:11:13,758] [ INFO] - save_generation_output : False [2024-06-03 16:11:13,758] [ INFO] - splits : None [2024-06-03 16:11:13,758] [ INFO] - src_length : 1024 [2024-06-03 16:11:13,758] [ INFO] - task_name : None [2024-06-03 16:11:13,758] [ INFO] - [2024-06-03 16:11:13,758] [ INFO] - ============================================================ [2024-06-03 16:11:13,758] [ INFO] - Generation Configuration Arguments [2024-06-03 16:11:13,759] [ INFO] - paddle commit id : fbf852dd832bc0e63ae31cd4aa37defd829e4c03 [2024-06-03 16:11:13,759] [ INFO] - paddlenlp commit id : adf9e6fac8a915327f72dc2543242bd14ed9580d [2024-06-03 16:11:13,759] [ INFO] - top_k : 1 [2024-06-03 16:11:13,759] [ INFO] - top_p : 1.0 [2024-06-03 16:11:13,759] [ INFO] - [2024-06-03 16:11:13,759] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True [2024-06-03 16:11:13,759] [ INFO] - We are using <class 'paddlenlp.transformers.mgen.configuration.MGenConfig'> to load '../weights/pretrained_models/mgen-vl-chat-7b'. [2024-06-03 16:11:13,759] [ INFO] - Loading configuration file ../weights/pretrained_models/mgen-vl-chat-7b/config.json [2024-06-03 16:11:13,760] [ INFO] - We are using <class 'models.mgen_vl.modeling.MGenLMHeadModel'> to load '../weights/pretrained_models/mgen-vl-chat-7b'. [2024-06-03 16:11:13,764] [ INFO] - Loading weights file ../weights/pretrained_models/mgen-vl-chat-7b/model_state.pdparams [2024-06-03 16:11:48,504] [ INFO] - Loaded weights file from disk, setting weights to model. W0603 16:11:48.516564 80910 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8 W0603 16:11:48.517910 80910 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9. [2024-06-03 16:11:54,896] [ INFO] - Downloading config.json from https://bj.bcebos.com/paddlenlp/models/community/mgen-7b/config.json 100%|██████████████████████████████████████████| 867/867 [00:00<00:00, 4.97MB/s] [2024-06-03 16:11:54,962] [ INFO] - We are using <class 'paddlenlp.transformers.mgen.configuration.MGenConfig'> to load 'mgen-7b'. [2024-06-03 16:11:55,010] [ INFO] - Found /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 16:11:55,011] [ INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 16:13:03,448] [ INFO] - All model checkpoint weights were used when initializing MGenLMHeadModel. [2024-06-03 16:13:03,449] [ INFO] - All the weights of MGenLMHeadModel were initialized from the model checkpoint at ../weights/pretrained_models/mgen-vl-chat-7b. If your task is similar to the task the model of the checkpoint was trained on, you can already use MGenLMHeadModel for predictions without further training. [2024-06-03 16:13:03,455] [ INFO] - Loading configuration file ../weights/pretrained_models/mgen-vl-chat-7b/generation_config.json [2024-06-03 16:13:04,351] [ INFO] - Freeze parameters: ['*visual*'] and exclude parameters: ['*visual.attn_pool*'] [2024-06-03 16:13:04,358] [ INFO] - Freeze parameters successfully. [2024-06-03 16:13:04,360] [ INFO] - We are using <class 'models.mgen_vl.tokenizer.MGenVLTokenizer'> to load '../weights/pretrained_models/mgen-vl-chat-7b'. [2024-06-03 16:13:05,173] [ INFO] - We are using <class 'models.mgen_vl.tokenizer.MGenVLTokenizer'> to load '../weights/pretrained_models/mgen-vl-chat-7b'. [2024-06-03 16:13:06,272] [ INFO] - Frozen parameters: 9.66e+09 || Trainable parameters:1.12e+08 || Total parameters:9.77e+09|| Trainable:1.15% [2024-06-03 16:13:06,487] [ INFO] - The global seed is set to 42, local seed is set to 43 and random seed is set to 42. [2024-06-03 16:13:06,545] [ INFO] - max_steps is given, it will override any value given in num_train_epochs [2024-06-03 16:13:06,545] [ INFO] - Using half precision [2024-06-03 16:13:06,583] [ INFO] - ============================================================ [2024-06-03 16:13:06,583] [ INFO] - Training Configuration Arguments [2024-06-03 16:13:06,583] [ INFO] - paddle commit id : fbf852dd832bc0e63ae31cd4aa37defd829e4c03 [2024-06-03 16:13:06,584] [ INFO] - paddlenlp commit id : adf9e6fac8a915327f72dc2543242bd14ed9580d [2024-06-03 16:13:06,584] [ INFO] - _no_sync_in_gradient_accumulation: True [2024-06-03 16:13:06,584] [ INFO] - adam_beta1 : 0.9 [2024-06-03 16:13:06,584] [ INFO] - adam_beta2 : 0.95 [2024-06-03 16:13:06,584] [ INFO] - adam_epsilon : 1e-08 [2024-06-03 16:13:06,584] [ INFO] - amp_custom_black_list : None [2024-06-03 16:13:06,584] [ INFO] - amp_custom_white_list : None [2024-06-03 16:13:06,584] [ INFO] - amp_master_grad : False [2024-06-03 16:13:06,584] [ INFO] - benchmark : False [2024-06-03 16:13:06,585] [ INFO] - bf16 : False [2024-06-03 16:13:06,585] [ INFO] - bf16_full_eval : False [2024-06-03 16:13:06,585] [ INFO] - current_device : gpu:0 [2024-06-03 16:13:06,585] [ INFO] - data_parallel_rank : 0 [2024-06-03 16:13:06,585] [ INFO] - dataloader_drop_last : False [2024-06-03 16:13:06,585] [ INFO] - dataloader_num_workers : 0 [2024-06-03 16:13:06,585] [ INFO] - dataset_rank : 0 [2024-06-03 16:13:06,585] [ INFO] - dataset_world_size : 1 [2024-06-03 16:13:06,585] [ INFO] - device : gpu [2024-06-03 16:13:06,585] [ INFO] - disable_tqdm : True [2024-06-03 16:13:06,585] [ INFO] - distributed_dataloader : False [2024-06-03 16:13:06,585] [ INFO] - do_eval : True [2024-06-03 16:13:06,585] [ INFO] - do_export : False [2024-06-03 16:13:06,586] [ INFO] - do_predict : False [2024-06-03 16:13:06,586] [ INFO] - do_train : True [2024-06-03 16:13:06,586] [ INFO] - eta_min : 1e-05 [2024-06-03 16:13:06,586] [ INFO] - eval_accumulation_steps : 16 [2024-06-03 16:13:06,586] [ INFO] - eval_batch_size : 1 [2024-06-03 16:13:06,586] [ INFO] - eval_steps : None [2024-06-03 16:13:06,586] [ INFO] - evaluation_strategy : IntervalStrategy.EPOCH [2024-06-03 16:13:06,586] [ INFO] - flatten_param_grads : False [2024-06-03 16:13:06,586] [ INFO] - force_reshard_pp : False [2024-06-03 16:13:06,586] [ INFO] - fp16 : True [2024-06-03 16:13:06,586] [ INFO] - fp16_full_eval : False [2024-06-03 16:13:06,586] [ INFO] - fp16_opt_level : O2 [2024-06-03 16:13:06,587] [ INFO] - gradient_accumulation_steps : 16 [2024-06-03 16:13:06,587] [ INFO] - greater_is_better : None [2024-06-03 16:13:06,587] [ INFO] - group_by_modality_length : False [2024-06-03 16:13:06,587] [ INFO] - hybrid_parallel_topo_order : None [2024-06-03 16:13:06,587] [ INFO] - ignore_data_skip : False [2024-06-03 16:13:06,587] [ INFO] - ignore_load_lr_and_optim : False [2024-06-03 16:13:06,587] [ INFO] - label_names : None [2024-06-03 16:13:06,587] [ INFO] - lazy_data_processing : True [2024-06-03 16:13:06,587] [ INFO] - learning_rate : 1e-05 [2024-06-03 16:13:06,587] [ INFO] - load_best_model_at_end : False [2024-06-03 16:13:06,587] [ INFO] - load_sharded_model : False [2024-06-03 16:13:06,587] [ INFO] - local_process_index : 0 [2024-06-03 16:13:06,587] [ INFO] - local_rank : -1 [2024-06-03 16:13:06,588] [ INFO] - log_level : -1 [2024-06-03 16:13:06,588] [ INFO] - log_level_replica : -1 [2024-06-03 16:13:06,588] [ INFO] - log_on_each_node : True [2024-06-03 16:13:06,588] [ INFO] - logging_dir : ../ckpt/mgen_vl_lora_sft_ckpts/runs/Jun03_16-11-13_jupyter-248974-7742098 [2024-06-03 16:13:06,588] [ INFO] - logging_first_step : False [2024-06-03 16:13:06,588] [ INFO] - logging_steps : 1 [2024-06-03 16:13:06,588] [ INFO] - logging_strategy : IntervalStrategy.STEPS [2024-06-03 16:13:06,588] [ INFO] - logical_process_index : 0 [2024-06-03 16:13:06,588] [ INFO] - lr_end : 1e-07 [2024-06-03 16:13:06,588] [ INFO] - lr_scheduler_name : CosineDecayWithWarmup [2024-06-03 16:13:06,588] [ INFO] - lr_scheduler_type : SchedulerType.COSINE [2024-06-03 16:13:06,589] [ INFO] - max_evaluate_steps : -1 [2024-06-03 16:13:06,589] [ INFO] - max_grad_norm : 1.0 [2024-06-03 16:13:06,589] [ INFO] - max_steps : 10 [2024-06-03 16:13:06,589] [ INFO] - metric_for_best_model : None [2024-06-03 16:13:06,589] [ INFO] - minimum_eval_times : None [2024-06-03 16:13:06,589] [ INFO] - mm_projector_lr : None [2024-06-03 16:13:06,589] [ INFO] - no_cuda : False [2024-06-03 16:13:06,589] [ INFO] - num_cycles : 0.5 [2024-06-03 16:13:06,589] [ INFO] - num_train_epochs : 1 [2024-06-03 16:13:06,589] [ INFO] - optim : OptimizerNames.ADAMW [2024-06-03 16:13:06,589] [ INFO] - optimizer_name_suffix : None [2024-06-03 16:13:06,589] [ INFO] - output_dir : ../ckpt/mgen_vl_lora_sft_ckpts [2024-06-03 16:13:06,589] [ INFO] - overwrite_output_dir : True [2024-06-03 16:13:06,590] [ INFO] - past_index : -1 [2024-06-03 16:13:06,590] [ INFO] - per_device_eval_batch_size : 1 [2024-06-03 16:13:06,590] [ INFO] - per_device_train_batch_size : 1 [2024-06-03 16:13:06,590] [ INFO] - pipeline_parallel_config : [2024-06-03 16:13:06,590] [ INFO] - pipeline_parallel_degree : -1 [2024-06-03 16:13:06,590] [ INFO] - pipeline_parallel_rank : 0 [2024-06-03 16:13:06,590] [ INFO] - power : 1.0 [2024-06-03 16:13:06,590] [ INFO] - prediction_loss_only : False [2024-06-03 16:13:06,590] [ INFO] - process_index : 0 [2024-06-03 16:13:06,590] [ INFO] - profiler_options : None [2024-06-03 16:13:06,590] [ INFO] - recompute : False [2024-06-03 16:13:06,590] [ INFO] - remove_unused_columns : False [2024-06-03 16:13:06,590] [ INFO] - report_to : ['visualdl'] [2024-06-03 16:13:06,591] [ INFO] - resume_from_checkpoint : None [2024-06-03 16:13:06,591] [ INFO] - run_name : ../ckpt/mgen_vl_lora_sft_ckpts [2024-06-03 16:13:06,591] [ INFO] - save_on_each_node : False [2024-06-03 16:13:06,591] [ INFO] - save_sharded_model : False [2024-06-03 16:13:06,591] [ INFO] - save_steps : 10 [2024-06-03 16:13:06,591] [ INFO] - save_strategy : IntervalStrategy.STEPS [2024-06-03 16:13:06,591] [ INFO] - save_total_limit : 2 [2024-06-03 16:13:06,591] [ INFO] - scale_loss : 32768 [2024-06-03 16:13:06,591] [ INFO] - seed : 42 [2024-06-03 16:13:06,591] [ INFO] - sep_parallel_degree : -1 [2024-06-03 16:13:06,591] [ INFO] - sharding : [] [2024-06-03 16:13:06,591] [ INFO] - sharding_degree : -1 [2024-06-03 16:13:06,591] [ INFO] - sharding_parallel_config : [2024-06-03 16:13:06,591] [ INFO] - sharding_parallel_degree : -1 [2024-06-03 16:13:06,592] [ INFO] - sharding_parallel_rank : 0 [2024-06-03 16:13:06,592] [ INFO] - should_load_dataset : True [2024-06-03 16:13:06,592] [ INFO] - should_load_sharding_stage1_model: False [2024-06-03 16:13:06,592] [ INFO] - should_log : True [2024-06-03 16:13:06,592] [ INFO] - should_save : True [2024-06-03 16:13:06,592] [ INFO] - should_save_model_state : True [2024-06-03 16:13:06,592] [ INFO] - should_save_sharding_stage1_model: False [2024-06-03 16:13:06,592] [ INFO] - skip_memory_metrics : False [2024-06-03 16:13:06,592] [ INFO] - skip_profile_timer : True [2024-06-03 16:13:06,592] [ INFO] - tensor_parallel_config : [2024-06-03 16:13:06,592] [ INFO] - tensor_parallel_degree : -1 [2024-06-03 16:13:06,592] [ INFO] - tensor_parallel_rank : 0 [2024-06-03 16:13:06,593] [ INFO] - to_static : False [2024-06-03 16:13:06,593] [ INFO] - train_batch_size : 1 [2024-06-03 16:13:06,593] [ INFO] - unified_checkpoint : False [2024-06-03 16:13:06,593] [ INFO] - unified_checkpoint_config : [2024-06-03 16:13:06,593] [ INFO] - use_auto_parallel : False [2024-06-03 16:13:06,593] [ INFO] - use_hybrid_parallel : False [2024-06-03 16:13:06,593] [ INFO] - warmup_ratio : 0.01 [2024-06-03 16:13:06,593] [ INFO] - warmup_start_lr : 1e-06 [2024-06-03 16:13:06,593] [ INFO] - warmup_steps : 0 [2024-06-03 16:13:06,593] [ INFO] - weight_decay : 0.1 [2024-06-03 16:13:06,593] [ INFO] - weight_name_suffix : None [2024-06-03 16:13:06,593] [ INFO] - world_size : 1 [2024-06-03 16:13:06,593] [ INFO] - [2024-06-03 16:13:06,796] [ INFO] - Starting training from resume_from_checkpoint : None /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/distributed/parallel.py:410: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card. warnings.warn( [2024-06-03 16:13:07,009] [ INFO] - ***** Running training ***** [2024-06-03 16:13:07,009] [ INFO] - Num examples = 941 [2024-06-03 16:13:07,009] [ INFO] - Num Epochs = 1 [2024-06-03 16:13:07,009] [ INFO] - Instantaneous batch size per device = 1 [2024-06-03 16:13:07,009] [ INFO] - Total train batch size (w. parallel, distributed & accumulation) = 16 [2024-06-03 16:13:07,010] [ INFO] - Gradient Accumulation steps = 16 [2024-06-03 16:13:07,010] [ INFO] - Total optimization steps = 10 [2024-06-03 16:13:07,010] [ INFO] - Total num train samples = 160 [2024-06-03 16:13:07,014] [ INFO] - Number of trainable parameters = 112,197,632 (per device) [2024-06-03 16:13:40,289] [ INFO] - loss: 4.10069227, learning_rate: 9.755e-06, global_step: 1, interval_runtime: 33.2723, interval_samples_per_second: 0.48088113648145003, interval_steps_per_second: 0.030055071030090627, cpu_mem_used: 3597, cpu_mem_used_peak: 3597, gpu_max_memory_allocated: 39711, gpu_max_memory_reserved: 22892, epoch: 0.017 [2024-06-03 16:14:12,932] [ INFO] - loss: 3.83077121, learning_rate: 9.045e-06, global_step: 2, interval_runtime: 32.6433, interval_samples_per_second: 0.49014685168038125, interval_steps_per_second: 0.030634178230023828, cpu_mem_used: 3599, cpu_mem_used_peak: 3599, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 22892, epoch: 0.034 [2024-06-03 16:14:45,624] [ INFO] - loss: 4.09264421, learning_rate: 7.939e-06, global_step: 3, interval_runtime: 32.6919, interval_samples_per_second: 0.4894185031432671, interval_steps_per_second: 0.030588656446454195, cpu_mem_used: 3601, cpu_mem_used_peak: 3601, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 24079, epoch: 0.051 [2024-06-03 16:15:18,206] [ INFO] - loss: 3.98077416, learning_rate: 6.545e-06, global_step: 4, interval_runtime: 32.5826, interval_samples_per_second: 0.49105903631278036, interval_steps_per_second: 0.030691189769548773, cpu_mem_used: 3602, cpu_mem_used_peak: 3602, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 24079, epoch: 0.068 [2024-06-03 16:15:50,844] [ INFO] - loss: 4.17841101, learning_rate: 5e-06, global_step: 5, interval_runtime: 32.6377, interval_samples_per_second: 0.490230646721887, interval_steps_per_second: 0.030639415420117937, cpu_mem_used: 3604, cpu_mem_used_peak: 3604, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 24079, epoch: 0.085 [2024-06-03 16:16:23,506] [ INFO] - loss: 4.15968752, learning_rate: 3.455e-06, global_step: 6, interval_runtime: 32.6626, interval_samples_per_second: 0.4898575328425619, interval_steps_per_second: 0.03061609580266012, cpu_mem_used: 3606, cpu_mem_used_peak: 3606, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 24079, epoch: 0.102 [2024-06-03 16:16:56,197] [ INFO] - loss: 4.08507586, learning_rate: 2.061e-06, global_step: 7, interval_runtime: 32.6903, interval_samples_per_second: 0.48944121553963, interval_steps_per_second: 0.030590075971226875, cpu_mem_used: 3607, cpu_mem_used_peak: 3607, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 24079, epoch: 0.119 [2024-06-03 16:17:28,905] [ INFO] - loss: 4.09213161, learning_rate: 9.549e-07, global_step: 8, interval_runtime: 32.7079, interval_samples_per_second: 0.4891788932419736, interval_steps_per_second: 0.03057368082762335, cpu_mem_used: 3609, cpu_mem_used_peak: 3609, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 24079, epoch: 0.136 [2024-06-03 16:18:01,683] [ INFO] - loss: 4.00875616, learning_rate: 2.447e-07, global_step: 9, interval_runtime: 32.7787, interval_samples_per_second: 0.48812212421484724, interval_steps_per_second: 0.030507632763427953, cpu_mem_used: 3611, cpu_mem_used_peak: 3611, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 24079, epoch: 0.153 [2024-06-03 16:18:34,424] [ INFO] - loss: 4.09759045, learning_rate: 0.0, global_step: 10, interval_runtime: 32.7402, interval_samples_per_second: 0.48869639076058796, interval_steps_per_second: 0.030543524422536748, cpu_mem_used: 3612, cpu_mem_used_peak: 3612, gpu_max_memory_allocated: 40997, gpu_max_memory_reserved: 24079, epoch: 0.17 [2024-06-03 16:18:34,424] [ INFO] - Saving model checkpoint to ../ckpt/mgen_vl_lora_sft_ckpts/checkpoint-10 [2024-06-03 16:18:34,425] [ INFO] - tokenizer config file saved in ../ckpt/mgen_vl_lora_sft_ckpts/checkpoint-10/tokenizer_config.json [2024-06-03 16:18:34,425] [ INFO] - Special tokens file saved in ../ckpt/mgen_vl_lora_sft_ckpts/checkpoint-10/special_tokens_map.json [2024-06-03 16:18:34,441] [ INFO] - Chat-template config file saved in ../ckpt/mgen_vl_lora_sft_ckpts/checkpoint-10/chat_template.json [2024-06-03 16:18:35,168] [ INFO] - Configuration saved in ../ckpt/mgen_vl_lora_sft_ckpts/checkpoint-10/config.json [2024-06-03 16:18:35,168] [ INFO] - Saving optimizer files. [2024-06-03 16:18:39,763] [ INFO] - ***** Running Evaluation ***** [2024-06-03 16:18:39,763] [ INFO] - Num examples = 50 [2024-06-03 16:18:39,763] [ INFO] - Total prediction steps = 50 [2024-06-03 16:18:39,763] [ INFO] - Pre device batch size = 1 [2024-06-03 16:18:39,764] [ INFO] - Total Batch size = 1 [2024-06-03 16:19:12,510] [ INFO] - eval_loss: 4.002026557922363, eval_runtime: 32.7465, eval_samples_per_second: 1.5268818574332808, eval_steps_per_second: 1.5268818574332808, epoch: 0.17 [2024-06-03 16:19:12,511] [ INFO] - Training completed. [2024-06-03 16:19:12,789] [ INFO] - train_runtime: 365.497, train_samples_per_second: 0.43776011449488345, train_steps_per_second: 0.027360007155930215, train_loss: 4.06265344619751, gpu_mem_max_memory_allocated: 42988742656, gpu_mem_max_memory_reserved: 25248953088, init_mem_cpu_alloc_delta: 442368, init_mem_gpu_alloc_delta: -100686336, init_mem_cpu_peaked_delta: 0, init_mem_gpu_peaked_delta: 1120950784, train_mem_cpu_alloc_delta: 2769502208, train_mem_gpu_alloc_delta: 1497066496, train_mem_cpu_peaked_delta: 0, train_mem_gpu_peaked_delta: 2536745472, before_init_mem_cpu: 3508011008, before_init_mem_gpu: 39055617024, epoch: 0.17 [2024-06-03 16:19:12,790] [ INFO] - Saving model checkpoint to ../ckpt/mgen_vl_lora_sft_ckpts [2024-06-03 16:19:12,790] [ INFO] - tokenizer config file saved in ../ckpt/mgen_vl_lora_sft_ckpts/tokenizer_config.json [2024-06-03 16:19:12,791] [ INFO] - Special tokens file saved in ../ckpt/mgen_vl_lora_sft_ckpts/special_tokens_map.json [2024-06-03 16:19:12,803] [ INFO] - Chat-template config file saved in ../ckpt/mgen_vl_lora_sft_ckpts/chat_template.json [2024-06-03 16:19:13,370] [ INFO] - Configuration saved in ../ckpt/mgen_vl_lora_sft_ckpts/config.json [2024-06-03 16:19:13,371] [ INFO] - ***** train metrics ***** [2024-06-03 16:19:13,371] [ INFO] - before_init_mem_cpu = 3345MB [2024-06-03 16:19:13,371] [ INFO] - before_init_mem_gpu = 37246MB [2024-06-03 16:19:13,371] [ INFO] - epoch = 0.17 [2024-06-03 16:19:13,371] [ INFO] - gpu_mem_max_memory_allocated = 40997MB [2024-06-03 16:19:13,371] [ INFO] - gpu_mem_max_memory_reserved = 24079MB [2024-06-03 16:19:13,371] [ INFO] - init_mem_cpu_alloc_delta = 0MB [2024-06-03 16:19:13,371] [ INFO] - init_mem_cpu_peaked_delta = 0MB [2024-06-03 16:19:13,371] [ INFO] - init_mem_gpu_alloc_delta = -97MB [2024-06-03 16:19:13,371] [ INFO] - init_mem_gpu_peaked_delta = 1069MB [2024-06-03 16:19:13,371] [ INFO] - train_loss = 4.0627 [2024-06-03 16:19:13,371] [ INFO] - train_mem_cpu_alloc_delta = 2641MB [2024-06-03 16:19:13,372] [ INFO] - train_mem_cpu_peaked_delta = 0MB [2024-06-03 16:19:13,372] [ INFO] - train_mem_gpu_alloc_delta = 1427MB [2024-06-03 16:19:13,372] [ INFO] - train_mem_gpu_peaked_delta = 2419MB [2024-06-03 16:19:13,372] [ INFO] - train_runtime = 0:06:05.49 [2024-06-03 16:19:13,372] [ INFO] - train_samples_per_second = 0.4378 [2024-06-03 16:19:13,372] [ INFO] - train_steps_per_second = 0.0274

D. 模型推理

In [5]

# 训练完成后，可传入训练保存的权重进行模型推理 # 如果使用lora微调，需要合并lora参数，我们提供lora参数合并脚本，可以将lora参数合并到主干模型并保存相应的权重。 # 删除中间保存结果 %cd /home/aistudio/work !rm -rf ckpt/mgen_vl_lora_sft_ckpts/checkpoint-* # lora参数合并 %cd /home/aistudio/work/paddlemix !python merge_lora_params.py --model_name_or_path ../weights/pretrained_models/mgen-vl-chat-7b \ --lora_path ../ckpt/mgen_vl_lora_sft_ckpts \ --merge_model_path ../ckpt/mgen_vl_lora_merge !ls ../ckpt/mgen_vl_lora_merge # 合并lora参数后，可以加载训练合并后的权重进行模型推理 !python chat.py ../ckpt/mgen_vl_lora_merge

/home/aistudio/work /home/aistudio/work/paddlemix /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2024-06-03 16:19:39,206] [ INFO] - We are using <class 'paddlenlp.transformers.mgen.configuration.MGenConfig'> to load '../weights/pretrained_models/mgen-vl-chat-7b'. [2024-06-03 16:19:39,207] [ INFO] - Loading configuration file ../weights/pretrained_models/mgen-vl-chat-7b/config.json [2024-06-03 16:19:39,207] [ INFO] - We are using <class 'models.mgen_vl.modeling.MGenLMHeadModel'> to load '../weights/pretrained_models/mgen-vl-chat-7b'. [2024-06-03 16:19:39,208] [ INFO] - Loading weights file ../weights/pretrained_models/mgen-vl-chat-7b/model_state.pdparams [2024-06-03 16:21:54,116] [ INFO] - Loaded weights file from disk, setting weights to model. W0603 16:21:54.125304 96853 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8 W0603 16:21:54.126637 96853 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9. [2024-06-03 16:22:00,303] [ INFO] - Found /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 16:22:00,305] [ INFO] - We are using <class 'paddlenlp.transformers.mgen.configuration.MGenConfig'> to load 'mgen-7b'. [2024-06-03 16:22:00,357] [ INFO] - Found /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 16:22:00,358] [ INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 16:23:09,180] [ INFO] - All model checkpoint weights were used when initializing MGenLMHeadModel. [2024-06-03 16:23:09,181] [ INFO] - All the weights of MGenLMHeadModel were initialized from the model checkpoint at ../weights/pretrained_models/mgen-vl-chat-7b. If your task is similar to the task the model of the checkpoint was trained on, you can already use MGenLMHeadModel for predictions without further training. [2024-06-03 16:23:09,188] [ INFO] - Loading configuration file ../weights/pretrained_models/mgen-vl-chat-7b/generation_config.json [2024-06-03 16:23:11,553] [ INFO] - Loading the LoRA weights from ../ckpt/mgen_vl_lora_sft_ckpts/lora_model_state.pdparams [2024-06-03 16:23:11,748] [ INFO] - Load lora weight successfully [2024-06-03 16:23:16,255] [ INFO] - Configuration saved in ../ckpt/mgen_vl_lora_merge/config.json [2024-06-03 16:23:16,262] [ INFO] - Configuration saved in ../ckpt/mgen_vl_lora_merge/generation_config.json [2024-06-03 16:24:52,641] [ INFO] - The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at ../ckpt/mgen_vl_lora_merge/model_state.pdparams.index.json. cp -a ../ckpt/mgen_vl_lora_sft_ckpts/trainer_state.json ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/lora_config.json ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/tokenizer_config.json ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/SimSun.ttf ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/all_results.json ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/train_results.json ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/training_args.bin ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/chat_template.json ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/special_tokens_map.json ../ckpt/mgen_vl_lora_merge cp -a ../ckpt/mgen_vl_lora_sft_ckpts/mgen.tiktoken ../ckpt/mgen_vl_lora_merge --------merge lora successfully---------- SimSun.ttf model_state-00002-of-00002.pdparams all_results.json model_state.pdparams.index.json chat_template.json special_tokens_map.json config.json tokenizer_config.json generation_config.json train_results.json lora_config.json trainer_state.json mgen.tiktoken training_args.bin model_state-00001-of-00002.pdparams /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2024-06-03 16:25:00,465] [ WARNING] - bfloat16 is not supported on your device,change to float16 [2024-06-03 16:25:00,467] [ INFO] - We are using <class 'models.mgen_vl.tokenizer.MGenVLTokenizer'> to load '../ckpt/mgen_vl_lora_merge'. [2024-06-03 16:25:01,327] [ INFO] - We are using <class 'models.mgen_vl.tokenizer.MGenVLTokenizer'> to load '../ckpt/mgen_vl_lora_merge'. [2024-06-03 16:25:02,385] [ INFO] - We are using <class 'paddlenlp.transformers.mgen.configuration.MGenConfig'> to load '../ckpt/mgen_vl_lora_merge'. [2024-06-03 16:25:02,385] [ INFO] - Loading configuration file ../ckpt/mgen_vl_lora_merge/config.json [2024-06-03 16:25:02,386] [ INFO] - We are using <class 'models.mgen_vl.modeling.MGenLMHeadModel'> to load '../ckpt/mgen_vl_lora_merge'. [2024-06-03 16:25:02,387] [ INFO] - Loading weights file ../ckpt/mgen_vl_lora_merge/model_state.pdparams.index.json W0603 16:25:02.391919 106661 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8 W0603 16:25:02.393275 106661 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9. [2024-06-03 16:25:09,104] [ INFO] - Found /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 16:25:09,774] [ INFO] - We are using <class 'paddlenlp.transformers.mgen.configuration.MGenConfig'> to load 'mgen-7b'. [2024-06-03 16:25:09,826] [ INFO] - Found /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 16:25:09,827] [ INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/mgen-7b/config.json Loading checkpoint shards: 100%|██████████████████| 2/2 [02:02<00:00, 61.38s/it] [2024-06-03 16:27:41,779] [ INFO] - All model checkpoint weights were used when initializing MGenLMHeadModel. [2024-06-03 16:27:41,780] [ INFO] - All the weights of MGenLMHeadModel were initialized from the model checkpoint at ../ckpt/mgen_vl_lora_merge. If your task is similar to the task the model of the checkpoint was trained on, you can already use MGenLMHeadModel for predictions without further training. [2024-06-03 16:27:41,786] [ INFO] - Loading configuration file ../ckpt/mgen_vl_lora_merge/generation_config.json W0603 16:27:44.315424 106661 dygraph_functions.cc:52647] got different data type, run type protmotion automatically, this may cause data type been changed. prompt: 请描述图片内容 response: 这张图片展示了一条城市街道，路上有车辆和行人。一辆红色的 Beacon Bus 正在行驶，而一辆银色的汽车紧随其后。另外两辆汽车则停在距离红色巴士较远的地方。街道上还有一名行人，走在巴士和汽车之间。在场景的中心，有一个公交车站，有两个人在等待。其中一人站在巴士旁边，而另一人则在距离巴士约五米处等待。此外，还有一个停在路边的摩托车。这张图片描绘了一个繁忙的城市环境，交通在有条不紊地前进。 ------------------ It took 10.416214227676392 seconds!

In [6]

# 因模型参数较大，可以清理掉中间结果，节省空间（ai-studio项目空间限定为100G） %cd /home/aistudio/work !rm -rf ckpt/mgen_vl_lora_sft_ckpts !rm -rf weights/pretrained_models !rm -rf dataset/train_samples.txt

/home/aistudio/work

截至到这里，模型的微调训练和推理已经完成

(四) 准备提交结果

本次比赛要求选手提交一个submission.zip文件，里面包括模型权重，模型源码和推理脚本，如果需要额外的环境依赖，还需要提供环境依赖的文件 submission文件夹中应该包括：

| requirements.txt -> 环境依赖 | weights -> 模型权重 | run.sh -> 运行脚本 | src -> 模型源码 | |--predict.py -> 推理脚本 | |...

requirements.txt应包括选手需要的额外依赖；

选手训练完成的模型权重需复制到weights文件夹中；

src文件夹应包括模型源码和模型推理的脚本predict.py，以产出推理结果；

选手可自定义修改demo中的predict.py，以确保模型推理程序可以正常被调用并跑通；

评估平台会调用run.sh对评测数据集进行推理产生结果，运行脚本run.sh会调用predict.py，运行方式如下：

sh run.sh 【测试数据输入路径】 【结果输出路径】 【模型权重】 【python环境路径】

【python环境路径】选手在本地测试，可以使用ai-studio上默认的python环境路径，不用传入脚本

注意

选手需确保文件中的代码可以在评估环境中正常运行，且能够正常接收测试集输入，产出对应结果文件，否则无法得到可用成绩。
选手需确保输出的结果文件个数和测试集数量一致，否则无法得到可用成绩
选手需确保文件的名称和示例一致，并且模型权重中包含模型的配置文件config.json

In [12]

# 以数据集中的test_samples作为测试集以及lora训练的模型，准备submission文件跑一下生成结果的脚本 （比赛实际测试集不会提供给选手） %cd /home/aistudio/work !cp -a /home/aistudio/data/data268898/test_samples.txt /home/aistudio/work/dataset/ !mkdir submission !mkdir submission/src !cp -a paddlemix/* submission/src/ !cp -a paddlemix/requirements.txt submission/ !cp -a paddlemix/run.sh submission/ !mkdir submission/weights !mv -f ckpt/mgen_vl_lora_merge submission/weights/ %cd /home/aistudio/work/submission !sh run.sh ../../dataset/test_samples.txt ../../dataset/results.txt ../weights

/home/aistudio/work /home/aistudio/work/submission run.sh: 9: [[: not found /opt/conda/envs/python35-paddle120-env/bin/python /opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") bfloat16 is not supported on your device,change to float16 [2024-06-03 17:22:30,846] [ INFO] - We are using <class 'models.mgen_vl.tokenizer.MGenVLTokenizer'> to load '../weights/mgen_vl_lora_merge'. [2024-06-03 17:22:31,650] [ INFO] - We are using <class 'models.mgen_vl.tokenizer.MGenVLTokenizer'> to load '../weights/mgen_vl_lora_merge'. [2024-06-03 17:22:32,252] [ INFO] - We are using <class 'paddlenlp.transformers.mgen.configuration.MGenConfig'> to load '../weights/mgen_vl_lora_merge'. [2024-06-03 17:22:32,252] [ INFO] - Loading configuration file ../weights/mgen_vl_lora_merge/config.json [2024-06-03 17:22:32,253] [ INFO] - We are using <class 'models.mgen_vl.modeling.MGenLMHeadModel'> to load '../weights/mgen_vl_lora_merge'. [2024-06-03 17:22:32,254] [ INFO] - Loading weights file ../weights/mgen_vl_lora_merge/model_state.pdparams.index.json W0603 17:22:32.258863 206732 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8 W0603 17:22:32.260078 206732 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9. [2024-06-03 17:22:38,665] [ INFO] - Found /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 17:22:38,667] [ INFO] - We are using <class 'paddlenlp.transformers.mgen.configuration.MGenConfig'> to load 'mgen-7b'. [2024-06-03 17:22:38,731] [ INFO] - Found /home/aistudio/.paddlenlp/models/mgen-7b/config.json [2024-06-03 17:22:38,732] [ INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/mgen-7b/config.json Loading checkpoint shards: 100%|██████████████████| 2/2 [00:44<00:00, 22.39s/it] [2024-06-03 17:23:51,112] [ INFO] - All model checkpoint weights were used when initializing MGenLMHeadModel. [2024-06-03 17:23:51,112] [ INFO] - All the weights of MGenLMHeadModel were initialized from the model checkpoint at ../weights/mgen_vl_lora_merge. If your task is similar to the task the model of the checkpoint was trained on, you can already use MGenLMHeadModel for predictions without further training. [2024-06-03 17:23:51,119] [ INFO] - Loading configuration file ../weights/mgen_vl_lora_merge/generation_config.json ------predicting starts------ [2024-06-03 17:23:51,141] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:23:51,179] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data W0603 17:23:52.525509 206732 dygraph_functions.cc:52647] got different data type, run type protmotion automatically, this may cause data type been changed. 图中是一个铁质的圆形井盖，上面有圆形的孔洞，看起来比较厚实，铁质井盖中间的圆形井盖上也设计了孔洞，可以让雨水自由流通。 [2024-06-03 17:23:55,364] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:23:55,397] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 图中是一名咖啡师正在磨咖啡豆，另一名员工在吧台后面等待下一步的指令。 [2024-06-03 17:23:57,192] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:23:57,226] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 图中是一名女子正在挖掘一种黑色的粘土。 [2024-06-03 17:23:58,340] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:23:58,382] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 图中是一个小型的挖掘机，它被用于挖掘泥土。这个设备在建筑施工中非常有用，可以完成挖掘、铲土、铲运、开沟等作业。 [2024-06-03 17:24:01,026] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:24:01,061] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 图中画了一部动漫中的机器人，有双剑和双炮，机器人身体绿色和蓝色。 [2024-06-03 17:24:02,798] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:24:02,836] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 图中是一个学校里的学生在打篮球，四个学生在球场上进行篮球比赛，有一个学生把篮球投进了篮框，另外三个学生分别站在场地的四个角上。 [2024-06-03 17:24:05,609] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:24:05,637] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 图中是一个身穿白色V领连衣裙的短发女性，她面带微笑，右手撑着自己的头。 [2024-06-03 17:24:07,622] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:24:07,667] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 图中是一名农民正在用铁铲处理大量的白菜，白菜堆积如山，场面壮观。 [2024-06-03 17:24:09,258] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:24:09,301] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 图中有一瓶酒和一个装有烤肉的盘子，盘子上还有辣椒和花瓣点缀。 [2024-06-03 17:24:11,259] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data [2024-06-03 17:24:11,288] [ WARNING] - The content in <img>..</img> is too long,will use [self.image_pad_tag] * IMG_TOKEN_SPAN replace. make sure use MGenVLProcessor for get input data 画面上是一位男性和一位女性拥抱在一起，女性靠在男性身上，两人都带着黑色的帽子，背景是橙色，两人正面朝向彼此。 ------predicting finished-----

In [14]

# 推理结束后，可以查看推理结果dataset/results.txt %cd /home/aistudio/work !cat dataset/results.txt |head -n 5

/home/aistudio/work demo99 图中是一个铁质的圆形井盖，上面有圆形的孔洞，看起来比较厚实，铁质井盖中间的圆形井盖上也设计了孔洞，可以让雨水自由流通。 demo199 图中是一名咖啡师正在磨咖啡豆，另一名员工在吧台后面等待下一步的指令。 demo299 图中是一名女子正在挖掘一种黑色的粘土。 demo399 图中是一个小型的挖掘机，它被用于挖掘泥土。这个设备在建筑施工中非常有用，可以完成挖掘、铲土、铲运、开沟等作业。 demo499 图中画了一部动漫中的机器人，有双剑和双炮，机器人身体绿色和蓝色。

因为最终提交文件较大(接近20G)，选手在平台打包、下载时可能会出现速度较慢或卡顿的情况，建议按照以下方式进行：

先下载大文件，如模型文件夹下以".pdparams"结尾的模型权重文件，注意保持网络稳定；
大文件下载完成后，可将大文件从submission里删除掉；
再压缩submission文件夹为submission.zip，然后下载；
下载到本地后将大文件合并入submission，压缩后再提交到评估平台。

(五) 结果评估

提交文件准备好之后，选手可以将其提交到比赛评估平台得到评估结果

注：因为模型文件较大以及模型推理时间较长，评估时间可能持续1小时以上

(六) 算法优化规则

优化要求：

选手必须在提供的paddlepaddle代码框架内优化模型；
选手不得更改大语言模型部分的网络结构，保证LLM部分参数量不改变；
选手允许利用样本对大语言模型部分参数进行微调或调整除大语言模型以外的其他模型结构。

优化方向：

数据优化：数据集清洗，引入更高质量的数据集，扩充训练集合等；
模型优化：采用更强的图像编码器，丰富图像与大语言模型的连接交互部分等；
训练调参；
后处理优化 .....

参考资料

https://github.com/PaddlePaddle/PaddleNLP
https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models

请点击查看本环境基本用法.
Please click for more detailed instructions.

2024百度商业AI技术创新大赛赛道二：广告图片描述生成 AI Studio

Ne0inhk

项目简介

比赛背景

赛题说明

比赛任务

基线程序

(一) 比赛环境和代码准备

(二) 数据准备

(三) 模型训练和推理

A. 模型介绍

B. 修改比赛的训练配置文件

C. 模型训练

D. 模型推理

(四) 准备提交结果

(五) 结果评估

(六) 算法优化规则

优化要求：

优化方向：

参考资料

Read more

黄仁勋公开发文：传统软件开发模式终结，参与AI不必非得拥有计算机博士学位

转型AI工程师实战指南

个人开发者“接私活”降维打击：我是如何用 Nuct + Cursor 快速交付项目的 🚀

🚀 Expo React Native 微信支付集成全攻略