前言
在明确了 Hugging Face 的参数传递理论后,接下来我们进入实践环节。本文将结合 Llama3 模型,演示如何构建参数对象、实例化模型与分词器,并整合 Trainer 进行训练准备。
参数体系构建
在大模型开发中,参数管理通常分为模型配置、数据处理和训练控制三类。虽然具体字段随项目需求变化,但核心结构保持一致。
模型参数配置
这部分主要涉及模型路径、版本控制及加载时的精度设置。例如,指定预训练权重位置或覆盖默认配置项:
@dataclass
class ModelArguments:
""" Arguments pertaining to which model/config/tokenizer we are going to fine-tune, or train from scratch. """
model_name_or_path: Optional[str] = field(
default=None,
metadata={"help": ("The model checkpoint for weights initialization. Don't set if you want to train a model from scratch.")}
)
tokenizer_name_or_path: Optional[str] = field(
default=None,
metadata={
"help": (
"The tokenizer for weights initialization.Don't set if you want to train a model from scratch."
)},
)
model_type: Optional[str] = field(
default=None
)
config_overrides: Optional[str] = field(
default=None,
metadata={"help": (
"Override some existing default config settings when a model is trained from scratch. Example: "
"n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index"
)},
)
cache_dir: Optional[str] = field(
default=None,
metadata={"help": "Where do you want to store the pretrained models downloaded from huggingface.co"}
)
model_revision: str = field(
default=,
metadata={: }
)
use_auth_token: = field(
default=,
metadata={
: (
)
}
)
torch_dtype: [] = field(
default=,
metadata={
: (
),
: [, , , ],
}
)
low_cpu_mem_usage: = field(
default=,
metadata={
: (
)
}
)

