➜ LlamaFactory git:(main) ✗ uv run llamafactory-cli train examples/train_qlora/qwen3-coco.yaml
[WARNING|2026-02-06 17:47:42] llamafactory.hparams.parser:148>> We recommend enable `upcast_layernorm` in quantized training.
Qwen3VLVideoProcessor {"crop_size": null,"data_format":"channels_first","default_to_square": true,"device": null,"do_center_crop": null,"do_convert_rgb": true,"do_normalize": true,"do_rescale": true,"do_resize": true,"do_sample_frames": true,"fps":2,"image_mean":[0.5,0.5,0.5],"image_std":[0.5,0.5,0.5],"input_data_format": null,"max_frames":768,"merge_size":2,"min_frames":4,"num_frames": null,"pad_size": null,"patch_size":16,"processor_class":"Qwen3VLProcessor","resample":3,"rescale_factor":0.00392156862745098,"return_metadata": false,"size":{"longest_edge":25165824,"shortest_edge":4096},"temporal_patch_size":2,"video_metadata": null,"video_processor_type":"Qwen3VLVideoProcessor"}
[INFO|processing_utils.py:1116] 2026-02-06 17:47:50,292>> loading configuration file processor_config.json from cache at None
[INFO|processing_utils.py:1199] 2026-02-06 17:47:50,543>> Processor Qwen3VLProcessor:- image_processor: Qwen2VLImageProcessorFast {"crop_size": null,"data_format":"channels_first","default_to_square": true,"device": null,"disable_grouping": null,"do_center_crop": null,"do_convert_rgb": true,"do_normalize": true,"do_pad": null,"do_rescale": true,"do_resize": true,"image_mean":[0.5,0.5,0.5],"image_processor_type":"Qwen2VLImageProcessorFast","image_std":[0.5,0.5,0.5],"input_data_format": null,"max_pixels": null,"merge_size":2,"min_pixels": null,"pad_size": null,"patch_size":16,"processor_class":"Qwen3VLProcessor","resample":3,"rescale_factor":0.00392156862745098,"return_tensors": null,"size":{"longest_edge":16777216,"shortest_edge":65536},"temporal_patch_size":2}- tokenizer: Qwen2TokenizerFast(name_or_path='Qwen/Qwen3-VL-2B-Instruct', vocab_size=151643, model_max_length=262144, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token':'<|im_end|>','pad_token':'<|endoftext|>','additional_special_tokens':['<|im_start|>','<|im_end|>','🎵','📷','🖼️','📄','📝','💬','🔤','📊']}
[INFO|trainer.py:2519] 2026-02-06 17:47:58,649>> ***** Running training *****
[INFO|trainer.py:2520] 2026-02-06 17:47:58,649>> Num examples = 600
[INFO|trainer.py:2521] 2026-02-06 17:47:58,649>> Num Epochs = 2
[INFO|trainer.py:2522] 2026-02-06 17:47:58,649>> Instantaneous batch size per device = 2
[INFO|trainer.py:2525] 2026-02-06 17:47:58,649>> Total train batch size (w. parallel, distributed & accumulation)=8
[INFO|trainer.py:2526] 2026-02-06 17:47:58,649>> Gradient Accumulation steps = 4
[INFO|trainer.py:2527] 2026-02-06 17:47:58,649>> Total optimization steps = 150
[INFO|trainer.py:2528] 2026-02-06 17:47:58,651>> Number of trainable parameters = 8,716,288
{'loss':4.3662,'grad_norm':5.828382968902588,'learning_rate':6e-06,'epoch':0.13}
{'loss':4.389,'grad_norm':6.548262119293213,'learning_rate':9.978353953249023e-06,'epoch':0.27}
{'loss':4.0005,'grad_norm':6.604191303253174,'learning_rate':9.736983212571646e-06,'epoch':0.4}
{'loss':3.4562,'grad_norm':5.726210117340088,'learning_rate':9.24024048078213e-06,'epoch':0.53}
{'loss':3.1868,'grad_norm':3.4086873531341553,'learning_rate':8.51490528712831e-06,'epoch':0.67}
{'loss':2.9764,'grad_norm':2.155060529646077e-06,'epoch':0.8}
{'loss':2.9609,'grad_norm':2.26679611206547,'learning_rate':6.545084971874738e-06,'epoch':0.93}
{'loss':2.7471,'grad_norm':1.8668205738067627,'learning_rate':5.406793373339292e-06,'epoch':1.07}
{'loss':2.9607,'grad_norm':2.023541438752585e-06,'epoch':1.2}
{'loss':2.7321,'grad_norm':1.6290875673294067,'learning_rate':3.12696703292044e-06,'epoch':1.33}
{'loss':2.6867,'grad_norm':2.182967628112793,'learning_rate':2.1083383191600676e-06,'epoch':1.47}
{'loss':2.7761,'grad_norm':1.878283852992554,'learning_rate':1.2455998350925042e-06,'epoch':1.6}
{'loss':2.6362,'grad_norm':1.8889576196670532,'learning_rate':5.852620357053651e-07,'epoch':1.73}
{'loss':2.6991,'grad_norm':2.0048000812530518,'learning_rate':1.6292390268568103e-07,'epoch':1.87}
{'loss':2.6784,'grad_norm':1.9924118518829346,'learning_rate':1.3537941026914302e-09,'epoch':2.0}
100%|███████████████████████████████████████████████████████████████████████████|150/150 [01:30<00:00,1.65it/s]
[INFO|trainer.py:4309] 2026-02-06 17:49:31,374>> Saving model checkpoint to saves/qwen3-2b-coco-3000/lora/sft/checkpoint-150
{'train_runtime':93.701,'train_samples_per_second':12.807,'train_steps_per_second':1.601,'train_loss':3.1501645787556964,'epoch':2.0}
100%|███████████████████████████████████████████████████████████████████████████|150/150 [01:31<00:00,1.63it/s]
epoch = 2.0 total_flos = 1679346GF train_loss = 3.1502 train_runtime = 0:01:33.70 train_samples_per_second = 12.807 train_steps_per_second = 1.601
Figure saved at: saves/qwen3-2b-coco-3000/lora/sft/training_loss.png
[WARNING|2026-02-06 17:49:33] llamafactory.extras.ploting:148>> No metric eval_loss to plot.
[WARNING|2026-02-06 17:49:33] llamafactory.extras.ploting:148>> No metric eval_accuracy to plot.
[INFO|modelcard.py:456] 2026-02-06 17:49:33,450>> Dropping the following result as it does not have all the necessary fields:{'task':{'name':'Causal Language Modeling','type':'text-generation'}}