➜ LlamaFactory git:(main) ✗ uv run llamafactory-cli train examples/train_qlora/qwen3-coco.yaml
[WARNING|2026-02-06 17:47:42] llamafactory.hparams.parser:148>> We recommend enable `upcast_layernorm` in quantized training.
Qwen3VLVideoProcessor {"crop_size": null,"data_format":"channels_first","default_to_square": true,"device": null,"do_center_crop": null,"do_convert_rgb": true,"do_normalize": true,"do_rescale": true,"do_resize": true,"do_sample_frames": true,"fps":2,"image_mean":[0.5,0.5,0.5],"image_std":[0.5,0.5,0.5],"input_data_format": null,"max_frames":768,"merge_size":2,"min_frames":4,"num_frames": null,"pad_size": null,"patch_size":16,"processor_class":"Qwen3VLProcessor","resample":3,"rescale_factor":0.00392156862745098,"return_metadata": false,"size":{"longest_edge":25165824,"shortest_edge":4096},"temporal_patch_size":2,"video_metadata": null,"video_processor_type":"Qwen3VLVideoProcessor"}
[INFO|processing_utils.py:1116]2026-02-06 17:47:50,292>> loading configuration file processor_config.json from cache at None
[INFO|processing_utils.py:1199]2026-02-06 17:47:50,543>> Processor Qwen3VLProcessor:- image_processor: Qwen2VLImageProcessorFast {...}
[INFO|trainer.py:2519]2026-02-06 17:47:58,649>>***** Running training *****
[INFO|trainer.py:2520]2026-02-06 17:47:58,649>> Num examples = 600
[INFO|trainer.py:2521]2026-02-06 17:47:58,649>> Num Epochs = 2
[INFO|trainer.py:2522]2026-02-06 17:47:58,649>> Instantaneous batch size per device = 2
[INFO|trainer.py:2525]2026-02-06 17:47:58,649>> Total train batch size (w. parallel, distributed & accumulation)=8
[INFO|trainer.py:2526]2026-02-06 17:47:58,649>> Gradient Accumulation steps = 4
[INFO|trainer.py:2527]2026-02-06 17:47:58,649>> Total optimization steps = 150
[INFO|trainer.py:2528]2026-02-06 17:47:58,651>> Number of trainable parameters = 8,716,288
{'loss': 4.3662, 'grad_norm': 5.828382968902588, 'learning_rate': 6e-06, 'epoch': 0.13}
{'loss': 4.389, 'grad_norm': 6.548262119293213, 'learning_rate': 9.978353953249023e-06, 'epoch': 0.27}
{'loss': 4.0005, 'grad_norm': 6.604191303253174, 'learning_rate': 9.736983212571646e-06, 'epoch': 0.4}
{'loss': 3.4562, 'grad_norm': 5.726210117340088, 'learning_rate': 9.24024048078213e-06, 'epoch': 0.53}
{'loss': 3.1868, 'grad_norm': 3.4086735313341553, 'learning_rate': 8.514905871600676e-06, 'epoch': 0.67}
{'loss': 2.9764, 'grad_norm': 2.15506052988623, 'learning_rate': 7.600080639646077e-06, 'epoch': 0.8}
{'loss': 2.9609, 'grad_norm': 2.26679112060547, 'learning_rate': 6.545084971874738e-06, 'epoch': 0.93}
{'loss': 2.7471, 'grad_norm': 1.8668205738067627, 'learning_rate': 5.406793373339292e-06, 'epoch': 1.07}
{'loss': 2.9607, 'grad_norm': 2.0235414505004883, 'learning_rate': 4.246571438752585e-06, 'epoch': 1.2}
{'loss': 2.7321, 'grad_norm': 1.6290875673294067, 'learning_rate': 3.12696703292044e-06, 'epoch': 1.33}
{'loss': 2.6867, 'grad_norm': 2.1829676628112793, 'learning_rate': 2.108338191600676e-06, 'epoch': 1.47}
{'loss': 2.7761, 'grad_norm': 1.8782838582992554, 'learning_rate': 1.2455998350925042e-06, 'epoch': 1.6}
{'loss': 2.6362, 'grad_norm': 1.8889576196670532, 'learning_rate': 5.852620357053651e-07, 'epoch': 1.73}
{'loss': 2.6991, 'grad_norm': 2.0048000812530518, 'learning_rate': 1.6292390268568103e-07, 'epoch': 1.87}
{'loss': 2.6784, 'grad_norm': 1.992411882946, 'learning_rate': 1.3537941026914302e-09, 'epoch': 2.0}
100%|███████████████████████████████████████████████████████████████████████████|150/150[01:30<00:00,1.65it/s]
[INFO|trainer.py:4309]2026-02-06 17:49:31,374>> Saving model checkpoint to saves/qwen3-2b-coco-3000/lora/sft/checkpoint-150
{'train_runtime': 93.701, 'train_samples_per_second': 12.807, 'train_steps_per_second': 1.601, 'train_loss': 3.1501645787556964, 'epoch': 2.0}
100%|███████████████████████████████████████████████████████████████████████████|150/150[01:31<00:00,1.63it/s]
epoch = 2.0 total_flos = 1679346GF train_loss = 3.1502 train_runtime = 0:01:33.70 train_samples_per_second = 12.807 train_steps_per_second = 1.601
Figure saved at: saves/qwen3-2b-coco-3000/lora/sft/training_loss.png
[WARNING|2026-02-06 17:49:33] llamafactory.extras.ploting:148>> No metric eval_loss to plot.
[WARNING|2026-02-06 17:49:33] llamafactory.extras.ploting:148>> No metric eval_accuracy to plot.
[INFO|modelcard.py:456]2026-02-06 17:49:33,450>> Dropping the following result as it does not have all the necessary fields:{'task':{'name':'Causal Language Modeling','type':'text-generation'}}