{"object":"list","data":[{"id":"chatglm3","object":"model","created":0,"owned_by":"xinference","model_type":"LLM","address":"0.0.0.0:38145","accelerators":["0"],"model_name":"chatglm3","model_lang":["en","zh"],"model_ability":["chat","tools"],"model_description":"ChatGLM3 is the third generation of ChatGLM, still open-source and trained on Chinese and English data.","model_format":"pytorch","model_size_in_billions":6,"model_family":"chatglm3","quantization":"4-bit","model_hub":"modelscope","revision":"v1.0.2","context_length":8192,"replica":1}]}
(xinference) root@master:~# xinference --help
Usage: xinference [OPTIONS] COMMAND [ARGS]...
Xinference command-line interface for serving and deploying models.
Options:
-v, --version Show the current version of the Xinference tool.
--log-level TEXT Set the logger level. Options listed from most log to
least log are: DEBUG > INFO > WARNING > ERROR > CRITICAL
(Default level is INFO)
-H, --host TEXT Specify the host address for the Xinference server.
-p, --port INTEGER Specify the port number for the Xinference server.
--help Show this message to exit.
Commands:
cached List all cached models in Xinference.
cal-model-mem calculate gpu mem usage with specified model size and...
chat Chat with a running LLM.
engine Query the applicable inference engine by model name.
generate Generate text using a running LLM.
launch Launch a model with the Xinference framework with the...
list List all running models in Xinference.
login Login when the cluster is authenticated.
register Register a new model with Xinference for deployment.
registrations List all registered models in Xinference.
remove-cache Remove selected cached models in Xinference.
stop-cluster Stop a cluster using the Xinference framework with the...
terminate Terminate a deployed model through unique identifier...
unregister Unregister a model from Xinference, removing it from...
vllm-models Query and display models compatible with vLLM.
(xinference) root@master:~# xinference launch --help
Usage: xinference launch [OPTIONS]
Launch a model with the Xinference framework with the given parameters.
Options:
-e, --endpoint TEXT Xinference endpoint.
-n, --model-name TEXT Provide the name of the model to be
launched. [required]
-t, --model-type TEXT Specify type of model, LLM as default.
-en, --model-engine TEXT Specify the inference engine of the model
when launching LLM.
-u, --model-uid TEXT Specify UID of model, default is None.
-s, --size-in-billions TEXT Specify the model size in billions of
parameters.
-f, --model-format TEXT Specify the format of the model, e.g. pytorch, ggmlv3, etc.
-q, --quantization TEXT Define the quantization settings for the
model.
-r, --replica INTEGER The replica count of the model, default is
1.
--n-gpu TEXT The number of GPUs used by the model,
default is "auto".
-lm, --lora-modules <TEXT TEXT>...
LoRA module configurations in the format
name=path. Multiple modules can be
specified.
-ld, --image-lora-load-kwargs <TEXT TEXT>...
-fd, --image-lora-fuse-kwargs <TEXT TEXT>...
--worker-ip TEXT Specify which worker this model runs on by
ip, for distributed situation.
--gpu-idx TEXT Specify which GPUs of a worker this model
can run on, separated with commas.
--trust-remote-code BOOLEAN Whether or not to allow for custom models
defined on the Hub in their own modeling
files.
-ak, --api-key TEXT Api-Key for access xinference api with
authorization.
--help Show this message to exit.
(xinference) root@master:~# xinference engine --help
Usage: xinference engine [OPTIONS]
Query the applicable inference engine by model name.
Options:
-n, --model-name TEXT The model name you want to query.
[required]
-en, --model-engine TEXT Specify the `model_engine` to query the
corresponding combination of other
parameters.
-f, --model-format TEXT Specify the `model_format` to query the
corresponding combination of other
parameters.
-s, --model-size-in-billions TEXT
Specify the `model_size_in_billions` to
query the corresponding combination of other
parameters.
-q, --quantization TEXT Specify the `quantization` to query the
corresponding combination of other
parameters.
-e, --endpoint TEXT Xinference endpoint.
-ak, --api-key TEXT Api-Key for access xinference api with
authorization.
--help Show this message to exit.