ramalama-run - Man Page

run specified AI Model as a chatbot

Synopsis

ramalama run [options] model [arg ...]

Model Transports

TransportsPrefixWeb Site
URL basedhttps://, http://, file://https://web.site/ai.model, file://tmp/ai.model
HuggingFacehuggingface://, hf://, hf.co/huggingface.co
Ollamaollama://ollama.com
OCI Container Registriesoci://opencontainers.org
Examples: quay.io,  Docker Hub,Artifactory

RamaLama defaults to the Ollama registry transport. This default can be overridden in the ramalama.conf file or via the RAMALAMA_TRANSPORTS environment. export RAMALAMA_TRANSPORT=huggingface Changes RamaLama to use huggingface transport.

Modify individual model transports by specifying the huggingface://, oci://, ollama://, https://, http://, file:// prefix to the model.

URL support means if a model is on a web site or even on your local system, you can run it directly.

Options

--authfile=password

path of the authentication file for OCI registries

--ctx-size, -c

size of the prompt context (default: 2048, 0 = loaded from model)

--help, -h

show this help message and exit

--name, -n

name of the container to run the Model in

--seed=

Specify seed rather than using random seed model interaction

--temp=“0.8”

Temperature of the response from the AI Model llama.cpp explains this as:

The lower the number is, the more deterministic the response.

The higher the number is the more creative the response is, but more likely to hallucinate when set too high.

    Usage: Lower numbers are good for virtual assistants where we need deterministic responses. Higher numbers are good for roleplay or creative tasks like editing stories

--tls-verify=true

require HTTPS and verify certificates when contacting OCI registries

Description

Run specified AI Model as a chat bot. RamaLama pulls specified AI Model from registry if it does not exist in local storage. By default a prompt for a chat bot is started. When arguments are specified, the arguments will be given to the AI Model and the output returned without entering the chatbot.

Examples

Run command without arguments starts a chatbot

ramalama run granite
>

Run command with local downloaoded model

ramalama run file:///tmp/mymodel
>
ramalama run merlinite "when is the summer solstice"
The summer solstice, which is the longest day of the year, will happen on June ...

Run command with a custom prompt and a file passed by the stdin

cat file.py | ramalama run quay.io/USER/granite-code:1.0 'what does this program do?'

This program is a Python script that allows the user to interact with a terminal. ...
 [end of text]

NVIDIA CUDA Support

See ramalama-cuda(7) for setting up the host Linux system for CUDA support.

See Also

ramalama(1), ramalama-cuda(7)

History

Aug 2024, Originally compiled by Dan Walsh dwalsh@redhat.com ⟨mailto:dwalsh@redhat.com⟩

Referenced By

ramalama(1), ramalama-stop(1).