ramalama-cann - Man Page

Setting Up RamaLama with Ascend NPU Support on Linux systems

This guide walks through the steps required to set up RamaLama with Ascend NPU support.
- Background ⟨#background⟩
- Hardware ⟨#hardware⟩
- Model ⟨#model⟩
- Docker ⟨#docker⟩
- History ⟨#todo⟩

Background

Ascend NPU is a range of AI processors using Neural Processing Unit. It will efficiently handle matrix-matrix multiplication, dot-product and scalars.

CANN (Compute Architecture for Neural Networks) is a heterogeneous computing architecture for AI scenarios, providing support for multiple AI frameworks on the top and serving AI processors and programming at the bottom. It plays a crucial role in bridging the gap between upper and lower layers, and is a key platform for improving the computing efficiency of Ascend AI processors. Meanwhile, it offers a highly efficient and easy-to-use programming interface for diverse application scenarios, allowing users to rapidly build AI applications and services based on the Ascend platform.

Hardware

Ascend NPU

Verified devices

Table Supported Hardware List: | Ascend NPU                     | Status  | | -----------------------------  | ------- | | Atlas A2 Training series       | Support | | Atlas 800I A2 Inference series | Support |

Notes:

  • If you have trouble with Ascend NPU device, please create an issue with [CANN] prefix/tag.
  • If you are running successfully with an Ascend NPU device, please help update the "Supported Hardware List" table above.

Model

Currently, Ascend NPU acceleration is only supported when the llama.cpp backend is selected. For supported models, please refer to the page llama.cpp/backend/CANN.md.

Docker

Install the Ascend driver

This provides NPU acceleration using the AI cores of your Ascend NPU. And CANN is a hierarchical APIs to help you to quickly build AI applications and service based on Ascend NPU.

For more information about Ascend NPU in Ascend Community.

Make sure to have the CANN toolkit installed. You can download it from here: CANN Toolkit Make sure the Ascend Docker runtime is installed. You can download it from here: Ascend-docker-runtime

Build Images

Go to ramalama directory and build using make.

make build IMAGE=cann
make install

You can test with:

export ASCEND_VISIBLE_DEVICES=0
ramalama --image quay.io/ramalama/cann:latest serve -d -p 8080 -name ollama://smollm:135m

In a window see the running podman container.

$ podman ps
CONTAINER ID   IMAGE                                                         COMMAND                  CREATED             STATUS             PORTS                                          NAMES
80fc31c131b0   quay.io/ramalama/cann:latest                                  "/bin/bash -c 'expor…"   About an hour ago   Up About an hour                                                  ame

Other using guides see Ramalama (README.md)

History

Mar 2025, Originally compiled