Ollama 使用指南：Python 与 JavaScript 集成实践

什么是 Ollama？

Ollama 是一个用于在本地运行大语言模型（LLM）的工具和框架。它简化了在本地环境中部署和使用大语言模型的过程，使得开发者可以在自己的计算机上运行强大的 AI 模型，而无需依赖云服务。

Ollama 提供了 REST API 接口，并且为 Python 和 JavaScript 开发者提供了专门的 SDK，使得集成变得非常简单。

安装 Ollama

首先需要在本地安装 Ollama，可以从 Ollama 官网下载适用于您操作系统的安装包。

安装完成后，可以通过以下命令验证安装：

1	ollama --version

拉取模型

在使用 Ollama 之前，需要先拉取所需的模型。以 gemma3 模型为例：

1	ollama pull gemma3

您可以在 Ollama 模型库中找到更多可用的模型。

Python 中使用 Ollama

安装 Python SDK

首先需要安装 Ollama 的 Python SDK：

1	pip install ollama

基本聊天功能

以下是一个简单的聊天示例：

from ollama import chat
from ollama import ChatResponse

# 发送聊天消息
response: ChatResponse = chat(model='gemma3', messages=[
    {
        'role': 'user',
        'content': '为什么天空是蓝色的？',
    },
])

# 输出响应内容
print(response['message']['content'])

# 或者直接通过响应对象访问字段
print(response.message.content)

流式响应

对于需要实时显示响应的场景，可以启用流式响应：

from ollama import chat

# 启用流式响应
stream = chat(
    model='gemma3',
    messages=[{'role': 'user', 'content': '为什么天空是蓝色的？'}],
    stream=True,
)

# 实时输出响应内容
for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

异步客户端

对于异步应用，可以使用 AsyncClient：

import asyncio
from ollama import AsyncClient

async def chat():
    message = {'role': 'user', 'content': '为什么天空是蓝色的？'}
    response = await AsyncClient().chat(model='gemma3', messages=[message])
    print(response.message.content)

# 运行异步函数
asyncio.run(chat())

流式异步响应

import asyncio
from ollama import AsyncClient

async def chat():
    message = {'role': 'user', 'content': '为什么天空是蓝色的？'}
    async for part in await AsyncClient().chat(model='gemma3', messages=[message], stream=True):
        print(part['message']['content'], end='', flush=True)

# 运行异步函数
asyncio.run(chat())

自定义客户端

可以创建自定义客户端来配置特定的选项：

from ollama import Client

# 创建自定义客户端
client = Client(
    host='http://localhost:11434',
    headers={'x-some-header': 'some-value'}
)

# 发送请求
response = client.chat(model='gemma3', messages=[
    {
        'role': 'user',
        'content': '为什么天空是蓝色的？',
    },
])

print(response.message.content)

其他 API 功能

Ollama Python SDK 还提供了其他功能：

import ollama

# 列出所有模型
models = ollama.list()
print(models)

# 显示模型信息
model_info = ollama.show('gemma3')
print(model_info)

# 生成文本（非聊天模式）
response = ollama.generate(model='gemma3', prompt='写一首关于春天的诗')
print(response.response)

# 删除模型
# ollama.delete('gemma3')

# 复制模型
# ollama.copy('gemma3', 'user/gemma3')

# 嵌入文本（建议使用专用嵌入模型，如 nomic-embed-text）
embedding = ollama.embed(model='nomic-embed-text', input='天空是蓝色的因为瑞利散射')
print(embedding['embeddings'])

高级能力

Ollama 的新一代能力都可以直接通过 SDK 的 chat / generate 使用。注意：部分能力需要对应模型支持，下面在每个示例中标注了适用的模型。

思考（Thinking）：推理模型（如 qwen3、deepseek-r1）支持把推理过程单独输出。传入 think 参数后，响应的 message.thinking 是推理过程，message.content 是最终回答。

from ollama import chat

response = chat(
    model='qwen3',
    messages=[{'role': 'user', 'content': '证明根号 2 是无理数'}],
    think=True,
)

print('推理过程：', response.message.thinking)
print('最终回答：', response.message.content)

工具调用（Tool Calling）：让模型决定是否调用你提供的函数。需使用支持工具调用的模型（如 llama3.1）。

from ollama import chat

tools = [{
    'type': 'function',
    'function': {
        'name': 'get_weather',
        'description': '获取指定城市的天气',
        'parameters': {
            'type': 'object',
            'properties': {
                'city': {'type': 'string', 'description': '城市名称'},
            },
            'required': ['city'],
        },
    },
}]

response = chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': '北京今天天气怎么样？'}],
    tools=tools,
)

# 模型决定调用 get_weather，结果在 tool_calls 中
print(response.message.tool_calls)

拿到 tool_calls 后，由你执行对应函数，再把结果以 role: 'tool' 的消息追加到 messages 中继续对话即可。

结构化输出（Structured Outputs）：通过 format 传入 JSON Schema，让模型返回严格符合结构的数据。

from ollama import chat
import json

response = chat(
    model='gemma3',
    messages=[{'role': 'user', 'content': '介绍北京，返回城市名和人口'}],
    format={
        'type': 'object',
        'properties': {
            'city': {'type': 'string'},
            'population': {'type': 'integer'},
        },
        'required': ['city', 'population'],
    },
)

print(json.loads(response.message.content))

视觉（Vision）：gemma3 4b 及以上版本支持图像输入，在消息中传入 base64 编码的 images 即可。

from ollama import chat
import base64

with open('photo.jpg', 'rb') as f:
    image = base64.b64encode(f.read()).decode()

response = chat(model='gemma3', messages=[{
    'role': 'user',
    'content': '描述这张图片',
    'images': [image],
}])

print(response.message.content)

模型驻留（keep_alive）：通过 keep_alive 控制模型在内存中的停留时长，避免频繁加载。设为 0 可立即卸载释放显存。

from ollama import chat

# 模型加载后保留 5 分钟
response = chat(
    model='gemma3',
    messages=[{'role': 'user', 'content': '你好'}],
    keep_alive='5m',
)

JavaScript 中使用 Ollama

安装 JavaScript SDK

在 Node.js 项目中安装 Ollama SDK：

1	npm install ollama

基本聊天功能

import ollama from 'ollama'

// 发送聊天消息
const response = await ollama.chat({
    model: 'gemma3',
    messages: [{ role: 'user', content: '为什么天空是蓝色的？' }],
})

// 输出响应内容
console.log(response.message.content)

流式响应

import ollama from 'ollama'

const message = { role: 'user', content: '为什么天空是蓝色的？' }
const response = await ollama.chat({
    model: 'gemma3',
    messages: [message],
    stream: true,
})

// 实时输出响应内容
for await (const part of response) {
    process.stdout.write(part.message.content)
}

浏览器中使用

在浏览器环境中，需要导入浏览器模块：

import ollama from 'ollama/browser'

// 在浏览器中使用
const response = await ollama.chat({
    model: 'gemma3',
    messages: [{ role: 'user', content: '为什么天空是蓝色的？' }],
})

console.log(response.message.content)

自定义客户端

import { Ollama } from 'ollama'

// 创建自定义客户端
const ollama = new Ollama({
    host: 'http://localhost:11434',
    headers: { 'x-some-header': 'some-value' }
})

// 发送请求
const response = await ollama.chat({
    model: 'gemma3',
    messages: [{ role: 'user', content: '为什么天空是蓝色的？' }],
})

console.log(response.message.content)

其他 API 功能

import ollama from 'ollama'

// 列出所有模型
const models = await ollama.list()
console.log(models)

// 显示模型信息
const modelInfo = await ollama.show('gemma3')
console.log(modelInfo)

// 生成文本（非聊天模式）
const response = await ollama.generate({
    model: 'gemma3',
    prompt: '写一首关于春天的诗'
})
console.log(response.response)

// 嵌入文本（建议使用专用嵌入模型，如 nomic-embed-text）
const embedding = await ollama.embed({
    model: 'nomic-embed-text',
    input: '天空是蓝色的因为瑞利散射'
})
console.log(embedding.embeddings)

高级能力

与 Python SDK 对应，JavaScript SDK 同样支持以下能力。

思考（Thinking）：

import ollama from 'ollama'

const response = await ollama.chat({
    model: 'qwen3',
    messages: [{ role: 'user', content: '证明根号 2 是无理数' }],
    think: true,
})

console.log('推理过程：', response.message.thinking)
console.log('最终回答：', response.message.content)

工具调用（Tool Calling）：

import ollama from 'ollama'

const tools = [{
    type: 'function',
    function: {
        name: 'get_weather',
        description: '获取指定城市的天气',
        parameters: {
            type: 'object',
            properties: {
                city: { type: 'string', description: '城市名称' },
            },
            required: ['city'],
        },
    },
}]

const response = await ollama.chat({
    model: 'llama3.1',
    messages: [{ role: 'user', content: '北京今天天气怎么样？' }],
    tools,
})

console.log(response.message.tool_calls)

结构化输出（Structured Outputs）：

import ollama from 'ollama'

const response = await ollama.chat({
    model: 'gemma3',
    messages: [{ role: 'user', content: '介绍北京，返回城市名和人口' }],
    format: {
        type: 'object',
        properties: {
            city: { type: 'string' },
            population: { type: 'integer' },
        },
        required: ['city', 'population'],
    },
})

console.log(JSON.parse(response.message.content))

视觉（Vision）：

import ollama from 'ollama'
import { readFileSync } from 'node:fs'

const image = readFileSync('./photo.jpg').toString('base64')

const response = await ollama.chat({
    model: 'gemma3',
    messages: [{
        role: 'user',
        content: '描述这张图片',
        images: [image],
    }],
})

console.log(response.message.content)

模型驻留（keep_alive）：

import ollama from 'ollama'

// 模型加载后保留 5 分钟
const response = await ollama.chat({
    model: 'gemma3',
    messages: [{ role: 'user', content: '你好' }],
    keep_alive: '5m',
})

错误处理

在使用 Ollama 时，需要适当处理可能出现的错误：

Python 错误处理

import ollama
from ollama import ResponseError

model = 'does-not-yet-exist'

try:
    ollama.chat(model)
except ResponseError as e:
    print('错误:', e.error)
    if e.status_code == 404:
        print('模型不存在，需要先拉取模型')
        # ollama.pull(model)

JavaScript 错误处理

import ollama from 'ollama'

try {
    const response = await ollama.chat({
        model: 'does-not-yet-exist',
        messages: [{ role: 'user', content: '为什么天空是蓝色的？' }],
    })
    console.log(response.message.content)
} catch (error) {
    console.error('错误:', error.message)
    if (error.status === 404) {
        console.log('模型不存在，需要先拉取模型')
        // await ollama.pull('does-not-yet-exist')
    }
}

实际应用示例

Python 聊天机器人

from ollama import chat
import asyncio

class ChatBot:
    def __init__(self, model='gemma3'):
        self.model = model
        self.history = []
    
    def send_message(self, message):
        # 添加用户消息到历史记录
        self.history.append({'role': 'user', 'content': message})
        
        # 发送请求
        response = chat(model=self.model, messages=self.history)
        
        # 添加助手响应到历史记录
        assistant_message = response['message']
        self.history.append(assistant_message)
        
        return assistant_message['content']
    
    def reset(self):
        self.history = []

# 使用示例
bot = ChatBot()
response = bot.send_message('你好，介绍一下你自己')
print(response)

JavaScript 聊天应用

import ollama from 'ollama'

class ChatBot {
    constructor(model = 'gemma3') {
        this.model = model
        this.history = []
    }
    
    async sendMessage(message) {
        // 添加用户消息到历史记录
        this.history.push({ role: 'user', content: message })
        
        // 发送请求
        const response = await ollama.chat({
            model: this.model,
            messages: this.history
        })
        
        // 添加助手响应到历史记录
        const assistantMessage = response.message
        this.history.push(assistantMessage)
        
        return assistantMessage.content
    }
    
    reset() {
        this.history = []
    }
}

// 使用示例
const bot = new ChatBot()
const response = await bot.sendMessage('你好，介绍一下你自己')
console.log(response)