vLLM/GPT-OSS Tool Calling Developer Guide¶
This guide explains how ppxai implements tool calling for vLLM backends (including GPT-OSS-120B) and provides a working approach that avoids the HarmonyError issues.
Quick Start: See examples/prompt_based_tools.py for a complete standalone example.
Critical Finding: Harmony Format is Mandatory¶
"GPT-OSS should not be used without using the Harmony format as it will not work correctly." — OpenAI Harmony Documentation
GPT-OSS was trained specifically on the Harmony response format. This is not an optional feature—it's the only correct way to use the model.
Harmony Format Architecture¶
| Component | Purpose |
|---|---|
<\|start\|>...<\|end\|> |
Message boundaries |
analysis channel |
Chain-of-thought reasoning (not shown to users) |
final channel |
User-facing responses |
commentary channel |
Tool/function calls |
<\|recipient\|> token |
Routes output to specific tools |
<\|thinking\|> token |
Internal reasoning |
Why Tool Parsing Can't Be Disabled¶
The Harmony format uses special control tokens (<|recipient|>, <|thinking|>, <|call|>, etc.) that the model always outputs as part of its response structure. If vLLM doesn't parse these tokens, they leak into the response as raw text—which causes the HarmonyError.
From vLLM Issue #22337:
"Without proper Harmony parsing, tool call data appeared in the content field as JSON text instead of being parsed into the proper structure."
Sources¶
- OpenAI Harmony GitHub - Official renderer library
- OpenAI Cookbook - Harmony Format - Format specification
- vLLM Blog - GPT-OSS Support - vLLM integration details
- vLLM Issue #22337 - Tool calling implementation issues
The Problem¶
When using vLLM with --enable-auto-tool-choice and GPT-OSS models, you may encounter:
This error occurs when Harmony control tokens aren't properly parsed. Since GPT-OSS was trained on Harmony format, the model always outputs these tokens. See vLLM issue #23567.
ppxai's Solution: Two Tool Calling Modes¶
ppxai supports two distinct approaches:
| Mode | Config Flag | vLLM Flags Required | Reliability |
|---|---|---|---|
| Prompt-Based | native_tool_calling: false |
None | ✅ High |
| Native | native_tool_calling: true |
--enable-auto-tool-choice |
⚠️ May hit HarmonyError |
Recommended: Prompt-Based Tool Calling¶
This mode bypasses vLLM's Harmony parsing entirely.
{
"providers": {
"vllm-gpt-oss": {
"name": "GPT-OSS 120B (vLLM)",
"base_url": "http://your-vllm-endpoint:8000/v1",
"api_key_env": "VLLM_API_KEY",
"default_model": "openai/gpt-oss-120b",
"capabilities": {
"native_tool_calling": false
}
}
}
}
How Prompt-Based Tool Calling Works¶
1. System Prompt Injection¶
When native_tool_calling: false, ppxai injects tool definitions into the system prompt:
# From ppxai/engine/tools/manager.py (lines 290-359)
def get_tools_prompt(self) -> str:
prompt = "# IMPORTANT: You Have Access to Tools\n\n"
prompt += "## How to Call a Tool\n\n"
prompt += "To use a tool, respond ONLY with a JSON code block:\n\n"
prompt += "```json\n{\"tool\": \"tool_name\", \"arguments\": {\"param\": \"value\"}}\n```\n\n"
prompt += "## Available Tools:\n\n"
for tool in tools:
prompt += f"### {tool.name}\n{tool.description}\n"
# ... parameter descriptions
2. Multi-Strategy Response Parsing¶
ppxai parses tool calls from text using multiple strategies (in order):
# From ppxai/engine/tools/parser.py (lines 212-308)
def parse_tool_call(text: str, get_tool: ToolLookupFunc) -> Optional[Dict[str, Any]]:
"""
Parsing strategies:
1. Entire response as JSON
2. JSON in markdown code blocks (```json ... ```)
3. Brace-based extraction (find {"tool" patterns)
4. Tool inference from arguments (for models without 'tool' key)
"""
Strategy 1: Entire Response as JSON¶
text_stripped = text.strip()
if text_stripped.startswith('{') and text_stripped.endswith('}'):
data = _try_parse_json(text_stripped)
if data:
normalized = _normalize_tool_call(data, get_tool)
Strategy 2: Markdown Code Blocks¶
code_block_pattern = r'```(?:json)?\s*([\s\S]*?)```'
matches = re.findall(code_block_pattern, text)
for match in matches:
# Parse JSON from code block
Strategy 3: Brace-Based Extraction¶
for pattern in ['{"tool"', "{'tool'"]:
start = text.find(pattern, start_idx)
# Count braces to find complete JSON object
depth = 0
for char in text[start:]:
if char == '{': depth += 1
elif char == '}': depth -= 1
if depth == 0: break
Strategy 4: Tool Inference (No 'tool' Key)¶
For models that output raw arguments without a tool name:
# From ppxai/engine/tools/parser.py (lines 34-81)
TOOL_INFERENCE_RULES = [
{
"tool": "web_search",
"required": ["query"],
"allowed": {"query", "num_results", "top_n", "count", "limit"},
"aliases": {"num_results": ["top_n", "count", "limit"]}
},
{
"tool": "read_file",
"required": ["path", "filepath"],
"allowed": {"path", "filepath", "line_start", "line_end"},
"aliases": {"filepath": ["path"]}
},
# ... more rules
]
3. Handling GPT-OSS Nested Tool Calls¶
GPT-OSS 120B sometimes outputs double-wrapped structures:
{
"tool": "apply_patch",
"arguments": {
"tool": "apply_patch",
"arguments": {
"file_path": "/path",
"unified_diff": "..."
}
}
}
ppxai unwraps this automatically:
# From ppxai/engine/tools/parser.py (lines 129-135)
if isinstance(args, dict) and "tool" in args and "arguments" in args:
# Nested tool call - unwrap it
args = args["arguments"]
4. Parameter Name Normalization¶
Different models use different parameter names. ppxai normalizes them:
# From ppxai/engine/tools/manager.py (lines 179-203)
PARAM_ALIAS_GROUPS = [
{"filepath", "file_path", "filePath", "file"},
{"path", "directory", "dir_path", "dir", "folder"},
{"command", "cmd", "shell_command"},
{"query", "query_text", "search_query"},
{"unified_diff", "diff", "patch"},
{"url", "link", "webpage", "uri"},
{"location", "city", "place"},
]
Example: Model outputs {"file": "/path"} → normalized to {"filepath": "/path"}
Implementation for Your Application¶
Step 1: Define Tool Schema¶
tools = [
{
"name": "read_file",
"description": "Read contents of a file",
"parameters": {
"type": "object",
"properties": {
"filepath": {"type": "string", "description": "Path to file"}
},
"required": ["filepath"]
}
}
]
Step 2: Generate System Prompt¶
def generate_tool_prompt(tools):
prompt = """# Tool Usage Instructions
To use a tool, respond with ONLY a JSON block:
```json
{"tool": "tool_name", "arguments": {"param": "value"}}
Available Tools:¶
"""
for tool in tools:
prompt += f"### {tool['name']}\n{tool['description']}\n"
for param, info in tool['parameters']['properties'].items():
required = "required" if param in tool['parameters'].get('required', []) else "optional"
prompt += f"- {param} ({required}): {info.get('description', '')}\n"
prompt += "\n"
prompt += """## Rules:
- Output ONLY the JSON block when calling a tool
- After receiving results, continue or respond to user
- You CAN access files and run commands - use tools proactively! """ return prompt ```
Step 3: Parse Tool Calls from Response¶
python
import json
import re
def parse_tool_call(text, tools):
"""Parse tool call from model response."""
tool_names = {t['name'] for t in tools}
# Strategy 1: Entire response is JSON
text_stripped = text.strip()
if text_stripped.startswith('{') and text_stripped.endswith('}'):
try:
data = json.loads(text_stripped)
if data.get('tool') in tool_names:
return normalize_tool_call(data)
except json.JSONDecodeError:
pass
# Strategy 2: JSON in code blocks
pattern = r'(?:json)?\s({[\s\S]?})\s*'
for match in re.findall(pattern, text):
try:
data = json.loads(match)
if data.get('tool') in tool_names:
return normalize_tool_call(data)
except json.JSONDecodeError:
pass
# Strategy 3: Find {"tool" pattern
start = text.find('{"tool"')
if start != -1:
depth = 0
for i, char in enumerate(text[start:]):
if char == '{': depth += 1
elif char == '}': depth -= 1
if depth == 0:
try:
data = json.loads(text[start:start+i+1])
if data.get('tool') in tool_names:
return normalize_tool_call(data)
except json.JSONDecodeError:
pass
break
return None
def normalize_tool_call(data):
"""Unwrap nested structures and normalize parameters."""
tool = data.get('tool') or data.get('name')
args = data.get('arguments', {})
# Unwrap GPT-OSS nested structure
if isinstance(args, dict) and 'tool' in args and 'arguments' in args:
args = args['arguments']
# Normalize parameter names
aliases = {
'filepath': ['file_path', 'filePath', 'file', 'path'],
'command': ['cmd', 'shell_command'],
'query': ['search_query', 'query_text'],
}
for canonical, alias_list in aliases.items():
for alias in alias_list:
if alias in args and canonical not in args:
args[canonical] = args.pop(alias)
break
return {'tool': tool, 'arguments': args}
Step 4: Chat Loop with Tool Execution¶
async def chat_with_tools(client, messages, tools, max_iterations=10):
tool_prompt = generate_tool_prompt(tools)
# Prepend tool prompt to system message
if messages[0]['role'] == 'system':
messages[0]['content'] = tool_prompt + "\n\n" + messages[0]['content']
else:
messages.insert(0, {'role': 'system', 'content': tool_prompt})
for iteration in range(max_iterations):
# Call model (no tools parameter - prompt-based mode)
response = await client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=messages,
stream=True
)
full_response = ""
async for chunk in response:
content = chunk.choices[0].delta.content or ""
full_response += content
print(content, end="", flush=True)
# Check for tool call
tool_call = parse_tool_call(full_response, tools)
if tool_call:
# Execute tool
result = await execute_tool(tool_call['tool'], tool_call['arguments'])
# Add assistant message and tool result to history
messages.append({'role': 'assistant', 'content': full_response})
messages.append({'role': 'user', 'content': f"Tool result:\n{result}"})
continue # Next iteration
# No tool call - return final response
return full_response
return full_response
Key Files in ppxai¶
| File | Purpose |
|---|---|
| ppxai/engine/tools/parser.py | Multi-strategy tool call parsing (309 lines) |
| ppxai/engine/tools/manager.py | Tool registration, prompt generation, parameter normalization |
| ppxai/engine/chat.py | Chat loop with tool iteration |
| tests/test_engine_tool_parsing.py | Comprehensive test suite (896 lines) |
Testing Your Implementation¶
Use these test cases from ppxai's test suite:
# Test: Simple JSON tool call
text = '{"tool": "read_file", "arguments": {"filepath": "/etc/hosts"}}'
assert parse_tool_call(text, tools) == {
"tool": "read_file",
"arguments": {"filepath": "/etc/hosts"}
}
# Test: Code block format
text = '''Here's what I'll do:
```json
{"tool": "web_search", "arguments": {"query": "python asyncio"}}
```'''
assert parse_tool_call(text, tools)['tool'] == "web_search"
# Test: GPT-OSS nested structure
text = '''{"tool": "apply_patch", "arguments": {"tool": "apply_patch", "arguments": {"file_path": "/app.py", "unified_diff": "..."}}}'''
result = parse_tool_call(text, tools)
assert result['arguments']['file_path'] == "/app.py" # Unwrapped
# Test: Parameter alias normalization
text = '{"tool": "read_file", "arguments": {"file": "/etc/hosts"}}'
result = parse_tool_call(text, tools)
assert 'filepath' in result['arguments'] # Normalized from 'file'
Summary¶
- Use prompt-based mode (
native_tool_calling: false) to avoid HarmonyError - Inject tool descriptions into system prompt with JSON format instructions
- Parse responses using multiple strategies (JSON, code blocks, brace matching)
- Handle GPT-OSS quirks: nested structures, parameter name variations
- Normalize parameters to match your tool schemas
This approach is model-agnostic and works reliably with GPT-OSS, Llama, Qwen, and other models served via vLLM or Ollama.
Implications for ppxai¶
Given that Harmony format is mandatory for GPT-OSS, here are the implications:
Current State (v1.14.x)¶
| Approach | Status | Notes |
|---|---|---|
Native (native_tool_calling: true) |
✅ Works | Requires vLLM with Harmony fix (PR #30205) |
Prompt-based (native_tool_calling: false) |
✅ Works | Fallback for older vLLM versions |
vLLM Harmony Fix¶
The Harmony parsing issue has been fixed in vLLM (PR #30205). Check your vLLM version to determine which mode to use.
Recommended Configuration¶
With fixed vLLM (PR #30205+): Use native tool calling for best performance:
{
"providers": {
"vllm-gpt-oss": {
"base_url": "http://your-vllm:8000/v1",
"default_model": "openai/gpt-oss-120b",
"capabilities": {
"native_tool_calling": true
}
}
}
}
With older vLLM (pre-fix): Use prompt-based mode to avoid HarmonyError:
{
"providers": {
"vllm-gpt-oss": {
"base_url": "http://your-vllm:8000/v1",
"default_model": "openai/gpt-oss-120b",
"capabilities": {
"native_tool_calling": false
}
}
}
}
Future Considerations¶
- Reasoning channel extraction - The
analysischannel contains chain-of-thought that could be displayed as "thinking" tokens (like DeepSeek R1) - Token filtering - Strip any leaked control tokens from responses (edge cases)
Architecture Decision¶
ppxai's prompt-based tool calling is the correct approach for GPT-OSS because:
- Reliability - Bypasses vLLM's unstable Harmony parser
- Portability - Same code works with Ollama, LM Studio, other backends
- Control - ppxai parses tool calls, not the inference server
- Flexibility - Can adapt to model quirks without vLLM changes