Introducing Advanced Tool Use on the Claude Developer Platform
Overview
Anthropic has released three beta features enabling Claude to dynamically discover, learn, and execute tools more efficiently:
- Tool Search Tool – Allows Claude to discover tools on-demand without loading all definitions upfront
- Programmatic Tool Calling – Enables Claude to orchestrate tools through code rather than sequential API calls
- Tool Use Examples – Provides concrete usage patterns beyond schema definitions
Tool Search Tool
The Problem
Large tool libraries consume significant tokens. A five-server setup (GitHub, Slack, Sentry, Grafana, Splunk) totals approximately 55K tokens before any conversation begins. The feature states that "tool definitions consume 134K tokens before optimization" at Anthropic's scale.
Tool selection errors occur frequently, especially with similarly named tools like notification-send-user versus notification-send-channel.
The Solution
Rather than loading all tool definitions upfront, the Tool Search Tool discovers relevant tools as needed. According to the documentation, this achieves "an 85% reduction in token usage while maintaining access to your full tool library."
Internal testing showed accuracy improvements: Opus 4 improved from 49% to 74%, and Opus 4.5 improved from 79.5% to 88.1%.
Implementation
Tools are marked with defer_loading: true to enable on-demand discovery:
{
"tools": [
{"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
{
"name": "github.createPullRequest",
"description": "Create a pull request",
"defer_loading": true
}
]
}For MCP servers, entire servers can be deferred while keeping high-use tools loaded.
When to Use
Best for:
- Tool definitions consuming >10K tokens
- Tool selection accuracy issues
- MCP systems with multiple servers
- 10+ tools available
Less beneficial for:
- Small libraries (<10 tools)
- Frequently used tools in every session
- Compact tool definitions
Programmatic Tool Calling
The Problem
Traditional tool calling creates two bottlenecks:
Context pollution – Intermediate results accumulate unnecessarily. Analyzing a 10MB log file for error patterns loads the entire file into context despite only needing a summary.
Inference overhead – Each tool call requires a full model inference pass. A five-tool workflow means five separate inferences plus manual result synthesis.
The Solution
Claude writes Python code that orchestrates tools, processes outputs, and controls what enters its context. The code executes in a sandboxed environment, with only final results visible to the model.
Example use case: Finding team members who exceeded Q3 travel budgets.
Traditional approach: Fetch 20 team members → fetch each person's 50-100 expense items (20 calls) → fetch budgets → all 2,000+ line items enter context → Claude manually sums and compares.
With Programmatic Tool Calling: Claude writes code that fetches data in parallel, filters locally, and returns only the 2-3 people who exceeded limits.
Efficiency Gains
- Token savings: 37% reduction (43,588 to 27,297 tokens on complex tasks)
- Latency: Eliminates 19+ inference passes by orchestrating 20+ calls in one code block
- Accuracy: Knowledge retrieval improved from 25.6% to 28.5%; GIA benchmarks from 46.5% to 51.2%
Implementation
Step 1 – Mark tools as callable from code:
{
"tools": [
{"type": "code_execution_20250825", "name": "code_execution"},
{
"name": "get_team_members",
"allowed_callers": ["code_execution_20250825"]
}
]
}Step 2 – Claude generates orchestration code in Python
Step 3 – Tools execute within the Code Execution environment, not Claude's context
Step 4 – Only final output returns to Claude
When to Use
Most beneficial for:
- Large datasets requiring aggregation
- Multi-step workflows (3+ dependent calls)
- Filtering/transforming results before Claude sees them
- Parallel operations across many items
Less beneficial for:
- Single-tool invocations
- Tasks requiring Claude to see all intermediate results
- Quick lookups with small responses
Tool Use Examples
The Problem
JSON schemas define structural validity but cannot express usage patterns: when to include optional parameters, which combinations make sense, or API conventions.
Example: A support ticket API with required title field and many optional nested fields leaves questions unanswered:
- Date format: "2024-11-06", "Nov 6, 2024", or ISO 8601?
- ID conventions: UUID, "USR-12345", or "12345"?
- When to populate nested structures?
- Relationship between
escalation.levelandpriority?
The Solution
Tool Use Examples show concrete usage patterns:
{
"name": "create_ticket",
"input_examples": [
{
"title": "Login page returns 500 error",
"priority": "critical",
"labels": ["bug", "authentication", "production"],
"reporter": {
"id": "USR-12345",
"name": "Jane Smith",
"contact": {"email": "[email protected]", "phone": "+1-555-0123"}
},
"due_date": "2024-11-06",
"escalation": {"level": 2, "notify_manager": true, "sla_hours": 4}
},
{
"title": "Add dark mode support",
"labels": ["feature-request", "ui"],
"reporter": {"id": "USR-67890", "name": "Alex Chen"}
},
{"title": "Update API documentation"}
]
}Internal testing showed accuracy improvement from 72% to 90% on complex parameter handling.
When to Use
Most beneficial for:
- Complex nested structures with non-obvious patterns
- Many optional parameters with conditional inclusion
- Domain-specific API conventions
- Similar tools needing clarification
Less beneficial for:
- Simple single-parameter tools
- Standard formats (URLs, emails)
- Validation better handled by schema constraints
Best Practices
Layer Features Strategically
Start with the biggest bottleneck:
- Context bloat from tool definitions → Tool Search Tool
- Large intermediate results → Programmatic Tool Calling
- Parameter errors → Tool Use Examples
Layer additional features as needed; they're complementary.
Tool Search Setup
Use clear, descriptive tool names and descriptions:
{
"name": "search_customer_orders",
"description": "Search for customer orders by date range, status, or total amount. Returns order details including items, shipping, and payment info."
}Keep three to five most-used tools always loaded; defer the rest.
Programmatic Tool Calling Setup
Document return formats clearly to help Claude write correct parsing logic:
{
"name": "get_orders",
"description": "Retrieve orders for a customer. Returns: List of order objects with id, total (float), status, items array, and created_at timestamp"
}Opt-in tools that can run in parallel and are idempotent.
Tool Use Examples Setup
Craft examples showing realistic data with minimal, partial, and full specification patterns. Keep examples concise (1-5 per tool) and focus on actual ambiguities.
Getting Started
Enable features with the beta header:
client.beta.messages.create(
betas=["advanced-tool-use-2025-11-20"],
model="claude-sonnet-4-5-20250929",
max_tokens=4096,
tools=[
{"type": "tool_search_tool_regex_20251119", "name": "tool_search_tool_regex"},
{"type": "code_execution_20250825", "name": "code_execution"},
]
)See Platform Documentation for detailed guides and cookbooks.
Real-World Application
Claude for Excel uses Programmatic Tool Calling to read and modify spreadsheets with thousands of rows without overloading the model's context window, demonstrating practical capabilities previously impossible with conventional tool patterns.