"ToolFactory: Automating Tool Generation by Leveraging LLM to Understand REST API Documentations"

Below podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Feb 10, 2025

Article voiceover

0:00

-5:39

https://arxiv.org/abs/2501.16945

The paper addresses the challenge of automatically creating AI-compatible tools from REST API documentation, which is often unstructured and lacks standardization. This hinders the seamless integration of APIs into AI agents, especially in research domains where APIs are less documented compared to commercial ones.

This paper introduces ToolFactory, an open-source pipeline to automate AI tool generation from unstructured API documents. It leverages a knowledge base and an evaluation method to improve tool reliability and handle poorly documented APIs.

-----

📌 Prompt tuning enables efficient API information extraction. APILlama, a small model, rivals GPT-3.5 in performance. This shows that focused fine-tuning is effective for structured data tasks, reducing reliance on large models.

📌 ToolFactory's validation pipeline is critical for real-world API integration. It addresses the noisy nature of API documentation by incorporating automated error diagnosis and ensuring tool functionality before agent use.

📌 The parameter knowledge base enhances tool usability in specific domains. By inferring parameter values, ToolFactory overcomes documentation gaps. This method demonstrates the value of domain-specific knowledge for tool agents.

----------

Methods Explored in this Paper 🔧:

→ ToolFactory pipeline is introduced. It automates the generation of AI-usable tools from REST API documentation in natural language.

→ APILlama, a prompt-tuned Llama-3 model, is developed. It extracts structured API information from documentation. Prompt tuning with soft prompts efficiently encodes the JSON schema for extraction. This minimizes trainable parameters and improves efficiency.

→ An API Extraction Benchmark dataset is created. It contains 167 API documents with varying structures. This benchmark is used to train and evaluate ToolFactory. A JSON schema is designed to standardize extracted API information.

→ A tool validation pipeline is implemented. It verifies tool functionality and diagnoses errors using GPT-4o. Error types include URL issues, request failures, and incorrect parameter values. A parameter knowledge base is built from validated tools to infer missing parameter information in new APIs.

-----

Key Insights 💡:

→ Prompt tuning is effective for structured information extraction from API documentation. APILlama, a fine-tuned small model, achieves performance comparable to larger models like GPT-3.5.

→ Automated tool generation pipelines like ToolFactory significantly reduce the effort required to integrate REST APIs into AI agent workflows.

→ Validating generated tools and building parameter knowledge bases are crucial for improving the reliability and usability of AI tools, especially for less structured API documentation.

-----

Results 📊:

→ APILlama achieves 97% Valid Ratio in generating correctly structured JSON files.

→ APILlama achieves 1.00 Method Accuracy and 0.92 Parameter Precision, demonstrating high accuracy in functional parameter extraction.

→ ToolFactory generates tools with a 52% success rate based on extracted information from API documentation in the benchmark.

→ Parameter inference method using knowledge base improves parameter value prediction, outperforming GPT-4o in finding valid parameter values by a margin of 33 vs 17 tools.

Rohan's Bytes

Discussion about this post