Docker-based system makes complex knowledge extraction as simple as running a container
Three AI agents team up to extract knowledge while learning from their mistakes.
OneKE introduces a dockerized system that uses multiple AI agents to extract structured knowledge from diverse data sources like web pages and PDFs .
-----
https://arxiv.org/abs/2412.20005
Original Problem 🤔:
Previous knowledge extraction systems struggle with complex schemas and error handling, focusing mainly on individual model capabilities rather than building comprehensive, practical systems .
-----
Solution in this Paper 🛠️:
→ OneKE deploys three specialized agents: Schema Agent analyzes data types and generates output schemas, Extraction Agent pulls knowledge using various LLMs, and Reflection Agent handles error correction .
→ The system includes a Configure Knowledge Base storing predefined schemas and historical cases to improve extraction accuracy .
→ Schema Agent processes raw HTML and PDF data using document_loaders from Langchain, standardizing formats for extraction .
→ Extraction Agent uses semantic similarity matching to retrieve relevant cases and incorporates them as few-shot examples .
-----
Key Insights 🔍:
→ Multi-agent architecture enables flexible handling of diverse data formats and schemas
→ Case Repository enables continuous learning from past mistakes
→ Schema-guided approach improves extraction accuracy across domains
-----
Results 📊:
→ Tested on CrossNER and NYT-11-HRL datasets using LLaMA-3-8BInstruct and GPT-4-turbo
→ Case Retrieval method showed significant improvements in both Named Entity Recognition and Relation Extraction tasks
→ Successfully extracted structured data from 17-page Harry Potter chapter and web news articles
Share this post