0:00
/
0:00
Transcript

"OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System"

Generated below podcast on this paper with Google's Illuminate.

Docker-based system makes complex knowledge extraction as simple as running a container

Three AI agents team up to extract knowledge while learning from their mistakes.

OneKE introduces a dockerized system that uses multiple AI agents to extract structured knowledge from diverse data sources like web pages and PDFs .

-----

https://arxiv.org/abs/2412.20005

Original Problem 🤔:

Previous knowledge extraction systems struggle with complex schemas and error handling, focusing mainly on individual model capabilities rather than building comprehensive, practical systems .

-----

Solution in this Paper 🛠️:

→ OneKE deploys three specialized agents: Schema Agent analyzes data types and generates output schemas, Extraction Agent pulls knowledge using various LLMs, and Reflection Agent handles error correction .

→ The system includes a Configure Knowledge Base storing predefined schemas and historical cases to improve extraction accuracy .

→ Schema Agent processes raw HTML and PDF data using document_loaders from Langchain, standardizing formats for extraction .

→ Extraction Agent uses semantic similarity matching to retrieve relevant cases and incorporates them as few-shot examples .

-----

Key Insights 🔍:

→ Multi-agent architecture enables flexible handling of diverse data formats and schemas

→ Case Repository enables continuous learning from past mistakes

→ Schema-guided approach improves extraction accuracy across domains

-----

Results 📊:

→ Tested on CrossNER and NYT-11-HRL datasets using LLaMA-3-8BInstruct and GPT-4-turbo

→ Case Retrieval method showed significant improvements in both Named Entity Recognition and Relation Extraction tasks

→ Successfully extracted structured data from 17-page Harry Potter chapter and web news articles

Discussion about this video