0:00
/
0:00
Transcript

"Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction"

The podcast on this paper is generated with Google's Illuminate.

Semantic-aware embeddings that understand the functional flow of conversations.

Dialog2Flow (D2F) maps dialog utterances into action-based latent space for automated workflow extraction

📚 https://arxiv.org/abs/2410.18481

Original Problem 🎯:

Automatically extracting structured workflows from raw dialog data remains a major challenge. This capability is crucial for dialog system design, discourse analysis, and training both AI and human agents. Current methods either require manual annotation or use ad-hoc approaches.

-----

Solution in this Paper 🔧:

• Introduces Dialog2Flow (D2F) embeddings that map utterances to a latent space where they cluster by their communicative functions

• Builds a unified dataset from 20 task-oriented dialog datasets with standardized action annotations

• Implements a novel soft contrastive loss that leverages semantic information of actions to guide representation learning

• Creates first sentence embedding model specifically pre-trained for dialog flow extraction

• Maps dialogs as continuous trajectories in latent space with distinct action-related regions

• Clusters D2F embeddings to convert dialogs into sequences of action IDs for workflow extraction

-----

Key Insights 💡:

• Soft contrastive loss outperforms standard supervised contrastive loss by better capturing semantic relationships

• The approach works consistently across domains even with limited training data

• Embeddings successfully cluster utterances by communicative functions rather than just semantic similarity

• The unified dataset created is the largest with standardized per-turn action annotations

-----

Results 📊:

• D2F achieves 6.86% average difference from reference graphs across domains vs 27.90% for best baseline

• Shows superior qualitative and quantitative results compared to various sentence embeddings

• Maintains consistent performance even in domains with only 0.11% of training data

• Extracts graphs closest in complexity to reference graphs across all tested domains

Discussion about this video