Documents serve as the backbone of knowledge preservation, decision-making, and information dissemination across domains such as law, finance, medicine, and academia. They encapsulate structured and unstructured data, requiring sophisticated processing to extract, synthesize, and verify critical insights. As AI-driven document understanding advances, ensuring factual accuracy, reasoning over complex contexts, and attributing information to reliable sources becomes increasingly vital. The RARA workshop addresses persistent challenges in document processing that have been amplified in the era of Large Language Models (LLMs). Extracting information from complex, multi-modal documents often requires reasoning over multiple sections, linking disparate pieces of evidence, and ensuring consistency across sources. Attribution remains a key issue, as models struggle to trace generated content back to reliable references, making verification difficult. While LLMs have improved contextual understanding, they also introduce risks such as hallucinations, overgeneralization, and opacity in their decision-making. Our workshop explores frameworks that ground document understanding through four key emerging pillars:
- Reasoning: Structured mechanisms to navigate complex inference chains and resolve ambiguities
- Agents: Specialized components working together on complex tasks with access to executable tools
- Retrieval: Methods for finding and integrating relevant information to ground AI responses
- Attribution: Techniques to ensure AI-generated content remains traceable to reliable sources
Speakers
University of North Carolina at Chapel Hill
Carnegie Mellon University, All Hands AI
University of Massachusetts Amherst
Call for Papers
We invite submissions to the first international workshop on Reasoning, Agents, Retrieval, and Attribution (RARA) for grounding documents, to be held in conjunction with ICDM 2025.
As documents continue to serve as the foundation of knowledge across domains like law, finance, healthcare, and academia, AI systems must develop more sophisticated capabilities to process, interpret, and reason over these complex information sources. The RARA workshop aims to bring together researchers and practitioners working at the intersection of document understanding, reasoning systems, multi-agent architectures, and information retrieval to address key challenges in this rapidly evolving field.
Topics of Interest
We welcome original research papers on topics including but not limited to:
- Complex Reasoning: Multi-hop inference across document sections, logical consistency maintenance, ambiguity resolution in domain-specific texts
- Agent Architectures: Multi-agent coordination frameworks, tool-augmented document understanding, planning strategies for document analysis
- Document-Specific Agents: Citation verification, fact-checking, table extraction, chart interpretation, formula extraction, document summarization and comparison agents
- Domain-Specific Document Processing: Specialized techniques for legal, financial, healthcare, academic, technical, and government documents
- Advanced Retrieval: Dense/sparse retrieval for multi-modal documents, cross-document information synthesis, retrieval-augmented generation for document processing
- Attribution Mechanisms: Source tracing in AI-generated content, confidence calibration in document analysis, verification of AI-generated claims
- Multi-Modal Processing: Handling diverse document formats including charts, tables, infographics, diagrams, flowcharts, forms, and other visually rich elements
- Document Structure: Layout analysis, semantic segmentation, hierarchical document modeling
- Benchmarks & Evaluation: Novel datasets, evaluation frameworks, metrics for document reasoning, attribution quality assessment, agent performance measurement
Workshop Schedule
November 12, 2025 • All times in local time
| Time | Type | Title/Speaker | Mode |
|---|---|---|---|
| 8:30 - 8:40 | Opening |
Nedim Lipka - Opening Speech
|
In-person |
| 8:40 - 9:15 | Keynote |
Graham Neubig - Large Language Models for Information Synthesis over Long Contexts
|
Virtual |
| 9:15 - 9:30 | Paper |
From Roots to Rewards: Dynamic Tree Reasoning with Reinforcement Learning
|
Virtual |
| 9:30 - 9:45 | Paper |
Agentic Meta-Orchestrator for Multi-task Copilots
|
In-person |
| 9:45 - 10:20 | Keynote |
Hamed Zamani - Retrieval-Augmented Reasoning
|
Virtual |
| 10:20 - 10:35 | ☕ Coffee Break | ||
| 10:35 - 10:50 | Paper |
From Regulations to IDS: A Tool-Augmented LLM Pipeline for Automated BIM Rule Checks
|
Virtual |
| 10:50 - 11:05 | Paper |
Attribution Quality in AI-Generated Content: Benchmarking Style Embeddings and LLM Judges
|
In-person |
| 11:05 - 11:20 | Paper |
Retrieval and Augmentation of Domain Knowledge for Text-to-SQL Semantic Parsing
|
Virtual |
| 11:20 - 11:30 | BUFFER - Overflow / Networking | ||
| 11:30 - 12:05 | Keynote |
Mohit Bansal - Multimodal Retrieval for Understanding and Generation
|
Virtual |
| 12:05 - 12:20 | Paper |
"Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models
|
In-person |
| 12:20 - 12:30 | Closing |
Manan Suri, Puneet Mathur - Concluding Statement
|
In-person |
Important Dates
| Workshop papers submission | Aug 29, 2025 |
| Notification of paper acceptance | |
| Camera-ready deadline and copyright form | Sep 25, 2025 |
| Workshop date | Nov 12, 2025 |
Organizers
University of Maryland, College Park
Adobe Research
University of Maryland, College Park
Adobe Research
Adobe Research
Adobe Research
Arizona State University