RARA

Documents serve as the backbone of knowledge preservation, decision-making, and information dissemination across domains such as law, finance, medicine, and academia. They encapsulate structured and unstructured data, requiring sophisticated processing to extract, synthesize, and verify critical insights. As AI-driven document understanding advances, ensuring factual accuracy, reasoning over complex contexts, and attributing information to reliable sources becomes increasingly vital. The RARA workshop addresses persistent challenges in document processing that have been amplified in the era of Large Language Models (LLMs). Extracting information from complex, multi-modal documents often requires reasoning over multiple sections, linking disparate pieces of evidence, and ensuring consistency across sources. Attribution remains a key issue, as models struggle to trace generated content back to reliable references, making verification difficult. While LLMs have improved contextual understanding, they also introduce risks such as hallucinations, overgeneralization, and opacity in their decision-making. Our workshop explores frameworks that ground document understanding through four key emerging pillars:

Reasoning: Structured mechanisms to navigate complex inference chains and resolve ambiguities
Agents: Specialized components working together on complex tasks with access to executable tools
Retrieval: Methods for finding and integrating relevant information to ground AI responses
Attribution: Techniques to ensure AI-generated content remains traceable to reliable sources

Speakers

Mohit Bansal

University of North Carolina at Chapel Hill

Graham Neubig

Carnegie Mellon University, All Hands AI

Hamed Zamani

University of Massachusetts Amherst

Call for Papers

We invite submissions to the first international workshop on Reasoning, Agents, Retrieval, and Attribution (RARA) for grounding documents, to be held in conjunction with ICDM 2025.

As documents continue to serve as the foundation of knowledge across domains like law, finance, healthcare, and academia, AI systems must develop more sophisticated capabilities to process, interpret, and reason over these complex information sources. The RARA workshop aims to bring together researchers and practitioners working at the intersection of document understanding, reasoning systems, multi-agent architectures, and information retrieval to address key challenges in this rapidly evolving field.

Topics of Interest

We welcome original research papers on topics including but not limited to:

Complex Reasoning: Multi-hop inference across document sections, logical consistency maintenance, ambiguity resolution in domain-specific texts
Agent Architectures: Multi-agent coordination frameworks, tool-augmented document understanding, planning strategies for document analysis
Document-Specific Agents: Citation verification, fact-checking, table extraction, chart interpretation, formula extraction, document summarization and comparison agents
Domain-Specific Document Processing: Specialized techniques for legal, financial, healthcare, academic, technical, and government documents
Advanced Retrieval: Dense/sparse retrieval for multi-modal documents, cross-document information synthesis, retrieval-augmented generation for document processing
Attribution Mechanisms: Source tracing in AI-generated content, confidence calibration in document analysis, verification of AI-generated claims
Multi-Modal Processing: Handling diverse document formats including charts, tables, infographics, diagrams, flowcharts, forms, and other visually rich elements
Document Structure: Layout analysis, semantic segmentation, hierarchical document modeling
Benchmarks & Evaluation: Novel datasets, evaluation frameworks, metrics for document reasoning, attribution quality assessment, agent performance measurement

Workshop Schedule

November 12, 2025 • All times in local time

Time	Type	Title/Speaker	Mode
8:30 - 8:40	Opening	Nedim Lipka - Opening Speech	In-person
8:40 - 9:15	Keynote	Graham Neubig - Large Language Models for Information Synthesis over Long Contexts 30 min + 5 min Q&A	Virtual
9:15 - 9:30	Paper	From Roots to Rewards: Dynamic Tree Reasoning with Reinforcement Learning Ahmed Bahloul, Simon Malberg	Virtual
9:30 - 9:45	Paper	Agentic Meta-Orchestrator for Multi-task Copilots Xiaofeng Zhu, Yunshen Zhou	In-person
9:45 - 10:20	Keynote	Hamed Zamani - Retrieval-Augmented Reasoning 30 min + 5 min Q&A	Virtual
10:20 - 10:35	☕ Coffee Break
10:35 - 10:50	Paper	From Regulations to IDS: A Tool-Augmented LLM Pipeline for Automated BIM Rule Checks Ivan Perov, Anastasiia Filatova, Egor Timoschak, Denis Nasonov	Virtual
10:50 - 11:05	Paper	Attribution Quality in AI-Generated Content: Benchmarking Style Embeddings and LLM Judges Misam Abbas	In-person
11:05 - 11:20	Paper	Retrieval and Augmentation of Domain Knowledge for Text-to-SQL Semantic Parsing Manasi Patwardhan, Ayush Agarwal, Shabbirhussain Bhaisaheb, Aseem Arora, Lovekesh Vig, Sunita Sarawagi	Virtual
11:20 - 11:30	BUFFER - Overflow / Networking
11:30 - 12:05	Keynote	Mohit Bansal - Multimodal Retrieval for Understanding and Generation 30 min + 5 min Q&A	Virtual
12:05 - 12:20	Paper	"Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models Yufei Tao, Adam Hiatt, Rahul Seetharaman, Ameeta Agrawal	In-person
12:20 - 12:30	Closing	Manan Suri, Puneet Mathur - Concluding Statement	In-person

Important Dates

Workshop papers submission	Aug 29, 2025
Notification of paper acceptance	~~Sep 15, 2025~~ → Sep 19, 2025
Camera-ready deadline and copyright form	Sep 25, 2025
Workshop date	Nov 12, 2025