RARA Workshop Logo

Workshop on Grounding Documents with Reasoning, Agents, Retrieval, and Attribution

November 12, 2025 @ ICDM 2025. Washington DC.


Documents serve as the backbone of knowledge preservation, decision-making, and information dissemination across domains such as law, finance, medicine, and academia. They encapsulate structured and unstructured data, requiring sophisticated processing to extract, synthesize, and verify critical insights. As AI-driven document understanding advances, ensuring factual accuracy, reasoning over complex contexts, and attributing information to reliable sources becomes increasingly vital. The RARA workshop addresses persistent challenges in document processing that have been amplified in the era of Large Language Models (LLMs). Extracting information from complex, multi-modal documents often requires reasoning over multiple sections, linking disparate pieces of evidence, and ensuring consistency across sources. Attribution remains a key issue, as models struggle to trace generated content back to reliable references, making verification difficult. While LLMs have improved contextual understanding, they also introduce risks such as hallucinations, overgeneralization, and opacity in their decision-making. Our workshop explores frameworks that ground document understanding through four key emerging pillars:

  • Reasoning: Structured mechanisms to navigate complex inference chains and resolve ambiguities
  • Agents: Specialized components working together on complex tasks with access to executable tools
  • Retrieval: Methods for finding and integrating relevant information to ground AI responses
  • Attribution: Techniques to ensure AI-generated content remains traceable to reliable sources

Speakers

Mohit Bansal
University of North Carolina at Chapel Hill
Graham Neubig
Carnegie Mellon University, All Hands AI
Hamed Zamani
University of Massachusetts Amherst

Call for Papers

We invite submissions to the first international workshop on Reasoning, Agents, Retrieval, and Attribution (RARA) for grounding documents, to be held in conjunction with ICDM 2025.

As documents continue to serve as the foundation of knowledge across domains like law, finance, healthcare, and academia, AI systems must develop more sophisticated capabilities to process, interpret, and reason over these complex information sources. The RARA workshop aims to bring together researchers and practitioners working at the intersection of document understanding, reasoning systems, multi-agent architectures, and information retrieval to address key challenges in this rapidly evolving field.

Topics of Interest

We welcome original research papers on topics including but not limited to:

  • Complex Reasoning: Multi-hop inference across document sections, logical consistency maintenance, ambiguity resolution in domain-specific texts
  • Agent Architectures: Multi-agent coordination frameworks, tool-augmented document understanding, planning strategies for document analysis
  • Document-Specific Agents: Citation verification, fact-checking, table extraction, chart interpretation, formula extraction, document summarization and comparison agents
  • Domain-Specific Document Processing: Specialized techniques for legal, financial, healthcare, academic, technical, and government documents
  • Advanced Retrieval: Dense/sparse retrieval for multi-modal documents, cross-document information synthesis, retrieval-augmented generation for document processing
  • Attribution Mechanisms: Source tracing in AI-generated content, confidence calibration in document analysis, verification of AI-generated claims
  • Multi-Modal Processing: Handling diverse document formats including charts, tables, infographics, diagrams, flowcharts, forms, and other visually rich elements
  • Document Structure: Layout analysis, semantic segmentation, hierarchical document modeling
  • Benchmarks & Evaluation: Novel datasets, evaluation frameworks, metrics for document reasoning, attribution quality assessment, agent performance measurement


Workshop Schedule

November 12, 2025 • All times in local time

Time Type Title/Speaker Mode
8:30 - 8:40 Opening
Nedim Lipka - Opening Speech
In-person
8:40 - 9:15 Keynote
Graham Neubig - Large Language Models for Information Synthesis over Long Contexts
30 min + 5 min Q&A
Virtual
9:15 - 9:30 Paper
From Roots to Rewards: Dynamic Tree Reasoning with Reinforcement Learning
Ahmed Bahloul, Simon Malberg
Virtual
9:30 - 9:45 Paper
Agentic Meta-Orchestrator for Multi-task Copilots
Xiaofeng Zhu, Yunshen Zhou
In-person
9:45 - 10:20 Keynote
Hamed Zamani - Retrieval-Augmented Reasoning
30 min + 5 min Q&A
Virtual
10:20 - 10:35 ☕ Coffee Break
10:35 - 10:50 Paper
From Regulations to IDS: A Tool-Augmented LLM Pipeline for Automated BIM Rule Checks
Ivan Perov, Anastasiia Filatova, Egor Timoschak, Denis Nasonov
Virtual
10:50 - 11:05 Paper
Attribution Quality in AI-Generated Content: Benchmarking Style Embeddings and LLM Judges
Misam Abbas
In-person
11:05 - 11:20 Paper
Retrieval and Augmentation of Domain Knowledge for Text-to-SQL Semantic Parsing
Manasi Patwardhan, Ayush Agarwal, Shabbirhussain Bhaisaheb, Aseem Arora, Lovekesh Vig, Sunita Sarawagi
Virtual
11:20 - 11:30 BUFFER - Overflow / Networking
11:30 - 12:05 Keynote
Mohit Bansal - Multimodal Retrieval for Understanding and Generation
30 min + 5 min Q&A
Virtual
12:05 - 12:20 Paper
"Lost-in-the-Later": Framework for Quantifying Contextual Grounding in Large Language Models
Yufei Tao, Adam Hiatt, Rahul Seetharaman, Ameeta Agrawal
In-person
12:20 - 12:30 Closing
Manan Suri, Puneet Mathur - Concluding Statement
In-person


Important Dates


Workshop papers submission Aug 29, 2025
Notification of paper acceptance Sep 15, 2025 → Sep 19, 2025
Camera-ready deadline and copyright form Sep 25, 2025
Workshop date Nov 12, 2025

Organizers

Manan Suri
University of Maryland, College Park
Puneet Mathur
Adobe Research
Dinesh Manocha
University of Maryland, College Park
Nedim Lipka
Adobe Research
Franck Dernoncourt
Adobe Research
Ryan A. Rossi
Adobe Research
Vivek Gupta
Arizona State University
Ramit Sawhney
Georgia Institute of Technology