BAI agents are reshaping Document AI — but can they be trusted in production? This tutorial walks through the agent stack for enterprise documents, from parsing to retrieval to reasoning, exposing where current systems silently fail and what it takes to close the gap. Drawing on recent benchmarks (ParseBench, MADQA), open-source tools (DRAG), and lessons from deploying agentic workflows in industry, we examine the tension between flexibility and reliability: agents that self-correct through visual reflection, agents that learn search strategies from experience, and the emerging need for systems that compile learned intelligence into deterministic, auditable pipelines. We conclude with open research problems and an invitation to collaborate.
Speaker's Bio: JORDY VAN LANDEGHEM received an M.A. degree in Linguistics (2015), an M.Sc. degree in Artificial Intelligence (2017), and a Ph.D. degree in Computer Science (2024), all from KU Leuven, Belgium. He completed research internships at Oracle and Nuance Communications and spent seven years as Lead AI Research Engineer at Contract.fit, a European IDP start-up. His doctoral research on "Intelligent Automation for AI-Driven Document Understanding" spans probabilistic deep learning, calibration, uncertainty quantification, and out-of-distribution robustness. He spearheaded the DUDE benchmark and the ICDAR 2023 competition, with further publications at ICML, ICCV, and WACV. Most recently a Senior ML Engineer at Instabase leading GenAI and Agentic AI efforts, while collaborating on MADQA, he now runs an independent global AI/ML consultancy from Belgium (Probably Approximately Human BV).
Brief TBC
Speaker's Bio: Brandon Smock is a Senior Applied Scientist for Document Intelligence at Kensho Technologies, with deep expertise in machine learning and algorithm development. During his tenure at Microsoft as a Principal Applied Scientist, he spearheaded the development of the Table Transformer (TATR), a state-of-the-art deep learning approach to recognizing and extracting data from tables in unstructured documents. The Table Transformer models have since been downloaded over two million times in a single month on Hugging Face, placing them among the most popular object detection models available. His work is characterized by a strong focus on scalable, data-centric machine learning, including automated cleaning of large-scale crowd-sourced data and the creation of realistic synthetic training data. Brandon has presented his work at venues including CVPR and ICDAR, and continues to push the boundaries of document intelligence research.
Brief TBC
Speaker's Bio: Dr. Sheraz Ahmed has been associated with DFKI for over fifteen years and is currently a Principal Researcher there. He completed his PhD at TU Kaiserslautern on the topic of generic frameworks for information segmentation in document images, and has since become a leading scientific voice in the Smart Data & Knowledge Services research area. He has attracted international attention through his publications on the application of machine learning to document analysis and life sciences, as well as his work on explainable AI systems. He is also the founder and CEO of DeepReader GmbH, a company he established to bridge the gap between academic research and industry applications. His research interests span document understanding, explainable AI, pattern recognition, anomaly detection, genome analysis, and natural language processing. In recognition of his contributions, Sheraz was honored with the prestigious DFKI Research Fellow Award.