Contributor: Nikhil Wani

Trustworthy Large Language Models (LLMs) and Agents

Mentors: Leilani Gilpin
Organization: UC OSPO
Technologies: python, postgresql, javascript, flask, typescript, HuggingFace, Amazon Web Services
Topics: Responsible AI, Agents, Large Language Models (LLMs), Human-AI Interaction, Decision-making, Reasoning, Trust & Safety

LLM hallucinations have been tackled using various techniques, including modern retrieval methods such as Retrieval-Augmented Generation (RAGs). These techniques show that combining non-parametric memory, which is retrieved and ranked from sources like knowledge graphs and/or databases, with the LLMs' parametric memory helps generate less factually incorrect information. However, hallucination explainability still remains unsolved, especially on logical reasoning tasks. Additionally, the impact of uncertainty caused by these hallucinations on human trust and decision-making still remains unexplained. To address these problems, we propose two solutions: 1. Conduct a first-of-its-kind study that develops a new conversational explanation paradigm under compositional uncertainty and performs human evaluation. 2. In parallel, develop a neuro-symbolic framework for explaining hallucinations in large language models by utilizing agentic Prolog knowledge-base rules. To accomplish these objectives, we first build a lightweight full-stack web application that collects quantitative and qualitative human feedback for model alignment. We also design LLM explanation paradigms grounded simultaneously in multiple sources (open and closed source), develop models (LLMs and agents), and build a pipeline for storing the human feedback in common data format to a secure database, which can be used to further adjust the weights of the reward model.