Cascaded Language Models for Cost-effective Human-AI Decision-Making

Authors: Claudio Fanconi, Mihaela van der Schaar

Affiliations: University of Cambridge

Venue: NeurIPS 2025 (Poster) & ICML 2025 Workshop on Multi Agent Systems

Representative figure for Cascaded Language Models for Cost-effective Human-AI Decision-Making

Abstract

A challenge in human-AI decision-making is balancing prediction correctness, knowledge and reasoning costs, and confidence in whether to abstain or escalate to human experts. This paper presents a cascaded LLM framework that adaptively delegates tasks across three tiers: a base model, a stronger but costlier model, and a human expert when the model cascade abstains. The method uses a two-stage policy: first, a deferral policy decides whether to accept the base model's answer or regenerate with the larger model; second, an abstention policy decides whether the cascade output is certain enough or should be referred to a human. An online learning mechanism incorporates human feedback to adapt to changing task difficulty. Evaluations on ARC-Easy, ARC-Challenge, MMLU, MedQA, and MedMCQA show that the cascaded strategy improves accuracy in most settings while reducing cost and handling abstentions in a principled way.