KTO - Kahneman-Tversky Optimization

Created: 2026-03-03 10:16
#quicknote

Kahneman-Tversky Optimization (KTO) applies behavioural economics — specifically prospect theory's loss aversion — to LLM alignment. Unlike DPO - Direct Preference Optimization which requires paired preferences (A > B), KTO operates on simple binary desirable/undesirable labels, making it far more data-efficient and practical for real-world feedback. Introduced by Ethayarajh et al. (2024), KTO matches or exceeds preference-based methods at 1B–30B parameter scales. Part of the LLM Training and Alignment Evolution.

Loss aversion principle: Humans overweight losses relative to gains ($\lambda \approx 1.5$–$2.5$). KTO translates this into the Human-Aware Loss Objective (HALO): undesirable outputs are penalised much more heavily than desirable outputs are rewarded
No paired preferences needed: Only requires binary labels (good/bad), which are far more abundant than paired comparisons — every company has customer interaction data naturally marked this way
Performance: Matches or exceeds DPO and IPO on alignment benchmarks across multiple model scales
When to use KTO: When you have abundant binary feedback but scarce preference pairs, when scaling alignment without expensive annotation, or when diverse feedback sources (logs, ratings) are available

Aspect	DPO	KTO
Input	Paired preferences (A > B)	Binary labels (good/bad)
Foundation	Bradley-Terry model	Prospect theory
Data needs	Preference pairs	Any labelled data

KTO - Kahneman-Tversky Optimization

Resources

Tags