As autonomous driving technology evolves, the next major challenge is to create systems that not only follow rules but also drive smoothly, safely, and comfortably, much like a skilled human. This research project addresses this challenge by developing the “PrefDrive” framework, which integrates nuanced human driving preferences—such as maintaining safe distances or ensuring smooth acceleration—into autonomous driving models using Large Language Models (LLMs). The goal is to create a system that can align with a wide range of requirements, from basic operational needs like traffic rule compliance to more human-like driving behaviors.
The core of PrefDrive is its pioneering use of Direct Preference Optimization (DPO), a preference learning technique, in the autonomous driving domain. This approach trains the model by having it learn from pairs of “chosen” (desirable) and “rejected” (undesirable) driving actions for a given scenario, allowing it to discern the optimal human choice. For this research, we built and publicly released a comprehensive dataset of 74,040 driving preference sequences. By implementing memory-efficient techniques like LoRA and 4-bit quantization, we have also made advanced LLM fine-tuning accessible on consumer-grade hardware, broadening research opportunities in the field.
However, real-world driving decisions are rarely a simple binary choice. For a single correct action, there are often multiple potential incorrect actions, each carrying a different degree of risk. To capture this complex decision-making landscape, we evolved the project to create “Multi-PrefDrive”. This advanced framework trains the model by pairing one “chosen” action with
multiple “rejected” alternatives, such as actions that are “aggressive,” “inattentive,” or “overcautious”. This enables the model to develop a more nuanced understanding of the spectrum of possible driving errors.
Multi-PrefDrive implements the sophisticated Plackett-Luce preference model to handle the ranked list of multiple choices. Experiments in the CARLA simulator demonstrated that this multi-preference approach dramatically improves performance over the standard DPO, especially in safety. The framework achieved an 83.6% reduction in infrastructure collisions and, in certain environments, eliminated traffic light violations entirely. This work validates that teaching AI to understand complex human judgment is a critical step toward creating safer and more reliable autonomous vehicles.
@inproceedings{Li2025d,
title = {Multi-PrefDrive: Optimizing Large Language Models for Autonomous Driving Through Multi-Preference Tuning},
author = {Yun Li and Ehsan Javanmardi and Simon Thompson and Kai Katsumata and Alex Orsholits and Manabu Tsukada},
url = {https://liyun0607.github.io/},
year = {2025},
date = {2025-10-19},
booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
address = {Hangzhou, China},
abstract = {This paper introduces Multi-PrefDrive, a framework that significantly enhances LLM-based autonomous driving through multidimensional preference tuning. Aligning LLMs with human driving preferences is crucial yet challenging, as driving scenarios involve complex decisions where multiple incorrect actions can correspond to a single correct choice. Traditional binary preference tuning fails to capture this complexity. Our approach pairs each chosen action with multiple rejected alternatives, better reflecting real-world driving decisions. By implementing the Plackett-Luce preference model, we enable nuanced ranking of actions across the spectrum of possible errors. Experiments in the CARLA simulator demonstrate that our algorithm achieves an 11.0% improvement in overall score and an 83.6% reduction in
infrastructure collisions, while showing perfect compliance with traffic signals in certain environments. Comparative analysis against DPO and its variants reveals that Multi-PrefDrive’s superior discrimination between chosen and rejected actions, which achieving a margin value of 25, and such ability has been directly translates to enhanced driving performance. We implement memory-efficient techniques including LoRA and 4-bit quantization to enable deployment on consumer-grade hardware and will open-source our training code and multi-rejected dataset to advance research in LLM-based autonomous driving systems. Project Page (https://liyun0607.github.io/)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Li2025c,
title = {PrefDrive: Enhancing Autonomous Driving through Preference-Guided Large Language Models},
author = {Yun Li and Ehsan Javanmardi and Simon Thompson and Kai Katsumata and Alex Orsholits and Manabu Tsukada},
url = {https://github.com/LiYun0607/PrefDrive/
https://huggingface.co/liyun0607/PrefDrive
https://huggingface.co/datasets/liyun0607/PrefDrive},
doi = {10.1109/IV64158.2025.11097672},
year = {2025},
date = {2025-06-22},
urldate = {2025-06-22},
booktitle = {36th IEEE Intelligent Vehicles Symposium (IV2025)},
address = {Cluj-Napoca, Romania},
abstract = {This paper presents PrefDrive, a novel framework that integrates driving preferences into autonomous driving models through large language models (LLMs). While recent advances in LLMs have shown promise in autonomous driving, existing approaches often struggle to align with specific driving behaviors (e.g., maintaining safe distances, smooth acceleration patterns) and operational requirements (e.g., traffic rule compliance, route adherence). We address this challenge by developing a preference learning framework that combines multimodal perception with natural language understanding. Our approach leverages Direct Preference Optimization (DPO) to fine-tune LLMs efficiently on consumer-grade hardware, making advanced autonomous driving research more accessible to the broader research community. We introduce a comprehensive dataset of 74,040 sequences, carefully annotated with driving preferences and driving decisions, which, along with our trained model checkpoints, will be made publicly available to facilitate future research. Through extensive experiments in the CARLA simulator, we demonstrate that our preference-guided approach significantly improves driving performance across multiple metrics, including distance maintenance and trajectory smoothness. Results show up to 28.1% reduction in traffic rule violations and 8.5% improvement in navigation task completion while maintaining appropriate distances from obstacles. The framework demonstrates robust performance across different urban environments, showcasing the effectiveness of preference learning in autonomous driving applications. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
autonomous driving machine learning
machine learning v2x
autonomous driving v2x
extended reality
v2x
machine learning v2x
digital twins extended reality
digital twins extended reality
We are part of the University of Tokyo’s Graduate School of Information Science and Technology, Department of Creative Informatics and focuses on computer networks and cyber-physical systems
Address
4F, I-REF building, Graduate School of Information Science and Technology, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
Room 91B1, Bld 2 of Engineering Department, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
Mail: