
自動運転技術が進化する中で、単にルールに従うだけでなく、人間のようにスムーズで安全、かつ快適な運転を実現することが次の大きな課題となっています。そこで本研究プロジェクトでは、近年のAI技術の中核である大規模言語モデル(LLM)を活用し、人間の持つ繊細な運転の「好み」(例えば、適切な車間距離の維持やスムーズな加減速など)を自動運転モデルに組み込む「PrefDrive」フレームワークを開発しました 。これにより、交通法規の遵守といった基本的な要求から、より人間らしい運転挙動まで、幅広い要求に応えられるシステムの実現を目指します 。
PrefDriveの核心は、「直接的選好最適化(DPO)」という選好学習の手法を自動運転分野で先駆けて導入した点にあります 。これは、特定の交通状況において「望ましい運転操作(chosen)」と「望ましくない運転操作(rejected)」のペアをモデルに提示し、その比較から人間にとっての最適解を学習させるアプローチです 。この研究のために、74,040シーケンスにも及ぶ独自の運転選好データセットを構築し、公開しました 。また、LoRAや4ビット量子化といったメモリ効率化技術を駆使することで、研究室レベルの一般的なGPUでも高度なLLMのファインチューニングを可能にし、研究のアクセス性を高めています 。
しかし、実際の運転における意思決定は、単純な二者択一ではありません。一つの正しい操作に対し、危険性の度合いが異なる複数の誤った選択肢が存在します 。この複雑な判断のニュアンスを捉えるため、我々はプロジェクトをさらに進化させ、「Multi-PrefDrive」を開発しました 。この新フレームワークでは、一つの「望ましい操作」に対して、「攻撃的すぎる」「不注意」「過度に慎重」といった複数の「望ましくない操作」をセットで学習させます 。これにより、モデルは多様なエラーの中から最適な行動をより精密に見分ける能力を獲得します。
Multi-PrefDriveでは、複数の選択肢を扱うためにPlackett-Luceモデルという高度な選好モデルを実装しました 。CARLAシミュレータを用いた実験では、このアプローチが従来のDPOを上回り、特に安全性において劇的な性能向上を達成することを示しました。具体的には、インフラとの衝突を83.6%削減し、特定の環境下では信号無視を完全にゼロに抑えることに成功しました 。この成果は、人間の複雑な判断基準をAIに学習させることが、より安全で信頼性の高い自動運転の実現に不可欠であることを示しています。
@inproceedings{Li2025d,
title = {Multi-PrefDrive: Optimizing Large Language Models for Autonomous Driving Through Multi-Preference Tuning},
author = {Yun Li and Ehsan Javanmardi and Simon Thompson and Kai Katsumata and Alex Orsholits and Manabu Tsukada},
url = {https://liyun0607.github.io/},
doi = {10.1109/IROS60139.2025.11247608},
year = {2025},
date = {2025-10-19},
urldate = {2025-10-19},
booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
address = {Hangzhou, China},
abstract = {This paper introduces Multi-PrefDrive, a framework that significantly enhances LLM-based autonomous driving through multidimensional preference tuning. Aligning LLMs with human driving preferences is crucial yet challenging, as driving scenarios involve complex decisions where multiple incorrect actions can correspond to a single correct choice. Traditional binary preference tuning fails to capture this complexity. Our approach pairs each chosen action with multiple rejected alternatives, better reflecting real-world driving decisions. By implementing the Plackett-Luce preference model, we enable nuanced ranking of actions across the spectrum of possible errors. Experiments in the CARLA simulator demonstrate that our algorithm achieves an 11.0% improvement in overall score and an 83.6% reduction in
infrastructure collisions, while showing perfect compliance with traffic signals in certain environments. Comparative analysis against DPO and its variants reveals that Multi-PrefDrive’s superior discrimination between chosen and rejected actions, which achieving a margin value of 25, and such ability has been directly translates to enhanced driving performance. We implement memory-efficient techniques including LoRA and 4-bit quantization to enable deployment on consumer-grade hardware and will open-source our training code and multi-rejected dataset to advance research in LLM-based autonomous driving systems. Project Page (https://liyun0607.github.io/)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Li2025c,
title = {PrefDrive: Enhancing Autonomous Driving through Preference-Guided Large Language Models},
author = {Yun Li and Ehsan Javanmardi and Simon Thompson and Kai Katsumata and Alex Orsholits and Manabu Tsukada},
url = {https://github.com/LiYun0607/PrefDrive/
https://huggingface.co/liyun0607/PrefDrive
https://huggingface.co/datasets/liyun0607/PrefDrive},
doi = {10.1109/IV64158.2025.11097672},
year = {2025},
date = {2025-06-22},
urldate = {2025-06-22},
booktitle = {36th IEEE Intelligent Vehicles Symposium (IV2025)},
address = {Cluj-Napoca, Romania},
abstract = {This paper presents PrefDrive, a novel framework that integrates driving preferences into autonomous driving models through large language models (LLMs). While recent advances in LLMs have shown promise in autonomous driving, existing approaches often struggle to align with specific driving behaviors (e.g., maintaining safe distances, smooth acceleration patterns) and operational requirements (e.g., traffic rule compliance, route adherence). We address this challenge by developing a preference learning framework that combines multimodal perception with natural language understanding. Our approach leverages Direct Preference Optimization (DPO) to fine-tune LLMs efficiently on consumer-grade hardware, making advanced autonomous driving research more accessible to the broader research community. We introduce a comprehensive dataset of 74,040 sequences, carefully annotated with driving preferences and driving decisions, which, along with our trained model checkpoints, will be made publicly available to facilitate future research. Through extensive experiments in the CARLA simulator, we demonstrate that our preference-guided approach significantly improves driving performance across multiple metrics, including distance maintenance and trajectory smoothness. Results show up to 28.1% reduction in traffic rule violations and 8.5% improvement in navigation task completion while maintaining appropriate distances from obstacles. The framework demonstrates robust performance across different urban environments, showcasing the effectiveness of preference learning in autonomous driving applications. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
autonomous driving v2x
digital twins extended reality
digital twins
autonomous driving machine learning
machine learning v2x
autonomous driving v2x
extended reality