自律型の自動運転を補完するため、車両をネットワークで繋ぎ、車載センサでは見通せない場所の情報を得るネットワーク型自動運転が必要不可欠です。計算機は、自己位置推定・物体検知・道路認知・走行計画など自動運転に必要な計算を全てサイバー空間で行なっているため、人間のように知覚・判断・操作が一体となった主体が運転する必要はなく、これらを分散環境で行うことで、交通を飛躍的に安全化、効率化できます。
キーワード: Vision-Language Model (VLM), 協調型ITS、無線ネットワーク、クラウド・エッジコンピューティング、ダイナミックマップ、アドホックネットワーク、第五世代携帯網(5G)、ISO/ETSI標準
インターネットを通じた地球規模の一体感を作り出すソフトウェア空間の創出します。一方向性の強い現在の大規模メディアやストリーミングサービスから転換し、会場と大規模かつ分散する観客のインタラクションを支援することで、会場と遠隔の参加者の音楽ライブへの参加体験を抜本的に転換させます。収録対象の音と映像を三次元的に解釈し、複数の視聴オブジェクトに分解して伝送、再生側の設備に合わせて柔軟に再生するオブジェクト方式や、映像音声、VR、触覚、IoT、計測、建築にまたがる技術を統合しさらに発展させます。
@inproceedings{Li2025b,
title = {State-Guided Spatial Cross-Attention for Enhanced End-to-End Autonomous Driving},
author = {Dongyang Li and Ehsan Javanmardi and Manabu Tsukada},
year = {2025},
date = {2025-09-30},
urldate = {2025-09-30},
booktitle = {IEEE International Automated Vehicle Validation Conference (IAVVC 2025)},
address = {Baden-Baden, Germany},
abstract = {Handling near-accident scenarios is a significant challenge for end-to-end autonomous driving (E2E-AD), as these situations often involve sudden environmental changes, complex interactions with other road users, and high-risk decision-making under uncertainty. Unlike routine driving tasks, near-accident scenarios require rapid and precise responses based on external perception and internal vehicle dynamics. Successfully navigating such situations demands not only a comprehensive understanding of the surrounding environment but also an accurate assessment of the ego vehicle's state, including speed, acceleration, and steering angle, to ensure safe and reliable control. However, conventional E2E-AD models struggle to handle these safety-critical situations effectively. Standard approaches primarily rely on raw sensor inputs to learn driving policies, often overlooking the crucial role of vehicle state information in decision-making. Since many near-accident scenarios involve conditions where the same environmental observation could require vastly different responses depending on the ego vehicle's motion state-such as whether the vehicle is braking, accelerating, or experiencing traction loss-ignoring these internal dynamics can lead to unsafe or suboptimal actions. Furthermore, E2E-AD models typically learn a direct mapping from sensory inputs to control outputs, making it difficult to generalize to highly dynamic and unpredictable interactions, such as emergency evasive maneuvers or sudden braking events. To address these challenges, we propose a state-guided cross-attention mechanism that explicitly models the interaction between the ego vehicle's states and its perception of the environment. By incorporating vehicle state information into the decision-making process, our approach ensures that the model can dynamically adjust its attention to critical sensory inputs based on real-time driving conditions. This allows the autonomous system to make more context-aware decisions, improving its ability to respond effectively to complex and safety-critical scenarios.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Li2025c,
title = {PrefDrive: Enhancing Autonomous Driving through Preference-Guided Large Language Models},
author = {Yun Li and Ehsan Javanmardi and Simon Thompson and Kai Katsumata and Alex Orsholits and Manabu Tsukada},
url = {https://github.com/LiYun0607/PrefDrive/
https://huggingface.co/liyun0607/PrefDrive
https://huggingface.co/datasets/liyun0607/PrefDrive},
year = {2025},
date = {2025-06-22},
urldate = {2025-06-22},
booktitle = {36th IEEE Intelligent Vehicles Symposium (IV2025)},
address = {Cluj-Napoca, Romania},
abstract = {This paper presents PrefDrive, a novel framework that integrates driving preferences into autonomous driving models through large language models (LLMs). While recent advances in LLMs have shown promise in autonomous driving, existing approaches often struggle to align with specific driving behaviors (e.g., maintaining safe distances, smooth acceleration patterns) and operational requirements (e.g., traffic rule compliance, route adherence). We address this challenge by developing a preference learning framework that combines multimodal perception with natural language understanding. Our approach leverages Direct Preference Optimization (DPO) to fine-tune LLMs efficiently on consumer-grade hardware, making advanced autonomous driving research more accessible to the broader research community. We introduce a comprehensive dataset of 74,040 sequences, carefully annotated with driving preferences and driving decisions, which, along with our trained model checkpoints, will be made publicly available to facilitate future research. Through extensive experiments in the CARLA simulator, we demonstrate that our preference-guided approach significantly improves driving performance across multiple metrics, including distance maintenance and trajectory smoothness. Results show up to 28.1% reduction in traffic rule violations and 8.5% improvement in navigation task completion while maintaining appropriate distances from obstacles. The framework demonstrates robust performance across different urban environments, showcasing the effectiveness of preference learning in autonomous driving applications. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Jiang2025,
title = {Towards Efficient Roadside LiDAR Deployment: A Fast Surrogate Metric Based on Entropy-Guided Visibility},
author = {Yuze Jiang and Ehsan Javanmardi and Manabu Tsukada and Hiroshi Esaki},
url = {https://arxiv.org/abs/2504.06772},
year = {2025},
date = {2025-06-22},
urldate = {2025-06-22},
booktitle = {36th IEEE Intelligent Vehicles Symposium (IV2025)},
address = {Cluj-Napoca, Romania},
abstract = {The deployment of roadside LiDAR sensors plays a crucial
role in the development of Cooperative Intelligent
Transport Systems (C-ITS). However, the high cost of LiDAR
sensors necessitates efficient placement strategies to
maximize detection performance. Traditional roadside LiDAR
deployment methods rely on expert insight, making them
time-consuming. Automating this process, however, demands
extensive computation, as it requires not only visibility
evaluation but also assessing detection performance across
different LiDAR placements. To address this challenge, we
propose a fast surrogate metric, the Entropy-Guided
Visibility Score (EGVS), based on information gain to
evaluate object detection performance in roadside LiDAR
configurations. EGVS leverages Traffic Probabilistic
Occupancy Grids (TPOG) to prioritize critical areas and
employs entropy-based calculations to quantify the
information captured by LiDAR beams. This eliminates the
need for direct detection performance evaluation, which
typically requires extensive labeling and computational
resources. By integrating EGVS into the optimization
process, we significantly accelerate the search for optimal
LiDAR configurations. Experimental results using the AWSIM
simulator demonstrate that EGVS strongly correlates with
Average Precision (AP) scores and effectively predicts
object detection performance. This approach offers a
computationally efficient solution for roadside LiDAR
deployment, facilitating scalable smart infrastructure
development. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@misc{Tsukada2025b,
title = {V2X Communication Technologies in the Era of End-to-End Autonomous Driving},
author = {Manabu Tsukada},
url = {https://sites.google.com/view/b-stem-iot/},
year = {2025},
date = {2025-06-22},
urldate = {2025-06-22},
abstract = {Autonomous driving technology is undergoing a significant paradigm shift from traditional rule-based systems to integrated End-to-End (E2E) deep learning architectures. This transition necessitates a fundamental rethinking of Vehicle-to-Everything (V2X) communication, as existing V2X standards, primarily designed for rule-based systems, may not fully leverage the capabilities or address the needs of E2E models. This presentation explores the evolution required for V2X technologies in the E2E era. We contrast rule-based and E2E architectures, highlighting the limitations of current V2X approaches like object-level message sharing for E2E systems that benefit from richer data. While intermediate feature sharing via V2X is promising, its practical implementation faces hurdles, notably the heterogeneity of sensors, AI models, and tasks across vehicles. To address these challenges, we introduce a research approach aiming to maximize V2X value through an E2E pipeline encompassing data foundation (Co3SOP dataset for collaborative 3D semantic occupancy), perception adaptation (PHCP framework for heterogeneous collaboration during inference), and decision optimization (PrefDrive integrating LLMs with preference learning). Through these interconnected efforts, we aim to unlock the full potential of V2X communication to enhance the safety, efficiency, and robustness of E2E autonomous driving systems.},
howpublished = {The 2nd Workshop on Secure connected vehicles: Digital Twin, UAVs, and Smart Transportation, at IEEE IV 2025},
keywords = {},
pubstate = {published},
tppubtype = {presentation}
}
@workshop{Hu2025,
title = {A Low PAPR Layered Multi-User OTFS Modulation},
author = {Dou Hu and Jin Nakazato and Kazuki Maruta and Omid Abbassi Aghda and Rui Dinis and Manabu Tsukada},
year = {2025},
date = {2025-06-17},
urldate = {2025-06-17},
booktitle = {AI-Driven Connectivity for Vehicular and Wireless Networks in VTC2025-Spring},
address = {Oslo, Norway},
abstract = {In modern communication systems, meeting the growing demand for high-capacity transmission requires developing efficient and robust modulation techniques. To address
this, we propose a low-PAPR page-style Orthogonal Time Frequency Space (OTFS) modulation framework that enhances communication capacity while maintaining a low peak-to-average power ratio (PAPR). The proposed design introduces a novel pilot signal placement and analysis method, improving channel estimation accuracy and system performance in high-mobility multi-user scenarios. This paper provides an overview of recent advancements in OTFS-based multi-user communication systems, emphasizing their contributions to enhancing spectral efficiency, reliability, and robustness. Through extensive simulations, we demonstrate the effectiveness of the proposed framework in achieving superior BER performance, improved interference mitigation, and robust transmission capabilities compared to traditional methods, validating its suitability for next-generation communication networks.},
howpublished = {Workshop on AI-Driven Connectivity for Vehicular and Wireless Networks in VTC2025-Spring},
keywords = {},
pubstate = {published},
tppubtype = {workshop}
}
@inproceedings{Orsholits2025,
title = {Context-Rich Interactions in Mixed Reality through Edge AI Co-Processing},
author = {Alex Orsholits and Manabu Tsukada},
url = {https://link.springer.com/chapter/10.1007/978-3-031-87772-8_3},
doi = {10.1007/978-3-031-87772-8_3},
isbn = {978-3-031-87771-1},
year = {2025},
date = {2025-04-09},
urldate = {2025-04-09},
booktitle = {The 39-th International Conference on Advanced Information Networking and Applications (AINA 2025)},
address = {Barcelona, Spain},
abstract = {Spatial computing is evolving towards leveraging data streaming for computationally demanding applications, facilitating a shift to lightweight, untethered, and standalone devices. These devices are therefore ideal candidates for co-processing, where real-time context understanding and low-latency data streaming are fundamental for seamless, general-purpose Mixed Reality (MR) experiences. This paper demonstrates and evaluates a scalable approach to augmented contextual understanding in MR by implementing multi-modal edge AI co-processing through a Hailo-8 AI accelerator, a low-power ARM-based single board computer (SBC), and the Magic Leap 2 AR headset. The proposed system utilises the native WebRTC streaming capabilities of the Magic Leap 2 to continuously stream camera data to the edge co-processor, where a collection of vision AI models-object detection, pose estimation, face recognition, and depth estimation-are executed. The resulting inferences are then streamed back to the headset for spatial re-projection and transmitted to cloud-based systems for further integration with large-scale AI models, such as LLMs and VLMs. This seamless integration enhances real-time contextual understanding in MR while facilitating advanced multi-modal, multi-device collaboration, supporting richer, scalable spatial cognition across distributed systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Sugizaki2024,
title = {Digital Twin Based Open Platform for IoT Offloading Control: Enabling System Transparency and User Participation},
author = {Yusuke Sugizaki and Jin Nakazato and Manabu Tsukada},
year = {2024},
date = {2024-11-28},
booktitle = {International Conference on Intelligent Computing and its Emerging Applications (ICEA2024)},
address = {Tokyo, Japan},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@manual{塚田学2024c,
title = {デジタルツインが拓く新たな建築空間},
author = {塚田学},
url = {http://jabs.aij.or.jp/backnumber/1794.php},
year = {2024},
date = {2024-11-20},
urldate = {2024-11-20},
volume = {139},
number = {1794},
pages = {16-17},
edition = {建築雑誌(特集:データが再構成する建築)},
abstract = {建築分野におけるデジタル技術の進展は、インターネットの発展と並行して加速している。インターネットの普及を支えた要素として、オープンアーキテクチャ、相互運用性、オープンプラットフォームが挙げられるが、これらの要素は建築のデジタル化においても重要な役割を果たしている。特に、Building Information Modeling (BIM)の台頭は、建築のライフサイクル全体にわたる情報管理を可能にし、デジタルツインの基盤となっている。
デジタルツインは、物理的な建築物とそのデジタル表現を密接に結びつけ、リアルタイムデータの活用を可能にする。この技術は、建築物の設計、施工、運用、保守の各段階において革新的な変化をもたらす潜在力を秘めている。例えば、センサーネットワークから収集されるリアルタイムデータを用いて、空調システムの最適制御や予防保全を実現することが可能となる。さらに、拡張現実(AR)や仮想現実(VR)技術と組み合わせることで、建築空間の新たな利用方法や体験を創出することも期待される。
しかし、このようなデジタルツインの実現には、データの標準化、相互運用性の確保、セキュリティの担保など、多くの課題が存在する。特に、BIMデータとIoTデバイスから得られるリアルタイムデータの統合、そしてそれらのデータを効果的に管理し活用するためのプラットフォームの構築が重要な課題となっている。
本稿では、これらの課題に取り組む最新の研究成果を紹介する。具体的には、セマンティックデジタルツインによるデータ資産の包括的管理に関する研究と、Webベースのプラットフォームを用いたBIMデータの可視化および更新に関する研究について述べる。これらの研究は、建築のデジタルツイン実現に向けた重要な一歩であり、リアルタイムデータによって再構成される未来の建築の姿を示唆するものである。},
keywords = {},
pubstate = {published},
tppubtype = {manual}
}
@inproceedings{Orsholits2024,
title = {PLATONE: An Immersive Geospatial Audio Spatialization Platform},
author = {Alex Orsholits and Yiyuan Qian and Eric Nardini and Yusuke Obuchi and Manabu Tsukada},
doi = {10.1109/MetaCom62920.2024.00020},
year = {2024},
date = {2024-08-12},
urldate = {2024-08-12},
booktitle = {The 2nd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom 2024)},
address = {Hong Kong, China},
abstract = {In the rapidly evolving landscape of mixed reality (MR) and spatial computing, the convergence of physical and virtual spaces is becoming increasingly crucial for enabling immersive, large-scale user experiences and shaping inter-reality dynamics. This is particularly significant for immersive audio at city-scale, where the 3D geometry of the environment must be considered, as it drastically influences how sound is perceived by the listener. This paper introduces PLATONE, a novel proof-of-concept MR platform designed to augment urban contexts with environment-dependent spatialized audio. It leverages custom hardware for localization and orientation, alongside a cloud-based pipeline for generating real-time binaural audio. By utilizing open-source 3D building datasets, sound propagation effects such as occlusion, reverberation, and diffraction are accurately simulated. We believe that this work may serve as a compelling foundation for further research and development.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Takada2024,
title = {Design of Digital Twin Architecture for 3D Audio Visualization in AR},
author = {Tokio Takada and Jin Nakazato and Alex Orsholits and Manabu Tsukada and Hideya Ochiai and Hiroshi Esaki},
doi = {10.1109/MetaCom62920.2024.00044},
year = {2024},
date = {2024-08-12},
urldate = {2024-08-12},
booktitle = {The 2nd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom 2024)},
address = {Hong Kong, China},
abstract = {Digital twins have recently attracted attention from academia and industry as a technology connecting physical space and cyberspace. Digital twins are compatible with Augmented Reality (AR) and Virtual Reality (VR), enabling us to understand information in cyberspace. In this study, we focus on music and design an architecture for a 3D representation of music using a digital twin. Specifically, we organize the requirements for a digital twin for music and design the architecture. We establish a method to perform 3D representation in cyberspace and map the recorded audio data in physical space. In this paper, we implemented the physical space representation using a smartphone as an AR device and employed a visual positioning system (VPS) for self-positioning. For evaluation, in addition to system errors in the 3D representation of audio data, we conducted a questionnaire evaluation with several users as a user study. From these results, we evaluated the effectiveness of the implemented system. At the same time, we also found issues we need to improve in the implemented system in future works.},
key = {CREST},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Li2025c,
title = {PrefDrive: Enhancing Autonomous Driving through Preference-Guided Large Language Models},
author = {Yun Li and Ehsan Javanmardi and Simon Thompson and Kai Katsumata and Alex Orsholits and Manabu Tsukada},
url = {https://github.com/LiYun0607/PrefDrive/
https://huggingface.co/liyun0607/PrefDrive
https://huggingface.co/datasets/liyun0607/PrefDrive},
year = {2025},
date = {2025-06-22},
urldate = {2025-06-22},
booktitle = {36th IEEE Intelligent Vehicles Symposium (IV2025)},
address = {Cluj-Napoca, Romania},
abstract = {This paper presents PrefDrive, a novel framework that integrates driving preferences into autonomous driving models through large language models (LLMs). While recent advances in LLMs have shown promise in autonomous driving, existing approaches often struggle to align with specific driving behaviors (e.g., maintaining safe distances, smooth acceleration patterns) and operational requirements (e.g., traffic rule compliance, route adherence). We address this challenge by developing a preference learning framework that combines multimodal perception with natural language understanding. Our approach leverages Direct Preference Optimization (DPO) to fine-tune LLMs efficiently on consumer-grade hardware, making advanced autonomous driving research more accessible to the broader research community. We introduce a comprehensive dataset of 74,040 sequences, carefully annotated with driving preferences and driving decisions, which, along with our trained model checkpoints, will be made publicly available to facilitate future research. Through extensive experiments in the CARLA simulator, we demonstrate that our preference-guided approach significantly improves driving performance across multiple metrics, including distance maintenance and trajectory smoothness. Results show up to 28.1% reduction in traffic rule violations and 8.5% improvement in navigation task completion while maintaining appropriate distances from obstacles. The framework demonstrates robust performance across different urban environments, showcasing the effectiveness of preference learning in autonomous driving applications. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@online{hanlin2025,
title = {Co3SOP: A Collaborative 3D Semantic Occupancy Prediction Dataset and Benchmark for Autonomous Driving},
url = {https://github.com/tlab-wide/Co3SOP},
year = {2025},
date = {2025-04-13},
urldate = {2025-04-13},
abstract = {To facilitate 3D semantic occupancy prediction in collaborative scenarios, we present a simulated dataset featuring a 3D semantic occupancy voxel sensor in Carla, which precisely and comprehensively annotate every surrounding voxel with semantic and occupancy states. In addition, we establish two benchmarks with varying detection ranges to investigate the impact of vehicle collaboration across different spatial extents and propose a baseline model that allows collaborative feature fusion. Experiments on our proposed benchmark demonstrate the superior performance of our baseline model.},
keywords = {},
pubstate = {published},
tppubtype = {online}
}
@online{V2X_E2E_Simulator2024,
title = {V2X End-to-End simulator},
url = {https://github.com/tlab-wide/V2X_E2E_Simulator
https://tlab-wide.github.io/V2X_E2E_Simulator/},
year = {2025},
date = {2025-03-31},
keywords = {},
pubstate = {published},
tppubtype = {online}
}
@online{AVVV2024,
title = {Autonomous Vehicle V2X Visualiser (AVVV)},
url = {https://github.com/tlab-wide/avvv_etsi
https://tlab-wide.github.io/avvv_etsi/},
year = {2024},
date = {2024-11-24},
abstract = {The AVVV project, standing for Autonomous Vehicle V2X Visualiser, aims to analyse and visualise V2X communications. V2X refers to the communications between the autonomous vehicle and everything else, including the road-side units (RSUs) and other intelligent vehicles (On-boar units or OBUs, for short).},
keywords = {},
pubstate = {published},
tppubtype = {online}
}
@inproceedings{Trumpp2024,
title = {RaceMOP: Mapless Online Path Planning for Multi-Agent Autonomous Racing using Residual Policy Learning},
author = {Raphael Trumpp and Ehsan Javanmardi and Jin Nakazato and Manabu Tsukada and Marco Caccamo},
url = {http://github.com/raphajaner/racemop},
doi = {10.1109/IROS58592.2024.10801657},
year = {2024},
date = {2024-09-14},
urldate = {2024-09-14},
booktitle = {The 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)},
address = {Abu Dhabi ,UAE},
abstract = {The interactive decision-making in multi-agent autonomous racing offers insights valuable beyond the domain of self-driving cars. Mapless online path planning is particularly of practical appeal but poses a challenge for safely overtaking opponents due to the limited planning horizon. Accordingly, this paper introduces RaceMOP, a novel method for mapless online path planning designed for multi-agent racing of F1TENTH cars. Unlike classical planners that depend on predefined racing lines, RaceMOP operates without a map, relying solely on local observations to overtake other race cars at high speed. Our approach combines an artificial potential field method as a base policy with residual policy learning to introduce long-horizon planning capabilities. We advance the field by introducing a novel approach for policy fusion with the residual policy directly in probability space. Our experiments for twelve simulated racetracks validate that RaceMOP is capable of long-horizon decision-making with robust collision avoidance during over- taking maneuvers. RaceMOP demonstrates superior handling over existing mapless planners while generalizing to unknown racetracks, paving the way for further use of our method in robotics. We make the open-source code for RaceMOP available at http://github.com/raphajaner/racemop.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
@inproceedings{Tao2023c,
title = {Flowsim: A Modular Simulation Platform for Microscopic Behavior Analysis of City-Scale Connected Autonomous Vehicles},
author = {Ye Tao and Ehsan Javanmardi and Jin Nakazato and Manabu Tsukada and Hiroshi Esaki},
url = {https://github.com/tlab-wide/flowsim
https://arxiv.org/abs/2306.05738},
doi = {10.1109/ITSC57777.2023.10421900},
year = {2023},
date = {2023-09-24},
urldate = {2023-09-24},
booktitle = {The 26th edition of the IEEE International Conference on Intelligent Transportation Systems (ITSC 2023)},
address = {Bilbao, Bizkaia, Spain},
abstract = {As connected autonomous vehicles (CAVs) become increasingly prevalent, there is a growing need for simulation platforms that can accurately evaluate CAV behavior in large-scale environments. In this paper, we propose Flowsim, a novel simulator specifically designed to meet these requirements. Flowsim offers a modular and extensible architecture that enables the analysis of CAV behaviors in large-scale scenarios. It provides researchers with a customizable platform for studying CAV interactions, evaluating communication and networking protocols, assessing cybersecurity vulnerabilities, optimizing traffic management strategies, and developing and evaluating policies for CAV deployment. Flowsim is implemented in pure Python in approximately 1,500 lines of code, making it highly readable, understandable, and easily modifiable. We verified the functionality and performance of Flowsim via a series of experiments based on realistic traffic scenarios. The results show the effectiveness of Flowsim in providing a flexible and powerful simulation environment for evaluating CAV behavior and data flow. Flowsim is a valuable tool for researchers, policymakers, and industry professionals who are involved in the development, evaluation, and deployment of CAVs. The code of Flowsim is publicly available on GitHub under the MIT license. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}