Publication
2025
Vishal Chauhan, Anubhav Anubhav, Robin Sidhu, Yu Asabe, Kanta Tanaka, Chia-Ming Chang, Xiang Su, Dr. Ehsan Javanmardi, Takeo Igarashi, Alex Orsholits, Kantaro Fujiwara, Manabu Tsukada, "A Silent Negotiator? Cross-cultural VR Evaluation of Smart Pole Interaction Units in Dynamic Shared Spaces", In: The ACM Symposium on Virtual Reality Software and Technology (VRST2025) , Montreal, Canada, 2025.Proceedings Article | Abstract | BibTeX | Links:
@inproceedings{Chauhan2025b,
title = {A Silent Negotiator? Cross-cultural VR Evaluation of Smart Pole Interaction Units in Dynamic Shared Spaces},
author = {Vishal Chauhan and Anubhav Anubhav and Robin Sidhu and Yu Asabe and Kanta Tanaka and Chia-Ming Chang and Xiang Su and Dr. Ehsan Javanmardi and Takeo Igarashi and Alex Orsholits and Kantaro Fujiwara and Manabu Tsukada},
url = {https://github.com/tlab-wide/Smartpole-VR-AWSIM.git},
doi = {10.1145/3756884.3765991},
year = {2025},
date = {2025-11-12},
urldate = {2025-11-12},
booktitle = {The ACM Symposium on Virtual Reality Software and Technology (VRST2025) },
address = {Montreal, Canada},
abstract = {As autonomous vehicles (AVs) enter pedestrian-centric environments, existing vehicle-mounted external human–machine interfaces (eHMIs) often fall short in shared spaces due to line-of-sight limitations, inconsistent signaling, and increased cognitive burden on pedestrians. To address these challenges, we introduce the Smart Pole Interaction Unit (SPIU), an infrastructure-based eHMI that decouples intent signaling from vehicles and provides context-aware, elevated visual cues. We evaluate SPIU using immersive VR-AWSIM simulations in four high-risk urban scenarios: four-way intersections, autonomous mixed traffic, blindspots, and nighttime crosswalks. The experiment was developed in Japan and replicated in Norway, where forty participants engaged in 32 trials each under both SPIU-present and SPIU-absent conditions. Behavioral (response time) and subjective (acceptance scale) data were collected. Results show that SPIU significantly improves pedestrian decision-making, with reductions ranging from 40% to over 80% depending on scenario and cultural context, particularly in complex or low-visibility scenarios. Cross-cultural analyses highlight SPIU's adaptability across differing urban and social contexts. We release our open-source Smartpole-VR-AWSIM framework to support reproducibility and global advancement of infrastructure-based eHMI research through reproducible and immersive behavioral studies.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
As autonomous vehicles (AVs) enter pedestrian-centric environments, existing vehicle-mounted external human–machine interfaces (eHMIs) often fall short in shared spaces due to line-of-sight limitations, inconsistent signaling, and increased cognitive burden on pedestrians. To address these challenges, we introduce the Smart Pole Interaction Unit (SPIU), an infrastructure-based eHMI that decouples intent signaling from vehicles and provides context-aware, elevated visual cues. We evaluate SPIU using immersive VR-AWSIM simulations in four high-risk urban scenarios: four-way intersections, autonomous mixed traffic, blindspots, and nighttime crosswalks. The experiment was developed in Japan and replicated in Norway, where forty participants engaged in 32 trials each under both SPIU-present and SPIU-absent conditions. Behavioral (response time) and subjective (acceptance scale) data were collected. Results show that SPIU significantly improves pedestrian decision-making, with reductions ranging from 40% to over 80% depending on scenario and cultural context, particularly in complex or low-visibility scenarios. Cross-cultural analyses highlight SPIU's adaptability across differing urban and social contexts. We release our open-source Smartpole-VR-AWSIM framework to support reproducibility and global advancement of infrastructure-based eHMI research through reproducible and immersive behavioral studies.
Yun Li, Ehsan Javanmardi, Simon Thompson, Kai Katsumata, Alex Orsholits, Manabu Tsukada, "Multi-PrefDrive: Optimizing Large Language Models for Autonomous Driving Through Multi-Preference Tuning", In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hangzhou, China, 2025.Proceedings Article | Abstract | BibTeX | Links:
@inproceedings{Li2025d,
title = {Multi-PrefDrive: Optimizing Large Language Models for Autonomous Driving Through Multi-Preference Tuning},
author = {Yun Li and Ehsan Javanmardi and Simon Thompson and Kai Katsumata and Alex Orsholits and Manabu Tsukada},
url = {https://liyun0607.github.io/},
year = {2025},
date = {2025-10-19},
booktitle = {2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
address = {Hangzhou, China},
abstract = {This paper introduces Multi-PrefDrive, a framework that significantly enhances LLM-based autonomous driving through multidimensional preference tuning. Aligning LLMs with human driving preferences is crucial yet challenging, as driving scenarios involve complex decisions where multiple incorrect actions can correspond to a single correct choice. Traditional binary preference tuning fails to capture this complexity. Our approach pairs each chosen action with multiple rejected alternatives, better reflecting real-world driving decisions. By implementing the Plackett-Luce preference model, we enable nuanced ranking of actions across the spectrum of possible errors. Experiments in the CARLA simulator demonstrate that our algorithm achieves an 11.0% improvement in overall score and an 83.6% reduction in
infrastructure collisions, while showing perfect compliance with traffic signals in certain environments. Comparative analysis against DPO and its variants reveals that Multi-PrefDrive’s superior discrimination between chosen and rejected actions, which achieving a margin value of 25, and such ability has been directly translates to enhanced driving performance. We implement memory-efficient techniques including LoRA and 4-bit quantization to enable deployment on consumer-grade hardware and will open-source our training code and multi-rejected dataset to advance research in LLM-based autonomous driving systems. Project Page (https://liyun0607.github.io/)},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper introduces Multi-PrefDrive, a framework that significantly enhances LLM-based autonomous driving through multidimensional preference tuning. Aligning LLMs with human driving preferences is crucial yet challenging, as driving scenarios involve complex decisions where multiple incorrect actions can correspond to a single correct choice. Traditional binary preference tuning fails to capture this complexity. Our approach pairs each chosen action with multiple rejected alternatives, better reflecting real-world driving decisions. By implementing the Plackett-Luce preference model, we enable nuanced ranking of actions across the spectrum of possible errors. Experiments in the CARLA simulator demonstrate that our algorithm achieves an 11.0% improvement in overall score and an 83.6% reduction in
infrastructure collisions, while showing perfect compliance with traffic signals in certain environments. Comparative analysis against DPO and its variants reveals that Multi-PrefDrive’s superior discrimination between chosen and rejected actions, which achieving a margin value of 25, and such ability has been directly translates to enhanced driving performance. We implement memory-efficient techniques including LoRA and 4-bit quantization to enable deployment on consumer-grade hardware and will open-source our training code and multi-rejected dataset to advance research in LLM-based autonomous driving systems. Project Page (https://liyun0607.github.io/)
infrastructure collisions, while showing perfect compliance with traffic signals in certain environments. Comparative analysis against DPO and its variants reveals that Multi-PrefDrive’s superior discrimination between chosen and rejected actions, which achieving a margin value of 25, and such ability has been directly translates to enhanced driving performance. We implement memory-efficient techniques including LoRA and 4-bit quantization to enable deployment on consumer-grade hardware and will open-source our training code and multi-rejected dataset to advance research in LLM-based autonomous driving systems. Project Page (https://liyun0607.github.io/)
Vishal Chauhan, Anubhav Anubhav, Chia-Ming Chang, Xiang Su, Jin Nakazato, Ehsan Javanmardi, Alex Orsholits, Takeo Igarashi, Kantaro Fujiwara, Manabu Tsukada, "Towards the Future of Pedestrian-AV Interaction: Human Perception vs. LLM Insights on Smart Pole Interaction Unit in Shared Spaces", In: International Journal of Human–Computer Studies (IJHCS), vol. 205, pp. 103628, 2025, ISBN: 1071-5819.Journal Article | Abstract | BibTeX | Links:
@article{Chauhan2025,
title = {Towards the Future of Pedestrian-AV Interaction: Human Perception vs. LLM Insights on Smart Pole Interaction Unit in Shared Spaces},
author = {Vishal Chauhan and Anubhav Anubhav and Chia-Ming Chang and Xiang Su and Jin Nakazato and Ehsan Javanmardi and Alex Orsholits and Takeo Igarashi and Kantaro Fujiwara and Manabu Tsukada},
doi = {10.1016/j.ijhcs.2025.103628},
isbn = {1071-5819},
year = {2025},
date = {2025-09-13},
urldate = {2025-09-13},
journal = {International Journal of Human–Computer Studies (IJHCS)},
volume = {205},
pages = {103628},
abstract = {As autonomous vehicles (AVs) reshape urban mobility, establishing effective communication between pedestrians and self-driving vehicles has become a critical safety imperative. This work investigates the integration of Smart Pole Interaction Units (SPIUs) as external human–machine interfaces (eHMIs) in shared spaces and introduces an innovative approach to enhance pedestrian–AV interactions. To provide subjective evidence on SPIU usability, we conduct a group design study (“Humans”) involving 25 participants (aged 18–40). We evaluate user preferences and interaction patterns using group discussion materials, revealing that 90% of the participants strongly prefer real-time multi-AV interactions facilitated by SPIU over conventional eHMI systems, where a pedestrian must look at multiple AVs individually. Furthermore, they emphasize inclusive design through multi-sensory communication channels—visual, auditory, and tactile signals—specifically addressing the needs of vulnerable road users (VRUs), including those with impairments. To complement these non-expert, real-world insights, we employ three leading Large Language Models (LLMs) (ChatGPT-4, Gemini-Pro, and Claude 3.5 Sonnet) as “experts” due to their extensive training data. Using the advantages of the multimodal vision-language processing capabilities of these LLMs, identical questions (text and images) used in human discussions are posed to generate text responses for pedestrian–AV interaction scenarios. Responses generated from LLMs and recorded conversations from human group discussions are used to extract the most frequent words. A keyword frequency analysis from both humans and LLMs is performed with three categories, Context, Safety, and Important. Our findings indicate that LLMs employ safety-related keywords 30% more frequently than human participants, suggesting a more structured, safety-centric approach. Among LLMs, ChatGPT-4 demonstrates superior response latency, Claude shows a closer alignment with human responses, and Gemini-Pro provides structured and contextually relevant insights. Our results from “Humans” and “LLMs” establish SPIU as a promising system for facilitating trust-building and safety-ensuring interactions among pedestrians, AVs, and delivery robots. Integrating diverse stakeholder feedback, we propose a prototype SPIU design to advance pedestrian–AV interactions in shared urban spaces, positioning SPIU as crucial infrastructure hubs for safe and trustworthy navigation.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
As autonomous vehicles (AVs) reshape urban mobility, establishing effective communication between pedestrians and self-driving vehicles has become a critical safety imperative. This work investigates the integration of Smart Pole Interaction Units (SPIUs) as external human–machine interfaces (eHMIs) in shared spaces and introduces an innovative approach to enhance pedestrian–AV interactions. To provide subjective evidence on SPIU usability, we conduct a group design study (“Humans”) involving 25 participants (aged 18–40). We evaluate user preferences and interaction patterns using group discussion materials, revealing that 90% of the participants strongly prefer real-time multi-AV interactions facilitated by SPIU over conventional eHMI systems, where a pedestrian must look at multiple AVs individually. Furthermore, they emphasize inclusive design through multi-sensory communication channels—visual, auditory, and tactile signals—specifically addressing the needs of vulnerable road users (VRUs), including those with impairments. To complement these non-expert, real-world insights, we employ three leading Large Language Models (LLMs) (ChatGPT-4, Gemini-Pro, and Claude 3.5 Sonnet) as “experts” due to their extensive training data. Using the advantages of the multimodal vision-language processing capabilities of these LLMs, identical questions (text and images) used in human discussions are posed to generate text responses for pedestrian–AV interaction scenarios. Responses generated from LLMs and recorded conversations from human group discussions are used to extract the most frequent words. A keyword frequency analysis from both humans and LLMs is performed with three categories, Context, Safety, and Important. Our findings indicate that LLMs employ safety-related keywords 30% more frequently than human participants, suggesting a more structured, safety-centric approach. Among LLMs, ChatGPT-4 demonstrates superior response latency, Claude shows a closer alignment with human responses, and Gemini-Pro provides structured and contextually relevant insights. Our results from “Humans” and “LLMs” establish SPIU as a promising system for facilitating trust-building and safety-ensuring interactions among pedestrians, AVs, and delivery robots. Integrating diverse stakeholder feedback, we propose a prototype SPIU design to advance pedestrian–AV interactions in shared urban spaces, positioning SPIU as crucial infrastructure hubs for safe and trustworthy navigation.
Shangkai Zhang, Alex Orsholits, Ehsan Javanmardi, Manabu Tsukada, "AWSIM-VR: A Tightly-Coupled Virtual Reality Extension for Human-in-the-Loop Pedestrian-Autonomous Vehicle Interaction", In: 3rd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (IEEE MetaCom 2025), Seoul, Republic of Korea, 2025.Proceedings Article | Abstract | BibTeX
@inproceedings{Zhang2025,
title = {AWSIM-VR: A Tightly-Coupled Virtual Reality Extension for Human-in-the-Loop Pedestrian-Autonomous Vehicle Interaction},
author = {Shangkai Zhang and Alex Orsholits and Ehsan Javanmardi and Manabu Tsukada},
year = {2025},
date = {2025-08-27},
booktitle = {3rd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (IEEE MetaCom 2025)},
address = {Seoul, Republic of Korea},
abstract = {Effective communication between autonomous vehicles (AVs) and pedestrians is crucial for ensuring future urban traffic safety. While external Human-Machine Interfaces (eHMIs) have emerged as promising solutions, current evaluation methodologies — particularly Virtual Reality (VR)-based studies — typically rely on scripted or pre-defined autonomous vehicle behaviors, limiting realism and neglecting pedestrians' active role in interactions. To address this, we introduce AWSIM-VR, a tightly-coupled VR extension of the AWSIM autonomous driving simulator, enabling real-time, human-in-the-loop pedestrian-AV interactions by directly integrating unmodified, real autonomous driving software (Autoware) into the simulation loop. Unlike previous systems, AWSIM-VR provides authentic, bidirectional interaction: pedestrians' actions dynamically influence vehicle decision-making and eHMI responses in real-time, closely emulating real-world AV scenarios. In user studies directly comparing AWSIM-VR to existing methodologies, participants reported significantly higher perceived realism and immersion, underscoring the importance of authentic autonomous behaviors in VR-based pedestrian interaction research. By directly utilizing production-level autonomous driving stacks, AWSIM-VR represents a significant methodological advancement, enabling more realistic, effective, and safer development and evaluation of eHMIs and AV technologies.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Effective communication between autonomous vehicles (AVs) and pedestrians is crucial for ensuring future urban traffic safety. While external Human-Machine Interfaces (eHMIs) have emerged as promising solutions, current evaluation methodologies — particularly Virtual Reality (VR)-based studies — typically rely on scripted or pre-defined autonomous vehicle behaviors, limiting realism and neglecting pedestrians' active role in interactions. To address this, we introduce AWSIM-VR, a tightly-coupled VR extension of the AWSIM autonomous driving simulator, enabling real-time, human-in-the-loop pedestrian-AV interactions by directly integrating unmodified, real autonomous driving software (Autoware) into the simulation loop. Unlike previous systems, AWSIM-VR provides authentic, bidirectional interaction: pedestrians' actions dynamically influence vehicle decision-making and eHMI responses in real-time, closely emulating real-world AV scenarios. In user studies directly comparing AWSIM-VR to existing methodologies, participants reported significantly higher perceived realism and immersion, underscoring the importance of authentic autonomous behaviors in VR-based pedestrian interaction research. By directly utilizing production-level autonomous driving stacks, AWSIM-VR represents a significant methodological advancement, enabling more realistic, effective, and safer development and evaluation of eHMIs and AV technologies.
Naren Bao, Alex Orsholits, Manabu Tsukada, "4D Path Planning via Spatiotemporal Voxels in Urban Airspaces", In: 3rd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (IEEE MetaCom 2025), Seoul, Republic of Korea, 2025.Proceedings Article | Abstract | BibTeX
@inproceedings{Bao2025,
title = {4D Path Planning via Spatiotemporal Voxels in Urban Airspaces},
author = {Naren Bao and Alex Orsholits and Manabu Tsukada},
year = {2025},
date = {2025-08-27},
booktitle = {3rd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (IEEE MetaCom 2025)},
address = {Seoul, Republic of Korea},
abstract = {This paper presents an approach to four-dimensional (4D) path planning for unmanned aerial vehicles (UAVs) in complex urban environments. We introduce a spatiotemporal voxel-based representation that effectively models both spatial and temporal dimensions of urban airspaces. By integrating the 4D spatio-temporal ID framework with reinforcement learning techniques, our system generates efficient and safe flight paths while considering dynamic obstacles and environmental constraints. The proposed method combines off-line pretraining and online fine-tuning of reinforcement learning models to achieve computational efficiency without compromising path quality. Experiments conducted using PLATEAU datasets in various urban scenarios demonstrate that our approach outperforms traditional path planning algorithms by 24% in safety metrics and 18% in efficiency metrics. Our framework advances the state-of-the-art in urban air mobility by providing a scalable solution for airspace management in increasingly congested urban environments.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents an approach to four-dimensional (4D) path planning for unmanned aerial vehicles (UAVs) in complex urban environments. We introduce a spatiotemporal voxel-based representation that effectively models both spatial and temporal dimensions of urban airspaces. By integrating the 4D spatio-temporal ID framework with reinforcement learning techniques, our system generates efficient and safe flight paths while considering dynamic obstacles and environmental constraints. The proposed method combines off-line pretraining and online fine-tuning of reinforcement learning models to achieve computational efficiency without compromising path quality. Experiments conducted using PLATEAU datasets in various urban scenarios demonstrate that our approach outperforms traditional path planning algorithms by 24% in safety metrics and 18% in efficiency metrics. Our framework advances the state-of-the-art in urban air mobility by providing a scalable solution for airspace management in increasingly congested urban environments.
Yun Li, Ehsan Javanmardi, Simon Thompson, Kai Katsumata, Alex Orsholits, Manabu Tsukada, "PrefDrive: Enhancing Autonomous Driving through Preference-Guided Large Language Models", In: 36th IEEE Intelligent Vehicles Symposium (IV2025), Cluj-Napoca, Romania, 2025.Proceedings Article | Abstract | BibTeX | Links:
@inproceedings{Li2025c,
title = {PrefDrive: Enhancing Autonomous Driving through Preference-Guided Large Language Models},
author = {Yun Li and Ehsan Javanmardi and Simon Thompson and Kai Katsumata and Alex Orsholits and Manabu Tsukada},
url = {https://github.com/LiYun0607/PrefDrive/
https://huggingface.co/liyun0607/PrefDrive
https://huggingface.co/datasets/liyun0607/PrefDrive},
doi = {10.1109/IV64158.2025.11097672},
year = {2025},
date = {2025-06-22},
urldate = {2025-06-22},
booktitle = {36th IEEE Intelligent Vehicles Symposium (IV2025)},
address = {Cluj-Napoca, Romania},
abstract = {This paper presents PrefDrive, a novel framework that integrates driving preferences into autonomous driving models through large language models (LLMs). While recent advances in LLMs have shown promise in autonomous driving, existing approaches often struggle to align with specific driving behaviors (e.g., maintaining safe distances, smooth acceleration patterns) and operational requirements (e.g., traffic rule compliance, route adherence). We address this challenge by developing a preference learning framework that combines multimodal perception with natural language understanding. Our approach leverages Direct Preference Optimization (DPO) to fine-tune LLMs efficiently on consumer-grade hardware, making advanced autonomous driving research more accessible to the broader research community. We introduce a comprehensive dataset of 74,040 sequences, carefully annotated with driving preferences and driving decisions, which, along with our trained model checkpoints, will be made publicly available to facilitate future research. Through extensive experiments in the CARLA simulator, we demonstrate that our preference-guided approach significantly improves driving performance across multiple metrics, including distance maintenance and trajectory smoothness. Results show up to 28.1% reduction in traffic rule violations and 8.5% improvement in navigation task completion while maintaining appropriate distances from obstacles. The framework demonstrates robust performance across different urban environments, showcasing the effectiveness of preference learning in autonomous driving applications. },
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents PrefDrive, a novel framework that integrates driving preferences into autonomous driving models through large language models (LLMs). While recent advances in LLMs have shown promise in autonomous driving, existing approaches often struggle to align with specific driving behaviors (e.g., maintaining safe distances, smooth acceleration patterns) and operational requirements (e.g., traffic rule compliance, route adherence). We address this challenge by developing a preference learning framework that combines multimodal perception with natural language understanding. Our approach leverages Direct Preference Optimization (DPO) to fine-tune LLMs efficiently on consumer-grade hardware, making advanced autonomous driving research more accessible to the broader research community. We introduce a comprehensive dataset of 74,040 sequences, carefully annotated with driving preferences and driving decisions, which, along with our trained model checkpoints, will be made publicly available to facilitate future research. Through extensive experiments in the CARLA simulator, we demonstrate that our preference-guided approach significantly improves driving performance across multiple metrics, including distance maintenance and trajectory smoothness. Results show up to 28.1% reduction in traffic rule violations and 8.5% improvement in navigation task completion while maintaining appropriate distances from obstacles. The framework demonstrates robust performance across different urban environments, showcasing the effectiveness of preference learning in autonomous driving applications.
Alex Orsholits, Manabu Tsukada, "Context-Rich Interactions in Mixed Reality through Edge AI Co-Processing", In: The 39-th International Conference on Advanced Information Networking and Applications (AINA 2025), Barcelona, Spain, 2025, ISBN: 978-3-031-87771-1.Proceedings Article | Abstract | BibTeX | Links:
@inproceedings{Orsholits2025,
title = {Context-Rich Interactions in Mixed Reality through Edge AI Co-Processing},
author = {Alex Orsholits and Manabu Tsukada},
url = {https://link.springer.com/chapter/10.1007/978-3-031-87772-8_3},
doi = {10.1007/978-3-031-87772-8_3},
isbn = {978-3-031-87771-1},
year = {2025},
date = {2025-04-09},
urldate = {2025-04-09},
booktitle = {The 39-th International Conference on Advanced Information Networking and Applications (AINA 2025)},
address = {Barcelona, Spain},
abstract = {Spatial computing is evolving towards leveraging data streaming for computationally demanding applications, facilitating a shift to lightweight, untethered, and standalone devices. These devices are therefore ideal candidates for co-processing, where real-time context understanding and low-latency data streaming are fundamental for seamless, general-purpose Mixed Reality (MR) experiences. This paper demonstrates and evaluates a scalable approach to augmented contextual understanding in MR by implementing multi-modal edge AI co-processing through a Hailo-8 AI accelerator, a low-power ARM-based single board computer (SBC), and the Magic Leap 2 AR headset. The proposed system utilises the native WebRTC streaming capabilities of the Magic Leap 2 to continuously stream camera data to the edge co-processor, where a collection of vision AI models-object detection, pose estimation, face recognition, and depth estimation-are executed. The resulting inferences are then streamed back to the headset for spatial re-projection and transmitted to cloud-based systems for further integration with large-scale AI models, such as LLMs and VLMs. This seamless integration enhances real-time contextual understanding in MR while facilitating advanced multi-modal, multi-device collaboration, supporting richer, scalable spatial cognition across distributed systems.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Spatial computing is evolving towards leveraging data streaming for computationally demanding applications, facilitating a shift to lightweight, untethered, and standalone devices. These devices are therefore ideal candidates for co-processing, where real-time context understanding and low-latency data streaming are fundamental for seamless, general-purpose Mixed Reality (MR) experiences. This paper demonstrates and evaluates a scalable approach to augmented contextual understanding in MR by implementing multi-modal edge AI co-processing through a Hailo-8 AI accelerator, a low-power ARM-based single board computer (SBC), and the Magic Leap 2 AR headset. The proposed system utilises the native WebRTC streaming capabilities of the Magic Leap 2 to continuously stream camera data to the edge co-processor, where a collection of vision AI models-object detection, pose estimation, face recognition, and depth estimation-are executed. The resulting inferences are then streamed back to the headset for spatial re-projection and transmitted to cloud-based systems for further integration with large-scale AI models, such as LLMs and VLMs. This seamless integration enhances real-time contextual understanding in MR while facilitating advanced multi-modal, multi-device collaboration, supporting richer, scalable spatial cognition across distributed systems.
Alex Orsholits, Manabu Tsukada, "Edge Vision AI Co-Processing for Dynamic Context Awareness in Mixed Reality", IEEE VR 2025, Poster, 2025, (Honorable mention).Miscellaneous | Abstract | BibTeX | Links:
@misc{Orsholits2025b,
title = {Edge Vision AI Co-Processing for Dynamic Context Awareness in Mixed Reality},
author = {Alex Orsholits and Manabu Tsukada},
url = {https://www.youtube.com/watch?v=xxahKZl4K9w
https://ieeevr.org/2025/awards/conference-awards/#poster-honorable},
doi = {10.1109/VRW66409.2025.00293},
year = {2025},
date = {2025-03-08},
urldate = {2025-03-08},
booktitle = {2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)},
address = {Saint-Malo, France},
abstract = {Spatial computing is evolving towards leveraging data streaming for computationally demanding applications, facilitating a shift to lightweight, untethered, and standalone devices. These devices are ideal candidates for co-processing, where real-time scene context understanding and low-latency data streaming are fundamental for general-purpose Mixed Reality (MR) experiences. This poster demonstrates and evaluates a scalable approach to augmented contextual understanding in MR by implementing edge AI co-processing through a Hailo-8 AI accelerator, a low-power ARM-based single board computer (SBC), and the Magic Leap 2 AR headset. The resulting inferences are streamed back to the headset for spatial reprojection into the user’s vision.},
howpublished = {IEEE VR 2025, Poster},
note = {Honorable mention},
keywords = {},
pubstate = {published},
tppubtype = {misc}
}
Spatial computing is evolving towards leveraging data streaming for computationally demanding applications, facilitating a shift to lightweight, untethered, and standalone devices. These devices are ideal candidates for co-processing, where real-time scene context understanding and low-latency data streaming are fundamental for general-purpose Mixed Reality (MR) experiences. This poster demonstrates and evaluates a scalable approach to augmented contextual understanding in MR by implementing edge AI co-processing through a Hailo-8 AI accelerator, a low-power ARM-based single board computer (SBC), and the Magic Leap 2 AR headset. The resulting inferences are streamed back to the headset for spatial reprojection into the user’s vision.
2024
Vishal Chauhan, Anubhav Anubhav, Chia-Ming Chang, Jin Nakazato, Ehsan Javanmardi, Alex Orsholits, Takeo Igarashi, Kantaro Fujiwara, Manabu Tsukada, "Connected Shared Spaces: Expert Insights into the Impact of eHMI and SPIU for Next-Generation Pedestrian-AV Communication", In: International Conference on Intelligent Computing and its Emerging Applications (ICEA2024), Tokyo, Japan, 2024.Proceedings Article | BibTeX
@inproceedings{Chauhan2024b,
title = {Connected Shared Spaces: Expert Insights into the Impact of eHMI and SPIU for Next-Generation Pedestrian-AV Communication},
author = {Vishal Chauhan and Anubhav Anubhav and Chia-Ming Chang and Jin Nakazato and Ehsan Javanmardi and Alex Orsholits and Takeo Igarashi and Kantaro Fujiwara and Manabu Tsukada},
year = {2024},
date = {2024-11-28},
urldate = {2024-11-28},
booktitle = {International Conference on Intelligent Computing and its Emerging Applications (ICEA2024)},
address = {Tokyo, Japan},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Alex Orsholits, Eric Nardini, Tsukada Manabu, "PLATONE: Assessing Simulation Accuracy of Environment-Dependent Audio Spatialization", In: International Conference on Intelligent Computing and its Emerging Applications (ICEA2024), Tokyo, Japan, 2024.Proceedings Article | BibTeX
@inproceedings{Orsholits2024b,
title = {PLATONE: Assessing Simulation Accuracy of Environment-Dependent Audio Spatialization},
author = {Alex Orsholits and Eric Nardini and Tsukada Manabu},
year = {2024},
date = {2024-11-28},
booktitle = {International Conference on Intelligent Computing and its Emerging Applications (ICEA2024)},
address = {Tokyo, Japan},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Vishal Chauhan, Anubhav Anubhav, Chia-Ming Chang, Jin Nakazato, Ehsan Javanmardi, Alex Orsholits, Takeo Igarashi, Kantaro Fujiwara, Manabu Tsukada
, "Transforming Pedestrian and Autonomous Vehicles Interactions in Shared Spaces: A Think-Tank Study on Exploring Human-Centric Designs", In: 16th International ACM Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutoUI 2024), Work in Progress (WiP), pp. 1-8, California, USA, 2024.Proceedings Article | Abstract | BibTeX | Links:
@inproceedings{Chauhan2024,
title = {Transforming Pedestrian and Autonomous Vehicles Interactions in Shared Spaces: A Think-Tank Study on Exploring Human-Centric Designs},
author = {Vishal Chauhan and Anubhav Anubhav and Chia-Ming Chang and Jin Nakazato and Ehsan Javanmardi and Alex Orsholits and Takeo Igarashi and Kantaro Fujiwara and Manabu Tsukada
},
doi = {10.1145/3641308.3685037},
year = {2024},
date = {2024-09-22},
urldate = {2024-09-22},
booktitle = {16th International ACM Conference on Automotive User Interfaces and Interactive Vehicular Applications (AutoUI 2024), Work in Progress (WiP)},
pages = {1-8},
address = {California, USA},
abstract = {Our research focuses on the smart pole interaction unit (SPIU) as an infrastructure external human-machine interface (HMI) to enhance pedestrian interaction with autonomous vehicles (AVs) in shared spaces. We extensively study SPIU with external human-machine interfaces (eHMI) on AVs as an integrated solution. To discuss interaction barriers and enhance pedestrian safety, we engaged 25 participants aged 18-40 to brainstorm design solutions for pedestrian-AV interactions, emphasising effectiveness, simplicity, visibility, and clarity. Findings indicate a preference for real-time SPIU interaction over eHMI on AVs in multiple AV scenarios. However, the combined use of SPIU and eHMI on AVs is crucial for building trust in decision-making. Consequently, we propose innovative design solutions for both SPIU and eHMI on AVs, discussing their pros and cons. This study lays the groundwork for future autonomous mobility solutions by developing human-centric eHMI and SPIU prototypes as ieHMI.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Our research focuses on the smart pole interaction unit (SPIU) as an infrastructure external human-machine interface (HMI) to enhance pedestrian interaction with autonomous vehicles (AVs) in shared spaces. We extensively study SPIU with external human-machine interfaces (eHMI) on AVs as an integrated solution. To discuss interaction barriers and enhance pedestrian safety, we engaged 25 participants aged 18-40 to brainstorm design solutions for pedestrian-AV interactions, emphasising effectiveness, simplicity, visibility, and clarity. Findings indicate a preference for real-time SPIU interaction over eHMI on AVs in multiple AV scenarios. However, the combined use of SPIU and eHMI on AVs is crucial for building trust in decision-making. Consequently, we propose innovative design solutions for both SPIU and eHMI on AVs, discussing their pros and cons. This study lays the groundwork for future autonomous mobility solutions by developing human-centric eHMI and SPIU prototypes as ieHMI.
Tokio Takada, Jin Nakazato, Alex Orsholits, Manabu Tsukada, Hideya Ochiai, Hiroshi Esaki, "Design of Digital Twin Architecture for 3D Audio Visualization in AR", In: The 2nd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom 2024), Hong Kong, China, 2024.Proceedings Article | Abstract | BibTeX | Links:
@inproceedings{Takada2024,
title = {Design of Digital Twin Architecture for 3D Audio Visualization in AR},
author = {Tokio Takada and Jin Nakazato and Alex Orsholits and Manabu Tsukada and Hideya Ochiai and Hiroshi Esaki},
doi = {10.1109/MetaCom62920.2024.00044},
year = {2024},
date = {2024-08-12},
urldate = {2024-08-12},
booktitle = {The 2nd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom 2024)},
address = {Hong Kong, China},
abstract = {Digital twins have recently attracted attention from academia and industry as a technology connecting physical space and cyberspace. Digital twins are compatible with Augmented Reality (AR) and Virtual Reality (VR), enabling us to understand information in cyberspace. In this study, we focus on music and design an architecture for a 3D representation of music using a digital twin. Specifically, we organize the requirements for a digital twin for music and design the architecture. We establish a method to perform 3D representation in cyberspace and map the recorded audio data in physical space. In this paper, we implemented the physical space representation using a smartphone as an AR device and employed a visual positioning system (VPS) for self-positioning. For evaluation, in addition to system errors in the 3D representation of audio data, we conducted a questionnaire evaluation with several users as a user study. From these results, we evaluated the effectiveness of the implemented system. At the same time, we also found issues we need to improve in the implemented system in future works.},
key = {CREST},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Digital twins have recently attracted attention from academia and industry as a technology connecting physical space and cyberspace. Digital twins are compatible with Augmented Reality (AR) and Virtual Reality (VR), enabling us to understand information in cyberspace. In this study, we focus on music and design an architecture for a 3D representation of music using a digital twin. Specifically, we organize the requirements for a digital twin for music and design the architecture. We establish a method to perform 3D representation in cyberspace and map the recorded audio data in physical space. In this paper, we implemented the physical space representation using a smartphone as an AR device and employed a visual positioning system (VPS) for self-positioning. For evaluation, in addition to system errors in the 3D representation of audio data, we conducted a questionnaire evaluation with several users as a user study. From these results, we evaluated the effectiveness of the implemented system. At the same time, we also found issues we need to improve in the implemented system in future works.
Alex Orsholits, Yiyuan Qian, Eric Nardini, Yusuke Obuchi, Manabu Tsukada, "PLATONE: An Immersive Geospatial Audio Spatialization Platform", In: The 2nd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom 2024), Hong Kong, China, 2024.Proceedings Article | Abstract | BibTeX | Links:
@inproceedings{Orsholits2024,
title = {PLATONE: An Immersive Geospatial Audio Spatialization Platform},
author = {Alex Orsholits and Yiyuan Qian and Eric Nardini and Yusuke Obuchi and Manabu Tsukada},
doi = {10.1109/MetaCom62920.2024.00020},
year = {2024},
date = {2024-08-12},
urldate = {2024-08-12},
booktitle = {The 2nd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (MetaCom 2024)},
address = {Hong Kong, China},
abstract = {In the rapidly evolving landscape of mixed reality (MR) and spatial computing, the convergence of physical and virtual spaces is becoming increasingly crucial for enabling immersive, large-scale user experiences and shaping inter-reality dynamics. This is particularly significant for immersive audio at city-scale, where the 3D geometry of the environment must be considered, as it drastically influences how sound is perceived by the listener. This paper introduces PLATONE, a novel proof-of-concept MR platform designed to augment urban contexts with environment-dependent spatialized audio. It leverages custom hardware for localization and orientation, alongside a cloud-based pipeline for generating real-time binaural audio. By utilizing open-source 3D building datasets, sound propagation effects such as occlusion, reverberation, and diffraction are accurately simulated. We believe that this work may serve as a compelling foundation for further research and development.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
In the rapidly evolving landscape of mixed reality (MR) and spatial computing, the convergence of physical and virtual spaces is becoming increasingly crucial for enabling immersive, large-scale user experiences and shaping inter-reality dynamics. This is particularly significant for immersive audio at city-scale, where the 3D geometry of the environment must be considered, as it drastically influences how sound is perceived by the listener. This paper introduces PLATONE, a novel proof-of-concept MR platform designed to augment urban contexts with environment-dependent spatialized audio. It leverages custom hardware for localization and orientation, alongside a cloud-based pipeline for generating real-time binaural audio. By utilizing open-source 3D building datasets, sound propagation effects such as occlusion, reverberation, and diffraction are accurately simulated. We believe that this work may serve as a compelling foundation for further research and development.
髙田季生, 中里仁, Alex Orsholits, 塚田学, 落合秀也, 江崎浩, "ARにおける3D音響可視化に向けたデジタルツインアーキテクチャの設計", マルチメディア、分散、協調とモバイル(DICOMO2024)シンポジウム, 岩手県花巻市, 2024.Conference | Abstract | BibTeX
@conference{髙田季生2024,
title = {ARにおける3D音響可視化に向けたデジタルツインアーキテクチャの設計},
author = {髙田季生 and 中里仁 and Alex Orsholits and 塚田学 and 落合秀也 and 江崎浩},
year = {2024},
date = {2024-06-26},
urldate = {2024-06-26},
booktitle = {マルチメディア、分散、協調とモバイル(DICOMO2024)シンポジウム},
address = {岩手県花巻市},
abstract = {近年,デジタルツインは学界および産業界から注目を集めており,フィジカル空間とサイバー空間を繋ぐ技術として着目されている.デジタルツインはAugmented Reality(AR)やVirtual Reality(VR)との相性がよく,ユーザは複雑な物理的実体やプロセスを理解できる.本研究では,音響に焦点を当て,デジタルツインを用いた音響の立体的な表現をするためのアーキテクチャの提案を行う.具体的には,音楽向けのデジタルツインの要件を整理し,アーキテクチャの設計を行った.既に収録した音声データをサイバー空間上にて立体表現を行い,フィジカル空間にマッピングするための手法を確立する.本稿では,フィジカル空間の表現にはARデバイスとしてGoogle Pixelを用いて実装を行い,自己位置の推定についてはVPSを用いた.評価においては,音声データの立体表現におけるシステム誤差に加えて,ユーザスタディとして複数人を体験者として実地調査を行った.これらの結果より,実装システムの有効性を評価することができ,一方で実装システムの改善を必要とする課題も発見できた.これらの成果を本稿で報告する.
},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}
近年,デジタルツインは学界および産業界から注目を集めており,フィジカル空間とサイバー空間を繋ぐ技術として着目されている.デジタルツインはAugmented Reality(AR)やVirtual Reality(VR)との相性がよく,ユーザは複雑な物理的実体やプロセスを理解できる.本研究では,音響に焦点を当て,デジタルツインを用いた音響の立体的な表現をするためのアーキテクチャの提案を行う.具体的には,音楽向けのデジタルツインの要件を整理し,アーキテクチャの設計を行った.既に収録した音声データをサイバー空間上にて立体表現を行い,フィジカル空間にマッピングするための手法を確立する.本稿では,フィジカル空間の表現にはARデバイスとしてGoogle Pixelを用いて実装を行い,自己位置の推定についてはVPSを用いた.評価においては,音声データの立体表現におけるシステム誤差に加えて,ユーザスタディとして複数人を体験者として実地調査を行った.これらの結果より,実装システムの有効性を評価することができ,一方で実装システムの改善を必要とする課題も発見できた.これらの成果を本稿で報告する.