
Level-4 autonomous flight of unmanned aerial vehicles (UAVs) is increasingly in demand for applications such as logistics, surveillance, and disaster response. However, ensuring safe navigation and precise path planning in dynamic environments remains a major challenge. Traditional methods focus heavily on spatial path generation under static assumptions, making them inadequate for adapting to moving or newly emerging obstacles. Furthermore, stable self-localization is difficult in environments where GPS is unavailable. To address these issues, this project aims to construct a scientifically rigorous and certifiable autonomous flight framework by integrating 4D spatiotemporal voxel representations, predictive world models, safety-constrained reinforcement learning, and Vision-Language/Action (VLM/VLA) modules.
The project introduces a 4D spatiotemporal voxel structure to represent static and dynamic obstacles in a unified way, with adaptive temporal resolution to predict future occupancy of nearby space. This supports efficient real-time updates of moving objects. Building on this representation, world models are employed to simulate environmental evolution, predicting trajectories of surrounding objects and environmental disturbances such as gusts or reduced visibility. Through short-horizon rollouts and counterfactual scenarios, the system transforms path planning into a risk-aware, forward-looking process, rather than a purely reactive response.
For decision-making, reinforcement learning (RL) is applied to optimize path following and obstacle avoidance, with explicit safety constraints. Offline pretraining on expert trajectories (e.g., A*, RRT*) provides efficient initial policies, while runtime shields based on control barrier functions and reachability analysis suppress unsafe actions during online learning. To enable navigation in GPS-denied environments, a VLM/VLA module integrates camera-based perception with natural-language instructions, allowing UAVs to understand and execute high-level mission commands (e.g., “survey this area and return”) even where mapping or GPS data are unreliable.
This research will present a new framework for Level-4 autonomous flight, combining rigorous spatiotemporal representation, learning-based control, formal safety assurance, and multimodal task understanding. Expected outcomes include measurable improvements in safety and efficiency in dynamic environments, enhanced GPS-independent localization and navigation, and a data representation foundation for future integration with Air Traffic Management (ATM) and Unmanned Traffic Management (UTM) systems. Looking forward, the project envisions extensions to multi-UAV coordination, weather-aware operations, certification frameworks based on learning assurance, and real-world flight demonstrations in logistics and emergency response scenarios.
@inproceedings{Bao2025,
title = {4D Path Planning via Spatiotemporal Voxels in Urban Airspaces},
author = {Naren Bao and Alex Orsholits and Manabu Tsukada},
doi = {10.1109/MetaCom65502.2025.00022},
year = {2025},
date = {2025-08-27},
urldate = {2025-08-27},
booktitle = {3rd Annual IEEE International Conference on Metaverse Computing, Networking, and Applications (IEEE MetaCom 2025)},
address = {Seoul, Republic of Korea},
abstract = {This paper presents an approach to four-dimensional (4D) path planning for unmanned aerial vehicles (UAVs) in complex urban environments. We introduce a spatiotemporal voxel-based representation that effectively models both spatial and temporal dimensions of urban airspaces. By integrating the 4D spatio-temporal ID framework with reinforcement learning techniques, our system generates efficient and safe flight paths while considering dynamic obstacles and environmental constraints. The proposed method combines off-line pretraining and online fine-tuning of reinforcement learning models to achieve computational efficiency without compromising path quality. Experiments conducted using PLATEAU datasets in various urban scenarios demonstrate that our approach outperforms traditional path planning algorithms by 24% in safety metrics and 18% in efficiency metrics. Our framework advances the state-of-the-art in urban air mobility by providing a scalable solution for airspace management in increasingly congested urban environments.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
autonomous driving v2x
v2x
digital twins extended reality
digital twins
autonomous driving machine learning
machine learning v2x
autonomous driving v2x
extended reality
We are part of the University of Tokyo’s Graduate School of Information Science and Technology, Department of Creative Informatics and focuses on computer networks and cyber-physical systems
Address
4F, I-REF building, Graduate School of Information Science and Technology, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
Room 91B1, Bld 2 of Engineering Department, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
Mail: