Direct Preference Optimization-Enhanced Multi-Guided
Diffusion Model for Traffic Scenario Generation

Seungjun Yu ¹ Kisung Kim ² Daejung Kim ² Haewook Han ¹ Jinhan Lee ²

¹ Pohang University of Science and Technology
² Autonomous Driving Goup, NAVER LABS

ArXiv

Abstract

Diffusion-based models are recognized for their effectiveness in using real-world driving data to generate realistic and diverse traffic scenarios. These models employ guided sampling to incorporate specific traffic preferences and enhance scenario realism. However, guiding the sampling process to conform to traffic rules and preferences can result in deviations from real-world traffic priors and potentially leading to unrealistic behaviors. To address this challenge, we introduce a multi-guided diffusion model that utilizes a novel training strategy to closely adhere to traffic priors, even when employing various combinations of guides. This model adopts a multi-task learning framework, enabling a single diffusion model to process various guide inputs. For increased guided sampling precision, our model is fine-tuned using the Direct Preference Optimization (DPO) algorithm. This algorithm optimizes preferences based on guide scores, effectively navigating the complexities and challenges associated with the expensive and often non-differentiable gradient calculations during the guided sampling fine-tuning process. Evaluated using the nuScenes dataset our model provides a strong baseline for balancing realism, diversity and controllability in the traffic scenario generation.

We present supplementary videos for few settings based on the quantitative results in the main paper. Samples generated from STRIVE and MuDi shows the typical trade-off between realism and controllability. Our novel model, MuDi-Pro, effectively addresses the trade-off between realism and controllability.

Guidance Rule Details

No vehicle collision(Rule 1) and no off-road(Rule 2). Rule 1 is designed to prevent collisions between vehicles, whereas Rule 2 is particularly vital since collisions involving non-ego vehicles with the environment are not accounted for during training. Ignoring Rule 2 could lead to a significant number of environmental collisions involving non-ego vehicles.

Goal waypoint(Rule 3). The vehicle is required to pass through a specified goal waypoint at a predetermined time step within its future trajectory.

Target speed(Rule 4). At each time step, vehicles must adhere to a specified speed. In this study, the target speed is set to the speed corresponding to the 75th percentile of the speeds within the initially generated trajectory of each agent.

Max speed(Rule 5). Vehicles are not permitted to exceed a predefined maximum speed limit. Guidance is applied only to vehicles that surpass this threshold. For this work, the maximum allowable speed has been set at $9m/s$ for both cars and trucks.

Additional Results

No vehicle collision + No off-road

In this scene the vehicles are supposed to avoid map collisions and vehicle collisions. STRIVE+opt seems to generate trajectories satisfying the rules, but the orange truck shows unrealistic behaviour. MuDi shows realistic behaviour for most vehicles, but the orange car does not follow the rules colliding with the map. In contrast, MuDi-Pro generates realistic trajectories with all vehicles satisfying the rules.

Ground Truth

STRIVE+opt

MuDi

MuDi-Pro

No vehicle collision + No off-road + Target speed

In this scene, vehicles are supposed to avoid collisions with the map and other vehicles while reducing speed. The STRIVE+opt generates trajectories that exhibit unrealistic behaviour for most vehicles, indicating that the optimization has compromised their data-driven ability. MuDi produces realistic trajectories for most of the vehicles, but the purple car is going off road. in contrast MuDi-Pro generates plausible and realistic trajectories in scenes where STRIVE+opt and MuDi fail to do so.

Ground Truth

STRIVE+opt

MuDi

MuDi-Pro

No vehicle collision + No off-road + Max speed

In this scene, the vehicles are supposed to avoid map collisions and vehicle collisions without exceeding the maximum speed. STRIVE+opt seems to generate trajectories satisfying the rules, but the green car shows unrealistic behaviour. MuDi generates realistic trajectories for most vehicles, but the green car is going off road. In contrast, MuDi-Pro generates plausible and realistic trajectories as ground truth trajectories in the scene where STRIVE+opt and MuDi fail to do.

Ground Truth

STRIVE+opt

MuDi

MuDi-Pro

Acknowledgments

This project page template is based on this page.

Contact

For any questions, please contact Seungjun Yu at seungjunyu@postech.ac.kr

Direct Preference Optimization-Enhanced Multi-Guided Diffusion Model for Traffic Scenario Generation

Direct Preference Optimization-Enhanced Multi-Guided
Diffusion Model for Traffic Scenario Generation