Since the 1960s, several major subfields of artificial intelligence like computer vision, graphics, and robotics have developed largely independently. But the community has recently recognized that progress toward self-driving cars requires an integrated effort across different fields. This motivated us to create KITTI-360, successor of the KITTI dataset, a new suburban driving dataset which comprises richer input modalities, comprehensive semantic instance annotations and accurate localization to facilitate research at the intersection of vision, graphics and robotics. To get an overview of the dataset, we invite you to enjoy our (overly) dramatic cinematic video trailer that we have produced.
Recently, we opened up all our evaluation servers for submission. In this blog post, we are excited to share some of the novel and exciting tasks that KITTI-360 considers at the intersection of vision, graphics and robotics.
How to parse the scene?
Semantic scene understanding is one of the key capabilities of autonomous driving vehicles. With KITTI-360, we establish scene perception benchmarks in both 2D image space and 3D domain:
- 2D Semantic/Instance Segmentation
- 3D Semantic/Instance Segmentation
- 3D Bounding Box Detection
- Semantic Scene Completion
We evaluated several baselines to bootstrap the leaderboards and assess the difficulties of the tasks:
For some of the tasks, we show interactive plots to illustrate results of the submission on our website. Here is an example for the semantic scene completion task: