Version v0.1 of the SEmantic Large-Scale Multimodal Acquisitions dataset, or SELMA Dataset for short, has just been released.
The paper preprint is available at ArXiv: 2204:09788.

The dataset was realized by the SCANLAB and the LTTM research groups at the University of Padova.

On this page, we will overview the main features of the dataset itself.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Modalities (Sensors):

One of the major focuses of this dataset is the multimodal aspect of the data samples provided. In particular, we rendered 30’909 unique scenes in 27 environmental conditions, capturing the scene with the following sensors:

  • 7 Color Cameras:
    • 3 frontal cameras (FRONT_LEFT, FRONT, FRONT_RIGHT), rectified and separated by 25cm. Can be used as 3 distinct stereo setups with baselines: 25cm (FL-F), 50cm (FL-FR) and 25cm (F-FR).
    • 1 desk camera (DESK).
    • 3 ring cameras (LEFT, RIGHT, BACK), facing in the remaining cardinal directions, providing a 360° view.
  • 7 Depth Cameras, placed in coincidence with the color cameras to provide 2.5D information. The depth is provided in CARLA format, to get the depth in meters one must load the PNG file as an RGB image and perform the following computation: d = 1000*(R + G*256 + B*256*256)/(256*256*256 - 1).
  • 3 LiDARS (64 channels, ~100k points per scan, provided in binary PLY format):
    • 1 top LiDARs (TOP), with a 360° view of the scene.
    • 2 front-lateral LiDARs placed on the headlights with 270° of aperture, overlapping in the front of the ego vehicle. The three LiDAR pointclouds can be merged to achieve higher density in such region.
  • 1 Surface Normal Camera (estimated):
    • For a standard subset of our dataset (30909 samples in total) we also provide the pre-computed surface normals of the scene, as seen from the desk camera. They were estimated using the technique described in these articles [1, 2], starting from the ground truth depth.

Semantic Classes:

To better align with the common synthetic-to-real segmentation benchmark, Cityscapes, we generated the semantic labels for each sample (both for the cameras and the LiDARs) in a strict superset (36 classes in total) of the classes considered in the benchmark:

Class Name Raw ID Train ID Cityscapes ID
(35 classes)
Color
(RGB)
Unlabeled 0 -1 0
(000, 000, 000)
Building 1 2 11
(070, 070, 070)
Fence 2 4 13
(190, 153, 153)
Other 3 -1 0
(000, 000, 000)
Pole 5 5 17
(153, 153, 153)
Road Line 6 0 7
(110, 190, 160)
Road 7 0 7
(128, 064, 128)
Sidewalk 8 1 8
(244, 035, 232)
Vegetation 9 8 21
(107, 142, 035)
Ego Vehicle 10 -1 1
(000, 000, 000)
Wall 11 3 12
(102, 102, 156)
Traffic Sign 12 7 20
(220, 220, 000)
Sky 13 10 23
(070, 130, 180)
Ground 14 -1 6
(081, 000, 081)
Bridge 15 -1 15
(150, 100, 100)
Rail Track 16 -1 10
(230, 150, 140)
Guard Rail 17 -1 14
(180, 165, 180)
Traffic Light 18 6 19
(250, 170, 030)
Static 19 -1 4
(110, 190, 160)
Dynamic 20 -1 5
(111, 074, 000)
Water 21 -1 36 (extra)
(045, 060, 150)
Terrain 22 9 22
(152, 251, 152)
Person 40 11 24
(220, 020, 060)
Rider 41 12 25
(255, 000, 000)
Car 100 13 26
(000, 000, 142)
Truck 101 14 27
(000, 000, 070)
Bus 102 15 28
(000, 060, 100)
Tram/Train 103 16 31
(000, 080, 100)
Motorcycle 104 17 32
(000, 000, 230)
Bicycle 105 18 33
(119, 011, 032)
Unknown 255 -1 0
(000, 000, 000)
Desk camera pixel and LiDAR point class frequencies of the SELMA dataset, compared to Cityscapes

Dataset Structure

The acquisitions are archived per scene and per sensor. A scene is a combination of a town and an environmental condition (weather and time-of-day). For instance, Town02_Opt_CloudyNoon is the scene acquired in Town02_Opt with a Cloudy sky at Noon. From the website, it is possible to select the data to download by combining different towns and environmental conditions. Furthermore, we offer the possibility to download only a subset of sensor data (e.g., only that acquired by the CAM_LEFT, i.e., the camera on the left side of the vehicle).

Weather Clear Cloudy Wet WetCloudy SoftRain MidRainy HardRain  
Time-of-day Noon Sunset Night          
Town Town01_Opt Town02_Opt Town03_Opt Town04_Opt Town05_Opt Town06_Opt Town07_Opt Town10HD_Opt

Download Instructions

1. Navigate to https://scanlab.dei.unipd.it/app/dataset

2. Login or register to access the data. We need you to register to monitor the traffic to/from our server and avoid overloading it. We will only use your e-mail for very limited service communications (new dataset releases, server status).

3. Select the subset of data you want to download using the selection tool available on the dataset page. We suggest starting with the SELMA random split (SELMA rand), that can be downloaded without setting any selection filter.

4. When prompted, download and unzip the archive. The zip contains the downloader toolkit: an executable downloader and a JSON file with the list of files to download.

5. Enable the execution of the downloader for your platform changing the permission accordingly (chmod +x downloader-<yourOS>.sh on Unix systems, double click and accept on Windows).

6. Execute the downloader and.. you’re set! You will find a new “CV” folder in the dataset-downloader-toolkit folder with all the data.

The data will be downloaded in a subfolder structure <SCENE_NAME>/<SENSOR_NAME> where <SCENE_NAME> is obtained as <Town_EnvironmentalCondition>.