New: WildDash 2 with 4256 public frames, new labels & panoptic GT!
See also: RailSem19 dataset for rail scene understanding.
For all metrics, higher scores are better. To participate in the benchmark, check our submission instructions.
|Meta AVG||Classic||Negative||Impact (PQ)|
Our benchmark evaluates the negative Impact of common visual hazards on algorithm output performance. It is calculated by this formula:
impact = min(metriclow,metrichigh) / max(metricnone,metriclow) - 1.0
The metricsnone/low/high are evaluated on subsets of the benchmark dataset that correspond to the identified severity of the hazard (e.g. the subset Blurhigh contains images which have a lot of blur visible). Positive impacts are truncated to zero.
An impact of -10% at Blur translates to an expected performance degradation for the algorithm of 10 percent when there is a considerable blur in the input image as opposed to supplying the same algorithm a similar image without noticeable image blur.
These are all currently evaluated hazards:
Blur: Image is noticeably affected by blur (e.g. motion blur, defocusing, compression artifacts...)
Coverage: Normally visible parts of the road are covered (e.g. unusual lane markings, snow, leaves...)
Distortion: Visible lens distortion
Hood: Ego-vehicle is visible, non-windscreen parts (e.g. car hood, mirrors)
Occl: Objects are partially occluded or cut off by image border
Overexp.: The scene is overexposed
Particle: Particles in the air obstruct the view (e.g. heavy rain, snow, fog)
Screen: The windscreen is interfering (e.g. interior reflections, wipers, rain on the windscreen,...)
Underexp.: The image is underexposed
Variation: Intra-class variations within the image (i.e. unusual representations of labels like unique cars)
More details on evaluation metrics and negative test cases can also be found on the FAQ page.