The overlap criteria of target and object is the PASCAL criterion:
where T^i denotes the tracked bounding box in frame i, and GT^i denotes the ground truth bounding box in frame i. For n_tp, n_fp, n_fn denoting the number of true positives, false positives and false negatives in a video, precision= n_tp/(n_tp + n_fp ), and recall= n_tp/(n_tp + n_fn ). The F-score combines the two:
Having a large video dataset we can provide a comprehensive performance evaluation without discussing each video separately. Thus we avoid the risk of being trapped in the peculiarity of the single video instance. We visualize the performance of a set of videos by sorting the videos according to the outcomes of the evaluation metric. By sorting the videos, the graph gives a bird's eye view in cumulative rendition of the quality of the tracker on the whole dataset. Note that the order of videos becomes different for each tracker. These types of plots are referred to as \emph{survival curves}, a terminology borrowed from medicine to test the effectiveness of treatments on patients. The survival curve indicates how many sequences, and to what percentage of the frames' length the tracker survives (by the Pascal 50% overlap criterion). The survival curve is estimated by the Kaplan-Meier estimator.
To evaluate the relative performance of the nineteen trackers on each of the 315 video sequences objectively, we have performed the Grubbs outlier test per video to look for an outstanding performance of trackers. We use the one-sided Grubbs statistics.
Matlab functions:
Compute F-score for a video:
Plot survival curves: a survival curve depicts the performance of a trackers on all videos in the dataset. The videos are sorted according to the outcomes of the evaluation metric. By sorting the videos, the graph gives a bird’s eye view in cumulative rendition of the quality of the tracker on the whole dataset. Note that the order of videos becomes different for each tracker. The survival curve indicates how many sequences, and to what percentage of the frames’ length the tracker survives
Kaplan-Meier estimator:
Grubbs test:
Plot color charts: a color chart also depicts the performance a a tracker on a selected set of videos. The color chart keeps the order of the videos in the categories intact. It uses color to indicate the evaluation score, ranging from black (worst) to white (best). By placing two color charts of two trackers next to each other, we can visually compare their performance on individual tracker