par Wu, Song ;Troupiotis-Kapeliaris, Alexandros;Zissis, Dimitris;Torp, Kristian;Zimanyi, Esteban ;Sakr, Mahmoud
Référence Proceedings (IEEE International Conference on Mobile Data Management), page (109-118)
Publication Publié, 2024-10-01
Référence Proceedings (IEEE International Conference on Mobile Data Management), page (109-118)
Publication Publié, 2024-10-01
Article révisé par les pairs
Résumé : | Recent advances, especially in deep learning, allow to effectively detect ship targets in surveillance videos. However, the translation of these detections to the real-world locations of ships has not been sufficiently explored. The common approach in the literature is using a transformation matrix to convert a pixel to a real-world coordinate. However, this approach has three shortcomings: first, a set of reference point pairs has to be manually prepared to establish the matrix; second, the matrix always maps a pixel to the same real-world coordinate, ignoring that there is no one-to-one correspondence between discrete pixel coordinates and continuous real-world coordinates; third, this approach can only work with one camera. In light of this, we propose a technique PixelToRegion that explicitly takes into account the uncertainty in coordinate conversion by mapping each pixel to a spatial polygon. Next, we propose a new algorithm MCbSLE that can estimate ship locations using pixel sets from multiple cameras. The precision of location estimation by MCbSLE is enhanced through spatial intersection between polygons from different cameras. Experiments are conducted under 16 carefully designed multi-camera settings to evaluate MCbSLE w.r.t. four factors: different ports, the number of cameras, the distance between cameras, and camera headings. Results on one-day ship trajectory data show that (1) an 79.8% accuracy in the number of coordinates can be achieved by MCbSLE when there are no more than 10 ships in camera views; (2) using multiple cameras can improve the precision of location estimation by one order of magnitude compared with using one camera. |