Abstract: Siamese-network-based trackers convert the general object tracking as a similarity matching task between a template and a search region. Using convolutional feature cross correlation (Xcorr) ...
Abstract: Open-vocabulary semantic segmentation aims to assign pixel-level labels using arbitrary text queries, but existing CLIP-based methods often produce diffuse similarity maps and struggle with ...