Abstract: Deep learning has become a dominant force in artificial intelligence; however, its substantial energy consumption and associated carbon emissions have received relatively little attention.
Aligning objects with corresponding textual descriptions is a fundamental challenge and a realistic requirement in vision-language understanding. While recent multimodal embedding models excel at ...
Abstract: Text-Pedestrian Image Retrieval employs textual description of pedestrian's appearance to identify the corresponding pedestrian image. This task involves modality discrepancy and the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results