Object detection in images using natural language prompts has recently become popular due to the extensive development of open-vocabulary perception tasks. We propose an enhancement strategy for open-vocabulary object detection models through a fine-tuning approach named Reframing. The cornerstone of our method is transforming a user’s query into a prompt optimally aligned with the requirements of a specific detection model. This transformation is facilitated by integrating an additional Large Language Model, which undergoes further training. We propose online and offline modes of Reframing, employing deep learning techniques and feedback from the detection models to refine its query transformation process. Our methodology has been applied to various detection models, showcasing its capability to significantly improve prediction accuracy. Developed code and dataset are available at https://github.com/ZoyaV/reframing.
DOI: 10.1007/978-3-031-74186-9_11
Download the code and dataset at GitHub: https://github.com/ZoyaV/reframing
Avshalumov, M., Volovikova, Z., Yudin, D., Panov, A. (2025). Reframing: Detector-Specific Prompt Tuning for Enhancing Open-Vocabulary Object Detection // Hybrid Artificial Intelligent Systems: 19th International Conference, HAIS 2024, Salamanca, Spain, October 9–11, 2024, Proceedings, Part II. Pp. 128–140.