Open-World Vision Applications use computer vision techniques to analyze and understand visual data in a dynamic environment. Researchers are developing novel approaches for learning image-region representation, i.e., focusing on regions within an image. It offers advantages such as a more comprehensive understanding of visual content, enhanced adaptability, and integration of contextual information. The challenges of learning image-region representation for open-world vision applications include model generalization and data availability. This research proposes learning frameworks for Open-Vocabulary Multi-Label Classification (OVML) and Open-Vocabulary Semantic Segmentation (OVS).