Object detection is a fundamental problem in computer vision, playing a critical role in various downstream applications, including tasks like video analysis, image captioning, and real-world applications such as embodied vision and autonomous driving. Models often struggle with limited generalization and become costly to deploy in real-world settings, with challenges such as flexible label spaces, unknown categories, and unseen image domains. To address these limitations, this thesis focuses on advancing object detection in the wild. Specifically, it explores open-set object detection, domain generalized object detection, and enriching objects using generative models. The aim is to develop methods that enhance the adaptability and robustness of object detection models in diverse and unpredictable environments.