A major challenge that arises in Weakly Supervised Object Detection (WSOD) is that only image-level labels are available, whereas WSOD trains instance-level object detectors. A typical approach to WSOD is to 1) generate a series of region proposals for each image and assign the image-level label to all the proposals in that image; 2) train a classifier using all the proposals; and 3) use the classifier to select proposals with high confidence scores as the positive instances for another round of training. In this way, the image-level labels are iteratively transferred to instance-level labels. We aim to resolve the following two fundamental problems within this paradigm. First, existing proposal generation algorithms are not yet robust, thus the object proposals are often inaccurate. Second, the selected positive instances are sometimes noisy and unreliable, which hinders the training at subsequent iterations. We adopt two separate neural networks, one to focus on each problem, to better utilize the specific characteristic of region proposal refinement and positive instance selection. Further, to leverage the mutual benefits of the two tasks, the two neural networks are jointly trained and reinforced iteratively in a progressive manner, starting with easy and reliable instances and then gradually incorporating difficult ones at a later stage when the selection classifier is more robust. Extensive experiments on the PASCAL VOC dataset show that our method achieves state-of-the-art performance.