WeShap: Weak Supervision Supply Analysis with Shapley Values
Authors: Naiqing Guan, Nick Koudas
Summary: Environment friendly information annotation stands as a major bottleneck in coaching modern machine studying fashions. The Programmatic Weak Supervision (PWS) pipeline presents an answer by using a number of weak supervision sources to routinely label information, thereby expediting the annotation course of. Given the numerous contributions of those weak supervision sources to the accuracy of PWS, it’s crucial to make use of a sturdy and environment friendly metric for his or her analysis. That is essential not just for understanding the conduct and efficiency of the PWS pipeline but in addition for facilitating corrective measures. In our research, we introduce WeShap values as an analysis metric, which quantifies the common contribution of weak supervision sources inside a proxy PWS pipeline, leveraging the theoretical underpinnings of Shapley values. We show environment friendly computation of WeShap values utilizing dynamic programming, attaining quadratic computational complexity relative to the variety of weak supervision sources. Our experiments show the flexibility of WeShap values throughout varied functions, together with the identification of helpful or detrimental labeling capabilities, refinement of the PWS pipeline, and rectification of mislabeled information. Moreover, WeShap values support in comprehending the conduct of the PWS pipeline and scrutinizing particular situations of mislabeled information. Though initially derived from a particular proxy PWS pipeline, we empirically show the generalizability of WeShap values to different PWS pipeline configurations. Our findings point out a noteworthy common enchancment of 4.8 factors in downstream mannequin accuracy by the revision of the PWS pipeline in comparison with earlier state-of-the-art strategies, underscoring the efficacy of WeShap values in enhancing information high quality for coaching machine studying fashions.