In deep learning, should we break the superstitious idea of a 1:1 ratio between positive and negative samples?

2024-04-29

Q: In deep learning, should we break the superstitious idea of a 1:1 ratio between positive and negative samples?

A: The principle of 1:1 positive and negative samples is aimed at achieving loss convergence. It is an intuitive analogy: 1 + (-1) = 0 or x * 1/x = 1. This suggests that, in the context of loss representation, for a given number of positive (negative) samples, there exists another number of negative (positive) samples that can be found, forming a reciprocal relationship, such that under the influence of loss, a "unit element" (or "zero element" / "convergence factor") is obtained. Therefore, to achieve loss convergence, it is necessary to uncover the underlying quantitative relationship between positive and negative samples, and a 1:1 ratio is a common quantitative relationship in many simple classification problems. The poster mentioned that there are scenarios where the positive and negative samples are extremely imbalanced, but they are still aiming to fit the rule of a 1:1 sample ratio. Is it possible that the perspective on the positive and negative relationship is not reasonable enough? Consider more about the initial data representation and see if changing the sample measurement method works.