“Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology”, Paul E. Meehl1978 (, , ; backlinks)⁠:

Theories in “soft” areas of psychology lack the cumulative character of scientific knowledge. They tend neither to be refuted nor corroborated, but instead merely fade away as people lose interest. Even though intrinsic subject matter difficulties (20 listed) contribute to this, the excessive reliance on significance testing is partly responsible, being a poor way of doing science.

Karl Popper’s approach, with modifications, would be prophylactic. Since the null hypothesis is quasi-always false, tables summarizing research in terms of patterns of “significant differences” are little more than complex, causally uninterpretable outcomes of statistical power functions. Multiple paths to estimating numerical point values (“consistency tests”) are better, even if approximate with rough tolerances; and lacking this, ranges, orderings, second-order differences, curve peaks and valleys, and function forms should be used.

Such methods are usual in developed sciences that seldom report statistical-significance. Consistency tests of a conjectural taxometric model yielded 94% success with zero false negatives.

…I make no claim to bibliographic completeness on the large theme of “What’s wrong with ‘soft’ psychology.” A beautiful hatchet job, which in my opinion should be required reading for all PhD candidates, is by the sociologist Andreski1972. Perhaps the easiest way to convince yourself is by scanning the literature of soft psychology over the last 30 years and noticing what happens to theories. Most of them suffer the fate that General MacArthur ascribed to old generals—They never die, they just slowly fade away. In the developed sciences, theories tend either to become widely accepted and built into the larger edifice of well-tested human knowledge or else they suffer destruction in the face of recalcitrant facts and are abandoned, perhaps regretfully as a “nice try.” But in fields like personology and social psychology, this seems not to happen. There is a period of enthusiasm about a new theory, a period of attempted application to several fact domains, a period of disillusionment as the negative data come in, a growing bafflement about inconsistent and unreplicable empirical results, multiple resort to ad hoc excuses, and then finally people just sort of lose interest in the thing and pursue other endeavors.

Since I do not want to step on toes lest my propaganda falls on deaf ears, I dare not mention what strike me as the most egregious contemporary examples, so let us go back to the late l930s and early 1940s when I was a student. In those days we were talking about level of aspiration. You could not pick up a psychological journal—even the Journal of Experimental Psychology—without finding at least one and sometimes several articles on level of aspiration in schizophrenics, or in juvenile delinquents, or in Phi Beta Kappas, or whatever. It was supposed to be a great powerful theoretical construct that would explain all kinds of things about the human mind from psychopathology to politics. What happened to it? Well, I have looked into some of the recent textbooks of general psychology and have found that either they do not mention it at all—the very phrase is missing from the index—or if they do, it gets cursory treatment in a couple of sentences. There is no doubt something to the notion. We all agree (from common sense) that people differ in what they demand or expect of themselves, and that this probably has something to do, sometimes, with their performance. But it did not get integrated into the total nomological network, nor did it get clearly liquidated as a nothing concept. It did not get killed or resurrected or transformed or solidified; it just kind of dried up and blew away, and we no longer wanted to talk about it or do experimental research on it.

…I am not making some nit-picking statistician’s correction. I am saying that the whole business is so radically defective as to be scientifically almost pointless. This is not a technical hassle about whether Fisbee should have used the Varimax rotation, or how he estimated the communalities, or that perhaps some of the higher order interactions that are marginally statistically-significant should have been lumped together as a part of the error term, or that the covariance matrices were not quite homogeneous. I am not a statistician, and I am not making a statistical complaint. I am making a philosophical complaint or, if you prefer, a complaint in the domain of scientific method. I suggest that when a reviewer tries to “make theoretical sense” out of such a table of favorable and adverse statistical-significance test results, what the reviewer is actually engaged in, willy-nilly or unwittingly, is meaningless substantive constructions on the properties of the statistical power function, and almost nothing else.

…Well, I am not intimidated by Fisher’s genius, because my complaint is not in the field of mathematical statistics, and as regards inductive logic and philosophy of science, it is well-known that Sir Ronald permitted himself a great deal of dogmatism. I remember my amazement when the late Rudolf Carnap said to me, the first time I met him, “But, of course, on this subject Fisher is just mistaken: surely you must know that.” My statistician friends tell me that it is not clear just how useful the statistical-significance test has been in biological science either, but I set that aside as beyond my competence to discuss.