"[...] many statistical results from scientific studies that showed great significance early in the analysis are less and less robust in later studies. For instance, a pharmaceutical company may release a new drug with great fanfare that showed extremely promising results in clinical trials, and then later, when numbers from its use in the general public trickle back, shows much smaller effects. Or a scientific observation of mate choice in swallows may first show a clear preference for symmetry, but as time passes and more species are examined or the same species is re-examined, the effect seems to fade.
This isn't surprising at all. It's what we expect, and there are many very good reasons for the shift.
Regression to the mean: As the number of data points increases, we expect the average values to regress to the true mean…and since often the initial work is done on the basis of promising early results, we expect more data to even out a fortuitously significant early outcome.
The file drawer effect: Results that are not significant are hard to publish, and end up stashed away in a cabinet. However, as a result becomes established, contrary results become more interesting and publishable.
Investigator bias: It's difficult to maintain scientific dispassion. We'd all love to see our hypotheses validated, so we tend to consciously or unconsciously select reseults that favor our views.
Commercial bias: Drug companies want to make money. They can make money off a placebo if there is some statistical support for it; there is certainly a bias towards exploiting statistical outliers for profit.
Population variance: Success in a well-defined subset of the population may lead to a bit of creep: if the drug helps this group with well-defined symptoms, maybe we should try it on this other group with marginal symptoms. And it doesn't…but those numbers will still be used in estimating its overall efficacy.
Simple chance: This is a hard one to get across to people, I've found. But if something is significant at the p=0.05 level, that still means that 1 in 20 experiments with a completely useless drug will still exhibit a significant effect.
Statistical fishing: I hate this one, and I see it all the time. The planned experiment revealed no significant results, so the data is pored over and any significant correlation is seized upon and published as if it was intended. See previous explanation. If the data set is complex enough, you'll always find a correlation somewhere, purely by chance.
[...] Yes, science is hard. Especially when you are dealing with extremely complex phenomena with multiple variables, it can be extremely difficult to demonstrate the validity of a hypothesis (I detest the word "prove" in science, which we don't do, and we know it; Lehrer should, too). What the decline effect demonstrates, when it occurs, is that just maybe the original hypothesis was wrong." P.Z.Myers, Science is not Dead, blog Pharyngula.