Dr. H. Gilbert Welch is an American physician and cancer screening researcher. As a former professor at the Dartmouth Institute for Health Policy and Clinical Practice, he has published many peer-reviewed papers about the harms of early detection and specifically of cancer screening—the systematic search for cancer before it causes symptoms.
Welch is also a science writer. His first book, published in 2004, was Should I Be Tested for Cancer? Maybe Not and Here’s Why. Welch, along with researchers Lisa Schwartz and Steven Woloshin, wrote Overdiagnosed: Making People Sick in the Pursuit of Health, which deals with screening and other cases where medicine probably causes more harm than good. His latest book, published in 2015, is titled Less Medicine, More Health: 7 Assumptions That Drive Too Much Medical Care.
Welch and I discussed why diagnosing cancer early is not necessarily always a good thing.
When we are discussing problems of screening, how can we make the message clear that not all medical care is being criticized?
I am a conventionally trained physician and believe medical care can do a lot of good—particularly for people who are sick and injured. Making a timely diagnosis in people who are sick is really important. What I am worried about is when medical care expands to the population that is well, because it is hard to make a well person better, but it is not that hard to make them worse.
We might involve a thousand people in a screening program for ten years, and one person is helped. This is good, but an important question is: What happened to the other 999? That is where I have been in my career for the past twenty years.
What is the main idea behind screening and its problems?
In the past, doctors waited for problems to develop in a population and made diagnosis and treatment in that fraction. The idea of screening or early detection is to advance in time the moment of diagnosis in the same population. The assumption behind screening is that the people diagnosed early will be those destined to develop problems.
However, the reality has been different; whenever we look hard for early forms of disease, we find that more people have them. Thus, not all of them will develop problems. Because we do not know who is going to develop problems, we tend to treat all of them. This means we are treating some people for whom the disease would never be a problem. It is the overdiagnosed and needlessly treated fraction that cannot be helped but can be harmed.
Overdiagnosis happens to a relatively few individuals. A more common problem of screening is the disease scare—a false positive result. Many individuals require multiple visits and multiple tests before we are sure they don’t have cancer. Patients understand that medications can be harmful, but they cannot imagine how a test could be harmful. They think that it is always good to know, but they do not recognize the cascade of events that a test can trigger. Even a perfectly safe test can lead to a series of events that can harm people.
Finally, to promote screening we need to scare people about the disease (“that’s why you need to be screened”). In other words, we are making everybody more worried about the future. Ironically, part of being healthy is being not too worried about health. Screening is responsible for injecting some “dis-ease” into the population.
What is the effect of screening/early detection in survival statistics?
With more detection, the typical patient now does better. Among patients with the disease, they appear to have survived longer. This happens because people who are overdiagnosed or have less severe forms of disease are included in the “disease” group. Screening effects are really misleading: the harder you look, the more you find, and everyone appears to be better. It is related to the popularity paradox of screening: the more overdiagnosis screening causes, the more popular screening becomes.
What have we learned about cancer progression and its relationship with screening?
Cancer is much more heterogeneous than we thought. Abnormalities that meet the pathological definition of cancer could have very different natural histories; they have variable growth rates.
It has been described as the barnyard pen of cancers. There are three animals in the barnyard: the birds, the rabbits, and the turtles. The goal of screening is to fence them in, to catch them early. However, we cannot catch the birds, because they are already gone. Birds are the most aggressive cancers; they have already spread by the time they are detectable. Screening does not help with those cancers. Sometimes we can treat them, but they are the worst type.
It is possible to catch the rabbits if you build enough fences. The rabbits are the cancers that can be detected earlier and will bother patients. So screening may help in these cases. For screening to be of help, treatment needs to be more effective early than it is late. Sometimes this is not true. In the case of breast cancer, for example, a two-centimeter tumor can be treated as well as a one-centimeter tumor.
Finally, we don’t need any fences for the turtles, because they are not going anywhere. Turtles meet the pathological definition of cancer. However, they are either not growing or growing so slowly that they will never cause problems until the patient dies from something else. Or they are regressing—some cancers start and they disappear; perhaps they are recognized by a well-functioning immune system.
The unfortunate reality is that screening is very good at finding turtles. Doctors are not able to distinguish turtles from rabbits; thus we treat everybody, creating the major harm of early detection: overdiagnosis and overtreatment.
How has screening affected the incidence of prostate cancer?
Note how the incidence of prostate cancer in the United States bounces around (see Figure 1). There is no known tumor biology or carcinogenic process that can explain this graph. It looks more like a financial chart than a cancer incidence chart. And this is not a small-number problem; it is the most common cancer in the database.
The graph can be divided into four phases. It begins in 1975 with the growth of transurethral resection of the prostate (TURP), which at the time was a common prostate surgery done to help men with large prostates. With more pieces of prostates being sent to pathologists, the incidence of prostate cancer slowly increased. The second phase is PSA promotion, when hospitals started to offer free PSA tests, knowing they would make their money back in subsequent blood tests, biopsies, and treatments. Around 1995, the retrenchment era began with urologists recognizing that they should not offer PSA screening for men with less than ten years of life expectancy, since they cannot be helped by screening. Finally, the discouragement took place after the U.S. Preventive Services Task Force argued against PSA screening. It is remarkable the incidence at present is almost the same as in 1975. In other words, this is a scrutiny-dependent cancer. I do not know of a more powerful example of how the health care system affects the apparent amount of cancer.
Among common cancer screening programs (for cervical, colorectal, breast, and prostate cancers), what are their effects in the mortality of those cancers?
We never had a randomized trial of cervical cancer screening; it was implemented before we considered randomized trials. There is a lot of observational data that suggests it is helpful, but it does not explain the 80 percent reduction in cervical cancer mortality. For instance, we have seen an 80 percent reduction in stomach cancer mortality, and it is a cancer that we do not screen for. Colon cancer mortality is also declining, and the fall started before the introduction of screening.
Screening for cervical cancer and colorectal cancer has had some effect in the mortality of those cancers. Breast cancer screening has had only a little effect on breast cancer mortality. The big effect in breast and prostate cancer is better treatment—we learned those cancers are hormonal diseases.
How do you see the risk-to-benefit ratio of those cancer screening programs?
In general, people consider colorectal and cervical cancer screening on the side of more benefit than harm. I think this is largely because the problem of cancer overdiagnosis is less evident in those cases. Because they detected precancerous lesions, overdiagnosis takes place at a prior step, dysplastic polyps or cervical dysplasia. In colorectal cancer screening, there are complications from colonoscopy and from polypectomies (e.g., bleeding, perforations). In cervical cancer screening, there are complications from cryotherapy and excisions for precancerous lesions (e.g., bleeding, preterm birth).
Cancer screening has a mix of effects. Most screening, including PSA and mammography, does help a few people but also harms others. This is the conundrum we must be clear about. So, screening is not a public health imperative; it’s a choice.
Screening can distract people from more important things they can be doing for their health. It can also distract resources from other more important interventions. There are two very different aspects to the word prevention. One is health promotion from behavior advices, such as do not smoke, eat real food, move regularly, and find meaningful relationships. They are not sexy or technological but are very important to health. But when the prevention movement got medicalized, it became a technological imperative to look for early forms of disease.
We also have to be sensible with the overdiagnosis problem. We have to stop thinking the best test is the one that finds more cancers. Typically, that is how tests are promoted, “This test finds more cancer than that one.” That is not a good test; we are not looking to find more cancers; we want to find a few cancers that matter.
How can we make screening better, for instance to find those cancers we can make a difference in?
This is best exemplified in the case of lung cancer screening. In the United States, lung cancer is the most common cause of cancer death; it is a big problem. There is a really well-defined risk group, which can be identified by a single question: “Do you smoke?” We have a really common cause of death and an easy way to find a high risk group—it is a perfect situation for screening.
Lung cancer was the first cancer studied for screening, and it happened in the 1980s using chest X-ray. The results were terribly disappointing: screening led to more deaths, not fewer. This happened because screening triggered operations, and some died from those operations. The idea of overdiagnosis in lung cancer was crazy, but it happened. Then, spiral CT comes along. Importantly, the investigators responsible for spiral CT trial knew about overdiagnosis. What they did was groundbreaking. When the spiral CT found a small lesion that looked worrisome, they did not act and did not biopsy immediately; they waited three months to see whether the lesion was growing. They were making use of the diagnostic value of time. Time provides information both about the genetics of the tumor and the body’s reaction to it. I think that is a step forward.
Everything changes when you move to a genuine high-risk population (recall that regular cigarette smokers are twenty times more likely than non-smokers to die from lung cancer). They are much less likely to be overdiagnosed and much more likely to be helped. But there are not a lot of risk factors as common and powerful as cigarette smoking. Most cancers are sporadic and not the result of some obvious risk factor.
All-cause mortality is not reduced in population-wide cancer screenings trials. Can you explain why it matters?
It begins with what counts as a cancer death. In the context of evaluating a screening, I want cancer death not only to include deaths from cancer but also deaths due to interventions performed as part of looking for and treating the cancer. That is not what happens. That is why all-cause mortality is important. If we are going to tell people that screening “save lives,” I would like to know if it changes their risk of death. Unless you want to play a game that you care more about one type of death than another.
A good example is the classic Minnesota Colon Cancer Control Study. It now has thirty years of follow-up. There are three arms in the study: annual and biennial screening and control group. After thirty years, 2 percent of the annual group and 3 percent of the control group died from colon cancer. This is the benefit: 1 percentage point, or to put it in relative terms, a 33 percent reduction in colon cancer death. However, all-cause mortality was the same in all groups (Figure 2). It is hard to say that is saving lives; it may be trading one form of death for another.
Because screening benefits are not large and there are harms, what are the reasons for the heavy promotion of screening?
The first is a true belief that early detection must help, as a solution to every bad disease. Money is another part, because screening is a great way to recruit new patients. It is good for Pharma, for test manufacturers, and increasingly good for our hospitals. It is a powerful idea to look for diseases early. If you could argue that everyone should do something, it is a huge market.
What about clinical breast examination and self-breast examination often advertised to women?
The data is clear that clinical breast exam and teaching women to self-examine their breasts does not seem to help. But if a woman becomes aware of a new breast lump, she should have it evaluated. Part of the attention to breast cancer has been good. Ironically, it is possible that screening mammography could be the best way to do the clinical breast exam, if the threshold would be looking for things 1 cm or bigger. I think a lot of harm from mammography could be reduced if the thresholds for further investigation were much higher.
The general conundrum of screening is we have to involve a whole bunch of people to potentially help a very few. We have to pay attention to not disturb the rest of them.
How do you see the paper that claimed an increase in advanced cases of prostate cancer after USPSTF 2012 recommendation against screening?
That report—an increased number of late stages of prostate cancer—was highly flawed. They were only talking about “counts”; they never had a denominator.
In the U.S. data so far (Figure 3), the incidence of metastatic prostate cancer at first presentation (the cancer was already metastatic at the moment of diagnosis) continues to stay stable. But I expect it will go up.
What you see is the implementation of PSA screening really had an effect on that incidence; it almost cut it in half. This is a sign that the bad cancers are being found early. Now it’s been fairly stable, but I wouldn’t be surprised if it goes back up, because PSA screening is going down. But whether that changes death rates is a separate question, because early treatment must matter.
Notice, in comparison, the incidence of metastatic breast cancer at first presentation never changes; it is pretty stable. Mammography screening has not been able to reduce the amount of breast cancer diagnosed at this very late stage. That’s not the mammographers’ fault; that’s the fault of the aggressive cancers (the birds in the barnyard analogy).
- Shaukat, A., S.J. Mongin, M.S. Geisser, et al. 2013. Long-term mortality after screening for colorectal cancer. New England Journal of Medicine 369(12): 1106–14. doi: 10.1056/NEJMoa1300720.
- Welch, H.G., and O.W. Brawley. 2018. Scrutiny-dependent cancer and self-fulfilling risk factors. Annals of Internal Medicine 168(2): 143–144. doi: 10.7326/M17-2792.
- Welch, H.G., D.H. Gorski, and P.C. Albertsen. 2015. Trends in metastatic breast and prostate cancer—lessons in cancer dynamics. New England Journal of Medicine 373: 1685–1687. doi: 10.1056/NEJMp1510443.