A simple explanation of a common concept in epidemiology

In our introduction to epidemiology we explain how most epidemiological studies compare one group to another.  But sometimes, the starting point isn't a systematic study designed to test a hypothesis like that.  Sometimes, the starting point is an observation of an unexpected group of cases of the same disease all in the same place or all at the same time.  This is called a "cluster".

Clusters can be in space or in time or often both.  Several cases of a rare cancer over a short period of a few years in one village, for instance, would set alarm bells ringing and would trigger an investigation of this apparent cluster.  A cluster will probably attract even more attention if it is clustered round an obvious candidate source of the disease.  Clusters have been reported round incinerators, radio masts, nuclear power stations, railways, power lines, and many other things besides.

When is a cluster not a cluster?

The human mind is very good at fitting observations into patterns.  When we see those cases apparently clustered together, we are bound to think we've seen a pattern.  Professional epidemiologists try to be more systematic.  They will ask the following questions:

Are the cases all defined the same way?

Are they all from the same geographical area and was that area properly defined?

And crucially, just how unlikely is this clustering of cases?  Four cases of one rare type of cancer in a single village may look very much out of the ordinary.  But suppose there are four hundred cases a year of that cancer nationally.  There are also an awful lot of villages nationally.  Just by chance, some villages will have no cases at all, but again, just by chance, some will have four or even more.  Or again, several adults getting cancer on the same street in just a couple of years may strike the residents as a cluster.  But cancer is sadly common - one in three adults will get it at some point - so there are bound to be some streets where it so happens several adults all do get cancer quite close together.  Epidemiologists use formal statistical techniques to investigate just how unlikely any given cluster is.

Pitfalls of cluster analysis

There is always a risk of fitting the definition to suit the observation.  That can happen both with defining the disease and with defining the extent of the cluster.

Suppose our first  two cases were childhood leukaemia.  We started off thinking we were looking at a cluster of leukaemia.  Then we heard about another case, this time a childhood brain tumour.  We might be tempted to change our definition - to say that the cluster is now all childhood cancers.  Of course, the observation is still absolutely valid.  But the statistical weight it carries is lessened if we don't have a clear and consistent definition of what's included.

Similarly, it is only natural to draw an area on the map that encloses the cases we have observed, and to say that that line is the extent of the cluster.  But that is fitting the definition to maximise the strength of the evidence.  The cluster will carry more weight if it fits some definition that existed before the cases were observed - an existing town or village boundary for example.

Can we learn anything from clusters?

Many groupings of cases that strike the people who first notice them as a cluster don't stand up to rigorous scrutiny - either they depend on definitions, of the cases or the area, that have been chosen to fit the evidence, or they turn out statistically not to be sufficiently unusual on a national scale, however remarkable they may look in isolation.

But many clusters have been found to be absolutely valid and have helped uncover causes of disease.  This is particularly true for occupational disease, where cancer among chimney sweeps in 18th century London, osteosarcoma among female watch dial painters in the 20th century, skin cancer in farmers, bladder cancer in dye workers exposed to aniline compounds, and leukemia and lymphoma in chemical workers exposed to benzene are all examples of clusters.