How to search for a needle in a haystack

At the Large Hadron Collider at CERN in Geneva, two beams of protons shoot in opposite directions around a 27km long ring, crossing paths at several points inside massive detectors. When they collide (with roughly 1 billion collisions per second), the energy inside the detector results in the production of other particles. The mountains of data collected by those detectors is evaluated by physicists working with the experiments at CERN, searching for signs of new physics (and making sure we understand the physics we’ve already seen).

Let’s briefly outline what happens inside the detector. Most particles in the Standard Model are unstable, which means they decay into other particles very quickly. Only a few stick around for long enough for us to measure them in the detectors; the electron, muon, and photon leave distinctive tracks, as do some simple bound states called hadrons. Neutrinos travel straight through a detector, exiting it without leaving any trace. Detectors are therefore set up to measure these more stable particles, and it is the job of the physicist to trace where the final state particles may have originated from. Let’s illustrate this with a simple Feynman diagram.

Feyn

These diagrams, invented by Richard Feynman, are the bread and butter of particle physics. They are read from left to right in time (the arrows indicate the flow of charge, rather than movement through space); in this diagram, two quarks (which come from the protons) interact at a point, and a Z boson or virtual photon is produced. A (very short) while later, the Z or photon decays into two leptons (let’s imagine it’s two electrons). This all happens in the empty space in the middle of the detector, and the electrons then travel outwards and can be detected. Using precise measurements of the electrons’ momenta, tracks, and angular properties, physicists can guess where the two electrons they measure have come from. Gathering many events where this process occurs and summing the energies of these leptons, we get a plot showing the mass of the Z (91.2 GeV).

Z

In order to search for a new particle, one should therefore decide what the final state of the search should be. That is, to what will this new particle decay? Hypothetically, one could then scan through the data, searching for every event where that final state was identified (say, a pair of leptons as above), add them all together. The problem is that there may be many processes with the same final state, which don’t have anything to do with the particle you’re looking for. These processes are called background, and must be well understood.

An example plan of attack

In order to convince the experimentalists to look for our models, theorists and model builders use simulation tools to replicate exactly what happens at the LHC, including the signal process, background process(es), and detector effects, to outline how a search could be constructed. For illustrative purposes, I’d like to use a paper we published last year where we used simulation to show how we could look for a new particle, called ‘a’, at the future collider (called, imaginatively, the Future Circular Collider, or FCC) which will hopefully be built soon (and perhaps at that point require a name change), which will collide electrons rather than protons.

In the first two sections of the paper, we outline the theory. Models should always have predictive power so that they can be testable. The trusty Feynman diagram appears in section 3, describing the process we’re interested in: two electrons collide, a Z or photon is produced, it radiates an ‘a’ (our target!) and decays to two fermions. In fact, the a will then also decay (in our model, to two taus). This all happens before anything crosses the detector. The final state to be searched for is then two fermions (we looked for electrons or muons) and two taus.

myfeyn

The background process looks like this, where two fermions are taus, and two are electrons or muons:

background

Notice the absence of the a, but that the final state looks identical. The task is to identify the background process and throw it away, while conserving signal. We tried this using only properties of the final states (the solid lines are variations on our signal, and the dashed red is the background), where different behaviour is observed:

mods

But in the end, we turned to machine learning, a topic which needs its own blog post, to help us sort the signal from the background. Results are often given in sigma, or standard deviation of a normal distribution, where 3 sigma is required as ‘evidence’ and 5 sigma is required for discovery. Here’s a nice article about it. The sigma represents the strength of a potential discovery; for instance, for a discovery quoted with a confidence level of 5 sigma, if the null hypothesis is assumed there’s approximately a 1 in 3.5 million chance of measuring the data as is.

At the end of the day, many particle physics searches are done simply to exclude regions of parameter space, or to make even more precise measurements of parameters of interest. The latter is the category of the recently announced result where experimentalists may have detected a hint of new physics. The results come with a confidence limit of 3 sigma, so for now we wait to see if they can make it to 5 (3 sigma results have come and gone in the past).

At its core, the experiments at the LHC and other colliders probe the fundamental laws of nature, producing enormous amounts of data which have to be sifted through to find the needle lying deep in the middle of a haystack. Except in this case, there are hundreds of other needles also in the haystack, and you’d better be sure you’ve found the right one.