Case studies:
Signal Identification in Particle Physics


Experiments in particle physics face the imgproblem of identifying elementary particles produced at frontier energy colliders. Typical colliders have millions of channels of electronics, producing terabytes of data per second. These data are analyzed in real time and reduced to a few terabytes per day that is stored for later analysis. Of the billion particle collisions occurring each second, only a few are of interest. Finding these interesting -but possibly unanticipatedcollisions in such a massive data stream represents a challenging test of forefront technology and computational power.



  • Noesis methodology follows a research direction that is unconventional. The problem is not to maximize the accuracy of a predictive model, but rather to provide enough evidence to believe that the signal event occurred multiple times during a photon-proton interaction. This is tantamount to searching for an event buried in tons of noisy data.


  • This study exploits information derived from physical constraints obtained from experts in particle physics. Our methodology enhances the set of original features for classification by exploiting such physical constraints.



We used machine learning techniques for classification. We first trained a predictive model img4using simulated data (Monte Carlo simulations) to obtain a model in a controlled setting. We then tested our model on real data using several classification techniques. We found that a particular technique called random forests can detect event signals with 91% confidence.


Conclusions and Benefits

Noesis methodology is instrumental in case studies where the goal is to find rare signal events buried in an enormous amount of background data.   Our methodology successfully identifies those rare signal events by using predictive models induced from training data and physical constraints.


In addition, our approach obviates forming decisions after a bump signal has been observed protruding above a background distribution; this is important to avoid the risk of finding patterns stemming from random sampling. Instead we indicate when the presence of a signal is likely using statistical tests.


This technology can be extremely useful in scenarios where a particular event of interest is buried in tons of background data.