Elusive Black Swans – New Ways to Detect Fraud

In the 17th century, it was a common belief that all swans are white. It was changed when an explorer discovered a black swan, a seemingly impossible occurrence. Black Swan term is now used to describe an outlier event, occurring at very low frequency and it has an extreme impact.

Why is the term Black Swan under the spotlight nowadays and how to hunt them?

People across the world are terming “Coronavirus” as the “Black Swan of 2020”. This COVID-19 era is acting as a fuel to give rise to fraudulent activities across sectors especially banking, financial services, and insurance.

Taking advantage of latest technologies to execute phishing attacks, fraudsters are finding loopholes in organizations’ defences and are passing through payments that on face-value look (almost) genuine.

The biggest question that is still unanswered for many is how do you hunt for black swans? If you go on a mission to find a black swan at night within a flock made up of millions of regular swans then in the dark, these are indistinguishable from the rest of the flock. It isn’t as easy as we think, right?

We need to build fraud detection mechanisms and specifically how “anomaly detection techniques” can be effectively used to detect and prevent such black swans.

Growing need of detecting frauds in the Banking, Financial Services and Insurance (BFSI) sectors

A recent report from the RBI suggests a year-on-year growth of 74% in fraudulent cases in 2018-19. And just comparing the first half of the fiscal year 2019-20, we can see an increase by over 50% in terms of fraud value*

This is not an India centric problem but a global issue where frauds are on an increase. So, let us understand the reasons for these phenomena:

Changing attributes of transactions in recent years

It has become quite convenient to transact large amounts while sitting in the comfort of your homes
Volume and velocity of transaction volumes have been growing
Sending money to offshore locations is quite easy nowadays

Ease of transactions
Today, individuals have multiple payment mediums like IMPS, UPI, mobile wallets to transact among themselves. While bringing more people under the financial network, it also increases the probability of frauds. For example, the presence of applications that can read SMSs on your mobile phones increases potential to commit frauds

Change in the nature of frauds
As new variables are being utilized to detect frauds, the older models or rule engines being used by organizations are fast becoming redundant. It is imperative to revisit your models and adding newer data sources to augment them.

COVID era acting as a fuel to give rise to frauds
It has been generally observed that frauds do rise post any economic recession cycle. Recession would mean slowing of businesses, shuffling, and reskilling of people and a burst of economic activities post a recovery. People hit hard by these circumstances might find it easier to resort to unscrupulous methods to cheat the system.

How machine learning algorithms could be used to detect frauds?

Banking and financial services companies generate large amounts of real-time data across processes. Due to the inclusion of large number of variables, we often have to deal with large dimensionality of variables.

Analysis of high-dimensional data often suffers from the curse of dimensionality and the complicated correlation among dimensions. Dimension reduction methods often are used to alleviate these problems. Existing distance-based algorithms based on dimension reduction usually only rely on applying conventional outlier detection methods to the reduced data. It could deteriorate the performance of outlier detection as it only considers part of the information from data.

For distance-based methods, every point is equally sparse in high dimensional space—rendering distance a useless measure. Instead, we can use “Isolation Forest (iForest)” as is has low processing time even in high dimensional data ^[1]

In addition, Deep learning techniques such as “Neural Network Autoencoder” has proved to detect outliers in high dimensional data.^[2]

iForest proceeds by generating random trees with random splits and calculating path length for each point to isolate it in the leaf node. The output of the iForest is an anomaly score generated for each observation which lies between [-1, 1] which is a function of the average path length. A more negative anomaly score would indicate more anomalous point in the dataset.

A Neural Network Autoencoder’s input and output is the same. The network tries to learn itself over number of iterations through its non-linear computations. As the anomalies in a dataset are abysmally less in number, the neural network efficiently learns the hyperplane of the normal points with greater efficiency. Hence while calculating the reconstruction error of observations, the anomalies show higher reconstruction error than normal points.

So, to put in simpler words, obviously there is a lot of statistics involved behind this, one can view the isolation forest algorithm like a multi-dimensional outlier detection. It might be easier to detect an outlier in one variable say age; so, any value greater than 110 might be labeled as such. There could be spurious additions like value less than 0 again an outlier. But isolation forest might be dealing with 100s of such features and it might be humanly impossible to identify outliers for so many different dimensions. For our rescue comes machine learning algorithms like Autoencoders, that try to find the most prominent features of your dataset and once trained, if any anomaly is passed to it; classifies the anomaly as not part of those features thus helping in identifying black swan events like frauds.

Finding black swans is not a task well suited to present day fraud detection solutions and their consortium models. While no solution can guarantee a protection from fraudsters, but you want to use the best tools and solutions available in the hunt. Look out for our upcoming follow-up post in this series to learn how TransOrg Analytics (www.transorg.com) helps you understand the abnormal behaviour of your swans to be able to spot the black swans in your flock.

You can reach out to us at info@transorg.com to know more about our Fraud Analytics solutions.

*Source: https://energyeducation.ca/encyclopedia/Black_swan_theory
**Source: https://economictimes.indiatimes.com/industry/banking/finance/banking/