Fraud detection is a critical challenge in various domains, necessitating accurate and reliable methods to distinguish between legitimate and fraudulent transactions. This work explores the application of copula-based models for anomaly detection in financial forensics. It focuses on their effectiveness in identifying fraudulent activities in a highly imbalanced dataset. Copula models are designed to capture the dependencies between continuous variables, providing a flexible framework for modeling the joint distribution of features. For instance, the variables in a dataset can follow different distributions and the copula is able to model how these variables jointly behave, particularly in extreme cases. In this work, we used a fraud dataset to calculate the copula-based probability of fraud and conditional Gaussian copula. Then, we derive the copula-based Generalized Linear Model (GLM) formula from the conditional copula which is essentially a GLM with probit link and transformed variables when the covariates are continuous. Finally, we compare the performance of these copula-based models with standard methods for predicting a binary variable like GLM with a probit link and logistic regression. Results indicate that copula-based probability and conditional copula formulas offer promising results, particularly in handling complex dependencies, but with a high computational time, while copula-based GLM, when combined with over-sampling, also outperforms traditional methods.
Copula-based Approaches for Anomaly Detection: a Case-study in Financial Forensics
Tenconi, Vanessa;Tamburri, Damian Andrew;Ennio Quattrocchi, Giovanni;
2024-01-01
Abstract
Fraud detection is a critical challenge in various domains, necessitating accurate and reliable methods to distinguish between legitimate and fraudulent transactions. This work explores the application of copula-based models for anomaly detection in financial forensics. It focuses on their effectiveness in identifying fraudulent activities in a highly imbalanced dataset. Copula models are designed to capture the dependencies between continuous variables, providing a flexible framework for modeling the joint distribution of features. For instance, the variables in a dataset can follow different distributions and the copula is able to model how these variables jointly behave, particularly in extreme cases. In this work, we used a fraud dataset to calculate the copula-based probability of fraud and conditional Gaussian copula. Then, we derive the copula-based Generalized Linear Model (GLM) formula from the conditional copula which is essentially a GLM with probit link and transformed variables when the covariates are continuous. Finally, we compare the performance of these copula-based models with standard methods for predicting a binary variable like GLM with a probit link and logistic regression. Results indicate that copula-based probability and conditional copula formulas offer promising results, particularly in handling complex dependencies, but with a high computational time, while copula-based GLM, when combined with over-sampling, also outperforms traditional methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


