Fraud detection is a critical challenge in various domains, necessitating accurate and reliable methods to distinguish between legitimate and fraudulent transactions. This work explores the application of copula-based models for anomaly detection in financial forensics. It focuses on their effectiveness in identifying fraudulent activities in a highly imbalanced dataset. Copula models are designed to capture the dependencies between continuous variables, providing a flexible framework for modeling the joint distribution of features. For instance, the variables in a dataset can follow different distributions and the copula is able to model how these variables jointly behave, particularly in extreme cases. In this work, we used a fraud dataset to calculate the copula-based probability of fraud and conditional Gaussian copula. Then, we derive the copula-based Generalized Linear Model (GLM) formula from the conditional copula which is essentially a GLM with probit link and transformed variables when the covariates are continuous. Finally, we compare the performance of these copula-based models with standard methods for predicting a binary variable like GLM with a probit link and logistic regression. Results indicate that copula-based probability and conditional copula formulas offer promising results, particularly in handling complex dependencies, but with a high computational time, while copula-based GLM, when combined with over-sampling, also outperforms traditional methods.

Copula-based Approaches for Anomaly Detection: a Case-study in Financial Forensics

Tenconi, Vanessa;Tamburri, Damian Andrew;Ennio Quattrocchi, Giovanni;
2024-01-01

Abstract

Fraud detection is a critical challenge in various domains, necessitating accurate and reliable methods to distinguish between legitimate and fraudulent transactions. This work explores the application of copula-based models for anomaly detection in financial forensics. It focuses on their effectiveness in identifying fraudulent activities in a highly imbalanced dataset. Copula models are designed to capture the dependencies between continuous variables, providing a flexible framework for modeling the joint distribution of features. For instance, the variables in a dataset can follow different distributions and the copula is able to model how these variables jointly behave, particularly in extreme cases. In this work, we used a fraud dataset to calculate the copula-based probability of fraud and conditional Gaussian copula. Then, we derive the copula-based Generalized Linear Model (GLM) formula from the conditional copula which is essentially a GLM with probit link and transformed variables when the covariates are continuous. Finally, we compare the performance of these copula-based models with standard methods for predicting a binary variable like GLM with a probit link and logistic regression. Results indicate that copula-based probability and conditional copula formulas offer promising results, particularly in handling complex dependencies, but with a high computational time, while copula-based GLM, when combined with over-sampling, also outperforms traditional methods.
2024
Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024
copula-based anomaly detection
fraud detection
generalized linear models
logistic regression
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1285650
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact