RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Fraud detection is a critical challenge in various domains, necessitating accurate and reliable methods to distinguish between legitimate and fraudulent transactions. This work explores the application of copula-based models for anomaly detection in financial forensics. It focuses on their effectiveness in identifying fraudulent activities in a highly imbalanced dataset. Copula models are designed to capture the dependencies between continuous variables, providing a flexible framework for modeling the joint distribution of features. For instance, the variables in a dataset can follow different distributions and the copula is able to model how these variables jointly behave, particularly in extreme cases. In this work, we used a fraud dataset to calculate the copula-based probability of fraud and conditional Gaussian copula. Then, we derive the copula-based Generalized Linear Model (GLM) formula from the conditional copula which is essentially a GLM with probit link and transformed variables when the covariates are continuous. Finally, we compare the performance of these copula-based models with standard methods for predicting a binary variable like GLM with a probit link and logistic regression. Results indicate that copula-based probability and conditional copula formulas offer promising results, particularly in handling complex dependencies, but with a high computational time, while copula-based GLM, when combined with over-sampling, also outperforms traditional methods.

Copula-based Approaches for Anomaly Detection: a Case-study in Financial Forensics

Tenconi, Vanessa;Tamburri, Damian Andrew;Ennio Quattrocchi, Giovanni;Pellegrino, Corrado;Cascavilla, Giuseppe;Van den Heuvel, Willem-Jan

2024-01-01

Abstract

Fraud detection is a critical challenge in various domains, necessitating accurate and reliable methods to distinguish between legitimate and fraudulent transactions. This work explores the application of copula-based models for anomaly detection in financial forensics. It focuses on their effectiveness in identifying fraudulent activities in a highly imbalanced dataset. Copula models are designed to capture the dependencies between continuous variables, providing a flexible framework for modeling the joint distribution of features. For instance, the variables in a dataset can follow different distributions and the copula is able to model how these variables jointly behave, particularly in extreme cases. In this work, we used a fraud dataset to calculate the copula-based probability of fraud and conditional Gaussian copula. Then, we derive the copula-based Generalized Linear Model (GLM) formula from the conditional copula which is essentially a GLM with probit link and transformed variables when the covariates are continuous. Finally, we compare the performance of these copula-based models with standard methods for predicting a binary variable like GLM with a probit link and logistic regression. Results indicate that copula-based probability and conditional copula formulas offer promising results, particularly in handling complex dependencies, but with a high computational time, while copula-based GLM, when combined with over-sampling, also outperforms traditional methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2024
			
	Titolo del libro
	
				Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024
			
	Parole chiave
	
				copula-based anomaly detection
fraud detection
generalized linear models
logistic regression
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1285650

Citazioni

ND

0

ND

social impact