Background: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. Methods: We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients' casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. Results: The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients' population and the hospitals' characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. Conclusions: The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most 'cautious' fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation.

Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases

Michela Carlotta Massi;Francesca Ieva;Emanuele Lettieri
2020-01-01

Abstract

Background: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. Methods: We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients' casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. Results: The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients' population and the hospitals' characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. Conclusions: The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most 'cautious' fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation.
2020
Administrative Databases
DRG
Data mining
Fraud
Upcoding
File in questo prodotto:
File Dimensione Formato  
s12911-020-01143-9 (1).pdf

accesso aperto

Descrizione: Articolo principale
: Publisher’s version
Dimensione 1.79 MB
Formato Adobe PDF
1.79 MB Adobe PDF Visualizza/Apri
12911_2020_1143_MOESM1_ESM.pdf

accesso aperto

Descrizione: Materiale Supplementare
: Publisher’s version
Dimensione 422.08 kB
Formato Adobe PDF
422.08 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1145606
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 10
social impact