RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The health impacts of repeated exposure to distressing concepts such as child exploitation materials (CEM, aka ‘child pornography’) have become a major concern to law enforcement agencies and associated entities. Existing methods for ‘flagging’ materials largely rely upon prior knowledge, whilst predictive methods are unreliable, particularly when compared with equivalent tools used for detecting ‘lawful’ pornography. In this paper we detail the design and implementation of a deep-learning based CEM classifier, leveraging existing pornography detection methods to overcome infrastructure and corpora limitations in this field. Specifically, we further existing research through direct access to numerous contemporary, real-world, annotated cases taken from Australian Federal Police holdings, demonstrating the dangers of overfitting due to the influence of individual users’ proclivities. We quantify the performance of skin tone analysis in CEM cases, showing it to be of limited use. We assess the performance of our classifier and show it to be sufficient for use in forensic triage and ‘early warning’ of CEM, but of limited efficacy for categorising against existing scales for measuring child abuse severity. We identify limitations currently faced by researchers and practitioners in this field, whose restricted access to training material is exacerbated by inconsistent and unsuitable annotation schemas. Whilst adequate for their intended use, we show existing schemas to be unsuitable for training machine learning (ML) models, and introduce a new, flexible, objective, and tested annotation schema specifically designed for cross-jurisdictional collaborative use. This work, combined with a world-first ‘illicit data airlock’ project currently under construction, has the potential to bring a ‘ground truth’ dataset and processing facilities to researchers worldwide without compromising quality, safety, ethics and legality.

Laying foundations for effective machine learning in law enforcement. Majura – A labelling schema for child exploitation materials

Dalins J.;Tyshetskiy Y.;Wilson C.;Carman M. J.;Boudry D.

2018-01-01

Abstract

The health impacts of repeated exposure to distressing concepts such as child exploitation materials (CEM, aka ‘child pornography’) have become a major concern to law enforcement agencies and associated entities. Existing methods for ‘flagging’ materials largely rely upon prior knowledge, whilst predictive methods are unreliable, particularly when compared with equivalent tools used for detecting ‘lawful’ pornography. In this paper we detail the design and implementation of a deep-learning based CEM classifier, leveraging existing pornography detection methods to overcome infrastructure and corpora limitations in this field. Specifically, we further existing research through direct access to numerous contemporary, real-world, annotated cases taken from Australian Federal Police holdings, demonstrating the dangers of overfitting due to the influence of individual users’ proclivities. We quantify the performance of skin tone analysis in CEM cases, showing it to be of limited use. We assess the performance of our classifier and show it to be sufficient for use in forensic triage and ‘early warning’ of CEM, but of limited efficacy for categorising against existing scales for measuring child abuse severity. We identify limitations currently faced by researchers and practitioners in this field, whose restricted access to training material is exacerbated by inconsistent and unsuitable annotation schemas. Whilst adequate for their intended use, we show existing schemas to be unsuitable for training machine learning (ML) models, and introduce a new, flexible, objective, and tested annotation schema specifically designed for cross-jurisdictional collaborative use. This work, combined with a world-first ‘illicit data airlock’ project currently under construction, has the potential to bring a ‘ground truth’ dataset and processing facilities to researchers worldwide without compromising quality, safety, ethics and legality.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2018
			
	Titolo della rivista
	
				DIGITAL INVESTIGATION
			
	Parole chiave
	
				Annotation schema
Digital forensics
Forensic triage
Neural networks
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1208864

Citazioni

ND

14

10

social impact