RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

We work on the problem of recognizing license plates and street signs automatically in challenging conditions such as chaotic traffic. We leverage state-of-the-art text spotters to generate a large amount of noisy labeled training data. The data is filtered using a pattern derived from domain knowledge. We augment training and testing data with interpolated boxes and annotations that makes our training and testing robust. We further use synthetic data during training to increase the coverage of the training data. We train two different models for recognition. Our baseline is a conventional Convolution Neural Network (CNN) encoder followed by a Recurrent Neural Network (RNN) decoder. As our first contribution, we bypass the detection phase by augmenting the baseline with an Attention mechanism in the RNN decoder. Next, we build in the capability of training the model end-to-end on scenes containing license plates by incorporating inception based CNN encoder that makes the model robust to multiple scales. We achieve improvements of 3.75% at the sequence level, over the baseline model. We present the first results of using multi-headed attention models on text recognition in images and illustrate the advantages of using multiple-heads over a single head. We observe gains as large as 7.18% by incorporating multi-headed attention. We also experiment with multi-headed attention models on French Street Name Signs dataset (FSNS) and a new Indian Street dataset that we release for experiments. We observe that such models with multiple attention masks perform better than the model with single-headed attention on three different datasets with varying complexities. Our models outperform state-of-the-art methods on FSNS and IIIT-ILST Devanagari datasets by 1.1% and 8.19% respectively.

OCR on-the-go: Robust end-to-end systems for reading license plates & street signs

Saluja R.;Maheshwari A.;Ramakrishnan G.;Chaudhuri P.;Carman M.

2019-01-01

Abstract

We work on the problem of recognizing license plates and street signs automatically in challenging conditions such as chaotic traffic. We leverage state-of-the-art text spotters to generate a large amount of noisy labeled training data. The data is filtered using a pattern derived from domain knowledge. We augment training and testing data with interpolated boxes and annotations that makes our training and testing robust. We further use synthetic data during training to increase the coverage of the training data. We train two different models for recognition. Our baseline is a conventional Convolution Neural Network (CNN) encoder followed by a Recurrent Neural Network (RNN) decoder. As our first contribution, we bypass the detection phase by augmenting the baseline with an Attention mechanism in the RNN decoder. Next, we build in the capability of training the model end-to-end on scenes containing license plates by incorporating inception based CNN encoder that makes the model robust to multiple scales. We achieve improvements of 3.75% at the sequence level, over the baseline model. We present the first results of using multi-headed attention models on text recognition in images and illustrate the advantages of using multiple-heads over a single head. We observe gains as large as 7.18% by incorporating multi-headed attention. We also experiment with multi-headed attention models on French Street Name Signs dataset (FSNS) and a new Indian Street dataset that we release for experiments. We observe that such models with multiple attention masks perform better than the model with single-headed attention on three different datasets with varying complexities. Our models outperform state-of-the-art methods on FSNS and IIIT-ILST Devanagari datasets by 1.1% and 8.19% respectively.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2019
			
	Titolo del libro
	
				Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
			
	Titolo della collana
	
				PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION
			
	ISBN (International Standard Book Number)
	
				978-1-7281-3014-9
			
	Parole chiave
	
				Automatic license plate recognition
Convolution Neural Networks (CNN)
Multi head attention
Pattern recognition
Recurrent Neural Network (RNN)
Scene text
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1171153

Citazioni

ND

16

ND

social impact