Pattern Classification Techniques for Lung Cancer Diagnosis by an Electronic Nose.

Blatt, Rossella; Bonarini, Andrea; Matteucci, Matteo

doi:10.1007/978-3-642-14464-6_18

Computational intelligence techniques can be implemented to analyze the olfactory signal as perceived by an electronic nose, and to detect information to diagnose a multitude of human diseases. Our research suggests the use of an electronic nose to diagnose lung cancer. An electronic nose is able to acquire and recognize the volatile organic compounds (VOCs) present in the analyzed substance: it is composed of an array of electronic, chemical sensors, and a pattern classification module based on computational intelligence techniques. The three main stages characterizing the basic functioning of an electronic nose are: acquisition, preprocessing and pattern analysis. In the lung cancer detection experimentation, we analyzed 104 breath samples of 52 subjects, 22 healthy subjects and 30 patients with primary lung cancer at different stages. In order to find the best classification model able to discriminate between the two classes healthy and lung cancer subjects, and to reduce the dimensionality of the problem, we implemented a genetic algorithm (GA) that can find the best combination of feature selection, feature projection and classifier algorithms to be used. In particular, for feature projection, we considered Principal Component Analysis (PCA), Fisher Linear Discriminant Analysis (LDA) and Non Parametric Linear Discriminant Analysis (NPLDA); classification has been performed implementing several supervised pattern classification algorithms, based on different k-Nearest Neighbors (k-NN) approaches (classic, modified and fuzzy k-NN), on linear and quadratic discriminant functions classifiers and on a feed-forward Artificial Neural Network (ANN). The best solution provided from the genetic algorithm has been the projection of a subset of features into a single component using the Fisher Linear Discriminant Analysis and a classification based on the k-Nearest Neighbors method. The observed results, all validated using cross-validation, have been excellent achieving an average accuracy of 96.2%, an average sensitivity of 93.3% and an average specificity of 100%, as well as very small confidence intervals. We also investigated the possibility of performing early diagnosis, building a model able to predict a sample belonging to a subject with primary lung cancer at stage I compared to healthy subjects. Also in this analysis results have been very satisfactory, achieving an average accuracy of 92.85%, an average sensitivity of 75.5% and an average specificity of 97.72%. The achieved results demonstrate that the electronic nose, combined with the appropriate computational intelligence methodologies, is a promising alternative to current lung cancer diagnostic techniques: not only the instrument is completely non invasive, but the obtained predictive errors are lower than those achieved by present diagnostic methods, and the cost of the analysis, both in money, time and resources, is lower. The introduction of this cutting edge technology will lead to very important social and business effects: its low price and small dimensions allow a large scale distribution, giving the opportunity to perform non invasive, cheap, quick, and massive early diagnosis and screening.