RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

In this paper we consider optimization with relaxation, an ample paradigm to make data-driven designs. This approach was previously considered by the same authors of this work in Garatti and Campi (2019), a study that revealed a deep-seated connection between two concepts: risk (probability of not satisfying a new, out-of-sample, constraint) and complexity (according to a definition introduced in paper Garatti and Campi, 2019). This connection was shown to have profound implications in applications because it implied that the risk can be estimated from the complexity, a quantity that can be measured from the data without any knowledge of the data-generation mechanism. In the present work we establish new results. First, we expand the scope of Garatti and Campi (2019) so as to embrace a more general setup that covers various algorithms in machine learning. Then, we study classical support vector methods – including SVM (Support Vector Machine), SVR (Support Vector Regression) and SVDD (Support Vector Data Description) – and derive new results for the ability of these methods to generalize. All results are valid for any finite size of the data set. When the sample size tends to infinity, we establish the unprecedented result that the risk approaches the ratio between the complexity and the cardinality of the data sample, regardless of the value of the complexity.

A theory of the risk for optimization with relaxation and its application to support vector machines

Campi M. C.;Garatti S.

2021-01-01

Abstract

In this paper we consider optimization with relaxation, an ample paradigm to make data-driven designs. This approach was previously considered by the same authors of this work in Garatti and Campi (2019), a study that revealed a deep-seated connection between two concepts: risk (probability of not satisfying a new, out-of-sample, constraint) and complexity (according to a definition introduced in paper Garatti and Campi, 2019). This connection was shown to have profound implications in applications because it implied that the risk can be estimated from the complexity, a quantity that can be measured from the data without any knowledge of the data-generation mechanism. In the present work we establish new results. First, we expand the scope of Garatti and Campi (2019) so as to embrace a more general setup that covers various algorithms in machine learning. Then, we study classical support vector methods – including SVM (Support Vector Machine), SVR (Support Vector Regression) and SVDD (Support Vector Data Description) – and derive new results for the ability of these methods to generalize. All results are valid for any finite size of the data set. When the sample size tends to infinity, we establish the unprecedented result that the risk approaches the ratio between the complexity and the cardinality of the data sample, regardless of the value of the complexity.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
			2021
		
	Titolo della rivista
	
			JOURNAL OF MACHINE LEARNING RESEARCH
		
	Parole chiave
	
			Generalization
Optimization
Optimization with relaxation
Risk quantification
Support vector machines
		
	Appare nelle tipologie:
	
			01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
21-0641.pdf accesso aperto : Publisher’s version Dimensione 1.18 MB Formato Adobe PDF Visualizza/Apri	1.18 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1205220

Citazioni

ND

7

ND

social impact