RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

The advent of Large Language Models (LLMs) has changed the way we process information today and has unlocked new ways of delivering intelligence to the user. One of the ways of interfacing with AI is via smart assistants and chatbots that take multimodal inputs. However, the diversity of input tasks imply the possibility of both latency-critical and complex input instructions for AI assistants. Further, LLMs cannot be deployed on the edge for low-latency outputs, as that presents challenges due to their high computational demands and memory requirements. This work explores such trade-offs and contributes a smart LLM selection policy, called SELA, that leverages a suite of LLM models with disparate characteristics to optimize overall quality of service (QoS). SELA uses a time-criticality and complexity predictor at the edge to identify the optimal LLM choice for a given input instruction. Experiments on public instruction benchmarks demonstrate that SELA provides 9% to 62% higher QoS scores compared to the state-of-the-art selection policies.

SELA: Smart Edge LLM Agent to Optimize Response Trade-offs of AI Assistants

Tuli, Shreshth;Casale, Giuliano;Roveri, Manuel

2025-01-01

Abstract

The advent of Large Language Models (LLMs) has changed the way we process information today and has unlocked new ways of delivering intelligence to the user. One of the ways of interfacing with AI is via smart assistants and chatbots that take multimodal inputs. However, the diversity of input tasks imply the possibility of both latency-critical and complex input instructions for AI assistants. Further, LLMs cannot be deployed on the edge for low-latency outputs, as that presents challenges due to their high computational demands and memory requirements. This work explores such trade-offs and contributes a smart LLM selection policy, called SELA, that leverages a suite of LLM models with disparate characteristics to optimize overall quality of service (QoS). SELA uses a time-criticality and complexity predictor at the edge to identify the optimal LLM choice for a given input instruction. Experiments on public instruction benchmarks demonstrate that SELA provides 9% to 62% higher QoS scores compared to the state-of-the-art selection policies.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo della rivista
	
				PROCEEDINGS OF THE ACM ON INTERACTIVE, MOBILE, WEARABLE AND UBIQUITOUS TECHNOLOGIES
			
	Parole chiave
	
				AI Assistants
Cloud Computing
LLMs
Wearables
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
3749483.pdf Accesso riservato Dimensione 987.22 kB Formato Adobe PDF Visualizza/Apri	987.22 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1309037

Citazioni

ND

0

0

social impact