RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Language networks are crucial in artificial intelligence, with the novel Mamba architecture significantly reducing computations and consumption compared to the traditional transformer network. However, a full-circuit implementation of the Mamba network has not been proposed due to the complexity of computations and data storage. Additionally, optimized hardware-aware parallel algorithms for Mamba inference in circuits remain undeveloped. This work addresses these challenges by presenting a memristor-based full-circuit implementation of the Mamba network and introducing a computing-in-memory parallel-aware algorithm tailored for circuit-level inference. The implementation includes: 1) Standard 1T1M memristor crossbar and depthwise separable convolution memristor crossbar for different convolutions. 2) Computing-in-memory implicit latent state circuits for the computation and transition of latent states. 3) Functional circuits for SiLU activation, RMS normalization, and multi-layer multiply-accumulate operations. 4) Optimized algorithm and circuit implementation for hardware-aware inference, achieving parallel scanning and hardware awareness in circuits. The proposed circuit enables analog signal computations and eliminates redundant analog-to-digital conversions and intermediate storage. A basic single-sentence generation task was simulated in PSPICE, validating the circuit's correctness. Analyses of analog computation accuracy, circuit stability, and power consumption demonstrate the proposed circuit's advantages, highlighting its potential as a fundamental module for large-scale circuit integration and complex text generation tasks.

Memristor-based circuit implementation and circuitry optimized algorithm for Mamba language network

Junming Zhang;Zheyuan Sheng;Huajun Sun;Chuanbo Zhu;Liangyu Chen;Zhenyu Hu;Xiangshui Miao

2025-01-01

Abstract

Language networks are crucial in artificial intelligence, with the novel Mamba architecture significantly reducing computations and consumption compared to the traditional transformer network. However, a full-circuit implementation of the Mamba network has not been proposed due to the complexity of computations and data storage. Additionally, optimized hardware-aware parallel algorithms for Mamba inference in circuits remain undeveloped. This work addresses these challenges by presenting a memristor-based full-circuit implementation of the Mamba network and introducing a computing-in-memory parallel-aware algorithm tailored for circuit-level inference. The implementation includes: 1) Standard 1T1M memristor crossbar and depthwise separable convolution memristor crossbar for different convolutions. 2) Computing-in-memory implicit latent state circuits for the computation and transition of latent states. 3) Functional circuits for SiLU activation, RMS normalization, and multi-layer multiply-accumulate operations. 4) Optimized algorithm and circuit implementation for hardware-aware inference, achieving parallel scanning and hardware awareness in circuits. The proposed circuit enables analog signal computations and eliminates redundant analog-to-digital conversions and intermediate storage. A basic single-sentence generation task was simulated in PSPICE, validating the circuit's correctness. Analyses of analog computation accuracy, circuit stability, and power consumption demonstrate the proposed circuit's advantages, highlighting its potential as a fundamental module for large-scale circuit integration and complex text generation tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo della rivista
	
				IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS. I, REGULAR PAPERS
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

File	Dimensione	Formato
Memristor-Based_Circuit_Implementation_and_Circuitry_Optimized_Algorithm_for_Mamba_Language_Network.pdf Accesso riservato Dimensione 3.31 MB Formato Adobe PDF Visualizza/Apri	3.31 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1296001

Citazioni

ND

0

0

social impact