Language networks are crucial in artificial intelligence, with the novel Mamba architecture significantly reducing computations and consumption compared to the traditional transformer network. However, a full-circuit implementation of the Mamba network has not been proposed due to the complexity of computations and data storage. Additionally, optimized hardware-aware parallel algorithms for Mamba inference in circuits remain undeveloped. This work addresses these challenges by presenting a memristor-based full-circuit implementation of the Mamba network and introducing a computing-in-memory parallel-aware algorithm tailored for circuit-level inference. The implementation includes: 1) Standard 1T1M memristor crossbar and depthwise separable convolution memristor crossbar for different convolutions. 2) Computing-in-memory implicit latent state circuits for the computation and transition of latent states. 3) Functional circuits for SiLU activation, RMS normalization, and multi-layer multiply-accumulate operations. 4) Optimized algorithm and circuit implementation for hardware-aware inference, achieving parallel scanning and hardware awareness in circuits. The proposed circuit enables analog signal computations and eliminates redundant analog-to-digital conversions and intermediate storage. A basic single-sentence generation task was simulated in PSPICE, validating the circuit's correctness. Analyses of analog computation accuracy, circuit stability, and power consumption demonstrate the proposed circuit's advantages, highlighting its potential as a fundamental module for large-scale circuit integration and complex text generation tasks.
Memristor-based circuit implementation and circuitry optimized algorithm for Mamba language network
Liangyu Chen;
2025-01-01
Abstract
Language networks are crucial in artificial intelligence, with the novel Mamba architecture significantly reducing computations and consumption compared to the traditional transformer network. However, a full-circuit implementation of the Mamba network has not been proposed due to the complexity of computations and data storage. Additionally, optimized hardware-aware parallel algorithms for Mamba inference in circuits remain undeveloped. This work addresses these challenges by presenting a memristor-based full-circuit implementation of the Mamba network and introducing a computing-in-memory parallel-aware algorithm tailored for circuit-level inference. The implementation includes: 1) Standard 1T1M memristor crossbar and depthwise separable convolution memristor crossbar for different convolutions. 2) Computing-in-memory implicit latent state circuits for the computation and transition of latent states. 3) Functional circuits for SiLU activation, RMS normalization, and multi-layer multiply-accumulate operations. 4) Optimized algorithm and circuit implementation for hardware-aware inference, achieving parallel scanning and hardware awareness in circuits. The proposed circuit enables analog signal computations and eliminates redundant analog-to-digital conversions and intermediate storage. A basic single-sentence generation task was simulated in PSPICE, validating the circuit's correctness. Analyses of analog computation accuracy, circuit stability, and power consumption demonstrate the proposed circuit's advantages, highlighting its potential as a fundamental module for large-scale circuit integration and complex text generation tasks.| File | Dimensione | Formato | |
|---|---|---|---|
|
Memristor-Based_Circuit_Implementation_and_Circuitry_Optimized_Algorithm_for_Mamba_Language_Network.pdf
Accesso riservato
Dimensione
3.31 MB
Formato
Adobe PDF
|
3.31 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


