How to Select SBC (Lattepanda/Raspberry Pi) for Local LLM (LLaMA, LLaMA2, Phi-2, Mixtral-MOE, etc.)

DFRobot Mar 29 2024 4486

When selecting a single-board computer (SBC) for a local large language model (LLM), several factors must be considered, including performance, resource requirements, hardware features, and budget. This article will introduce how to choose between Lattepanda 3 Delta, Lattepanda Sigma, Raspberry Pi 4, and Raspberry Pi 5 to meet your LLM application needs.

Hardware Comparison

Lattepanda 3 Delta and Lattepanda Sigma utilize Intel processors based on the x86 architecture, offering higher performance. Raspberry Pi 4 and Raspberry Pi 5 use processors based on the ARM architecture, providing lower performance suitable for lightweight LLM tasks.

1. Lattepanda 3 Delta is a powerful single-board computer equipped with an x86 architecture Intel processor:

Supports Windows 10 & 11, Linux, Android x86 OS systems
Compatible with 200+ sensors and actuators

2. Lattepanda Sigma features a more robust hardware configuration capable of handling more complex tasks and applications:

Supports Windows 10 & 11, Linux
Similar to Lattepanda 3 Delta in terms of software and hardware ecosystem, with community and developer support, and abundant tutorials and third-party software libraries available.

3. Raspberry Pi 4B is a popular ARM architecture single-board computer:

Supports Raspberry Pi OS, Ubuntu
Offers a wide range of software and tools from the Raspberry Pi official software repository and other third-party software sources
Many sensors and hardware manufacturers have already adapted their products for its hardware.

4. Raspberry Pi 5:

Supports Raspberry Pi OS, Ubuntu
Compared to Raspberry Pi 4, the hardware ecosystem of Raspberry Pi 5 is less mature; however, manufacturers, communities, and developers will provide longer-term software and hardware updates and support for Raspberry Pi 5.

Comparison Table of LattePanda and Raspberry Pi Single-board Computers

Product Name

LattePanda 3 Delta 864 - The Fastest Pocket-sized Windows/Linux Single Board Computer (8GB RAM/64GB eMMC)

LattePanda Sigma - x86 Windows / Linux Single Board Computer Server (32GB RAM, 500GB SSD, WiFi 6E)

Raspberry Pi 4 Model B - 8GB

Raspberry Pi 5 Single Board Computer - 8GB

Figure

SKU

DFR0981

DFR1091

DFR0697

DFR1119

Processor

Intel® Celeron® N5105

Intel® Core™ i5-1340P

Broadcom BCM2711

Broadcom BCM2712

Core

2.0-2.9GHz Quad-Core, Four-Thread

12-Core, 16-Thread, 12M Cache Up to 4.60 GHz (Performance-Core), 3.40 GHz (Efficient-Core)

Quad core Cortex-A72 (ARM v8) 64-bit @ 1.8 GHz

Quad-Core Cortex-A76 (ARM v8) 64-bit @ 2.4 GHz

Graphics

Intel® UHD Graphics (Frequency: 450 – 800MHz)

Intel® Iris® Xe Graphics 80 Execution Units, up to 1.45 GHz

VideoCore VI @ 500 MHz Supports: OpenGL ES 3.1, Vulkan 1.0

VideoCore VII @ 800 MHz Supports: OpenGL ES 3.1, Vulkan 1.2

Memory

LPDDR4 8GB 2933MHz

Up to 32GB, Dual-Channel LPDDR5-6400MHz

LPDDR4-3200 SDRAM 1GB, 2GB, 4GB or 8GB

LPDDR4X-4267 SDRAM 4GB, or 8GB

Storage

64GB eMMC

M.2 NVMe/SATA SSD (Separately Installed)

Micro SD

Micro SD (SDR104 Compatible) M.2 NVME SSD Support via HAT

Wireless

802.11ax, 2.4G & 5G(160MHz), Up to 2.4Gbps Bluetooth 5.2

· 2 x 2.5GbE RJ45 Ports (Intel® i225-V) · M.2 Wireless Module (Separately Installed)

Dua-Band 802.11ac Bluetooth 5 / BLE Gigabit Ethernet PoE via POE + Hat

Dua-Band 802.11ac Bluetooth 5 / BLE Gigabit Ethernet PoE via POE + Hat (Incompatible with old version)

Expansion Slots

· 1x M.2 M Key, PCIe 3.0 2x, Supports NVMe SSD · 1x M.2 B Key, PCIe 3.0 1x, Supports USB 2.0, USB 3.0, SATA, SIM

· M.2M Key: PCIe 3.0 x 4 · M.2M Key: PCIe 4.0 x 4 · M.2 B Key: SATA III/PCIe 3.0 x 1, USB2.0, USB3.0, SIM · M.2 E Key: PCIe 3.0 x 1, USB2.0, Intel CNVio · Micro SIM Card Slot

2-lane MIPI DSI Display Port 2-lane MIPI CSI Camera Port 4-Pole Stereo Audio and Composite Video Port

2 x 4-lane MIPI camera / display transceivers PCIe 2.0 x1 Interface UART Breakout RTC Clock Power 4-Pin FAN Power

Price

$279

$579(16GB), $629(32GB)

$75

$80

Add to Cart

Resource and Memory Constraints

LLM usually requires a large amount of memory to store model parameters and intermediate calculation results. For example, a model with billions of parameters may require tens of gigabytes or more of memory. On limited system memory, you can consider using model compression techniques, and quantization to reduce the size of the model, and ensure that the selected SBC has sufficient memory capacity to be able to load and run the LLM model.

A powerful CPU is crucial to handle the inference and training of LLM, and hardware accelerators such as GPU can significantly improve the training and inference speed of LLM. Although it is possible to run LLM with only SBC's CPU, its performance may not be comparable to that of a GPU or dedicated acceleration hardware. Therefore, when choosing between an SBC and an LLM, make sure its CPU performance is powerful enough to handle the computational load required by the LLM.

SBC with smaller memory may not be able to simultaneously store the LLM model and its parameters, conversation history, input data, and intermediate results during inference. After multiple rounds of dialogue, the memory has been exhausted, which may cause the LLM program to crash.

Lower resource and memory constraints may cause LLM to perform slower inference when processing long texts. CPU with smaller memory faces multiple performance bottlenecks when processing LLMs, and these bottlenecks work together to slow down token processing.

Model	File Size
phi-2-Q4	1.7GB
Alpaca-7B-Q4	< 4GB
LLaMA-7B-Q4	< 4GB
LLaMA2-7B-Q4	< 7GB
LLaMA-13B-Q4	< 8GB
mixtral_7bx2_moe_Q4	< 8GB
mamba-gpt-7b	<13GB
ChatGLM-6B-Q4	13GB

Considering the memory and storage requirements of LLMs, Lattepanda Sigma typically offers larger memory and storage capacities, making it better suited to support LLM operations. Raspberry Pi 4 and Raspberry Pi 5, on the other hand, have relatively smaller memory and storage capacities, requiring adaptation for LLMs with smaller memory, such as phi-2.

Deployment Discrepancies

We utilize the LLaMA.cpp and CPU for LLM inference. For LLM original model files in .pth format, they need to be quantized into GGUF format before running on the CPU. Considering the memory constraints of SBCs, we quantize GGUF model files into int4 format. For phi-2, an original model of approximately 6GB is reduced to only 1.6GB after Q4 quantization.

For LattePanda 3 Delta, Raspberry Pi 4, and Raspberry Pi 5, once you have selected a suitable LLM, due to memory limitations, you need to first download the LLM original model on another Linux PC and perform quantization before copying the quantized model to the SBC for LLM execution. However, on Lattepanda Sigma, you can directly download the LLM model and perform quantization.

LLM token speed

Real-time Performance:

Assessing the SBC's performance in handling real-time language tasks, including response time and processing latency.

Benchmark for LP 3 delta， LP Sigma， Raspberry Pi 4B， Raspberry Pi 5

Model	File size	LattePanda 3 Delta 8GB	LattePanda Sigma 32GB	Raspberry Pi 4B 8GB	Raspberry Pi 5 8GB
llama2-7b-Q4	<7GB	2.55 tokens/s	6 token/s	0.1 tokens/s	2.3 tokens/s

Comparing SBCs with the same LLM model as the standard, it is evident from the above table that Lattepanda Sigma is significantly better than the other SBC.

Test for Raspberry Pi 5 (8GB) & LLM

For specific deployment steps, please refer to the following:

Deploy and run LLM on Raspberry Pi 5 vs Raspberry Pi 4B (LLaMA, LLaMA2, Phi-2, Mixtral-MOE, mamba-gp

Model	File Size	Compatibility	Out of Memory	Token Speed
phi-2-Q4	1.7GB	√		5.13 tokens/s
LLaMA-7B-Q4	< 4GB	√		2.2 tokens/s
LLaMA2-7B-Q4	< 7GB	√		2.3 tokens/s
LLaMA2-13B-Q4	< 4GB	√		2.02 tokens/s
mixtral_7bx2_moe_Q4	<8GB	√		use llama.cpp <1 tokens/s
mamba-gpt-7b	<13GB		√

Test for Raspberry Pi 4B (8GB) & LLM

For specific deployment steps, please refer to the following:

Deploy and run LLM on Raspberry Pi 4B (LLaMA, Alpaca, LLaMA2, ChatGLM)

Model	File Size	Compatibility	Out Of Memory	Token Speed
LLaMA-7B-Q4	< 4GB	√		~0.1 token/s
Alpaca-7B-Q4	< 4 GB	√
LLaMA2-7B-chat-hf-Q4	< 7GB	√
LLaMA-13B-Q4	< 8GB		√
ChatGLM-6B-Q4	13GB		√

Test for Lattepanda 3 Delta 864 (8GB) & LLM

For specific deployment steps, please refer to the following:

Deploy and run LLM on Lattepanda 3 Delta 864 (LLaMA, LLaMA2, Phi-2, ChatGLM2)

Model	File Size	Compatibility	Out of Memory	Token Speed
phi-2-Q4	1.7GB	√		5.48 tokens/s
LLaMA-7B-chat-Q4	<4GB	√		2.55 tokens/s
LLaMA2-7B-chat-Q4	<7GB	√		2.56 tokens/s
LLaMA2-13B-Q4	~4GB	√		2.51 tokens/s
ChatGL2-6B-Q4	<4GB	√		<1.5 tokens/s
mamba-gpt-7b	<13GB		√

Test for Lattepanda Sigma (32GB) CPU & LLM

For specific deployment steps, please refer to the following:

Deploy and run LLM on LattePanda Sigma (LLaMA, Alpaca, LLaMA2, ChatGLM)

Model	File Size	Compatibility	Token Speed
LLaMA-7B-chat-Q4	<4GB	√	5 tokens/s
Alpaca-7B-Q4	<4GB	√	5 tokens/s
LLaMA2-7B-chat-Q4	<7GB	√	6 tokens/s
LLaMA-13B-Q4	<8GB	√	2 tokens/s
ChatGLM-6B-Q4	13GB	√	1 tokens/s

Summary

In summary, when selecting a SBC suitable for local LLMs, several factors need consideration.

In terms of hardware configuration, Lattepanda 3 Delta is equipped with an Intel Celeron N5105 processor, 8GB memory, and 64GB eMMC storage. Lattepanda Sigma features a more potent Intel Core i5-1340P processor, supporting up to 32GB LPDDR5 memory and M.2 NVMe/SATA SSD storage. Raspberry Pi 4B and Raspberry Pi 5 are equipped with Broadcom BCM2711 and BCM2712 processors, offering relatively smaller memory and storage capacities.

Considering resource and memory constraints, LLMs typically demand substantial memory to store model parameters and intermediate computation results. Therefore, it's crucial to ensure that the chosen SBC has a sufficiently large memory capacity. Due to its expandable memory and storage capacity, Lattepanda Sigma can better support LLM operations compared to Raspberry Pi 4B and Raspberry Pi 5, which may require adaptation for LLM models with smaller memory footprints.

Regarding deployment, Lattepanda Sigma allows direct download and quantization of LLM models. However, for Lattepanda 3 Delta, Raspberry Pi 4, and Raspberry Pi 5, you need to download the LLM original model on another Linux PC and perform quantization before transferring the quantized model to the SBC for execution.

Finally, model inference speed is a crucial performance metric for assessing SBCs. According to test results, Raspberry Pi 5 exhibits a significant improvement in processing speed compared to Raspberry Pi 4B. Particularly noteworthy is the outstanding performance of phi-2-Q4 on Raspberry Pi 5, with an evaluation time speed of 5.13 tokens/s. However, due to RAM capacity limitations, both Raspberry Pi 5 and Raspberry Pi 4B may still encounter constraints when processing large-scale LLMs. Lattepanda 3 Delta demonstrates slightly better LLM performance than Raspberry Pi 4B and Raspberry Pi 5. Nonetheless, Lattepanda Sigma provides higher performance, achieving speeds of up to 6 tokens/s when running llama2-7b-Q4, meeting the requirements of applications demanding more from LLMs, albeit at a higher price, making it suitable for those with larger budgets.

Running LLMs can lead to high CPU loads, generating considerable heat. It's essential to ensure that the SBC has adequate cooling measures in place to prevent overheating and maintain system stability. Lattepanda 3 Delta includes an active cooling system with a small fan to help dissipate heat, while users may need to ensure sufficient ventilation space to prevent overheating. Lattepanda Sigma comes equipped with a cooling fan to maintain proper processor temperature under load. Raspberry Pi 4B utilizes the Broadcom BCM2711 SoC and, although lacking a built-in fan, users can purchase official or third-party heat sinks, especially when running high-load applications or operating in high-temperature environments.