Computer Science > Machine Learning

arXiv:2406.03777 (cs)

[Submitted on 6 Jun 2024 (v1), last revised 13 Jun 2024 (this version, v2)]

Title:Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Authors:Ruiyang Qin, Dancheng Liu, Zheyu Yan, Zhaoxuan Tan, Zixuan Pan, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Jinjun Xiong, Yiyu Shi

View PDF

Abstract:The scaling laws have become the de facto guidelines for designing large language models (LLMs), but they were studied under the assumption of unlimited computing resources for both training and inference. As LLMs are increasingly used as personalized intelligent assistants, their customization (i.e., learning through fine-tuning) and deployment onto resource-constrained edge devices will become more and more prevalent. An urging but open question is how a resource-constrained computing environment would affect the design choices for a personalized LLM. We study this problem empirically in this work. In particular, we consider the tradeoffs among a number of key design factors and their intertwined impacts on learning efficiency and accuracy. The factors include the learning methods for LLM customization, the amount of personalized data used for learning customization, the types and sizes of LLMs, the compression methods of LLMs, the amount of time afforded to learn, and the difficulty levels of the target use cases. Through extensive experimentation and benchmarking, we draw a number of surprisingly insightful guidelines for deploying LLMs onto resource-constrained devices. For example, an optimal choice between parameter learning and RAG may vary depending on the difficulty of the downstream task, the longer fine-tuning time does not necessarily help the model, and a compressed LLM may be a better choice than an uncompressed LLM to learn from limited personalized data.

Comments:	Benckmarking paper
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.03777 [cs.LG]
	(or arXiv:2406.03777v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.03777

Submission history

From: Ruiyang Qin [view email]
[v1] Thu, 6 Jun 2024 06:41:53 UTC (6,125 KB)
[v2] Thu, 13 Jun 2024 17:00:47 UTC (6,125 KB)

Computer Science > Machine Learning

Title:Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators