Computer Science > Artificial Intelligence

arXiv:2406.13399 (cs)

[Submitted on 19 Jun 2024]

Title:VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

Authors:Zhi Yao, Zhiqing Tang, Jiong Lou, Ping Shen, Weijia Jia

Abstract:The Large Language Model (LLM) has gained significant popularity and is extensively utilized across various domains. Most LLM deployments occur within cloud data centers, where they encounter substantial response delays and incur high costs, thereby impacting the Quality of Services (QoS) at the network edge. Leveraging vector database caching to store LLM request results at the edge can substantially mitigate response delays and cost associated with similar requests, which has been overlooked by previous research. Addressing these gaps, this paper introduces a novel Vector database-assisted cloud-Edge collaborative LLM QoS Optimization (VELO) framework. Firstly, we propose the VELO framework, which ingeniously employs vector database to cache the results of some LLM requests at the edge to reduce the response time of subsequent similar requests. Diverging from direct optimization of the LLM, our VELO framework does not necessitate altering the internal structure of LLM and is broadly applicable to diverse LLMs. Subsequently, building upon the VELO framework, we formulate the QoS optimization problem as a Markov Decision Process (MDP) and devise an algorithm grounded in Multi-Agent Reinforcement Learning (MARL) to decide whether to request the LLM in the cloud or directly return the results from the vector database at the edge. Moreover, to enhance request feature extraction and expedite training, we refine the policy network of MARL and integrate expert demonstrations. Finally, we implement the proposed algorithm within a real edge system. Experimental findings confirm that our VELO framework substantially enhances user satisfaction by concurrently diminishing delay and resource consumption for edge users utilizing LLMs.

Comments:	to be published in IEEE ICWS 2024
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.13399 [cs.AI]
	(or arXiv:2406.13399v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2406.13399

Submission history

From: Zhiqing Tang [view email]
[v1] Wed, 19 Jun 2024 09:41:37 UTC (5,542 KB)

Computer Science > Artificial Intelligence

Title:VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:VELO: A Vector Database-Assisted Cloud-Edge Collaborative LLM QoS Optimization Framework

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators