Efficient Task Scheduling Algorithms for Decentralized Large Language Model Serving

Panda, Sanjaya Kumar; Dubey, Sankalp; Mishra, Siba

Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/3350

Full metadata record

DC Field	Value	Language
dc.contributor.author	Panda, Sanjaya Kumar	-
dc.contributor.author	Dubey, Sankalp	-
dc.contributor.author	Mishra, Siba	-
dc.date.accessioned	2025-10-13T04:27:36Z	-
dc.date.available	2025-10-13T04:27:36Z	-
dc.date.issued	2025	-
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/3350	-
dc.description	NITW	en_US
dc.description.abstract	Large language models (LLMs) have gained enormous popularity for processing and generating text. They are a subset of generative artificial intelligence (GenAI) that require higher availability of graphical processing unit (GPU) resources for inference services. However, making GPU resources available in a centralized infrastructure is quite challenging. Therefore, recent works have focused on decentralized physical infrastructure networks (DePIN) to utilize idle GPU resources, enabling scalable LLM inference services across the decentralized network. These inference services may experience inherent latency (i.e., measured in time per output token (TPOT)) due to communication overhead or time between GPU resources responsible for generating consecutive tokens. The task scheduling algorithm is crucial in decentralized LLM inference services to minimize TPOT and maximize GPU resource utilization, particularly when GPU resources are constrained by computational capacity. This paper introduces two task scheduling algorithms, the improved greedy heuristic shortest path algorithm (IGHSPA) and the dynamic programming-based task scheduling algorithm (DPTSA), for decentralized LLM serving to achieve these objectives. Each task involves assigning a layer to a GPU resource, which IGHSPA and DPTSA accomplish using greedy heuristic and dynamic programming. Both algorithms are extensively simulated and compared with one of the recent algorithms, namely the greedy heuristic shortest path algorithm (GHSPA), in terms of TPOT and execution time (ET). Our simulation results demonstrate that DPTSA improves TPOT up to 47.50% and 35.50% and ET up to 99.95% and 35.00%, compared to GHSPA and IGHSPA.	en_US
dc.language.iso	en	en_US
dc.publisher	IEEE Region 10 Conference 2025 (TENCON 2025)	en_US
dc.subject	Large Language Models	en_US
dc.subject	Generative Artificial Intelligence	en_US
dc.subject	Time Per Output Token	en_US
dc.subject	Greedy Heuristic	en_US
dc.subject	Dynamic Programming	en_US
dc.title	Efficient Task Scheduling Algorithms for Decentralized Large Language Model Serving	en_US
dc.type	Other	en_US
dcterms.publisher	IEEE Region 10 Conference 2025 (TENCON 2025)	en_US
Appears in Collections:	Computer Science & Engineering

Files in This Item:

File	Description	Size	Format
2025299379.pdf		302.38 kB	Adobe PDF	View/Open

Show simple item record