Please use this identifier to cite or link to this item:
http://localhost:8080/xmlui/handle/123456789/3350Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Panda, Sanjaya Kumar | - |
| dc.contributor.author | Dubey, Sankalp | - |
| dc.contributor.author | Mishra, Siba | - |
| dc.date.accessioned | 2025-10-13T04:27:36Z | - |
| dc.date.available | 2025-10-13T04:27:36Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.uri | http://localhost:8080/xmlui/handle/123456789/3350 | - |
| dc.description | NITW | en_US |
| dc.description.abstract | Large language models (LLMs) have gained enormous popularity for processing and generating text. They are a subset of generative artificial intelligence (GenAI) that require higher availability of graphical processing unit (GPU) resources for inference services. However, making GPU resources available in a centralized infrastructure is quite challenging. Therefore, recent works have focused on decentralized physical infrastructure networks (DePIN) to utilize idle GPU resources, enabling scalable LLM inference services across the decentralized network. These inference services may experience inherent latency (i.e., measured in time per output token (TPOT)) due to communication overhead or time between GPU resources responsible for generating consecutive tokens. The task scheduling algorithm is crucial in decentralized LLM inference services to minimize TPOT and maximize GPU resource utilization, particularly when GPU resources are constrained by computational capacity. This paper introduces two task scheduling algorithms, the improved greedy heuristic shortest path algorithm (IGHSPA) and the dynamic programming-based task scheduling algorithm (DPTSA), for decentralized LLM serving to achieve these objectives. Each task involves assigning a layer to a GPU resource, which IGHSPA and DPTSA accomplish using greedy heuristic and dynamic programming. Both algorithms are extensively simulated and compared with one of the recent algorithms, namely the greedy heuristic shortest path algorithm (GHSPA), in terms of TPOT and execution time (ET). Our simulation results demonstrate that DPTSA improves TPOT up to 47.50% and 35.50% and ET up to 99.95% and 35.00%, compared to GHSPA and IGHSPA. | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | IEEE Region 10 Conference 2025 (TENCON 2025) | en_US |
| dc.subject | Large Language Models | en_US |
| dc.subject | Generative Artificial Intelligence | en_US |
| dc.subject | Time Per Output Token | en_US |
| dc.subject | Greedy Heuristic | en_US |
| dc.subject | Dynamic Programming | en_US |
| dc.title | Efficient Task Scheduling Algorithms for Decentralized Large Language Model Serving | en_US |
| dc.type | Other | en_US |
| dcterms.publisher | IEEE Region 10 Conference 2025 (TENCON 2025) | en_US |
| Appears in Collections: | Computer Science & Engineering | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| 2025299379.pdf | 302.38 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.