About me

I am a professor in Department of Computer Science at Tianjin University and a member of TANK Lab, led by Prof.Keqiu Li. I received my Ph.D. degree from Networked Systems Lab at University of Southern California, advised by Prof.Ramesh Govidan. I obtained my B.S. degree at Shanghai Jiao Tong University, advised by Prof. Xinbing Wang.

My research interests include agentic systems, large language model (LLM) systems, and AI for Science (AI4Science). In collaboration with research institutions like IBM Watson, Samsung Research and Microsoft Research, I have published tens of papers at the leading conferences/journals, including Nature, ASPLOS, EuroSys, SoCC, Ubicomp, INFOCOM, IWQoS, SIGCOMM, TPDS and TC. I have received honors such as U35 Outstanding Talent Award from Tianjin University and Best Paper Award from ACM SoCC'24. I have served as program committees for conferences such as INFOCOM and HPCA, and am an associate editor for IEEE Transactions on Computers.

I have developed Twen.ai, an university Q&A large language model. Empowered by RAG techniques, Twen addresses daily questions from students and faculties in areas such as daily life, scholarship selection, further studies, etc. Twen is officially released in April 2024, and serves thousands of requests each day since then. Recently, I am actively developing Xiaotian, the official AI mentor that is available to all students at Tianjin University since September 2025.

I am looking for self-motivated students interested in building systems for large language model and AI4Science. Feel free to drop me an email if you want to join us!

Selected Publications

[ICML 26] Mosaic: Unlocking Long-Context Inference for Diffusion LLMs via Global Memory Planning and Dynamic Peak Taming [paper][code]
[ICML 26] KernelBand: Boosting LLM-based Kernel Optimization with a Hierarchical and Hardware-aware Multi-armed Bandit [paper]
[EuroSys 26] PARD: Enhancing Goodput for Inference Pipeline via Proactive Request Dropping (CCF-A) [paper]
[ASPLOS 26] PAT: Accelerating LLM Decoding via Prefix-Aware Attention with Resource Efficient Multi-Tile Kernel (CCF-A) [paper][code][slides]
[TPDS 26] ViDA: Lossless VideoQA Acceleration via Selective Sparse Self-Speculation with Parallel Computational Load Management (CCF-A) [paper]
[TPDS 26] Accelerating ML Inference via Opportunistic Pre-Loading on Serverless Clusters (CCF-A) [paper]
[Nature 25] Delocalized Electrolyte Design Enables 600 Wh kg⁻¹ Lithium Metal Pouch Cells [paper]
[TC 25] TightLLM: Maximizing Throughput for LLM Inference via Adaptive Offloading Policy (CCF-A) [paper]
[TC 25] SLOpt: Serving Real-Time Inference Pipeline with Strict Latency Constraint (CCF-A) [paper]
[INFOCOM 25] Harpagon: Minimizing DNN Serving Cost via Efficient Dispatching, Scheduling and Splitting (CCF-A) [paper]
[INFOCOM 25] Lark: A Buffer-aware Building Block for Programmable Packet Scheduling in Datacenters (CCF-A) [paper]
[SoCC 24] Pre-Warming is Not Enough: Accelerating Serverless Inference With Opportunistic Pre-Loading (CCF-B, Best Paper Award) [paper]
[SIGCOMM 24] PPT: A Pragmatic Transport for Datacenters (CCF-A) [paper]
[ASPLOS 24] FUYAO: DPU-enabled Direct Data Transfer for Serverless Computing (CCF-A) [paper]
[IWQoS 23] High-throughput Sampling, Communicating and Training for Reinforcement Learning Systems (CCF-B) [paper]
[TPDS 23] Accelerating Data Delivery of Latency-Sensitive Applications in Container Overlay Network (CCF-A) [paper]
[SoCC 21] Scrooge: A Cost-Effective Deep Learning Inference System (CCF-B) [paper]
[IoTDI 21] Rim: Offloading Inference to the Edge [paper]
[Middleware 18] Olympian: Scheduling GPU Usage in a Deep Neural Network Model Serving System (CCF-B) [paper]
[Ubicomp 16] ALPS: Accurate Landmark Positioning at City Scales (CCF-A) [paper]
[INFOCOM 14] Critical Sensing Range for Mobile Heterogeneous Camera Sensor Networks (CCF-A) [paper]
[Preprint 26] RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems [paper][code][huggingface]
[Preprint 26] ServerlessLoRA: Minimizing Latency and Cost in Serverless Inference for LoRA-Based LLMs [paper]

Professional Activities

Associate Editor for IEEE Transactions on Computers
Program Committee for INFOCOM 2027, HPCA 2027, RecSys 2026, ICCCN 2023-2026, ICA3PP 2023-2025
Artifact Evaluation Committee for ASPLOS 2026, MLSys 2026
Journal Reviewer for IEEE ToN, IEEE TMC, IEEE TPDS

Honors and Awards

Xiaomi Young Scholar, Xiaomi Corporation, 2026
Outstanding Undergraduate Thesis Advisor, Tianjin University, 2025
U35 Outstanding Talent Award, Tianjin University, 2025
Best Paper Award, ACM SoCC, 2024
Outstanding Young Academic Talent Award, Tianjin University, 2024
Qiming Scholar, Tianjin University, 2023
Chun-Tsung Scholar (1st at SJTU), Shanghai Jiao Tong University, 2014
Valedictorian at SEIEE, Shanghai Jiao Tong University, 2014

Teaching

Computer Systems, TJU, 23Spring, 24Spring, 25Spring, 26Spring
Design and Analysis of Algorithms, TJU, 23Fall, 25Spring, 26Spring
Introduction to Internetworking, USC, 16Spring

Students

PhD Students

Zhixin Zhao (2022 - Now)¹
Guotao Yang (2023 - Now)¹
Liang Zheng (2024 - Now)²
Zhengchao Wang (2025 - Now)
Weiwei Sun (2025 - Now)¹
Xinpei Wang (2026 - Now)

Master Students

Jingyuan Xiao (2024 - Now)
Jinjun Yi (2024 - Now)
Tao Wang (2024 - Now)¹
Yongfeng Wang (2025 - Now)
Shi Chen (2025 - Now)
Kaining Hui (2025 - Now)
Siwei He (2025 - Now)
Jianing Ye (2025 - Now)
Bowen Shi (2025 - Now)
Hanyang Xie (2025 - Now)

Undergraduate Students

Rui Guo (2024 - Now)
Hao Chen (2024 - Now)
Ke Yan (2025 - Now)
Mingxi Zhao (2025 - Now)
Zhuxuan Chang (2025 - Now)
Yongzhi Shi (2025 - Now)
Fujiang Liu (2025 - Now)
Jiazheng Yu (2025 - Now)
Fengkai Zhu (2025 - Now)
Longfei Yin (2025 - Now)
Haopeng Li (2025 - Now)
Jingyang Pan (2025 - Now)
Xinqiang Yu (2025 - Now)
Weibo Xu (2025 - Now)
Tengyi Wang (2025 - Now)
Zhike Guo (2025 - Now)
Yipeng Wu (2025 - Now)

Alumni