About me

I am a tenure-track assistant professor in Department of Computer Science at Tianjin University and a member of TANK Lab, led by Prof.Keqiu Li. I received my Ph.D. degree from Networked Systems Lab at University of Southern California, advised by Prof.Ramesh Govidan. I obtained my B.S. degree at Shanghai Jiao Tong University, advised by Prof. Xinbing Wang.

My research interests include large language model (LLM) systems, deep neural network (DNN) systems, performance analysis and optimization, parallel and distributed computing. My recent work delves into developing inference systems capable of deploying LLM and DNN models in large-scale cloud clusters, aiming for peak performance, efficiency and scalability through innovative techniques such as computational acceleration, parallel optimization, and resource orchestration. In collaboration with research institutions like IBM Watson, Samsung Research and Microsoft Research, I have published tens of papers at the leading conferences/journals, including SoCC, Ubicomp, INFOCOM, IWQoS, ASPLOS, SIGCOMM and TPDS. My research has been funed by NSFC, etc. I have received honors such as Outstanding Young Academic Talent Award from Tianjin University and Best Paper Award from SoCC'24.

Recently, I am actively developing Twen.ai, the very first university Q&A large language model. Empowered by RAG techniques, Twen addresses daily questions from students and faculties in areas such as daily life, scholarship selection, further studies, etc. Twen is officially released in April 2024, and serves thousands of requests each day since then.

I am looking for self-motivated students interested in building systems for large language model and deep neural network. Feel free to drop me an email if you want to join us!

Research

My research is aiming to build inference systems capable of deploying LLM and DNN models in large-scale cloud clusters with peak performance, efficiency and scalability.

  • Large Language Model System

    • Seving Classic LLM: Serving LLM applications brings new challenges due to their huge memory consumption and unpredictable output length. We designed novel LLM inference systems (qLLM, tgLLM) to minimize job completion time across LLM requests and to maximize model throughput and resource utilization. We also built various inference systems (InferRAG, InferMM) to manage computation resources under scenarios such as RAG and multi-modal. ..
    • Serving Specialized LLM: Recent innovations in LLM architecture also bring new challenges. We designed specialized inference systems (SpecInfer, ParaMoE) to optimize the inference pipeline for speculative decoding and mixture of expert. Besides, we also investigated interesting topics such as lookahead decoding, LoRA serving, kv-cache optimization, etc. ..
  • Deep Neural Network System

    • Latency Sensitive Inference: To guarantee good user experiences, DNN-based applications are usually associated with a latency objective. We designed various model orchestration systems (Harpagon, DeepLat, TopInfer) to minimize the serving cost under latency objective via techniques such as dynamic batching, request dispatching and configuration decoupling. We also built various resource scaling systems (SLOpt, DeepChain) to maximize system goodput under bursty workload via techniques such as AoT compilation and model pre-warmup. ..
    • Complex Scenario: Given the use cases, DNN-based applications face various deployment requirements. We have designed multi-stage inference systems (Scrooge, Rim, Olympian) to manage DNN models in edge/cloud GPU clusters via techniques such as model co-location and model promotion. We also built specialized systems (ALPS, HRL) to handle complex scenario such as multi-modal input and heterogeneous hardware. ..

Selected Publications

Honors and Awards

  • Best Paper Award, SoCC, 2024
  • Outstanding Young Academic Talent Award, Tianjin University, 2024
  • Qiming Scholar, Tianjin University, 2023
  • Chun-Tsung Scholar (1st at SJTU), Shanghai Jiao Tong University, 2014
  • Valedictorian at SEIEE, Shanghai Jiao Tong University, 2014

Teaching

  • Computer Systems, TJU, 23Spring, 24Spring
  • Design and Analysis of Algorithms, TJU, 23Fall
  • Introduction to Internetworking, USC, 16Spring

Students

  • Zhixin Zhao (PhD, 2022 - Now)1
  • Guotao Yang (PhD, 2023 - Now)1
  • Liang Zheng (PhD, 2024 - Now)2
  • Jiaheng Gao (MS, 2022 - Now)
  • Linxuan Li (MS, 2022 - Now)
  • Ziqi Gong (MS, 2023 - Now)
  • Chen Shen (MS, 2023 - Now)
  • Jingyuan Xiao (MS, 2024 - Now)
  • Jinjun Yi (MS, 2024 - Now)
  • Zhengchao Wang (MS, 2024 - Now)
  • Tao Wang (MS, 2024 - Now)
  • Wenxin Zhu (BS, 2023 - Now)
  • Mingfang Ji (BS, 2023 - Now)
  • Kai Zeng (BS, 2023 - Now)
  • Zhenyi Zhong (BS, 2024 - Now)
  • Ke Wang (BS, 2024 - Now)
  • Junhao Li (BS, 2024 - Now)
  • Hao Ding (BS, 2024 - Now)

Alumni