Also available at Google Scholar, DBLP and ORCID.
[P8] Xianwei Zhang, John Kalamatianos and Bradford Beckmann
GPU Cache Management based on Lightweight Locality Type Detection
, US11487671B2, 2022.
[P7] Sooraj Puthoor, Kishore PUNNIYAMURTHY, Onur Kayiran, Xianwei Zhang, Yasuko ECKERT, Johnathan Alsop and Bradford Michael Beckmann
Memory Request Priority Assignment Techniques for Parallel Processors
, US11507522B2, 2022.
[P6] Seyed Mohammad Seyedzadehdelcheh, Xianwei Zhang, Bradford Beckmann, and Shomit N. Das
Data Compression System Using Base Values and Methods Thereof
, US11740791B2, 2023.
[P5] Anthony T. Gutierrez, Sergey Blagodurov, Scott A. Moe, Xianwei Zhang, Jieming Yin and Matthew D. Sinclair
Selecting a Precision Level for Executing a Workload in an Electronic Device
, US11150899B2, 2021.
[P4] Xianwei Zhang, Yuhao Gu, Zhiguang Chen and Yutong Lu
A Programmable Data Exchange System
, ZL 2025 1 0077351.0
[P3] Zewei Mo, Xianwei Zhang, Tianao Ge and Yutong Lu
A Compiler-Based Automatic Multi-Stream Scheduling Method for Kernel Functions
, ZL 2022 1 0172808.2
[P2] Zewei Mo, Xianwei Zhang and Tianao Ge
An Error-Controllable Automated Optimization Method for Mixed-Precision Operators
, ZL 2021 1 1551663.9
[P1] Tianao Ge, Xianwei Zhang, Zewei Mo and Yutong Lu
A Loop-Folding-Based Binary Code Size Optimizer
, ZL 2022 1 0154571.5
Note: (Co-)Supervised Student, Corresponding#
[J7] Wenxuan Pan, Zejia Lin, Jiangsu Du# and Xianwei Zhang#
HuntKTm: Hybrid Scheduling and Automatic Management for Efficient Kernel Execution on Modern GPUs (CCF-A)
ACM Transactions on Architecture and Code Optimization (TACO).
[C29] Hongxin Xu=, Tianyu Guo= and Xianwei Zhang# (=Equal Contribution)
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism (CCF-A)
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), San Diego, CA, United States, December 2025.
[C28] Yuhao Gu, Haoquan Chen, Xianjie Chen, Jiangsu Du, Zhiguang Chen, Nong Xiao#, Xianwei Zhang# and Yutong Lu
coMtainer: Compilation-assisted HPC Container Images with Enhanced Adaptability (CCF-A)
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), St. Louis, MO, United States, November 2025.
[C27] Tianyu Guo, Xianwei Zhang#, Jiangsu Du, Zhiguang Chen#, Nong Xiao and Yutong Lu
gLLM: Global Balanced Pipeline Parallelism Systems for Distributed LLMs Serving with Token Throttling (CCF-A)
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), St. Louis, MO, United States, November 2025.
[C26] Han Huang, Jiabin Xie, Guangnan Feng, Xianwei Zhang, Dan Huang, Zhiguang Chen and Yutong Lu#
HStencil: Matrix-Vector Stencil Computation with Interleaved Outer Product and MLA (CCF-A)
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), St. Louis, MO, United States, November 2025.
[C25] Xuanteng Huang, Jiangsu Du, Nong Xiao and Xianwei Zhang#
PaSK: Cold Start Mitigation for Inference with Proactive and Selective Kernel Loading on GPUs (CCF-A)
The 62nd ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, United States, June 2025.
[C24] Kan Wu, Zejia Lin, Mengyue Xi, Zhongchun Zheng, Wenxuan Pan, Xianwei Zhang# and Yutong Lu#
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving (CCF-A)
The 62nd ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, United States, June 2025.
[C23] Yuhao Gu, Chunyu Chen, Jiangsu Du, Xiaoxi Zhang and Xianwei Zhang#
ORFA: Exploring WebAssembly as a Turing Complete Query Language for Web APIs (CCF-A)
The ACM Web Conference (WWW), Sydney, NSW, Australia, April 2025.
[C22] Mengyue Xi, Jingyi He and Xianwei Zhang#
CacheC: LLM-based GPU Cache Management to Enhance Kernel Concurrency (CCF-B)
The 31st International European Conference on Parallel and Distributed Computing (Euro-Par), Dresden, Germany, August 2025.
[C21] Tianyu Guo, Hande Dong#, Yichong Leng, Feng Liu, Cheater Lin, Nong Xiao and Xianwei Zhang#
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse (CCF-B)
The 31st International European Conference on Parallel and Distributed Computing (Euro-Par), Dresden, Germany, August 2025.
[C20] Mengyue Xi, Tianyu Guo, Xuanteng Huang, Zejia Lin and Xianwei Zhang#
Mpache: Interaction Aware Multi-level Cache Bypassing on GPUs (CCF-C)
The 30th Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo Odaiba Miraikan, Japan, January 2025.
[J6] Hengzhong Liang, Han Huang and Xianwei Zhang#
SuCL: Supply Unified Communication Layer to Improve SYCL-based Heterogeneous Computing (CCF-C)
CCF Transactions on High Performance Computing (THPC), 2025.
[J5] Pin Chen, Qing Mo, Zexin Xu, Xianwei Zhang and Yutong Lu#
Star-gen: An HPC-AI Framework for Constructing Large-scale Computational Materials Database (CCF-C)
CCF Transactions on High Performance Computing (THPC), 2025.
[C19] Tianyu Guo, Xuanteng Huang, Kan Wu, Xianwei Zhang# and Nong Xiao
SMILE: LLC-based Shared Memory Expansion to Improve GPU Thread Level Parallelism (CCF-A)
The 61st ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, United States, June 2024.
[C18] Yuanxin Wei, Jiangsu Du#, Jiazhi Jiang, Xiao Shi, Xianwei Zhang, Dan Huang#, Nong Xiao and Yutong Lu
APTMoE: Affinity-aware Pipeline Tuning for MoE Models on Bandwidth-constrained GPU Nodes (CCF-A)
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), Atlanta, GA, United States, November 2024.
[C17] Zejia Lin, Aoyuan Sun, Xianwei Zhang# and Yutong Lu
MixPert: Optimizing Mixed-precision Floating-point Emulation on GPU Integer Tensor Cores (CCF-B)
The 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), Copenhagen, Denmark, June 2024.
[C16] Zhaowen Shan, Xuanteng Huang, Zheng Zhou and Xianwei Zhang#
openLG: A Tunable and Efficient Open-source LSTM on GPUs (CCF-C)
The International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, June 2024.
[C15] Zhongchun Zheng, Yuan Wu and Xianwei Zhang#
mLOOP: Optimize Loop Unrolling in Compilation with a ML-based Approach
The 17th International Conference on Networking, Architecture, and Storage (NAS), Guangzhou, China, November 2024.
[C14] Zejia Lin, Zewei Mo, Xuanteng Huang, Xianwei Zhang# and Yutong Lu
KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications (CCF-B)
The IEEE 41st International Conference on Computer Design (ICCD), Washington DC, United States, November 2023.
[J4] Xuanteng Huang, Xianwei Zhang#, Panfei Yang# and Nong Xiao
Benchmarking GPU Tensor Cores on General Matrix Multiplication Kernels through CUTLASS
Applied Sciences, December 2023.
[J3] Xi Zhang#, Xiaohu Gu, Yue Weng, Xianwei Zhang, Yutong Lu and Zhong Zhao
Hybrid MPI and CUDA Paralleled Finite Volume Unstructured CFD Simulations on a Multi-GPU System (CCF-C)
Future Generation Computer Systems (FGCS), 139 (2023), February 2023.
[W5] Lianghong Huang, Zejia Lin, Wei Liu# and Xianwei Zhang#
Hay: Enhancing GPU Sharing Performance With Two-Level Scheduling for Ray
The 29th IEEE International Conference on Parallel and Distributed Systems (ICPADS, short), Hainan, China, December 2023.
[J7] Wenxuan Pan, Zejia Lin, Jiangsu Du# and Xianwei Zhang#
HuntKTm: Hybrid Scheduling and Automatic Management for Efficient Kernel Execution on Modern GPUs (CCF-A)
ACM Transactions on Architecture and Code Optimization (TACO).
[C29] Hongxin Xu=, Tianyu Guo= and Xianwei Zhang# (=Equal Contribution)
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism (CCF-A)
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), San Diego, CA, United States, December 2025.
[C28] Yuhao Gu, Haoquan Chen, Xianjie Chen, Jiangsu Du, Zhiguang Chen, Nong Xiao#, Xianwei Zhang# and Yutong Lu
coMtainer: Compilation-assisted HPC Container Images with Enhanced Adaptability (CCF-A)
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), St. Louis, MO, United States, November 2025.
[C27] Tianyu Guo, Xianwei Zhang#, Jiangsu Du, Zhiguang Chen#, Nong Xiao and Yutong Lu
gLLM: Global Balanced Pipeline Parallelism Systems for Distributed LLMs Serving with Token Throttling (CCF-A)
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), St. Louis, MO, United States, November 2025.
[C26] Han Huang, Jiabin Xie, Guangnan Feng, Xianwei Zhang, Dan Huang, Zhiguang Chen and Yutong Lu#
HStencil: Matrix-Vector Stencil Computation with Interleaved Outer Product and MLA (CCF-A)
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), St. Louis, MO, United States, November 2025.
[C25] Xuanteng Huang, Jiangsu Du, Nong Xiao and Xianwei Zhang#
PaSK: Cold Start Mitigation for Inference with Proactive and Selective Kernel Loading on GPUs (CCF-A)
The 62nd ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, United States, June 2025.
[C24] Kan Wu, Zejia Lin, Mengyue Xi, Zhongchun Zheng, Wenxuan Pan, Xianwei Zhang# and Yutong Lu#
GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving (CCF-A)
The 62nd ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, United States, June 2025.
[C23] Yuhao Gu, Chunyu Chen, Jiangsu Du, Xiaoxi Zhang and Xianwei Zhang#
ORFA: Exploring WebAssembly as a Turing Complete Query Language for Web APIs (CCF-A)
The ACM Web Conference (WWW), Sydney, NSW, Australia, April 2025.
[C22] Mengyue Xi, Jingyi He and Xianwei Zhang#
CacheC: LLM-based GPU Cache Management to Enhance Kernel Concurrency (CCF-B)
The 31st International European Conference on Parallel and Distributed Computing (Euro-Par), Dresden, Germany, August 2025.
[C21] Tianyu Guo, Hande Dong#, Yichong Leng, Feng Liu, Cheater Lin, Nong Xiao and Xianwei Zhang#
EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse (CCF-B)
The 31st International European Conference on Parallel and Distributed Computing (Euro-Par), Dresden, Germany, August 2025.
[C20] Mengyue Xi, Tianyu Guo, Xuanteng Huang, Zejia Lin and Xianwei Zhang#
Mpache: Interaction Aware Multi-level Cache Bypassing on GPUs (CCF-C)
The 30th Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo Odaiba Miraikan, Japan, January 2025.
[J6] Hengzhong Liang, Han Huang and Xianwei Zhang#
SuCL: Supply Unified Communication Layer to Improve SYCL-based Heterogeneous Computing (CCF-C)
CCF Transactions on High Performance Computing (THPC), 2025.
[J5] Pin Chen, Qing Mo, Zexin Xu, Xianwei Zhang and Yutong Lu#
Star-gen: An HPC-AI Framework for Constructing Large-scale Computational Materials Database (CCF-C)
CCF Transactions on High Performance Computing (THPC), 2025.
[C19] Tianyu Guo, Xuanteng Huang, Kan Wu, Xianwei Zhang# and Nong Xiao
SMILE: LLC-based Shared Memory Expansion to Improve GPU Thread Level Parallelism (CCF-A)
The 61st ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, United States, June 2024.
[C18] Yuanxin Wei, Jiangsu Du#, Jiazhi Jiang, Xiao Shi, Xianwei Zhang, Dan Huang#, Nong Xiao and Yutong Lu
APTMoE: Affinity-aware Pipeline Tuning for MoE Models on Bandwidth-constrained GPU Nodes (CCF-A)
The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), Atlanta, GA, United States, November 2024.
[C17] Zejia Lin, Aoyuan Sun, Xianwei Zhang# and Yutong Lu
MixPert: Optimizing Mixed-precision Floating-point Emulation on GPU Integer Tensor Cores (CCF-B)
The 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), Copenhagen, Denmark, June 2024.
[C16] Zhaowen Shan, Xuanteng Huang, Zheng Zhou and Xianwei Zhang#
openLG: A Tunable and Efficient Open-source LSTM on GPUs (CCF-C)
The International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, June 2024.
[C15] Zhongchun Zheng, Yuan Wu and Xianwei Zhang#
mLOOP: Optimize Loop Unrolling in Compilation with a ML-based Approach
The 17th International Conference on Networking, Architecture, and Storage (NAS), Guangzhou, China, November 2024.
[C14] Zejia Lin, Zewei Mo, Xuanteng Huang, Xianwei Zhang# and Yutong Lu
KeSCo: Compiler-based Kernel Scheduling for Multi-task GPU Applications (CCF-B)
The IEEE 41st International Conference on Computer Design (ICCD), Washington DC, United States, November 2023.
[J4] Xuanteng Huang, Xianwei Zhang#, Panfei Yang# and Nong Xiao
Benchmarking GPU Tensor Cores on General Matrix Multiplication Kernels through CUTLASS
Applied Sciences, December 2023.
[J3] Xi Zhang#, Xiaohu Gu, Yue Weng, Xianwei Zhang, Yutong Lu and Zhong Zhao
Hybrid MPI and CUDA Paralleled Finite Volume Unstructured CFD Simulations on a Multi-GPU System (CCF-C)
Future Generation Computer Systems (FGCS), 139 (2023), February 2023.
[W5] Lianghong Huang, Zejia Lin, Wei Liu# and Xianwei Zhang#
Hay: Enhancing GPU Sharing Performance With Two-Level Scheduling for Ray
The 29th IEEE International Conference on Parallel and Distributed Systems (ICPADS, short), Hainan, China, December 2023.
[C13] Tianao Ge, Zewei Mo, Kan Wu, Xianwei Zhang# and Yutong Lu
RollBin: Reducing Code-size via Loop Rerolling at Binary Level (CCF-B)
The 23rd ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES), San Diego, California, United States, June 2022.
[C12] Zewei Mo, Zejia Lin, Xianwei Zhang# and Yutong Lu
moTuner: A Compiler-based Auto-tuning Approach for Mixed-precision Operators (CCF-C)
The 19th ACM International Conference on Computing Frontiers (CF), Turin, Piedmont, Italy, May 2022.
[C11] Yue Weng, Tianao Ge, Xianwei Zhang# and Yutong Lu
RAISE: Efficient GPU Resource Management via Hybrid Scheduling (CCF-C)
The 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Taormina (Messina), Italy, May 2022.
[J2] Yue Weng, Xi Zhang, Xiaohu Guo, Xianwei Zhang#, Yutong Lu and Yang Liu
Effects of Mesh Loop Modes on Performance of Unstructured Finite Volume GPU Simulations
Advances in Aerodynamics (AIA), 3(21), 2021.
[W4] Xianwei Zhang and Evgeny Shcherbakov
DELTA: Validate GPU Memory Profiling with Microbenchmarks
The International Symposium on Memory Systems (MemSys, short), Washington D.C., USA, October 2020.
[C10] Tuan Ta, Xianwei Zhang, Anthony Gutierrez and Brad Beckmann
Autonomous Data-Race-Free GPU Testing
IEEE International Symposium on Workload Characterization (IISWC), Orlando, Florida, USA, November 2019.
[C9] Xianwei Zhang, Rujia Wang, Youtao Zhang and Jun Yang
Boosting Chipkill Capability under Retention-error Induced Reliability Emergency (CCF-C)
The 24th Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, Janurary 2019.
[W3] John Alsop, Matt Sinclair, Srikant Bharadwaj, Anthony Gutierrez, Xianwei Zhang, Brad Beckmann, Alex Dutu, Onur Kayiran, Michael LeBeane, Brandon Potter, Sooraj Puthoor and Tsung Tai Yeh
Optimizing GPU Cache Policies for MI Workloads
IEEE International Symposium on Workload Characterization (IISWC, short), Orlando, Florida, USA, November 2019.
[C8] Anthony Gutierrez, Brad Beckmann, Alexandru Dutu, Joseph Gross, Michael LeBeane, John Kalamatianos, Onur Kayiran, Matthew Poremba, Brandon Potter, Sooraj Puthoor, Matt Sinclair, Mark Wyse, Jieming Yin, Xianwei Zhang, Akshay Jain and Tim Rogers
Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level (CCF-A)
The 24th IEEE International Symposium on High-Performance Computer Architecture (HPCA), Vienna, Austria, February 2018.
[C7] Xianwei Zhang, Youtao Zhang, Bruce R. Childers and Jun Yang
DrMP: Mixed Precision-aware DRAM for High Performance Approximate and Precise Computing (CCF-B)
The 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), Portland, Oregon, USA, September 2017.
[J1] Xianwei Zhang, Youtao Zhang, Bruce R. Childers and Jun Yang
On the Restore Time Variations of Future DRAM Memory (CCF-B)
ACM Trans. on Design Automation of Electronic Systems (TODAES), 22(2), February 2017.
[W2] Xianwei Zhang, Youtao Zhang, Bruce R. Childers and Jun Yang
AWARD: Approximation-aWAre Restore in Further Scaling DRAM
The International Symposium on Memory Systems (MemSys, extended abstract), Washington D.C., USA, October 2016.
[C6] Xianwei Zhang, Youtao Zhang, Bruce R. Childers and Jun Yang
Restore Truncation for Performance Improvement in Future DRAM Systems (CCF-A)
The 22nd IEEE Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain, March 2016.
[C5] Xianwei Zhang, Youtao Zhang, Bruce R. Childers and Jun Yang
Exploiting DRAM Restore Time Variations in Deep Sub-micron Scaling (CCF-B)
The IEEE conference on Design, Automation and Test in Europe (DATE), Grenoble, France, March 2015.
[C4] Xianwei Zhang, Youtao Zhang and Jun Yang
DLB: Dynamic Lane Borrowing for Improving Bandwidth and Performance in Hybrid Memory Cube (CCF-B)
The 33rd IEEE International Conference on Computer Design (ICCD), New York City, USA, October 2015.
[C3] Xianwei Zhang, Youtao Zhang and Jun Yang
TriState-SET: Proactive SET for Improved Performance in MLC Phase Change Memories (CCF-B)
The 33rd IEEE International Conference on Computer Design (ICCD), New York City, USA, October 2015.
[C2] Xianwei Zhang, Lei Zhao, Youtao Zhang and Jun Yang
Exploit Common Source-Line to Construct Energy Efficient Domain Wall Memory based Caches (CCF-B)
The 33rd IEEE International Conference on Computer Design (ICCD), New York City, USA, October 2015.
[W1] Xianwei Zhang, Youtao Zhang and Jun Yang
Adaptive Lane Borrowing of Hybrid Memory Cube
The 52nd ACM/IEEE Design Automation Conference (DAC, wip), San Francisco, California, USA, June 2015.
[C1] Xianwei Zhang, Lei Jiang, Youtao Zhang, Chuanjun Zhang and Jun Yang
WoM-SET: Lowering Write Power of Proactive-SET based PCM Write Strategy Using WoM Code (CCF-C), (Best Paper Award)
The International Symposium on Low Power Electronics and Design (ISLPED), Beijing, China, September 2013.