Due to the high computational complexity of DNN inference, there has been intense effort to design hardware that performs the computation efficiently. On the commercial side, we have Google TPUs[1] which can be used in the Google Cloud and specialization in NVIDIA GPUs with the introduction Tensor Cores in recent years. There are also many newer companies working in this space [2, 3, 4].
My research in this area is in the realm of hardware/algorithmic co-design. Specifically, I am interested in integrating sparsity/quantization/adaptive computation into the design of the hardware. I mainly focus on hardware architectures based around systolic arrays (similar to the TPU).
H. T. Kung, B. McDanel, S. Zhang
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020.
paper
H. T. Kung, B. McDanel, S. Zhang, X. Dong, C. Chen.
30th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2019
paper
B. McDanel, S. Zhang, H. T. Kung, X. Dong.
32nd ACM International Conference on Supercomputing (ICS), 2019
paper
H. T. Kung, B. McDanel, S. Zhang, C. T. Wang, J. Cai, C. Y. Chen, V. Chang, M. F. Chen, J. Sun, and D. Yu.
IEEE International Symposium on Circuits and Systems (ISCAS), 2019
paper
H. T. Kung, B. McDanel, and S. Zhang
24th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019
paper | code
H. T. Kung, B. McDanel, S. Zhang
International Conference on Pattern Recognition (ICPR), 2018
paper
H. T. Kung, B. McDanel, S. Zhang
IEEE Workshop on Signal Processing Systems (SiPs), 2018.
paper