My research spans the intersections of deep learning, hardware architecture, and computer networks, with a particular focus on developing efficient algorithms and systems for deploying artificial intelligence. I have extensive experience in optimizing deep neural networks (DNNs) through various approaches, including systolic array implementations, sparse architectures, and quantization techniques. My work has contributed to both theoretical frameworks and practical implementations for accelerating DNN inference on edge devices, especially in distributed network environments. Recently, I have expanded my research to address efficiency challenges in emerging AI systems, particularly Large Language Models (LLMs) and multi-modal architectures, as evidenced by my work on speculative decoding and transformer optimization techniques.
Ken Kawamura, Zeqian Li, Chit-Kwan Lin, Bradley McDanel
Joint Workshop of the 8th Financial Technology and Natural Language Processing (FinNLP) and the 1st Agent AI for Scenario Planning (AgentScen) at IJCAI, 2024
paper