Generative AI

Projects

Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference

ICML 2025

(

)

Efficient MOE inference and training

WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning

arxiv 2025

(

)

we introduce WorkForceAgent-R1, an LLM-based web agent trained using a rule-based R1-style reinforcement learning framework designed explicitly to enhance single-step reasoning and planning for business-oriented web navigation tasks.

HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy

arxiv 2025

(

)

We explore developing a \texttt{H}eterogeneous-aware \texttt{EX}pert \texttt{A}llocation framework, \textbf{\texttt{HEXA-MoE}}, with significantly enhanced computing efficiency.

Lightening-Transformer: A Dynamically-Operated Optically-Interconnected Photonic Transformer Accelerator

We propose Lightening-Transformer, the first light-empowered, high-performance, and energy-efficient photonic Transformer accelerator.

SpAtten-Chip: A Fully-Integrated Energy-Scalable Transformer Accelerator Supporting Adaptive Model Configuration and Word Elimination for Language Understanding on Edge Devices

ISLPED 2023

(

)

We design and tape-out the SpAtten architecture in TSMC 28nm technology digital chip.

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

TODAES 2022

(

)

This paper provides an overview of efficient deep learning methods, systems and applications.

Blog Posts

No related blog posts.

Efficient AI Computing