Selected Publications
Conference Publications
Zhao Yang, Yue Heng Yeo, Rui Jiang, Xiao Fu, Weiguang Chen, Wei Xi, Jizhong Zhao. lnjecting Visual Features into Whisperfor Parameter-Efficient Noise-Robust Audio-Visual Speech Recognition. lCASSP 2025-2025 lEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)
Zhao Yang, Rui Jiang, Xiao Fu, Wei Xi, Jizhong Zhao. Open-Modality Latent Modality Interaction Maximization for Audio-Visual Learning. lCASSP 2025-2025 lEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025)
Zhao Yang, Dianwen Ng, Chong Zhang, Xiao Fu, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Eng siong Chng, Bin Ma, JizhongZhao. Dual Acoustic Linguistic Self-supervised Representation Learning for Cross-Domain Speech Recognition. INTERSPEECH 2023
Zhao Yang, Dianwen Ng, Chong Zhang, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Jizhong Zhao, Bin Ma, Eng Siong Chng. AUnified Recognition and Correction Model under Noisy and Accent Speech Conditions. INTERSPEECH 2023
Zhao Yang, Dianwen Ng, Xizhe Li, Chong Zhang, Rui Jiang, Wei Xi, Yukun Ma, Chongjia Ni, Jizhong Zhao, Bin Ma, Eng Siong Chng. Dual-Memory Multi-Modal Learning for Continual Spoken Keyword Spotting with Confidence Selection and Diversity Enhancement. INTERSPEECH 2023
Rui Jiang, Zhao Yang, Xiao Fu, Wei Xi, Jizhong Zhao. Chinese Speech Processing via Chinese Character Feature. ICASSP 2025-2025 lEEE International Conference on Acoustics, Speech and Signal Processing (CASSP 2025)
Xiao Fu, Wei Xi, Zhao Yang, Rui Jiang, Dianwen Ng, Jie Yang, Jizhong Zhao. MRFER: Multi-channel Robust Feature EnhancedFusion For Multi-modal Emotion Recognition. 2024 lEEE lnternational Conference on Multimedia and Expo (ICME 2024)
Dianwen Ng, Ruixi Zhang, Jia Qi Yip, Zhao Yang, Jinjie Ni, Chong Zhang, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma. De’hubert: Disentangling Noise in a Self-Supervised Model for Robust Speech Recognition. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)
Dianwen Ng, Yang Xiao, Jia Qi Yip, Zhao Yang, Biao Tian, Qiang fu, Eng Siong Chng, Bin Ma. Small Footprint Multi-channel Network for Keyword Spotting with Centroid Based Awareness. INTERSPEECH 2023
Dianwen Ng, Jia Qi Yip, Tanmay Surana, Zhao Yang, Chong Zhang, Yukun Ma, Chongjia Ni, Eng Siong Chng, Bin Ma. I2CR: Improving Noise Robustness on Keyword Spotting using inter-Intra Contrastive Regularization. Asia-Pacific Signal andInformation Processing Association Annual Summit and Conference (APSIPA ASC 2022)
Fan Yu, Wei Xi, Zhao Yang, Ziye Tong, Jingtong Sun. LRTD: A Low-rank Transformer with Dynamic Depth and Width for Speech Recognition. International Joint Conference on Neural Networks (ICNN 2022)
刘志林, 杨颜瑜, 杨钊, 赵鲲, 惠维. 基于卷积神经网络的单一特征及多特征融合阿尔茨海默症音频分类方法. 第十六届人机语音通讯会议 (NCMMSC 2021)
Fan Yu, Jiawei Guo, Wei Xi, Zhao Yang, Rui Jiang, Chao Zhang. Audio DistilBERT: a distilled audio BERT for speech representation learning. 2021 International Joint Conference on Neural Networks (IJCNN 2021)
Yinhui Zhang, Wei Xi, Zhao Yang, Sitao Men, Rui Jiang, Yuxin Yang, Jizhong Zhao. Speech2Stroke: Generate Chinese Character Strokes Directly from Speech. International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2020)
arXiv
Zhao Yang, Dianwen Ng, Xiao Fu, Liping Han, Wei Xi, Rui Wang, Rui Jiang, Jizhong Zhao. On the Effectiveness of Pinyin-Character Dual-Decoding for End-to-End Mandarin Chinese ASR. arXiv 2022