Audiovisual Speech Recognition Method Based on Connectionism

Authors

  • Na Che
  • Yiming Zhu
  • Femi Adetunji
  • Khanyisa Dlamini
  • Lijuan Shi
  • Xianwei Zeng

DOI:

https://doi.org/10.59782/iam.v1i2.225

Keywords:

computer application technology, audio-visual speech recognition, deep learning

Abstract

Audio-visual speech recognition technology has greatly improved the performance of pure speech recognition by combining visual speech information and acoustic speech information, but there are problems such as large data demand, audio and video data alignment, and noise robustness. Scholars have proposed many solutions to these problems. Among them, deep learning algorithms, as representatives of connectionist artificial intelligence technology, have good generalization ability and portability, and are easier to migrate to different tasks and fields. They are becoming one of the mainstream technologies for audio-visual speech recognition. This paper mainly studies and analyzes the application of deep learning technology in the field of audio-visual speech recognition, especially the audio-visual speech recognition model of the end-to-end framework. Through experimental comparative analysis, relevant data sets and evaluation methods are summarized, and finally hot issues that need to be further studied and solved are proposed.

How to Cite

Che, N., Zhu, Y., Adetunji, F., Dlamini, K., Shi, L., & Zeng, X. (2024). Audiovisual Speech Recognition Method Based on Connectionism. Insights of Automation in Manufacturing, 1(2), 43–54. https://doi.org/10.59782/iam.v1i2.225

Issue

Section

Articles