Multimodal speech and visual gesture control interface technique for small unmanned multirotor aircraft
This research conducted an investigation into the use of novel human computer interaction(HCI) interfaces in the control of small multirotor unmanned aerial vehicles(UAVs). The main objective was to propose, design, and develop an alternative control interface for the small multirotor UAV, which could perform better than the standard RC joystick (RCJ) controller, and to evaluate the performance of the proposed interface. The multimodal speech and visual gesture (mSVG)interface were proposed, designed, and developed. This was then coupled to a Rotor S ROS Gazebo UAV simulator. An experiment study was designed to determine how practical the use of the proposed multimodal speech and visual gesture interface was in the control of small multirotor UAVs by determining the limits of speech and gesture at different ambient noise levels and under different background-lighting conditions, respectively. And to determine how the mSVG interface compares to the RC joystick controller for a simple navigational control task – in terms of performance (time of completion and accuracy of navigational control) and from a human factor’s perspective (user satisfaction and cognitive workload). 37 participants were recruited. From the results of the experiments conducted, the mSVG interface was found to be an effective alternative to the RCJ interface when operated within a constrained application environment. From the result of the noise level experiment, it was observed that speech recognition accuracy/success rate falls as noise levels rise, with75 dB noise level being the practical aerial robot (aerobot) application limit. From the results of the gesture lighting experiment, gestures were successfully recognised from 10 Lux and above on distinct solid backgrounds, but the effect of varying both the lighting conditions and the environment background on the quality of gesture recognition, was insignificant (< 0.5%), implying that the technology used, type of gesture captured, and the image processing technique used were more important. From the result of the performance and cognitive workload comparison between the RCJ and mSVG interfaces, the mSVG interface was found to perform better at higher nCA application levels than the RCJ interface. The mSVG interface was 1 minute faster and 25% more accurate than the RCJ interface; and the RCJ interface was found to be 1.4 times more cognitively demanding than the mSVG interface. The main limitation of this research was the limited lighting level range of 10 Lux – 1400 Lux used during the gesture lighting experiment, which constrains the application limit to lowlighting indoor environments. Suggested further works from this research included the development of a more robust gesture and speech algorithm and the coupling of the improved mSVG interface on to a practical UAV.
https://eprints.soton.ac.uk/479472/
https://eprints.soton.ac.uk/479472/1/phd_thesis_draft_2v_PDFA.pdf