Mubin, O., Bartneck, C., & Feijs, L. (2009). Designing an Artificial Robotic Interaction Language In T. Gross, J. Gulliksen, P. Kotzé, L. Oestreicher, P. Palanque, R. O. Prates & M. Winckler (Eds.), Human-Computer Interaction – INTERACT 2009 (Vol. LNCS 5727/2009, pp. 848-851). Berlin: Springer.
Abstract - The project described hereunder focuses on the design and implementation of a “Artificial Robotic Interaction Language", where the research goal is to find a balance between the effort necessary from the user to learn a new language and the resulting benefit of optimized automatic speech recognition for a robot or a machine.
Keywords: Artificial Languages, Speech Interaction, Automatic Speech Recognition
Speech is a natural means of information exchange for humans. Therefore, improving speech interaction technology in Human Computer Interaction could lead to a more pleasant interaction. Speech Interaction is confronted by various issues, such as: ambiguity in natural dialogue, unrobust speech recognition and unsynchronization between software and hardware . Recent attempts to improve the quality of the technology of automatic speech recognition for machines have not advanced enough. The limitations prevailing in current speech recognition technology for natural language is a major obstacle behind the unanimous acceptance of Speech Interfaces. Existing speech recognition is just not good enough for it to be deployed in natural environments, where the ambience also influences its performance.
Generally in speech interfaces the focus is on using natural language and given their unsuitability, it is time to find a different balance in the form of a new language. Recent research in speech interaction is already moving in this direction, as stated in , constraining language is a plausible method of improving recognition accuracy. In  the user experience of an artificially constrained language ("Speech Graffiti") was evaluated and it was concluded that 74% of the users found it more satisfactory than natural language and also more efficient in terms of time. The field of handwriting recognition has followed a similar road map. The first recognition systems for handheld devices, such as Apple's Newton were nearly unusable. Palm solved the problem by inventing a simplified alphabet called Graffiti which was easy to learn for users and easy to recognize for the device. Therefore we aim to construct an "Artificial Robotic Interaction Language" where an artificial language is a language deliberately invented or constructed, especially as a means of communication in computing . In linguistics, there are numerous artificial languages which address a user perspective by making communication between humans easier and/or universal; however there has been little or no attempt to optimize a spoken artificial language for automatic speech recognition. Therefore the main goal of our research is constructed on the basis of two sub goals. Firstly the language should be learnable by the user and secondly, the language should be optimized for efficient automatic speech recognition by a machine or a robot.
As a first step of our research an overview of artificial languages has been carried out , to ascertain what we could learn from existing Artificial Languages, especially in reference to what could be easier to learn for humans. The overview was carried out across two aspects, namely morphology or grammar and phonology. Various encyclopedias  define the major properties of a language of which morphology and phonology are two key aspects.
In summary, there are two major approaches of morphological design amongst artificial languages: The first is to have very few grammatical markings, leaving it to the interpretation of the speakers and the context, and secondly, some languages have inflections but their grammatical rules are consistent across all words. The question that emerges is that which grammar type would be easier to learn and which would be less ambiguous. Shown below (see Table 1) are the two main approaches of grammar design amongst artificial languages. Grammar-I has very few grammatical markings (as indicated by the `X') and therefore interpretation is determined by the speakers, the context or the word order. Grammar-II has inflections but the grammatical rules are consistent across all words within each category.
|Grammatical Category||Grammar I||Grammar II|
|Case||X||Basic Levels: for e.g. Possessive, Nominative|
|Numbering||First, Second, Third||Singular, Plural|
|Person References||X||First, Second, Third|
|Tense||X||Past, Present, Future|
|Polarity||Positive, Negative||Positive, Negative|
|Definiteness (Articles)||X||Definite (the), Indefinite (a)|
Another important metric upon which artificial languages could be overviewed was determined to be the domain of Phonetics. Deciding the manner in which to use phonemes as part of our phonological classification of artificial language was an important decision. Extending from our research goal of designing an interaction language that is easy to learn for humans, we extracted a set of the most common phonemes/segments present in the major languages of the world, based on the number of speakers of a language as indicated in the Ethnologue . The overview utilized the UCLA Phonological Segment Inventory Database (UPSID), see  and . The database provides a large inventory of all the existing phonemes of 451 different languages of the world. The number of phonemes documented in the database amount to 919. What we were seeking for was a list of phonemes found in only the major languages of the world. This resulted in a net total of 23 phonemes . Interesting trends were observed; certain dental consonants were not found in any of the artificial languages that we overviewed. One reason why this might have occurred is that most artificial languages stem from Germanic or Western languages, whereas the dental consonants are found in Indic or Asian languages only. Trends that have been observed in natural languages with regards to the most common consonants (for e.g. `m', `p') were to some extent replicated for the case of artificial languages. The mirroring effect between natural and artificial languages extended to vowels as well. The vowels `a', `o' and `u' were found in all the artificial languages that we classified and the vowels `i' and `e' were absent in only one artificial language.
We have presented a morphological overview of artificial languages where, two primary grammar types were discussed. In the future, we aim to evaluate which of the mentioned grammar types will be easier to learn for our intended artificial language and which will be less ambiguous, using methods as advocated in . Moreover, our phonological overview has revealed a set of phonemes that might be desirable to include in the artificial language to render it conducive for human learnability. However for both aspects of morphology and phonology what also needs to be determined is how both could contribute to improve speech recognition. For example unique phonemes that have less confusion amongst them would be easier to recognize . Similarly, selecting a particular grammar type could also influence the quality of speech recognition, and we aim to determine this in the future. Another interesting variable is word length, as shorter words tend to be confused with each other in automatic speech recognition. Therefore we also aim to evaluate the role the length of a word could play in improving the quality of speech recognition.
As a first step in the design process we aim to inherit the vocabulary set or word concepts of the simple artificial language Toki Pona . It has 118 word concepts and sufficiently caters for the needs of a simple language. We aim to adapt the pronunciation of the words of Toki Pona based on the requirements of word length and phonetic information. For example, given that Toki Pona is a simple language it has some words which are very short; of course to be easier to learn for humans. However to assist speech recognition, some of its words will need to be elongated based on a specific methodology, which will attempt to improve the phonetic discernability of words and also be scalable to allow the generation of new words. Additionally, we aim to start the design from Grammar Type II and gradually remove grammatical markings and rules to move towards Grammar Type I, an evolutionary trend that can also be noticed in natural languages .
In the future, we shall also investigate speech recognition engines to ascertain "the exact criteria that makes speech easy for machines to recognize". As a combination of our initial endeavors we will then move towards designing the "Artificial Robotic Interaction Language". Our intention is to carry out future research in the form of several cycles as a spiral model. Each cycle typically would have four phases: requirements, design, implementation and evaluation.
Contribution to HCI. We intend to deploy our interaction language within the domain of robotics, however our proposed interaction language does not necessarily have to be restricted to robots only, but it could be applied to any behavioral products that employ speech interaction. Moreover, HCI is moving towards the domain of Ambient Intelligence, where technology is invisible in the background but there are still objects that mediate/interact, for e.g. robots. Therefore the design of an Artificial Robotic Interaction Language would lie in the heart of next generation HCI.
This is a pre-print version | last updated February 2, 2010 | All Publications