Socially interactive robots, especially those designed for entertainment and companionship, must be able to hold conversations with users that feel natural and engaging for humans. Two important components of such conversations include adherence to the topic of conversation and inclusion of affective expressions.Most previous approaches have concentrated on topic detection or sentiment analysis alone, and approaches that attempt to address both are limited by domain and by type of reply.
This thesis presents a new approach, implemented on a humanoid robot interface, that detects the topic and sentiment of a user’s utterances from text-transcribed speech. It also generates domain-independent, topically relevant verbal replies and appropriate positive and negative emotional expressions in real time.
The front end of the system is a smartphone app that functions as the robot’s face. It displays emotionally expressive eyes, transcribes verbal input as text, and synthesizes spoken replies. The back end of the system is implemented on the robot’s onboard computer. It connects with the app via Bluetooth, receives and processes the transcribed input, and returns verbal replies and sentiment scores.
The back end consists of a topic-detection subsystem and a sentiment-analysis subsystem. The topic-detection subsystem uses a Latent Semantic Indexing model of a conversation corpus, followed by a search in the online database ConceptNet 5, in order to generate a topically relevant reply. The sentiment-analysis subsystem disambiguates the input words, obtains their sentiment scores from SentiWordNet, and returns the averaged sum of the scores as the overall sentiment score.
The system was hypothesized to engage users more with both subsystems working together than either subsystem alone, and each subsystem alone was hypothesized to engage users more than a random control. In computational evaluations, each subsystem performed weakly but positively.
In user evaluations, users reported a higher level of topical relevance and emotional appropriateness in conversations in which the subsystems were working together, and they reported higher engagement especially in conversations in which the topic-detection system was working. It is concluded that the system partially fulfills its goals, and suggestions for future work are presented.
Sentiment Analysis is a natural language processing sub-field that is concerned with extracting opinions or sentiments from text data, for example from consumer reviews of movies or products, from articles about politics, or even from microblogging posts on Twitter. Companies and political parties especially are interested in aggregating sentiment data about their products and images in order to understand the effects of their branding and develop new strategies. In addition, consumers benefit from a high-level view of general opinion about products and services that they may be interested in
The field of Human-Robot Interaction is concerned with improving the ability of robots to interact with humans in ways that humans find natural. In Fong, et al.’s review, this problem is broken up into design approaches for embodiment, emotion, dialogue, personality, human-oriented perception, user modeling, socially situated learning, and intentionality.
OVERVIEW OF PROJECT RESOURCES
ConceptNet 5 is the latest version of a large, open-source database of common-sense knowledge, currently developed and maintained by Luminoso Technologies, Inc., in collaboration with the MIT Media Lab. The original ConceptNet was founded in the Media Lab as a crowd-sourced data gathering project called the Open Mind Common Sense project. Over the course of several versions, it has become a vast, multilingual project containing knowledge contributed by various databases, dictionaries, online games, and human users from around the world.
SentiWordNet 3.0 is the latest version of an opinion-mining resource based on WordNet 3.0, developed by a team at the Instituto di Scienza e Tecnologie dell’Informazione . This resource maps a large number of WordNet synsets to sentiment scores indicating their positivity, negativity, and objectivity. That is, if a WordNet synset is included in SentiWordNet, it is assigned a tuple (pos,neg,obj) such that pos indicates the synset’s level of positivity, neg indicates its negativity, and obj indicates its sentiment-neutrality, and such that pos+ neg + obj = 1. For example, the score tuple for the synset “love.n.01” is (0.625, 0.0, 0.375), and the score tuple for the synset “pain.n.01” is (0.0, 0.75, 0.25).
The SMILE App:
The front end of the system is an Android app called the SMartphone Intuitive Likeness and Expression App, or the SMILE App. This app was developed in Java by students of the H.E.I.R. Lab as the emotional and conversational interface for their humanoid robot design, the MU-L8 robot. A smartphone running the app is positioned on the front of the robot’s head, and the app then acts as the robot’s face.
Back End System:
The system installed on the robot’s onboard computer is implemented in Python, and it consists of a text processing layer, the topic-detection subsystem, and the sentiment-analysis subsystem. These subsystems work together to analyze text-transcribed user speech input and to generate appropriate and relevant robot reactions.
The goal of this system is to maintain a topically relevant, emotionally appropriate, and engaging conversation with the user in real time. Thus, outputs from the system must be generated in minimal processing time, and any text processing of the user’s utterances should be usable by both subsystems.
In order to perform computational evaluations on both subsystems, a set of conversational utterances was required for use as a test set. For this purpose, the Fisher English Training Transcripts corpus was split into a Training Set and a Test Set of conversation documents. Approximately 5% of the documents were randomly selected as the Test Set; any of these documents that lacked a topic prompt label were returned to the Training Set in order to ease the evaluation process for the topic-detection subsystem.
The system was also tested in its target scenario, by having actual conversations with human users. There were three variables of interest in the perceptual evaluation:
- User perception of the to pical relevance of the robot’s replies.
- User perception of the emotional a p pro priateness of the robot’s expressions.
- User engagement during conversations.
After conducting evaluations and receiving user feedback, it became clear that several improvements to the system are needed. First, the speech-to-text transcription module should be adapted to facilitate a smoother conversational flow. Second, the robot’s “Happy” and “Sad” expressions should be redesigned to denote happiness more actively and sadness less dramatically, in order to even out the perceived emotional distance between these two expressions and the “Neutral” expression.
This would constitute a useful user model in the case of future conversations with the same user; for example, the robot could introduce topics related to those known to be interesting to the user, and possibly infer the user’s opinions on analagous topics. The robot could even build a model of its own opinions by aggregating the opinions it has heard before and deciding whether to agree or disagree with the user based on the sentiments previously expressed. A robot with its own opinions to discover and relate to could make a much more engaging conversational partner than one without any self-knowledge.
This project attempted to improve user engagement in a conversational entertainment robot’s interface by implementing a back-end system consisting of basic topic-detection and sentiment-analysis subsystems. These subsystems used machine learning and lexical techniques to analyze the topical and sentiment content of user utterances, then to generate relevant verbal replies and appropriate emotional expressions as the robot’s reactions. These reactions were generated with the intent to express the robot’s personality and continue the conversation with the user in real time.
In evaluations, the system overall performed positively, although in some areas not as strongly as expected. The sentiment-analysis subsystem in particular needs improvement in order to be more effective at engaging users. General user feedback was collected and reviewed, and these comments were used to determine the direction that this research should take next. Further and more complex conversational systems are planned for the MU-L8 robot using this system as a foundation.
Source: Marquette University
Author: Elise Russell