Research
My PhD research was aimed at improving grounded language learning for robotics - that is, enabling robots to connect natural language with knowledge either stored as facts, or obtained via sensory perception, in order to meaningfully interact with humans in natural language.
The work is under the umbrella of the Building-Wide Intelligence project at UT Austin.
Refereed Conference and Journal Publications
Dialog Policy Learning for Joint Clarification and Active Learning Queries
Aishwarya Padmakumar and Raymond J. Mooney.
To Appear in Proceedings of the 35th AAAI Conference on Artifical Intelligence (AAAI 2021).
[Abstract]
[PDF]
[Bibtex]
Intelligent systems need to be able to recover from mistakes, resolve uncertainty, and adapt to novel concepts not seen during training. Dialog interaction can enable this by the use of clarifications for correction and resolving uncertainty, and active learning queries to learn new concepts encountered during operation. Prior work on dialog systems has either focused on exclusively learning how to perform clarification/ information seeking, or to perform active learning. In this work, we train a hierarchical dialog policy to jointly perform both clarification and active learning in the context of an interactive language-based image retrieval task motivated by an on-line shopping application, and demonstrate that jointly learning dialog policies for clarification and active learning is more effective than the use of static dialog policies for one or both of these functions.
@inproceedings{padmakumar:2020,
title={Dialog Policy Learning for Joint Clarification and Active Learning Queries},
author={Aishwarya Padmakumar and Raymond J. Mooney},
booktitle={Proceedings of the 35th AAAI Conference on Artifical Intelligence (AAAI 2021)}
year={2021}
}
Dialog as a Vehicle for Lifelong Learning
Aishwarya Padmakumar and Raymond J. Mooney.
Position Paper Track at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDial 2.0), July 2020.
[Abstract]
[PDF]
[Slides]
[Talk Video]
[Bibtex]
Dialog systems research has primarily been focused around two main types of applications – task-oriented dialog systems that learn to use clarification to aid in understanding a goal, and open-ended dialog systems that are expected to carry out unconstrained “chit chat” conversations. However, dialog interactions can also be used to obtain various types of knowledge that can be used to improve an underlying language understanding system, or other machine learning systems that the dialog acts over. In this position paper, we present the problem of designing dialog systems that enable lifelong learning as an important challenge problem, in particular for applications involving physically situated robots. We include examples of prior work in this direction, and discuss challenges that remain to be addressed.
@inproceedings{padmakumar:robodial20,
title={Dialog as a Vehicle for Lifelong Learning},
author={Aishwarya Padmakumar and Raymond J. Mooney},
booktitle={Position Paper Track at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDial 2.0)},
month={July},
url="https://arxiv.org/abs/2006.14767",
year={2020}
}
Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog
Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney.
To Appear in The Journal of Artificial Intelligence Research (JAIR), Vol. 67 (2020).
[Abstract]
[PDF]
[Bibtex]
Humans use natural language to articulate their thoughts and intentions to other people, making it a natural channel for human-robot communication. Natural language understanding in robots needs to be robust to a wide-range of both human speakers and environments. In this work, we present methods for parsing natural language to underlying meanings and using robotic sensors to create multi-modal models of perceptual concepts. Through dialog, robots should learn new language constructions and perceptual concepts as they are used in context. We develop an agent for jointly improving parsing and perception in simulation through human-robot dialog, and demonstrate this agent on a robotic platform. Dialog clarification questions are used both to understand commands and to generate additional parsing training data. The agent improves its perceptual concept models through questions about how words relate to objects. We evaluate this agent on Amazon Mechanical Turk. After training on induced data from conversations, the agent can reduce the number of clarification questions asked while receiving higher usability ratings. Additionally, we demonstrate the agent on a robotic platform, where it learns new concepts on the fly while completing a real-world task.
@inproceedings{thomason:jair20,
title={Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog},
author={Jesse Thomason and Aishwarya Padmakumar and Jivko Sinapov and Nick Walker and Yuqian Jiang and Harel Yedidsion and Justin Hart and Peter Stone and Raymond J. Mooney},
booktitle={The Journal of Artificial Intelligence Research (JAIR)},
month={January},
volume={67},
year={2020}
}
Improving Grounded Natural Language Understanding through Human-Robot Dialog
Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond J. Mooney.
International Conference on Robotics and Automation (ICRA), 2019.
Also presented at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDIAL), 2018.
Also presented at the RSS Workshop on Models and Representations for Natural Human-Robot Communication (MRHRC), 2018.
[Abstract]
[PDF]
[Bibtex]
[RoboDial PDF]
[RoboDial Video]
[MRHRC PDF]
[MRHRC Poster]
[MRHRC Demo]
Natural language understanding for robotics can require substantial domain- and platform-specific engineering. For example, for mobile robots to pick-and-place objects in an environment to satisfy human commands, we can specify the language humans use to issue such commands, and connect concept words like red can to physical object properties. One way to alleviate this engineering for a new domain is to enable robots in human environments to adapt dynamically---continually learning new language constructions and perceptual concepts. In this work, we present an end-to-end pipeline for translating natural language commands to discrete robot actions, and use clarification dialogs to jointly improve language parsing and concept grounding. We train and evaluate this agent in a virtual setting on Amazon Mechanical Turk, and we transfer the learned agent to a physical robot platform to demonstrate it in the real world.
@inproceedings{thomason:icra19,
title={Improving Grounded Natural Language Understanding through Human-Robot Dialog},
author={Jesse Thomason and Aishwarya Padmakumar and Jivko Sinapov and Nick Walker and Yuqian Jiang and Harel Yedidsion and Justin Hart and Peter Stone and Raymond J. Mooney},
booktitle={International Conference on Robotics and Automation (ICRA)},
month={October},
url="https://arxiv.org/abs/1903.00122",
year={2019}
}
Learning a Policy for Opportunistic Active Learning
Aishwarya Padmakumar, Peter Stone, and Raymond J. Mooney.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-18), Brussels, Belgium, November 2018.
[Abstract]
[PDF]
[Bibtex]
[Code]
Active learning identifies data points to label that are expected to be the most useful in improving a supervised model. Opportunistic active learning incorporates active learning into interactive tasks that constrain possible queries during interactions. Prior work has shown that opportunistic active learning can be used to improve grounding of natural language descriptions in an interactive object retrieval task. In this work, we use reinforcement learning for such an object retrieval task, to learn a policy that effectively trades off task completion with model improvement that would benefit future tasks.
@inproceedings{padmakumar:emnlp18,
title={Learning a Policy for Opportunistic Active Learning},
author={Aishwarya Padmakumar and Peter Stone and Raymond J. Mooney},
booktitle={In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-18)},
month={November},
address={Brussels, Belgium},
year={2018}
}
Opportunistic Active Learning for Grounding Natural Language Descriptions
Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Justin Hart, Peter Stone, and Raymond J. Mooney.
In Proceedings of the 1st Annual Conference on Robot Learning (CoRL-17), Mountain View, California, November 2017.
[Abstract]
[PDF]
[Bibtex]
[Poster]
[Demo]
[Code]
Active learning identifies data points from a pool of unlabeled examples whose labels, if made available, are most likely to improve the predictions of a supervised model. Most research on active learning assumes that an agent has access to the entire pool of unlabeled data and can ask for labels of any data points during an initial training phase. However, when incorporated in a larger task, an agent may only be able to query some subset of the unlabeled pool. An agent can also opportunistically query for labels that may be useful in the future, even if they are not immediately relevant. In this paper, we demonstrate that this type of opportunistic active learning can improve performance in grounding natural language descriptions of everyday objects---an important skill for home and office robots. We find, with a real robot in an object identification setting, that inquisitive behavior---asking users important questions about the meanings of words that may be off-topic for the current dialog---leads to identifying the correct object more often over time.
@article{thomason:corl17,
title={Opportunistic Active Learning for Grounding Natural Language Descriptions},
author={Jesse Thomason and Aishwarya Padmakumar and Jivko Sinapov and Justin Hart and Peter Stone and Raymond J. Mooney},
booktitle={In Proceedings of the 1st Annual Conference on Robot Learning (CoRL-17)},
month={November},
editor={Sergey Levine and Vincent Vanhoucke and Ken Goldberg},
address={Mountain View, California},
publisher={PMLR},
pages={67--76},
pdf = {http://proceedings.mlr.press/v78/thomason17a/thomason17a.pdf},
url="http://proceedings.mlr.press/v78/thomason17a.html",
year={2017}
}
Integrated Learning of Dialog Strategies and Semantic Parsing
Aishwarya Padmakumar and Jesse Thomason and Raymond J. Mooney
In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2017), pp. 547--557, Valencia, Spain, April 2017.
[Abstract]
[PDF]
[Bibtex]
Natural language understanding and dialog management are two integral components of interactive dialog systems. Previous research has used machine learning techniques to individually optimize these components, with different forms of direct and indirect supervision. We present an approach to integrate the learning of both a dialog strategy using reinforcement learning, and a semantic parser for robust natural language understanding, using only natural dialog interaction for supervision. Experimental results on a simulated task of robot instruction demonstrate that joint learning of both components improves dialog performance over learning either of these components alone.
@inproceedings{padmakumar:eacl17,
title={Integrated Learning of Dialog Strategies and Semantic Parsing},
author={Aishwarya Padmakumar and Jesse Thomason and Raymond J. Mooney},
booktitle={Proceedings of the 15th Conference of the European Chapter of the Association
for Computational Linguistics (EACL 2017)},
month={April},
address={Valencia, Spain},
pages={547--557},
url="http://www.cs.utexas.edu/users/ai-lab/pub-view.php?PubID=127615",
year={2017}
}
Automated Linguistic Personalization of Targeted Marketing Messages Mining User-Generated Text on
Social Media
Rishiraj Saha Roy, Aishwarya Padmakumar, Guna Prasad Jeganathan and Ponnurangam Kumaraguru
In proceedings of the 16th International Conference on Intelligent Text Processing and Computational
Linguistics (CICLing 15) (Best Paper Award)
[Abstract]
[PDF]
[Bibtex]
Personalizing marketing messages for specific audience segments is vital for increasing user engagement with advertisements, but it becomes very resource-intensive when the marketer has to deal with multiple segments, products or campaigns. In this research, we take the first steps towards automating message personalization by algorithmically inserting adjectives and adverbs that have been found to evoke positive sentiment in specific audience segments, into basic versions of ad messages. First, we build language models representative of linguistic styles from user-generated textual content on social media for each segment. Next, we mine product-specific adjectives and adverbs from content associated with positive sentiment. Finally, we insert extracted words into the basic version using the language models to enrich the message for each target segment, after statistically checking in-context readability. Decreased cross-entropy values from the basic to the transformed messages show that we are able to approach the linguistic style of the target segments. Crowdsourced experiments verify that our personalized messages are almost indistinguishable from similar human compositions. Social network data processed for this research has been made publicly available for community use.
@inproceedings {roy:cicling15,
title={Automated Linguistic Personalization of Targeted Marketing Messages Mining User-Generated Text
on Social Media},
author={Rishiraj Saha Roy and Aishwarya Padmakumar and Guna Prasad Jeganathan and Ponnurangam
Kumaraguru},
booktitle={Proceedings of the 16th Conference of International Conference on Intelligent Text Processing
and Computational Linguistics (CICLing 15)},
month={April},
address={Cairo, Egypt},
url="https://link.springer.com/chapter/10.1007%2F978-3-319-18117-2_16",
year={2015}
}
Workshop Publications; Thesis Work
Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems
Aishwarya Padmakumar
PhD Thesis, Department of Computer Science, The University of Texas At Austin. October 2018.
[Abstract]
[PDF]
[Slides]
[Bibtex]
The ability to understand and communicate in natural language can make robots much more accessible for naive users. Environments such as homes and offices contain many objects that humans describe in diverse language referencing perceptual properties. Robots operating in such environments need to be able to understand such descriptions. Different types of dialog interactions with humans can help robots clarify their understanding to reduce mistakes, and also improve their language understanding models, or adapt them to the specific domain of operation.
We present completed work on jointly learning a dialog policy that enables a robot to clarify partially understood natural language commands, while simultaneously using the dialogs to improve the underlying semantic parser for future commands. We introduce the setting of opportunistic active learning - a framework for interactive tasks that use supervised models. This framework allows a robot to ask diverse, potentially off-topic queries across interactions, requiring the robot to trade-off between task completion and knowledge acquisition for future tasks. We also attempt to learn a dialog policy in this framework using reinforcement learning.
We propose a novel distributional model for perceptual grounding, based on learning a joint space for vector representations from multiple modalities. We also propose a method for identifying more informative clarification questions that can scale well to a larger space of objects, and wish to learn a dialog policy that would make use of such clarifications.
@misc{padmakumar:thesis18,
title={Dialog as a Vehicle for Lifelong Learning of Grounded Language Understanding Systems},
author={Aishwarya Padmakumar},
month={August},
url="http://www.cs.utexas.edu/users/ml/papers/padmakumar.thesis20.pdf",
year={2020}
}
Improved Models and Queries for Grounded Human-Robot Dialog
Aishwarya Padmakumar
PhD Proposal, Department of Computer Science, The University of Texas At Austin. October 2018.
[Abstract]
[PDF]
[Slides]
[Bibtex]
The ability to understand and communicate in natural language can make robots much more accessible for naive users. Environments such as homes and offices contain many objects that humans describe in diverse language referencing perceptual properties. Robots operating in such environments need to be able to understand such descriptions. Different types of dialog interactions with humans can help robots clarify their understanding to reduce mistakes, and also improve their language understanding models, or adapt them to the specific domain of operation.
We present completed work on jointly learning a dialog policy that enables a robot to clarify partially understood natural language commands, while simultaneously using the dialogs to improve the underlying semantic parser for future commands. We introduce the setting of opportunistic active learning - a framework for interactive tasks that use supervised models. This framework allows a robot to ask diverse, potentially off-topic queries across interactions, requiring the robot to trade-off between task completion and knowledge acquisition for future tasks. We also attempt to learn a dialog policy in this framework using reinforcement learning.
We propose a novel distributional model for perceptual grounding, based on learning a joint space for vector representations from multiple modalities. We also propose a method for identifying more informative clarification questions that can scale well to a larger space of objects, and wish to learn a dialog policy that would make use of such clarifications.
@misc{padmakumar:proposal18,
title={Improved Models and Queries for Grounded Human-Robot Dialog},
author={Aishwarya Padmakumar},
month={October},
url="http://www.cs.utexas.edu/users/ml/papers/padmakumar.proposal18.pdf",
year={2018}
}
Interaction and Autonomy in RoboCup@Home and Building-Wide Intelligence
Justin Hart, Harel Yedidsion, Yuqian Jiang, Nick Walker, Rishi Shah, Jesse Thomason, Aishwarya Padmakumar, Rolando Fernandez, Jivko Sinapov, Raymond Mooney and Peter Stone.
In Artificial Intelligence (AI) for Human-Robot Interaction (HRI) symposium, AAAI Fall Symposium Series, Arlington, Virginia, October 2018.
[Abstract]
[PDF]
[Bibtex]
Efforts are underway at UT Austin to build autonomous robot systems that address the challenges of long-term deployments in office environments and of the more prescribed domestic service tasks of the RoboCup@Home competition. We discuss the contrasts and synergies of these efforts, highlighting how our work to build a RoboCup@Home Domestic Standard Platform League entry led us to identify an integrated software architecture that could support both projects. Further, naturalistic deployments of our office robot platform as part of the Building-Wide Intelligence project have led us to identify and research new problems in a traditional laboratory setting.
@inproceedings{hari:fss18,
title={Interaction and Autonomy in RoboCup@Home and Building-Wide Intelligence},
author={Justin Hart and Harel Yedidsion and Yuqian Jiang and Nick Walker and Rishi Shah and Jesse Thomason and Aishwarya Padmakumar and Rolando Fernandez and Jivko Sinapov and Raymond Mooney and Peter Stone},
booktitle={Artificial Intelligence (AI) for Human-Robot Interaction (HRI) symposium, AAAI Fall Symposium Series},
month={October},
address={Arlington, Virginia},
url="http://www.cs.utexas.edu/users/ai-lab/pub-view.php?PubID=127724",
year={2018}
}
As a part of the Deep Learning Seminar course, we had to train a deep neural network to predict the year in which a yearbook photograph was taken. We fine-tuned VGGNet to do this both using standard cross-entropy loss (classification loss) and a linear combination of cross-entropy and L1 loss. We also attempted to visualize pixels most relevant for classification and the network's view of each class.
Unsupervised Text Summarization Using Sentence Embeddings
[Report]
The huge improvements in the performance of various NLP tasks using word, and more recently, sentence embeddings prompts one to attack any problem that has not yet been shown to benefit using these methods. In our class project for NLP, we attempted to perform text summarization by clustering of sentence vectors. We experimented with different choices of embeddings and different techniques for selecting a cluster representative. We outperformed a simple baseline but could not match state-of-the-art performance.
Visual Question Answering Using Natural Language Object Retrieval and Saliency Cues
[Report]
Visual Question Answering is the task where a machine is given an image, and a question in natural language based on it, and is expected to answer the question in natural language. A simple neural baseline for this task uses a CNN to encode the image, an LSTM to encode the question and uses weights over these vectors to perform a classification over the 1000 most common answers. We extended this baseline to take in two additional cues - one was a bounding box retrieved when the question is passed as a query to a natural language object retrieval pipeline. The second was a region in the image that humans would typically find salient, and hence would be likely to ask questions about. We improved performance over the baseline but could not meet the state-of-the-art.
Modelling Cooking Tutorials Using HMMs
[Report]
If we want robots to directly read a tutorial from the web and follow the instructions, it must be able to abstract away the nuances of language in each individual tutorial and store the instructions in a common format. We wished to discover whether it was possible learn such a format in an unsupervised manner given only tutorials corresponding to the same task. To this end, we attempted to use Hidden Markov Models to model cooking recipes. We experimented with different possible language models to act as observation probabilities but could not identify a good model.
Improving Aggregate Diversity in Recommender Systems
[Thesis]
Over the last few years, the focus of research in Recommender Systems has shifted from simply predicting ratings accurately to more holistic metrics that examine other aspects of the recommendation process. For my senior thesis under Dr. B. Ravindran, I attempted to improve the performance of recommender systems on their aggregate diversity, which examines the overall number of items in the inventory that the system manages to recommend and the relative number of times different itms get recommended. I first examined this problem as a course project for my data mining course, when I developed a heuristic solution to the problem that improved aggregate diversity by exploiting additional availabel context information such as user demographics. In my senior thesis, I identified the deficiencies of metrics that would intuitively be chosen to measure the aggregate diversity of a recommender system, attempted to identify a metric that overcame these defects and develop algorithms to optimize the same.
Reinforcement Learning for Coreference Resolution
[Report]
For the project in our Reinforcement Learning course, we attempted to use Reinforcement Learning techniques to exploit structural information when performing coreference resolution, as opposed to modelling it as a classification task. The coreference chain identified so far can be seen as a partial clustering. Clustering evaluation mechanisms can then be used to generate a reward which could be used by a Reinforcement Learning algorithm to learn generalizable rules for coreference resolution and can be interpreted as a metric learning task. We compared the performance of a number of standrd RL algorithms and were able to improve precision on the task over the baseline.
Diversity in Text Summarization Using LSA
[Report]
As a part of the Natural Language Processing course, we tried to improve upon techniques that make use of Latent Semantic Analysis for text summarization. This was done by attempting to use standard relevance scores, that determine the importance of a sentence to the summary of the document obtained using LSA, discounted by the similarity of the sentence in question to already selected sentences, much like what Maximal Marginal Relevance does. We also looked at different vector spaces in which sentences could be represented such as binary word vectors, TF.IDF. word vectors and term frequency word vectors weighted by unigram probabilities of words.
Identifying Points for Active Learning in an Induced Hypergraph
[Report]
This was my project for the social network analysis course. We were exploring how a graphical representation of a dataset could be exploited to identify useful points for active learning. In active learning, the system identifies difficult data points, which it requests a human to label. We created an induced hypergraph for a normal relational database by grouping points into hyperedges based on attribute values. We hypothesized that influencers in this structure would make good choices for training points as label propagation techniques could then be used to learn other labels from them and experimented with different techniques of identifying influencers and propagation.
A different solution to a classic problem - we designed a spell checker that makes use of a modified version of Jaccard similarity of character bigrams of words, rather than edit distance to rank corrections. The system can also make use of additional context when surrounding words in the text are provided.
Wait-freedom is the strongest non-blocking guarantee of progress in a concurrent system. During our concurrent programming course, we tried to design a concurrent binary search tree that provides this guarantee. The main challenge in this was to identify all possible interleavings of operations from different threads that could result in a loss of correctness and incorporate a preventive for attaining it by a mechanism that did not involve waiting. We ended up with a design that, as far as we could make out, was theoretically wait-free except for a periodic cleanup operation, but spawned a very large number of threads to create a monitoring mechanism.