I developed a joint inference approach for extracting semantic representations of events from text. It can effectively identifies not only the anchors of events but also the related entities (participants, times, and places) and their semantic roles. By making joint predictions about these elements across a document context, my approach significantly outperforms the state-of-the-art event extraction approaches [NAACL2016].
I developed a novel Bayesian clustering model for event coreference resolution both within a document and across multiple documents. It successfully combines linguistic intuitions with the advances of Bayesian statistics, resulting in significant improvement over the existing clustering approaches [TACL2015].
I developed a scalable neural network model for learning continuous representations of entities and relations in large knowledge bases like Freebase. I showed that the learned representations lead to accurate prediction of unseen relations [LS2014]. Moreover, I showed a novel use of the learned representations in mining horn clauses (e.g., BornInCity(a,b) and CityInCountry(b,c) => Nationality(a,c)) from data [ICLR2015].
I developed structured learning and inference approaches for extracting semantic representations of opinions from text. Unlike existing work, they allow joint interpretation of opinion attributes (i.e., polarity and intensity), the holders (who is giving the opinion), and the targets (what the opinion is about), and produce state-of-the-art performance on the fine-grained opinion extraction tasks [ACL2013, EMNLP2012, TACL2014].
I was the lead developer (from 2007-2010) of MobileMiner, a data mining system for data analytics in mobile communication. It includes a wide range of data mining algorithms and supports applications such as social community analysis, churn prediction, and traffic pattern mining [SIGMOD2009, PAKDD2008]. I implemented the distributed versions of the algorithms in Hadoop and deployed them in China Mobile's Cloud computing platform to processing multi-terabyte data.