research-article

Jointly Learning Word Embeddings and Latent Topics

Authors:

Steven Schockaert, and

Kwun Ping LaiAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

Pages 375 - 384

https://doi.org/10.1145/3077136.3080806

Published: 07 August 2017 Publication History

Abstract

Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view, looking at the word distributions across the corpus to assign a topic to each word occurrence. These two paradigms are complementary in how they represent the meaning of word occurrences. While some previous works have already looked at using word embeddings for improving the quality of latent topics, and conversely, at using latent topics for improving word embeddings, such "two-step'' methods cannot capture the mutual interaction between the two paradigms. In this paper, we propose STE, a framework which can learn word embeddings and latent topics in a unified manner. STE naturally obtains topic-specific word embeddings, and thus addresses the issue of polysemy. At the same time, it also learns the term distributions of the topics, and the topic distributions of the documents. Our experimental results demonstrate that the STE model can indeed generate useful topic-specific word embeddings and coherent latent topics in an effective and efficient way.

References

[1]

Ricardo Baeza-Yates, Berthier Ribeiro-Neto, and others. 1999. Modern information retrieval. Vol. Vol. 463.

Digital Library

[2]

Sergey Bartunov, Dmitry Kondrashkin, Anton Osokin, and Dmitry Vetrov. 2015. Breaking Sticks and Ambiguities with Adaptive Skip-gram. arXiv preprint arXiv:1502.07257 (2015).

[3]

Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends® in Machine Learning, Vol. 2, 1 (2009), 1--127.

Digital Library

[4]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research Vol. 3 (2003), 1137--1155.

Digital Library

[5]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research Vol. 3 (2003), 993--1022.

Digital Library

[6]

Jonathan Chang, Sean Gerrish, Chong Wang, Jordan L. Boyd-Graber, and David M Blei. 2009. Reading tea leaves: How humans interpret topic models Proceedings of NIPS. 288--296.

[7]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of ICML. 160--167.

Digital Library

[8]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research Vol. 12 (2011), 2493--2537.

Digital Library

[9]

Rajarshi Das, Manzil Zaheer, and Chris Dyer. 2015. Gaussian lda for topic models with word embeddings Proceedings of ACL. 795--804.

[10]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research Vol. 9 (2008), 1871--1874.

Digital Library

[11]

Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of SIGIR. 50--57.

Digital Library

[12]

Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving word representations via global context and multiple word prototypes Proceedings of ACL. 873--882.

Digital Library

[13]

Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Embeddings for word sense disambiguation: An evaluation study Proceedings of ACL. 897--907.

[14]

Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality. In Proceedings of EACL. 530--539.

[15]

Quoc V Le and Tomas Mikolov 2014. Distributed Representations of Sentences and Documents. Proceedings of ICML. 1188--1196.

[16]

Chenliang Li, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2016. Topic Modeling for Short Texts with Auxiliary Word Embeddings Proceedings of SIGIR. 165--174.

[17]

Shaohua Li, Tat-Seng Chua, Jun Zhu, and Chunyan Miao. 2016. Generative topic embedding: a continuous representation of documents Proceedings of ACL. 666--675.

[18]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2015. Learning context-sensitive word embeddings with neural tensor skip-gram model Proceedings of IJCAI. 1284--1290.

[19]

Yang Liu, Zhiyuan Liu, Tat-Seng Chua, and Maosong Sun. 2015. Topical Word Embeddings. In Proceedings of AAAI. 2418--2424.

[20]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9 (2008), 2579--2605.

[21]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality Advances in Neural Information Processing Systems. 3111--3119.

[22]

Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. Proceedings of NAACL. 746--751.

[23]

Frederic Morin and Yoshua Bengio. 2005. Hierarchical Probabilistic Neural Network Language Model. Proceedings of AISTATS (2005), 246.

[24]

Arvind Neelakantan, Jeevan Shankar, Alexandre Passos, and Andrew McCallum. 2015. Efficient non-parametric estimation of multiple embeddings per word in vector space Proceedings of EMNLP. 1059--1069.

[25]

David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. 2010. Automatic evaluation of topic coherence. In Proceedings of NAACL. 100--108.

[26]

Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. 2015. Improving Topic Models with Latent Feature Word Representations. TACL Vol. 3 (2015), 299--313.

[27]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of EMNLP. 1532--1543.

[28]

Joseph Reisinger and Raymond J. Mooney. 2010. Multi-prototype vector-space models of word meaning Proceedings of NAACL. 109--117.

[29]

Navid Rekabsaz. 2016. Enhancing Information Retrieval with Adapted Word Embedding Proceedings of SIGIR. 1169--1169.

[30]

Yafeng Ren, Yue Zhang, Meishan Zhang, and Donghong Ji. 2016. Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings. Proceedings of AAAI. 3038--3044.

[31]

Michael Roth and Mirella Lapata. 2016. Neural semantic role labeling with dependency path embeddings. (2016), 1192--1202.

[32]

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1988. Learning representations by back-propagating errors. Cognitive modeling, Vol. 5, 3 (1988), 1.

Digital Library

[33]

Bahar Salehi, Paul Cook, and Timothy Baldwin. 2015. A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions. Proceedings of NAACL. 977--983.

[34]

Cyrus Shaoul. 2010. The westbury lab wikipedia corpus. Edmonton, AB: University of Alberta (2010).

[35]

Richard Socher, Cliff C. Lin, Chris Manning, and Andrew Y. Ng. 2011. Parsing natural scenes and natural language with recursive neural networks Proceedings of ICML. 129--136.

Digital Library

[36]

Keith Stevens, Philip Kegelmeyer, David Andrzejewski, and David Buttler. 2012. Exploring topic coherence over many models and many topics Proceedings of EMNLP. 952--961.

[37]

Fei Tian, Hanjun Dai, Jiang Bian, Bin Gao, Rui Zhang, Enhong Chen, and Tie-Yan Liu. 2014. A Probabilistic Model for Learning Multi-Prototype Word Embeddings. Proceedings of COLING. 151--160.

[38]

Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning Proceedings of ACL. 384--394.

[39]

Peter D. Turney, Patrick Pantel, and others. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, Vol. 37, 1 (2010), 141--188.

[40]

Hanna M. Wallach. 2006. Topic modeling: beyond bag-of-words. In Proceedings of ICML. 977--984.

Digital Library

[41]

Xuerui Wang, Andrew McCallum, and Xing Wei. 2007. Topical n-grams: Phrase and topic discovery, with an application to information retrieval Proceedings of ICDM. 697--702.

Cited By

Zhao BYuan CHuang Y(2024)CoTE: A Flexible Method for Joint Learning of Topic and Embedding ModelsWeb and Big Data10.1007/978-981-97-2421-5_27(406-421)Online publication date: 12-May-2024
https://doi.org/10.1007/978-981-97-2421-5_27
Li WYang YSuzuki EFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Class-Specific Word Sense Aware Topic Modeling via Soft Orthogonalized TopicsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614809(1218-1227)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614809
Wei CZhao WChen LWang Y(2023)Contextualized Word Embeddings via Generative Adversarial Learning of Syntagmatic and Paradigmatic Structure2023 6th International Conference on Software Engineering and Computer Science (CSECS)10.1109/CSECS60003.2023.10428465(1-8)Online publication date: 22-Dec-2023
https://doi.org/10.1109/CSECS60003.2023.10428465
Show More Cited By

Index Terms

Jointly Learning Word Embeddings and Latent Topics
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Topic modeling
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Document topic models

Recommendations

Mining coherent topics in documents using word embeddings and large-scale text data

Probabilistic topic models have been extensively used to extract low-dimension aspects from document collections. However, such models without any human knowledge often generate topics that are not interpretable. Recently, a number of knowledge-based ...
Read More
Improving biterm topic model with word embeddings
Abstract
As one of the fundamental information extraction methods, topic model has been widely used in text clustering, information recommendation and other text analysis tasks. Conventional topic models mainly utilize word co-occurrence information in ...
Read More
Improving Word Embeddings for Antonym Detection Using Thesauri and SentiWordNet
Natural Language Processing and Chinese Computing
Abstract
Word embedding is a distributed representation of words in a vector space. It involves a mathematical embedding from a space with one dimension per word to a continuous vector space with much lower dimension. It performs well on tasks including ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Microsoft Research Asia Urban Informatics Grant
ERC Starting Grant
Research Grant Council of the Hong Kong Special Administrative Region China

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
947
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)2

Other Metrics

View Author Metrics

Citations

Cited By

Zhao BYuan CHuang Y(2024)CoTE: A Flexible Method for Joint Learning of Topic and Embedding ModelsWeb and Big Data10.1007/978-981-97-2421-5_27(406-421)Online publication date: 12-May-2024
https://doi.org/10.1007/978-981-97-2421-5_27
Li WYang YSuzuki EFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Class-Specific Word Sense Aware Topic Modeling via Soft Orthogonalized TopicsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614809(1218-1227)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614809
Wei CZhao WChen LWang Y(2023)Contextualized Word Embeddings via Generative Adversarial Learning of Syntagmatic and Paradigmatic Structure2023 6th International Conference on Software Engineering and Computer Science (CSECS)10.1109/CSECS60003.2023.10428465(1-8)Online publication date: 22-Dec-2023
https://doi.org/10.1109/CSECS60003.2023.10428465
Yin QZhong LSong YBai LWang ZLi CXu YYang X(2023)A decision support system in precision medicine: contrastive multimodal learning for patient stratificationAnnals of Operations Research10.1007/s10479-023-05545-6Online publication date: 29-Aug-2023
https://doi.org/10.1007/s10479-023-05545-6
Lei HLiu KChen Y(2023)Exclusive Topic ModelResearch Papers in Statistical Inference for Time Series and Related Models10.1007/978-981-99-0803-5_3(83-109)Online publication date: 1-Jun-2023
https://doi.org/10.1007/978-981-99-0803-5_3
Xu WEguchi K(2022)A supervised topic embedding model and its applicationPLOS ONE10.1371/journal.pone.027710417:11(e0277104)Online publication date: 4-Nov-2022
https://doi.org/10.1371/journal.pone.0277104
Keya KPapanikolaou YFoulds J(2022)Neural Embedding Allocation: Distributed Representations of Topic ModelsComputational Linguistics10.1162/coli_a_0045748:4(1021-1052)Online publication date: 1-Dec-2022
https://doi.org/10.1162/coli_a_00457
Wang WTang TXia FGong ZChen ZLiu H(2022)Collaborative Filtering With Network Representation Learning for Citation RecommendationIEEE Transactions on Big Data10.1109/TBDATA.2020.30349768:5(1233-1246)Online publication date: 1-Oct-2022
https://doi.org/10.1109/TBDATA.2020.3034976
Zhang JXu YMao KLi P(2022)Multi-scaled Topic Embedding for Text Classification2022 7th International Conference on Computational Intelligence and Applications (ICCIA)10.1109/ICCIA55271.2022.9828449(52-58)Online publication date: 24-Jun-2022
https://doi.org/10.1109/ICCIA55271.2022.9828449
Liu WHuang YGuo YWang YFang BLiao Q(2022)Topic Modeling for Short Texts Via Dual View Collaborate optimization2022 7th IEEE International Conference on Data Science in Cyberspace (DSC)10.1109/DSC55868.2022.00028(160-166)Online publication date: Jul-2022
https://doi.org/10.1109/DSC55868.2022.00028
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents