short-paper

Which Discriminator for Cooperative Text Generation?

Authors:

Antoine Chaffin,

Thomas Scialom,

Sylvain Lamprier,

Jacopo Staiano,

Benjamin Piwowarski,

Vincent ClaveauAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

Pages 2360 - 2365

https://doi.org/10.1145/3477495.3531858

Published: 07 July 2022 Publication History

Abstract

Language models generate texts by successively predicting probability distributions for next tokens given past ones. A growing field of interest tries to leverage external information in the decoding process so that the generated texts have desired properties, such as being more natural, non toxic, faithful, or having a specific writing style. A solution is to use a classifier at each generation step, resulting in a cooperative environment where the classifier guides the decoding of the language model distribution towards relevant texts for the task at hand. In this paper, we examine three families of (transformer-based) discriminators for this specific task of cooperative decoding: bidirectional, left-to-right and generative ones. We evaluate the pros and cons of these different types of discriminators for cooperative generation, exploring respective accuracy on classification tasks along with their impact on the resulting sample quality and computational performances. We also provide the code of a batched implementation of the powerful cooperative decoding strategy used for our experiments, the Monte Carlo Tree Search, working with each discriminator for Natural Language Generation.

Supplementary Material

MP4 File (SIGIR22-sp1823.mp4)

Five minutes presentation video of the "Which Discriminator for Cooperative Text Generation?" paper, quickly introducing cooperative decoding (including SOTA method MCTS) and the motivations of the study. Most important results are then presented before concluding with key takeaways.

Download
131.12 MB

References

[1]

Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, and Arthur Szlam. 2021. Residual Energy-Based Models for Text. Journal of Machine Learning Research 22, 40 (2021), 1--41. http://jmlr.org/papers/v22/20--326.html

[2]

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In FAccT '21: 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event / Toronto, Canada, March 3--10, 2021, Madeleine Clare Elish, William Isaac, and Richard S. Zemel (Eds.). ACM, 610--623. https://doi.org/10.1145/3442188.3445922

Digital Library

[3]

Antoine Chaffin, Vincent Claveau, and Ewa Kijak. 2021. PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided MCTS Decoding. CoRR abs/2109.13582 (2021). arXiv:2109.13582 https://arxiv.org/abs/2109.13582

[4]

Xingyuan Chen, Ping Cai, Peng Jin, Hongjun Wang, Xinyu Dai, and Jiajun Chen. 2020. Adding A Filter Based on The Discriminator to Improve Unconditional Text Generation. arXiv preprint arXiv:2004.02135 (2020).

[5]

Vincent Claveau. 2021. Neural text generation for query expansion in information retrieval. In WI-IAT 2021 - 20th IEEE/WIC/ACM International Conference onWeb Intelligence and Intelligent Agent Technology (Proceedings of the WI-IAT Conference). IEEE, Melbourne, Australia, 1--8. https://doi.org/10.1145/3486622.3493957

Digital Library

[6]

Rémi Coulom. 2006. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Computers and Games, 5th International Conference, CG 2006, Turin, Italy, May 29--31, 2006. Revised Papers (Lecture Notes in Computer Science, Vol. 4630), H. Jaap van den Herik, Paolo Ciancarini, and H. H. L. M. Donkers (Eds.). Springer, 72--83. https://doi.org/10.1007/978--3--540--75538--8_7

[7]

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2020. Plug and Play Language Models: A Simple Approach to Controlled Text Generation. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https://openreview.net/forum?id=H1edEyBKDS

[8]

Carnegie-Mellon University.Computer Science Dept. 2018. Speech understanding systems: summary of results of the five-year research effort at Carnegie-Mellon University. https://doi.org/10.1184/R1/6609821.v1

[9]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171--4186. https://doi.org/10.18653/v1/n19--1423

[10]

Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified Language Model Pretraining for Natural Language Understanding and Generation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8--14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d'Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 13042--13054. https://proceedings. neurips.cc/paper/2019/hash/c20bb2d9a50d5ac1f713f8b34d9aac5a-Abstract.html

[11]

Angela Fan, Mike Lewis, and Yann N. Dauphin. 2018. Hierarchical Neural Story Generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15--20, 2018, Volume 1: Long Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 889--898. https://doi.org/10.18653/v1/P18--1082

[12]

Saadia Gabriel, Antoine Bosselut, Jeff Da, Ari Holtzman, Jan Buys, Kyle Lo, Asli Celikyilmaz, and Yejin Choi. 2021. Discourse Understanding and Factual Consistency in Abstractive Summarization. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty (Eds.). Association for Computational Linguistics, 435--447. https://aclanthology.org/2021.eacl-main.34/

[13]

Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, and Noah A. Smith. 2020. RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16--20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 3356--3369. https://doi.org/10.18653/v1/2020.findings-emnlp.301

[14]

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C. Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139--144. https://doi.org/10.1145/3422622

Digital Library

[15]

Di He, Hanqing Lu, Yingce Xia, Tao Qin, Liwei Wang, and Tie-Yan Liu. 2017. Decoding with value networks for neural machine translation. Advances in Neural Information Processing Systems 30 (2017).

[16]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curious Case of Neural Text Degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26--30, 2020. OpenReview.net. https://openreview.net/forum?id=rygGQyrFvH

[17]

Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine Bosselut, David Golub, and Yejin Choi. 2018. Learning to Write with Cooperative Discriminators. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15--20, 2018, Volume 1: Long Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 1638--1649. https://doi.org/10.18653/v1/P18--1152

[18]

Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, and Richard Socher. 2019. CTRL: A Conditional Transformer Language Model for Controllable Generation. CoRR abs/1909.05858 (2019). arXiv:1909.05858 http://arxiv.org/abs/1909.05858

[19]

Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq R. Joty, Richard Socher, and Nazneen Fatema Rajani. 2021. GeDi: Generative Discriminator Guided Sequence Generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16--20 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 4929--4952. https://aclanthology.org/2021.findings-emnlp.424

[20]

Sylvain Lamprier, Thomas Scialom, Antoine Chaffin, Vincent Claveau, Ewa Kijak, Jacopo Staiano, and Benjamin Piwowarski. 2022. Generative Cooperative Networks for Natural Language Generation. CoRR abs/2201.12320 (2022). arXiv:2201.12320 https://arxiv.org/abs/2201.12320

[21]

Rémi Leblond, Jean-Baptiste Alayrac, Laurent Sifre, Miruna Pislar, Jean-Baptiste Lespiau, Ioannis Antonoglou, Karen Simonyan, and Oriol Vinyals. 2021. Machine Translation Decoding beyond Beam Search. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7--11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8410--8434. https://aclanthology.org/2021.emnlpmain.662

[22]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 9459--9474. https://proceedings.neurips.cc/paper/ 2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf

[23]

Ilya Loshchilov and Frank Hutter. 2019. DecoupledWeight Decay Regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6--9, 2019. OpenReview.net. https://openreview.net/forum?id=Bkg6RiCqY7

[24]

Agnès Mustar, Sylvain Lamprier, and Benjamin Piwowarski. 2022. On the Study of Transformers for Query Suggestion. ACM Trans. Inf. Syst. 40, 1 (2022), 18:1--18:27. https://doi.org/10.1145/3470562

Digital Library

[25]

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. 2021. WebGPT: Browserassisted question-answering with human feedback. CoRR abs/2112.09332 (2021). arXiv:2112.09332 https://arxiv.org/abs/2112.09332

[26]

Andrew Y. Ng and Michael I. Jordan. 2001. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, NIPS 2001, December 3--8, 2001, Vancouver, British Columbia, Canada], Thomas G. Dietterich, Suzanna Becker, and Zoubin Ghahramani (Eds.). MIT Press, 841--848. https://proceedings.neurips.cc/paper/2001/hash/7b7a53e239400a13bd6be6c91c4f6c4e-Abstract.html

Digital Library

[27]

Vishal Pallagani and Biplav Srivastava. 2021. A Generic Dialog Agent for Information Retrieval Based on Automated Planning Within a Reinforcement Learning Platform. Bridging the Gap Between AI Planning and Reinforcement Learning (PRL) (2021).

[28]

Ramakanth Pasunuru, Asli Celikyilmaz, Michel Galley, Chenyan Xiong, Yizhe Zhang, Mohit Bansal, and Jianfeng Gao. 2021. Data augmentation for abstractive query-focused multi-document summarization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2021). Online. 13666--13674.

[29]

Alec Radford, Jeff Wu, R. Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.

[30]

Zhou Ren, Xiaoyu Wang, Ning Zhang, Xutao Lv, and Li-Jia Li. 2017. Deep Reinforcement Learning-Based Image Captioning With Embedding Reward. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, and Jacopo Staiano. 2020. Discriminative Adversarial Search for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13--18 July 2020, Virtual Event (Proceedings of Machine Learning Research, Vol. 119). PMLR, 8555--8564. http://proceedings.mlr.press/v119/scialom20a.html

[32]

Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, and Jacopo Staiano. 2021. To Beam Or Not To Beam: That is a Question of Cooperation for Language GANs. Advances in neural information processing systems (2021).

[33]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354--359.

[34]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998--6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Digital Library

[35]

ThomasWolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020 - Demos, Online, November 16--20, 2020, Qun Liu and David Schlangen (Eds.). Association for Computational Linguistics, 38--45. https://doi.org/10.18653/v1/2020.emnlp-demos.6

[36]

Dani Yogatama, Chris Dyer, Wang Ling, and Phil Blunsom. 2017. Generative and Discriminative Text Classification with Recurrent Neural Networks. CoRR abs/1703.01898 (2017). arXiv:1703.01898 http://arxiv.org/abs/1703.01898

[37]

Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.

[38]

Ruifeng Yuan, Zili Wang, and Wenjie Li. 2021. Event Graph based Sentence Fusion. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4075--4084.

[39]

Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level Convolutional Networks for Text Classification. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7--12, 2015, Montreal, Quebec, Canada, Corinna Cortes, Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, and Roman Garnett (Eds.). 649--657. https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html

Digital Library

[40]

Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texygen: A Benchmarking Platform for Text Generation Models. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08--12, 2018, Kevyn Collins-Thompson, Qiaozhu Mei, Brian D. Davison, Yiqun Liu, and Emine Yilmaz (Eds.). ACM, 1097--1100. https://doi.org/10.1145/3209978.3210080

Digital Library

Cited By

Chen ZLiu Z(2023)Sentence-level heuristic tree search for long text generationComplex & Intelligent Systems10.1007/s40747-023-01244-810:2(3153-3167)Online publication date: 29-Sep-2023
https://doi.org/10.1007/s40747-023-01244-8

Index Terms

Which Discriminator for Cooperative Text Generation?

Recommendations

Performance evaluation of HARQ schemes for cooperative regenerative relaying
ICC'09: Proceedings of the 2009 IEEE international conference on Communications

Two Hybrid ARQ (HARQ) schemes based on selective decode and forward are considered for a cooperative half-duplex relay channel. Two time slot types: T₁ slot for relay listening and T₂ slot for relay forwarding are assumed to accommodate half duplex ...
Read More
Cooperative Communications in Future Home Networks

The basic idea behind cooperative communications is that mobile terminals collaborate to send data to each other. This effectively adds diversity in the system and improves the overall performance. In this paper, we investigate the potential gains of ...
Read More
OFDMA uplink frequency offset estimation via cooperative relaying

Frequency offset estimation for an Orthogonal Frequency-Division Multiple Access (OFDMA) uplink for amplify-and-forward (AF) relays and a new type of relay (R) called decode-and-compensate-and-forward (DcF) relays are studied. Multiple relays are ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
86
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)1

Other Metrics

View Author Metrics

Citations

Cited By

Chen ZLiu Z(2023)Sentence-level heuristic tree search for long text generationComplex & Intelligent Systems10.1007/s40747-023-01244-810:2(3153-3167)Online publication date: 29-Sep-2023
https://doi.org/10.1007/s40747-023-01244-8

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents