Skip to main content

Showing 1–50 of 62 results for author: Bendersky, M

  1. Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I

    Authors: Harrie Oosterhuis, Rolf Jagerman, Zhen Qin, Xuanhui Wang, Michael Bendersky

    Abstract: The traditional evaluation of information retrieval (IR) systems is generally very costly as it requires manual relevance annotation from human experts. Recent advancements in generative artificial intelligence -- specifically large language models (LLMs) -- can generate relevance annotations at an enormous scale with relatively small computational costs. Potentially, this could alleviate the cost… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: KDD '24

  2. arXiv:2406.02886  [pdf, other

    cs.CL cs.AI

    PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs

    Authors: Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Haorui Wang, Zhen Qin, Feng Han, Jialu Liu, Simon Baumgartner, Michael Bendersky, Chao Zhang

    Abstract: Large Language Models (LLMs) have exhibited impressive capabilities in various tasks, yet their vast parameter sizes restrict their applicability in resource-constrained settings. Knowledge distillation (KD) offers a viable solution by transferring expertise from large teacher models to compact student models. However, traditional KD techniques face specific challenges when applied to LLMs, includ… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  3. arXiv:2405.02816  [pdf, other

    cs.CL cs.IR cs.LG

    Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization

    Authors: Hamed Zamani, Michael Bendersky

    Abstract: This paper introduces Stochastic RAG--a novel approach for end-to-end optimization of retrieval-augmented generation (RAG) models that relaxes the simplifying assumptions of marginalization and document independence, made in most prior work. Stochastic RAG casts the retrieval process in RAG as a stochastic sampling without replacement process. Through this formulation, we employ straight-through G… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: To appear in the proceedings of SIGIR 2024

  4. arXiv:2404.11791  [pdf, other

    cs.IR

    Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing

    Authors: Le Yan, Zhen Qin, Honglei Zhuang, Rolf Jagerman, Xuanhui Wang, Michael Bendersky, Harrie Oosterhuis

    Abstract: The powerful generative abilities of large language models (LLMs) show potential in generating relevance labels for search applications. Previous work has found that directly asking about relevancy, such as ``How relevant is document A to query Q?", results in sub-optimal ranking. Instead, the pairwise ranking prompting (PRP) approach produces promising ranking performance through asking about pai… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  5. arXiv:2402.13417  [pdf, other

    cs.IR

    Unlocking the `Why' of Buying: Introducing a New Dataset and Benchmark for Purchase Reason and Post-Purchase Experience

    Authors: Tao Chen, Siqi Zuo, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

    Abstract: Explanations are crucial for enhancing user trust and understanding within modern recommendation systems. To build truly explainable systems, we need high-quality datasets that elucidate why users make choices. While previous efforts have focused on extracting users' post-purchase sentiment in reviews, they ignore the reasons behind the decision to buy. In our work, we propose a novel purchase r… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  6. arXiv:2401.08189  [pdf, other

    cs.AI cs.CL cs.LG

    PRewrite: Prompt Rewriting with Reinforcement Learning

    Authors: Weize Kong, Spurthi Amba Hombaiah, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

    Abstract: Prompt engineering is critical for the development of LLM-based applications. However, it is usually done manually in a "trial and error" fashion that can be time consuming, ineffective, and sub-optimal. Even for the prompts which seemingly work well, there is always a lingering question: can the prompts be made better with further modifications? To address these problems, we investigate automat… ▽ More

    Submitted 10 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  7. arXiv:2401.06954  [pdf, other

    cs.CL

    Bridging the Preference Gap between Retrievers and LLMs

    Authors: Zixuan Ke, Weize Kong, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Michael Bendersky

    Abstract: Large Language Models (LLMs) have demonstrated superior results across a wide range of tasks, and Retrieval-augmented Generation (RAG) is an effective way to enhance the performance by locating relevant information and placing it into the context window of the LLM. However, the relationship between retrievers and LLMs in a RAG is still under-investigated. Most existing work treats the retriever an… ▽ More

    Submitted 20 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  8. arXiv:2311.17650  [pdf, other

    cs.IR

    Creator Context for Tweet Recommendation

    Authors: Spurthi Amba Hombaiah, Tao Chen, Mingyang Zhang, Michael Bendersky, Marc Najork, Matt Colen, Sergey Levi, Vladimir Ofitserov, Tanvir Amin

    Abstract: When discussing a tweet, people usually not only refer to the content it delivers, but also to the person behind the tweet. In other words, grounding the interpretation of the tweet in the context of its creator plays an important role in deciphering the true intent and the importance of the tweet. In this paper, we attempt to answer the question of how creator context should be used to advance… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  9. arXiv:2311.09619  [pdf, other

    cs.CL

    Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning

    Authors: Kazuma Hashimoto, Karthik Raman, Michael Bendersky

    Abstract: In-Context Learning (ICL) is an emergent capability of Large Language Models (LLMs). Only a few demonstrations enable LLMs to be used as blackbox for new tasks. Previous studies have shown that using LLMs' outputs as labels is effective in training models to select demonstrations. Such a label is expected to estimate utility of a demonstration in ICL; however, it has not been well understood how d… ▽ More

    Submitted 2 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted as a long paper at NAACL 2024

  10. Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?

    Authors: Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky

    Abstract: Query expansion has been widely used to improve the search results of first-stage retrievers, yet its influence on second-stage, cross-encoder rankers remains under-explored. A recent work of Weller et al. [44] shows that current expansion techniques benefit weaker models such as DPR and BM25 but harm stronger rankers such as MonoT5. In this paper, we re-examine this conclusion and raise the follo… ▽ More

    Submitted 30 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  11. arXiv:2311.08390  [pdf, other

    cs.CL

    Predicting Text Preference Via Structured Comparative Reasoning

    Authors: Jing Nathan Yan, Tianqi Liu, Justin T Chiu, Jiaming Shen, Zhen Qin, Yue Yu, Yao Zhao, Charu Lakshmanan, Yair Kurzion, Alexander M. Rush, Jialu Liu, Michael Bendersky

    Abstract: Comparative reasoning plays a crucial role in text preference prediction; however, large language models (LLMs) often demonstrate inconsistencies in their reasoning. While approaches like Chain-of-Thought improve accuracy in many other settings, they struggle to consistently distinguish the similarities and differences of complex texts. We introduce SC, a prompting approach that predicts text pref… ▽ More

    Submitted 1 July, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  12. arXiv:2311.07930  [pdf, other

    cs.CL

    It's All Relative! -- A Synthetic Query Generation Approach for Improving Zero-Shot Relevance Prediction

    Authors: Aditi Chaudhary, Karthik Raman, Michael Bendersky

    Abstract: Recent developments in large language models (LLMs) have shown promise in their ability to generate synthetic query-document pairs by prompting with as few as 8 demonstrations. This has enabled building better IR models, especially for tasks with no training data readily available. Typically, such synthetic query generation (QGen) approaches condition on an input context (e.g. a text document) and… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 18 pages

  13. arXiv:2311.07099  [pdf, other

    cs.CL cs.AI

    Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning

    Authors: Yue Yu, Jiaming Shen, Tianqi Liu, Zhen Qin, Jing Nathan Yan, Jialu Liu, Chao Zhang, Michael Bendersky

    Abstract: Large language models (LLMs) have shown remarkable capabilities in various natural language understanding tasks. With only a few demonstration examples, these LLMs can quickly adapt to target tasks without expensive gradient updates. Common strategies to boost such 'in-context' learning ability are to ensemble multiple model decoded results and require the model to generate an explanation along wi… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  14. arXiv:2310.14122  [pdf, other

    cs.IR

    Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels

    Authors: Honglei Zhuang, Zhen Qin, Kai Hui, Junru Wu, Le Yan, Xuanhui Wang, Michael Bendersky

    Abstract: Zero-shot text rankers powered by recent LLMs achieve remarkable ranking performance by simply prompting. Existing prompts for pointwise LLM rankers mostly ask the model to choose from binary relevance labels like "Yes" and "No". However, the lack of intermediate relevance label options may cause the LLM to provide noisy or biased answers for documents that are partially relevant to the query. We… ▽ More

    Submitted 1 April, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

    Comments: NAACL 2024; 13 pages

  15. arXiv:2310.12100  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling

    Authors: Yaqing Wang, Jialin Wu, Tanmaya Dabral, Jiageng Zhang, Geoff Brown, Chun-Ta Lu, Frederick Liu, Yi Liang, Bo Pang, Michael Bendersky, Radu Soricut

    Abstract: Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on a wide range of tasks by scaling up parameter counts from O(10^9) to O(10^{12}) levels and further beyond. These large scales make it impossible to adapt and deploy fully specialized models given a task of interest. Parameter-efficient fine-tuning (PEFT) emerges as a promising direction to tackle th… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  16. arXiv:2310.11593  [pdf, other

    cs.CL cs.AI cs.LG

    Automated Evaluation of Personalized Text Generation using Large Language Models

    Authors: Yaqing Wang, Jiepu Jiang, Mingyang Zhang, Cheng Li, Yi Liang, Qiaozhu Mei, Michael Bendersky

    Abstract: Personalized text generation presents a specialized mechanism for delivering content that is specific to a user's personal context. While the research progress in this area has been rapid, evaluation still presents a challenge. Traditional automated metrics such as BLEU and ROUGE primarily measure lexical similarity to human-written references, and are not able to distinguish personalization from… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  17. arXiv:2310.05175  [pdf, other

    cs.LG

    Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

    Authors: Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Gen Li, Ajay Jaiswal, Mykola Pechenizkiy, Yi Liang, Michael Bendersky, Zhangyang Wang, Shiwei Liu

    Abstract: Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size. In response to this challenge, efforts have been directed toward the application of traditional network pruning techniques to LLMs, uncovering a massive number of parameters that can be pruned in one-shot without… ▽ More

    Submitted 6 May, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

  18. Learning to Rewrite Prompts for Personalized Text Generation

    Authors: Cheng Li, Mingyang Zhang, Qiaozhu Mei, Weize Kong, Michael Bendersky

    Abstract: Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction. Most existing studies focus on designing specialized models for a particular domain, or they require fine-tuning the LLMs to generate personalized text. We consider a typical scenario in which the large language model, which generates personalized output, is frozen and can onl… ▽ More

    Submitted 8 February, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: In Proceedings of the ACM Web Conference 2024 (WWW '24)

  19. arXiv:2309.07900  [pdf, other

    cs.CL cs.IR

    Ambiguity-Aware In-Context Learning with Large Language Models

    Authors: Lingyu Gao, Aditi Chaudhary, Krishna Srinivasan, Kazuma Hashimoto, Karthik Raman, Michael Bendersky

    Abstract: In-context learning (ICL) i.e. showing LLMs only a few task-specific demonstrations has led to downstream gains with no task-specific fine-tuning required. However, LLMs are sensitive to the choice of prompts, and therefore a crucial research question is how to select good demonstrations for ICL. One effective strategy is leveraging semantic similarity between the ICL demonstrations and test input… ▽ More

    Submitted 30 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: 15 pages in total

  20. arXiv:2308.07968  [pdf, other

    cs.CL

    Teach LLMs to Personalize -- An Approach inspired by Writing Education

    Authors: Cheng Li, Mingyang Zhang, Qiaozhu Mei, Yaqing Wang, Spurthi Amba Hombaiah, Yi Liang, Michael Bendersky

    Abstract: Personalized text generation is an emerging research area that has attracted much attention in recent years. Most studies in this direction focus on a particular domain by designing bespoke features or models. In this work, we propose a general approach for personalized text generation using large language models (LLMs). Inspired by the practice of writing education, we develop a multistage and mu… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  21. SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding

    Authors: Vasilisa Bashlovkina, Riley Matthews, Zhaobin Kuang, Simon Baumgartner, Michael Bendersky

    Abstract: We study the ability of transformer-based language models (LMs) to understand social media language. Social media (SM) language is distinct from standard written language, yet existing benchmarks fall short of capturing LM performance in this socially, economically, and politically important domain. We quantify the degree to which social media language differs from conventional language and conclu… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  22. arXiv:2306.17563  [pdf, other

    cs.IR cs.CL cs.LG

    Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting

    Authors: Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, Michael Bendersky

    Abstract: Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem. However, researchers have found it difficult to outperform fine-tuned baseline rankers on benchmark datasets. We analyze pointwise and listwise ranking prompts used by existing methods and argue that off-the-shelf LLMs do not fully unde… ▽ More

    Submitted 28 March, 2024; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted to NAACL 2024. Corrected results of RankT5 on TREC-DL19

  23. arXiv:2306.15811  [pdf, other

    q-bio.NC cs.LG eess.IV

    Learning normal asymmetry representations for homologous brain structures

    Authors: Duilio Deangeli, Emmanuel Iarussi, Juan Pablo Princich, Mariana Bendersky, Ignacio Larrabide, José Ignacio Orlando

    Abstract: Although normal homologous brain structures are approximately symmetrical by definition, they also have shape differences due to e.g. natural ageing. On the other hand, neurodegenerative conditions induce their own changes in this asymmetry, making them more pronounced or altering their location. Identifying when these alterations are due to a pathological deterioration is still challenging. Curre… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Journal ref: Published in MICCAI 2023

  24. arXiv:2306.08650  [pdf, other

    cs.IR cs.LG

    Learning to Rank when Grades Matter

    Authors: Le Yan, Zhen Qin, Gil Shamir, Dong Lin, Xuanhui Wang, Mike Bendersky

    Abstract: Graded labels are ubiquitous in real-world learning-to-rank applications, especially in human rated relevance data. Traditional learning-to-rank techniques aim to optimize the ranked order of documents. They typically, however, ignore predicting actual grades. This prevents them from being adopted in applications where grades matter, such as filtering out ``poor'' documents. Achieving both good ra… ▽ More

    Submitted 20 June, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  25. arXiv:2305.11944  [pdf, other

    cs.IR cs.CL

    Exploring the Viability of Synthetic Query Generation for Relevance Prediction

    Authors: Aditi Chaudhary, Karthik Raman, Krishna Srinivasan, Kazuma Hashimoto, Mike Bendersky, Marc Najork

    Abstract: Query-document relevance prediction is a critical problem in Information Retrieval systems. This problem has increasingly been tackled using (pretrained) transformer-based models which are finetuned using large collections of labeled data. However, in specialized domains such as e-commerce and healthcare, the viability of this approach is limited by the dearth of large in-domain data. To address t… ▽ More

    Submitted 16 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: In Proceedings of ACM SIGIRWorkshop on eCommerce (SIGIR eCom 23)

  26. arXiv:2305.05010  [pdf, other

    cs.LG cs.CL

    Do Not Blindly Imitate the Teacher: Using Perturbed Loss for Knowledge Distillation

    Authors: Rongzhi Zhang, Jiaming Shen, Tianqi Liu, Jialu Liu, Michael Bendersky, Marc Najork, Chao Zhang

    Abstract: Knowledge distillation is a popular technique to transfer knowledge from large teacher models to a small student model. Typically, the student learns to imitate the teacher by minimizing the KL divergence of its output distribution with the teacher's output distribution. In this work, we argue that such a learning objective is sub-optimal because there exists a discrepancy between the teacher's ou… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: 16 pages

  27. arXiv:2305.03653  [pdf, other

    cs.IR

    Query Expansion by Prompting Large Language Models

    Authors: Rolf Jagerman, Honglei Zhuang, Zhen Qin, Xuanhui Wang, Michael Bendersky

    Abstract: Query expansion is a widely used technique to improve the recall of search systems. In this paper, we propose an approach to query expansion that leverages the generative abilities of Large Language Models (LLMs). Unlike traditional query expansion approaches such as Pseudo-Relevance Feedback (PRF) that relies on retrieving a good set of pseudo-relevant documents to expand queries, we rely on the… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: 7 pages, 2 figures

    ACM Class: H.3.3

  28. arXiv:2304.14522  [pdf, other

    cs.IR cs.CL cs.LG

    Multivariate Representation Learning for Information Retrieval

    Authors: Hamed Zamani, Michael Bendersky

    Abstract: Dense retrieval models use bi-encoder network architectures for learning query and document representations. These representations are often in the form of a vector representation and their similarities are often computed using the dot product function. In this paper, we propose a new representation learning framework for dense retrieval. Instead of learning a vector for each query and document, o… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: Accepted for publication at SIGIR 2023

  29. arXiv:2304.11406  [pdf, other

    cs.CL

    LaMP: When Large Language Models Meet Personalization

    Authors: Alireza Salemi, Sheshera Mysore, Michael Bendersky, Hamed Zamani

    Abstract: This paper highlights the importance of personalization in large language models and introduces the LaMP benchmark -- a novel benchmark for training and evaluating language models for producing personalized outputs. LaMP offers a comprehensive evaluation framework with diverse language tasks and multiple entries for each user profile. It consists of seven personalized tasks, spanning three text cl… ▽ More

    Submitted 4 June, 2024; v1 submitted 22 April, 2023; originally announced April 2023.

  30. arXiv:2304.08062  [pdf, other

    cs.IR

    Metric-agnostic Ranking Optimization

    Authors: Qingyao Ai, Xuanhui Wang, Michael Bendersky

    Abstract: Ranking is at the core of Information Retrieval. Classic ranking optimization studies often treat ranking as a sorting problem with the assumption that the best performance of ranking would be achieved if we rank items according to their individual utility. Accordingly, considerable ranking metrics have been developed and learning-to-rank algorithms that have been designed to optimize these si… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  31. arXiv:2302.05852  [pdf, other

    cs.CL cs.AI cs.IR

    "Why is this misleading?": Detecting News Headline Hallucinations with Explanations

    Authors: Jiaming Shen, Jialu Liu, Dan Finnie, Negar Rahmati, Michael Bendersky, Marc Najork

    Abstract: Automatic headline generation enables users to comprehend ongoing news events promptly and has recently become an important task in web mining and natural language processing. With the growing need for news headline generation, we argue that the hallucination issue, namely the generated headlines being not supported by the original news stories, is a critical challenge for the deployment of this f… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

    Comments: WWW 2023, 12 pages

  32. arXiv:2212.13937  [pdf, other

    cs.IR cs.AI

    Towards Disentangling Relevance and Bias in Unbiased Learning to Rank

    Authors: Yunan Zhang, Le Yan, Zhen Qin, Honglei Zhuang, Jiaming Shen, Xuanhui Wang, Michael Bendersky, Marc Najork

    Abstract: Unbiased learning to rank (ULTR) studies the problem of mitigating various biases from implicit user feedback data such as clicks, and has been receiving considerable attention recently. A popular ULTR approach for real-world applications uses a two-tower architecture, where click modeling is factorized into a relevance tower with regular input features, and a bias tower with bias-relevant inputs… ▽ More

    Submitted 4 June, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

  33. arXiv:2212.11311  [pdf, other

    cs.CL cs.AI cs.LG cs.SI

    What do LLMs Know about Financial Markets? A Case Study on Reddit Market Sentiment Analysis

    Authors: Xiang Deng, Vasilisa Bashlovkina, Feng Han, Simon Baumgartner, Michael Bendersky

    Abstract: Market sentiment analysis on social media content requires knowledge of both financial markets and social media jargon, which makes it a challenging task for human raters. The resulting lack of high-quality labeled data stands in the way of conventional supervised learning methods. Instead, we approach this problem using semi-supervised learning with a large language model (LLM). Our pipeline gene… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  34. arXiv:2212.10764  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Learning List-Level Domain-Invariant Representations for Ranking

    Authors: Ruicheng Xian, Honglei Zhuang, Zhen Qin, Hamed Zamani, Jing Lu, Ji Ma, Kai Hui, Han Zhao, Xuanhui Wang, Michael Bendersky

    Abstract: Domain adaptation aims to transfer the knowledge learned on (data-rich) source domains to (low-resource) target domains, and a popular method is invariant representation learning, which matches and aligns the data distributions on the feature space. Although this method is studied extensively and applied on classification and regression problems, its adoption on ranking problems is sporadic, and t… ▽ More

    Submitted 31 October, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: NeurIPS 2023. Comparison to v1: revised presentation and proof of Corollary 4.9

  35. arXiv:2211.01494  [pdf, other

    cs.IR

    Regression Compatible Listwise Objectives for Calibrated Ranking with Binary Relevance

    Authors: Aijun Bai, Rolf Jagerman, Zhen Qin, Le Yan, Pratyush Kar, Bing-Rong Lin, Xuanhui Wang, Michael Bendersky, Marc Najork

    Abstract: As Learning-to-Rank (LTR) approaches primarily seek to improve ranking quality, their output scores are not scale-calibrated by design. This fundamentally limits LTR usage in score-sensitive applications. Though a simple multi-objective approach that combines a regression and a ranking objective can effectively learn scale-calibrated scores, we argue that the two objectives are not necessarily com… ▽ More

    Submitted 21 August, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  36. arXiv:2210.15718  [pdf, other

    cs.CL cs.IR

    QUILL: Query Intent with Large Language Models using Retrieval Augmentation and Multi-stage Distillation

    Authors: Krishna Srinivasan, Karthik Raman, Anupam Samanta, Lingrui Liao, Luca Bertelli, Mike Bendersky

    Abstract: Large Language Models (LLMs) have shown impressive results on a variety of text understanding tasks. Search queries though pose a unique challenge, given their short-length and lack of nuance or context. Complicated feature engineering efforts do not always lead to downstream improvements as their performance benefits may be offset by increased complexity of knowledge distillation. Thus, in this p… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 Industry Track

  37. arXiv:2210.10634  [pdf, other

    cs.IR cs.CL

    RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses

    Authors: Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma, Jing Lu, Jianmo Ni, Xuanhui Wang, Michael Bendersky

    Abstract: Recently, substantial progress has been made in text ranking based on pretrained language models such as BERT. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts usually formulate text ranking as classification and rely on postprocessing to obtain a ranked list. In this paper, we propose RankT5 and study two T5-based rankin… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 13 pages

  38. arXiv:2210.05145  [pdf, other

    cs.IR cs.CL

    Retrieval Augmentation for T5 Re-ranker using External Sources

    Authors: Kai Hui, Tao Chen, Zhen Qin, Honglei Zhuang, Fernando Diaz, Mike Bendersky, Don Metzler

    Abstract: Retrieval augmentation has shown promising improvements in different tasks. However, whether such augmentation can assist a large language model based re-ranker remains unclear. We investigate how to augment T5-based re-rankers using high-quality information retrieved from two external corpora -- a commercial web search engine and Wikipedia. We empirically demonstrate how retrieval augmentation ca… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  39. arXiv:2205.01230  [pdf, other

    cs.LG cs.CL cs.IR

    Retrieval-Enhanced Machine Learning

    Authors: Hamed Zamani, Fernando Diaz, Mostafa Dehghani, Donald Metzler, Michael Bendersky

    Abstract: Although information access systems have long supported people in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models. In this way, the core principles of indexing, representation, retrieval, and ranking can be applied and extended to substantially improve model generalization,… ▽ More

    Submitted 2 May, 2022; originally announced May 2022.

    Comments: To appear in proceedings of ACM SIGIR 2022

  40. Out-of-Domain Semantics to the Rescue! Zero-Shot Hybrid Retrieval Models

    Authors: Tao Chen, Mingyang Zhang, Jing Lu, Michael Bendersky, Marc Najork

    Abstract: The pre-trained language model (eg, BERT) based deep retrieval models achieved superior performance over lexical retrieval models (eg, BM25) in many passage retrieval tasks. However, limited work has been done to generalize a deep retrieval model to other tasks and domains. In this work, we carefully select five datasets, including two in-domain datasets and three out-of-domain datasets with diffe… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: Accepted at ECIR 2022 (full paper)

  41. arXiv:2112.09727  [pdf, other

    cs.LG cs.AI cs.IR

    Rank4Class: A Ranking Formulation for Multiclass Classification

    Authors: Nan Wang, Zhen Qin, Le Yan, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Marc Najork

    Abstract: Multiclass classification (MCC) is a fundamental machine learning problem of classifying each instance into one of a predefined set of classes. In the deep learning era, extensive efforts have been spent on developing more powerful neural embedding models to better represent the instance for improving MCC performance. In this paper, we do not aim to propose new neural models for instance represent… ▽ More

    Submitted 21 December, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

  42. arXiv:2109.15285  [pdf, other

    cs.IR

    Improving Neural Ranking via Lossless Knowledge Distillation

    Authors: Zhen Qin, Le Yan, Yi Tay, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Marc Najork

    Abstract: We explore a novel perspective of knowledge distillation (KD) for learning to rank (LTR), and introduce Self-Distilled neural Rankers (SDR), where student rankers are parameterized identically to their teachers. Unlike the existing ranking distillation work which pursues a good trade-off between performance and efficiency, SDR is able to significantly improve ranking performance of students over t… ▽ More

    Submitted 6 April, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

    Comments: 15 pages

  43. Dynamic Language Models for Continuously Evolving Content

    Authors: Spurthi Amba Hombaiah, Tao Chen, Mingyang Zhang, Michael Bendersky, Marc Najork

    Abstract: The content on the web is in a constant state of flux. New entities, issues, and ideas continuously emerge, while the semantics of the existing conversation topics gradually shift. In recent years, pre-trained language models like BERT greatly improved the state-of-the-art for a large spectrum of content understanding tasks. Therefore, in this paper, we aim to study how these language models can b… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Journal ref: KDD 2021

  44. arXiv:2104.08405  [pdf, other

    cs.CL cs.CV cs.IR

    LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding

    Authors: Te-Lin Wu, Cheng Li, Mingyang Zhang, Tao Chen, Spurthi Amba Hombaiah, Michael Bendersky

    Abstract: Document layout comprises both structural and visual (eg. font-sizes) information that is vital but often ignored by machine learning models. The few existing models which do use layout information only consider textual contents, and overlook the existence of contents in other modalities such as images. Additionally, spatial interactions of presented contents in a layout were never really fully ex… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

  45. Natural Language Understanding with Privacy-Preserving BERT

    Authors: Chen Qu, Weize Kong, Liu Yang, Mingyang Zhang, Michael Bendersky, Marc Najork

    Abstract: Privacy preservation remains a key challenge in data mining and Natural Language Understanding (NLU). Previous research shows that the input text or even text embeddings can leak private information. This concern motivates our research on effective privacy preservation approaches for pretrained Language Models (LMs). We investigate the privacy and utility implications of applying dx-privacy, a var… ▽ More

    Submitted 19 August, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted to CIKM 2021

  46. arXiv:2103.01913  [pdf, other

    cs.CV cs.CL cs.IR

    WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning

    Authors: Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, Marc Najork

    Abstract: The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage large high-quality visio-linguistic datasets for learning complementary information (across image and text modalities). In this paper, we introduce the Wikipedia-based Imag… ▽ More

    Submitted 3 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

  47. DiPair: Fast and Accurate Distillation for Trillion-Scale Text Matching and Pair Modeling

    Authors: Jiecao Chen, Liu Yang, Karthik Raman, Michael Bendersky, Jung-Jung Yeh, Yun Zhou, Marc Najork, Danyang Cai, Ehsan Emadzadeh

    Abstract: Pre-trained models like BERT (Devlin et al., 2018) have dominated NLP / IR applications such as single sentence classification, text pair classification, and question answering. However, deploying these models in real systems is highly non-trivial due to their exorbitant computational costs. A common remedy to this is knowledge distillation (Hinton et al., 2015), leading to faster inference. Howev… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

    Comments: 13 pages. Accepted to Findings of EMNLP 2020

  48. arXiv:2010.01195  [pdf, other

    cs.IR

    Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

    Authors: Saar Kuzi, Mingyang Zhang, Cheng Li, Michael Bendersky, Marc Najork

    Abstract: Search engines often follow a two-phase paradigm where in the first stage (the retrieval stage) an initial set of documents is retrieved and in the second stage (the re-ranking stage) the documents are re-ranked to obtain the final result list. While deep neural networks were shown to improve the performance of the re-ranking stage in previous works, there is little literature about using deep neu… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  49. arXiv:2010.00200  [pdf, other

    cs.IR cs.CL

    RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble

    Authors: Michael Bendersky, Honglei Zhuang, Ji Ma, Shuguang Han, Keith Hall, Ryan McDonald

    Abstract: In this paper, we report the results of our participation in the TREC-COVID challenge. To meet the challenge of building a search engine for rapidly evolving biomedical collection, we propose a simple yet effective weighted hierarchical rank fusion approach, that ensembles together 102 runs from (a) lexical and semantic retrieval systems, (b) pre-trained and fine-tuned BERT rankers, and (c) releva… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: 14 pages

  50. Receptivity of an AI Cognitive Assistant by the Radiology Community: A Report on Data Collected at RSNA

    Authors: Karina Kanjaria, Anup Pillai, Chaitanya Shivade, Marina Bendersky, Ashutosh Jadhav, Vandana Mukherjee, Tanveer Syeda-Mahmood

    Abstract: Due to advances in machine learning and artificial intelligence (AI), a new role is emerging for machines as intelligent assistants to radiologists in their clinical workflows. But what systematic clinical thought processes are these machines using? Are they similar enough to those of radiologists to be trusted as assistants? A live demonstration of such a technology was conducted at the 2016 Scie… ▽ More

    Submitted 13 September, 2020; originally announced September 2020.

    Journal ref: Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 5: HEALTHINF, ISBN 978-989-758-398-8, pages 178-186. 2020