Computer Science > Computation and Language

arXiv:2401.07284 (cs)

[Submitted on 14 Jan 2024 (v1), last revised 18 Jan 2024 (this version, v2)]

Title:Improving Domain Adaptation through Extended-Text Reading Comprehension

Authors:Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang

View PDF HTML (experimental)

Abstract:To enhance the domain-specific capabilities of large language models, continued pre-training on a domain-specific corpus is a prevalent method. Recent work demonstrates that adapting models using reading comprehension data formatted by regex-based patterns can significantly improve performance on domain-specific tasks. However, regex-based patterns are incapable of parsing raw corpora using domain-specific knowledge. Furthermore, the question and answer pairs are extracted directly from the corpus in predefined formats offers limited context. To address this limitation, we improve reading comprehension via LLM and clustering. LLM focuses on leveraging domain knowledge within the corpus to refine comprehension stage, while clustering supplies relevant knowledge by extending the context to enrich reading stage. Additionally, our method incorporates parameter-efficient fine-tuning to improve the efficiency of domain adaptation. In comparison to AdaptLLM, our method achieves an improvement exceeding 5% in domain-specific tasks. Our code will available at this https URL.

Comments:	Work in Progress
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2401.07284 [cs.CL]
	(or arXiv:2401.07284v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.07284

Submission history

From: Ting Jiang [view email]
[v1] Sun, 14 Jan 2024 13:11:31 UTC (734 KB)
[v2] Thu, 18 Jan 2024 11:29:37 UTC (734 KB)

Computer Science > Computation and Language

Title:Improving Domain Adaptation through Extended-Text Reading Comprehension

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving Domain Adaptation through Extended-Text Reading Comprehension

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators