-
MedCalc-Bench: Evaluating Large Language Models for Medical Calculations
Authors:
Nikhil Khandekar,
Qiao Jin,
Guangzhi Xiong,
Soren Dunn,
Serina S Applebaum,
Zain Anwar,
Maame Sarfo-Gyamfi,
Conrad W Safranek,
Abid A Anwar,
Andrew Zhang,
Aidan Gilson,
Maxwell B Singer,
Amisha Dave,
Andrew Taylor,
Aidong Zhang,
Qingyu Chen,
Zhiyong Lu
Abstract:
As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative e…
▽ More
As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative equations and rule-based reasoning paradigms for evidence-based decision support. To this end, we propose MedCalc-Bench, a first-of-its-kind dataset focused on evaluating the medical calculation capability of LLMs. MedCalc-Bench contains an evaluation set of over 1000 manually reviewed instances from 55 different medical calculation tasks. Each instance in MedCalc-Bench consists of a patient note, a question requesting to compute a specific medical value, a ground truth answer, and a step-by-step explanation showing how the answer is obtained. While our evaluation results show the potential of LLMs in this area, none of them are effective enough for clinical settings. Common issues include extracting the incorrect entities, not using the correct equation or rules for a calculation task, or incorrectly performing the arithmetic for the computation. We hope our study highlights the quantitative knowledge and reasoning gaps in LLMs within medical settings, encouraging future improvements of LLMs for various clinical calculation tasks.
△ Less
Submitted 30 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
The Canadian VirusSeq Data Portal & Duotang: open resources for SARS-CoV-2 viral sequences and genomic epidemiology
Authors:
Erin E. Gill,
Baofeng Jia,
Carmen Lia Murall,
Raphaël Poujol,
Muhammad Zohaib Anwar,
Nithu Sara John,
Justin Richardsson,
Ashley Hobb,
Abayomi S. Olabode,
Alexandru Lepsa,
Ana T. Duggan,
Andrea D. Tyler,
Arnaud N'Guessan,
Atul Kachru,
Brandon Chan,
Catherine Yoshida,
Christina K. Yung,
David Bujold,
Dusan Andric,
Edmund Su,
Emma J. Griffiths,
Gary Van Domselaar,
Gordon W. Jolly,
Heather K. E. Ward,
Henrich Feher
, et al. (45 additional authors not shown)
Abstract:
The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes acro…
▽ More
The COVID-19 pandemic led to a large global effort to sequence SARS-CoV-2 genomes from patient samples to track viral evolution and inform public health response. Millions of SARS-CoV-2 genome sequences have been deposited in global public repositories. The Canadian COVID-19 Genomics Network (CanCOGeN - VirusSeq), a consortium tasked with coordinating expanded sequencing of SARS-CoV-2 genomes across Canada early in the pandemic, created the Canadian VirusSeq Data Portal, with associated data pipelines and procedures, to support these efforts. The goal of VirusSeq was to allow open access to Canadian SARS-CoV-2 genomic sequences and enhanced, standardized contextual data that were unavailable in other repositories and that meet FAIR standards (Findable, Accessible, Interoperable and Reusable). The Portal data submission pipeline contains data quality checking procedures and appropriate acknowledgement of data generators that encourages collaboration. Here we also highlight Duotang, a web platform that presents genomic epidemiology and modeling analyses on circulating and emerging SARS-CoV-2 variants in Canada. Duotang presents dynamic changes in variant composition of SARS-CoV-2 in Canada and by province, estimates variant growth, and displays complementary interactive visualizations, with a text overview of the current situation. The VirusSeq Data Portal and Duotang resources, alongside additional analyses and resources computed from the Portal (COVID-MVP, CoVizu), are all open-source and freely available. Together, they provide an updated picture of SARS-CoV-2 evolution to spur scientific discussions, inform public discourse, and support communication with and within public health authorities. They also serve as a framework for other jurisdictions interested in open, collaborative sequence data sharing and analyses.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation
Authors:
Thomas Monninger,
Vandana Dokkadi,
Md Zafar Anwar,
Steffen Staab
Abstract:
Autonomous driving requires an accurate representation of the environment. A strategy toward high accuracy is to fuse data from several sensors. Learned Bird's-Eye View (BEV) encoders can achieve this by mapping data from individual sensors into one joint latent space. For cost-efficient camera-only systems, this provides an effective mechanism to fuse data from multiple cameras with different vie…
▽ More
Autonomous driving requires an accurate representation of the environment. A strategy toward high accuracy is to fuse data from several sensors. Learned Bird's-Eye View (BEV) encoders can achieve this by mapping data from individual sensors into one joint latent space. For cost-efficient camera-only systems, this provides an effective mechanism to fuse data from multiple cameras with different views. Accuracy can further be improved by aggregating sensor information over time. This is especially important in monocular camera systems to account for the lack of explicit depth and velocity measurements. Thereby, the effectiveness of developed BEV encoders crucially depends on the operators used to aggregate temporal information and on the used latent representation spaces. We analyze BEV encoders proposed in the literature and compare their effectiveness, quantifying the effects of aggregation operators and latent representations. While most existing approaches aggregate temporal information either in image or in BEV latent space, our analyses and performance comparisons suggest that these latent representations exhibit complementary strengths. Therefore, we develop a novel temporal BEV encoder, TempBEV, which integrates aggregated temporal information from both latent spaces. We consider subsequent image frames as stereo through time and leverage methods from optical flow estimation for temporal stereo encoding. Empirical evaluation on the NuScenes dataset shows a significant improvement by TempBEV over the baseline for 3D object detection and BEV segmentation. The ablation uncovers a strong synergy of joint temporal aggregation in the image and BEV latent space. These results indicate the overall effectiveness of our approach and make a strong case for aggregating temporal information in both image and BEV latent spaces.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Privacy Limits in Power-Law Bipartite Networks under Active Fingerprinting Attacks
Authors:
M. Shariatnasab,
F. Shirani,
Z. Anwar
Abstract:
This work considers the fundamental privacy limits under active fingerprinting attacks in power-law bipartite networks. The scenario arises naturally in social network analysis, tracking user mobility in wireless networks, and forensics applications, among others. A stochastic growing network generation model -- called the popularity-based model -- is investigated, where the bipartite network is g…
▽ More
This work considers the fundamental privacy limits under active fingerprinting attacks in power-law bipartite networks. The scenario arises naturally in social network analysis, tracking user mobility in wireless networks, and forensics applications, among others. A stochastic growing network generation model -- called the popularity-based model -- is investigated, where the bipartite network is generated iteratively, and in each iteration vertices attract new edges based on their assigned popularity values. It is shown that using the appropriate choice of initial popularity values, the node degree distribution follows a power-law distribution with arbitrary parameter $α>2$, i.e. fraction of nodes with degree $d$ is proportional to $d^{-α}$. An active fingerprinting deanonymization attack strategy called the augmented information threshold attack strategy (A-ITS) is proposed which uses the attacker's knowledge of the node degree distribution along with the concept of information values for deanonymization. Sufficient conditions for the success of the A-ITS, based on network parameters, are derived. It is shown through simulations that the proposed attack significantly outperforms the state-of-the-art attack strategies.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Leveraging Social-Network Infrastructure to Improve Peer-to-Peer Overlay Performance: Results from Orkut
Authors:
Zahid Anwar,
William Yurcik,
Vivek Pandey,
Asim Shankar,
Indranil Gupta,
Roy H. Campbell
Abstract:
Application-level peer-to-peer (P2P) network overlays are an emerging paradigm that facilitates decentralization and flexibility in the scalable deployment of applications such as group communication, content delivery, and data sharing. However the construction of the overlay graph topology optimized for low latency, low link and node stress and lookup performance is still an open problem. We pr…
▽ More
Application-level peer-to-peer (P2P) network overlays are an emerging paradigm that facilitates decentralization and flexibility in the scalable deployment of applications such as group communication, content delivery, and data sharing. However the construction of the overlay graph topology optimized for low latency, low link and node stress and lookup performance is still an open problem. We present a design of an overlay constructed on top of a social network and show that it gives a sizable improvement in lookups, average round-trip delay and scalability as opposed to other overlay topologies. We build our overlay on top of the topology of a popular real-world social network namely Orkut. We show Orkuts suitability for our purposes by evaluating the clustering behavior of its graph structure and the socializing pattern of its members.
△ Less
Submitted 28 September, 2005;
originally announced September 2005.