subscribe to arXiv mailings

Canonical Correlation Analysis for Analyzing Sequences of Medical Billing Codes

Authors: Corinne L. Jones, Sham M. Kakade, Lucas W. Thornblade, David R. Flum, Abraham D. Flaxman

Abstract: We propose using canonical correlation analysis (CCA) to generate features from sequences of medical billing codes. Applying this novel use of CCA to a database of medical billing codes for patients with diverticulitis, we first demonstrate that the CCA embeddings capture meaningful relationships among the codes. We then generate features from these embeddings and establish their usefulness in pre… ▽ More We propose using canonical correlation analysis (CCA) to generate features from sequences of medical billing codes. Applying this novel use of CCA to a database of medical billing codes for patients with diverticulitis, we first demonstrate that the CCA embeddings capture meaningful relationships among the codes. We then generate features from these embeddings and establish their usefulness in predicting future elective surgery for diverticulitis, an important marker in efforts for reducing costs in healthcare. △ Less

Submitted 6 January, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

Comments: Accepted at NIPS 2016 Workshop on Machine Learning for Health

arXiv:1504.04317 [pdf, ps, other]

doi 10.1145/2746266.2746277

Towards a relation extraction framework for cyber-security concepts

Authors: Corinne L. Jones, Robert A. Bridges, Kelly Huffer, John Goodall

Abstract: In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security en… ▽ More In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting from the desired relations. Preliminary testing on a small corpus shows promising results, obtaining precision of .82. △ Less

Submitted 16 April, 2015; originally announced April 2015.

Comments: 4 pages in Cyber & Information Security Research Conference 2015, ACM

ACM Class: H.3.3

arXiv:1308.4941 [pdf, other]

Automatic Labeling for Entity Extraction in Cyber Security

Authors: Robert A. Bridges, Corinne L. Jones, Michael D. Iannacone, Kelly M. Testa, John R. Goodall

Abstract: Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often no… ▽ More Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus ($\sim$750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds. △ Less

Submitted 9 June, 2014; v1 submitted 22 August, 2013; originally announced August 2013.

Comments: 10 pages

arXiv:cs/0203015 [pdf]

Towards Experimental Nanosound Using Almost Disjoint Set Theory

Authors: Cameron L Jones

Abstract: Music composition using digital audio sequence editors is increasingly performed in a visual workspace where sound complexes are built from discrete sound objects, called gestures that are arranged in time and space to generate a continuous composition. The visual workspace, common to most industry standard audio loop sequencing software, is premised on the arrangement of gestures defined with g… ▽ More Music composition using digital audio sequence editors is increasingly performed in a visual workspace where sound complexes are built from discrete sound objects, called gestures that are arranged in time and space to generate a continuous composition. The visual workspace, common to most industry standard audio loop sequencing software, is premised on the arrangement of gestures defined with geometric shape properties. Here, one aspect of fractal set theory was validated using audio-frequency sets to evaluate self-affine scaling behavior when new sound complexes are built through union and intersection operations on discrete musical gestures. Results showed that intersection of two sets revealed lower complexity compared with the union operator, meaning that the intersection of two sound gestures is an almost disjoint set, and in accord with formal logic. These results are also discussed with reference to fuzzy sets, cellular automata, nanotechnology and self-organization to further explore the link between sequenced notation and complexity. △ Less

Submitted 12 March, 2002; originally announced March 2002.

Comments: 20 pages, 4 figures, 1 table

ACM Class: H.5.5

Showing 1–4 of 4 results for author: Jones, C L