-
Canonical Correlation Analysis for Analyzing Sequences of Medical Billing Codes
Authors:
Corinne L. Jones,
Sham M. Kakade,
Lucas W. Thornblade,
David R. Flum,
Abraham D. Flaxman
Abstract:
We propose using canonical correlation analysis (CCA) to generate features from sequences of medical billing codes. Applying this novel use of CCA to a database of medical billing codes for patients with diverticulitis, we first demonstrate that the CCA embeddings capture meaningful relationships among the codes. We then generate features from these embeddings and establish their usefulness in pre…
▽ More
We propose using canonical correlation analysis (CCA) to generate features from sequences of medical billing codes. Applying this novel use of CCA to a database of medical billing codes for patients with diverticulitis, we first demonstrate that the CCA embeddings capture meaningful relationships among the codes. We then generate features from these embeddings and establish their usefulness in predicting future elective surgery for diverticulitis, an important marker in efforts for reducing costs in healthcare.
△ Less
Submitted 6 January, 2017; v1 submitted 1 December, 2016;
originally announced December 2016.
-
Towards a relation extraction framework for cyber-security concepts
Authors:
Corinne L. Jones,
Robert A. Bridges,
Kelly Huffer,
John Goodall
Abstract:
In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security en…
▽ More
In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting from the desired relations. Preliminary testing on a small corpus shows promising results, obtaining precision of .82.
△ Less
Submitted 16 April, 2015;
originally announced April 2015.
-
Automatic Labeling for Entity Extraction in Cyber Security
Authors:
Robert A. Bridges,
Corinne L. Jones,
Michael D. Iannacone,
Kelly M. Testa,
John R. Goodall
Abstract:
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often no…
▽ More
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus ($\sim$750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.
△ Less
Submitted 9 June, 2014; v1 submitted 22 August, 2013;
originally announced August 2013.
-
Towards Experimental Nanosound Using Almost Disjoint Set Theory
Authors:
Cameron L Jones
Abstract:
Music composition using digital audio sequence editors is increasingly performed in a visual workspace where sound complexes are built from discrete sound objects, called gestures that are arranged in time and space to generate a continuous composition. The visual workspace, common to most industry standard audio loop sequencing software, is premised on the arrangement of gestures defined with g…
▽ More
Music composition using digital audio sequence editors is increasingly performed in a visual workspace where sound complexes are built from discrete sound objects, called gestures that are arranged in time and space to generate a continuous composition. The visual workspace, common to most industry standard audio loop sequencing software, is premised on the arrangement of gestures defined with geometric shape properties. Here, one aspect of fractal set theory was validated using audio-frequency sets to evaluate self-affine scaling behavior when new sound complexes are built through union and intersection operations on discrete musical gestures. Results showed that intersection of two sets revealed lower complexity compared with the union operator, meaning that the intersection of two sound gestures is an almost disjoint set, and in accord with formal logic. These results are also discussed with reference to fuzzy sets, cellular automata, nanotechnology and self-organization to further explore the link between sequenced notation and complexity.
△ Less
Submitted 12 March, 2002;
originally announced March 2002.