Skip to main content

Showing 1–47 of 47 results for author: Luo, A

  1. arXiv:2407.11333  [pdf, other

    cs.RO cs.SD eess.AS

    Disentangled Acoustic Fields For Multimodal Physical Scene Understanding

    Authors: Jie Yin, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan

    Abstract: We study the problem of multimodal physical scene understanding, where an embodied agent needs to find fallen objects by inferring object properties, direction, and distance of an impact sound source. Previous works adopt feed-forward neural networks to directly regress the variables from sound, leading to poor generalization and domain adaptation issues. In this paper, we illustrate that learning… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.08939  [pdf, other

    cs.CV

    LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

    Authors: Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu

    Abstract: In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded f… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  3. arXiv:2406.13735  [pdf, other

    cs.CV cs.LG

    StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images

    Authors: Rushikesh Zawar, Shaurya Dewan, Andrew F. Luo, Margaret M. Henderson, Michael J. Tarr, Leila Wehbe

    Abstract: Understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. A key aspect of this challenge is that objects sharing similar semantic meanings or functions can exhibit striking visual differences, making accurate identification and categorization difficult. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statist… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Dataset website: https://stablesemantics.github.io/StableSemantics

  4. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  5. arXiv:2406.05191  [pdf, other

    cs.CV

    DiffusionPID: Interpreting Diffusion via Partial Information Decomposition

    Authors: Shaurya Dewan, Rushikesh Zawar, Prakanshul Saxena, Yingshan Chang, Andrew Luo, Yonatan Bisk

    Abstract: Text-to-image diffusion models have made significant progress in generating naturalistic images from textual inputs, and demonstrate the capacity to learn and represent complex visual-semantic relationships. While these diffusion models have achieved remarkable success, the underlying mechanisms driving their performance are not yet fully accounted for, with many unanswered questions surrounding w… ▽ More

    Submitted 12 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

  6. arXiv:2406.02659  [pdf, other

    q-bio.NC cs.AI cs.CV

    Neural Representations of Dynamic Visual Stimuli

    Authors: Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr

    Abstract: Humans experience the world through constantly changing visual stimuli, where scenes can shift and move, change in appearance, and vary in distance. The dynamic nature of visual perception is a fundamental aspect of our daily lives, yet the large majority of research on object and scene processing, particularly using fMRI, has focused on static stimuli. While studies of static image perception are… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  7. arXiv:2405.19425  [pdf, other

    cs.CL

    Adaptive In-conversation Team Building for Language Model Agents

    Authors: Linxin Song, Jiale Liu, Jieyu Zhang, Shaokun Zhang, Ao Luo, Shijian Wang, Qingyun Wu, Chi Wang

    Abstract: Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks, while the effective design of multiple agents for a particular application remains an art. It is thus intriguing to answer a critical question: Given a task, how can we build a team of LLM agents to solve it effectively? Our new adaptive team-building paradigm offers a flexible so… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  8. arXiv:2405.10890  [pdf, other

    astro-ph.IM astro-ph.GA cs.AI

    A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model

    Authors: Mingxiang Fu, Yu Song, Jiameng Lv, Liang Cao, Peng Jia, Nan Li, Xiangru Li, Jifeng Liu, A-Li Luo, Bo Qiu, Shiyin Shen, Liangping Tu, Lili Wang, Shoulin Wei, Haifeng Yang, Zhenping Yi, Zhiqiang Zou

    Abstract: The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. He… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 26 pages, 10 figures, to be published on Chinese Physics C

  9. arXiv:2404.08452  [pdf, other

    cs.CV

    MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection

    Authors: Chenqi Kong, Anwei Luo, Peijun Bao, Yi Yu, Haoliang Li, Zengwei Zheng, Shiqi Wang, Alex C. Kot

    Abstract: Deepfakes have recently raised significant trust issues and security concerns among the public. Compared to CNN face forgery detectors, ViT-based methods take advantage of the expressivity of transformers, achieving superior detection performance. However, these approaches still exhibit the following limitations: (1) Fully fine-tuning ViT-based models from ImageNet weights demands substantial comp… ▽ More

    Submitted 7 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  10. arXiv:2403.19164  [pdf, other

    cs.CV

    RecDiffusion: Rectangling for Image Stitching with Diffusion Models

    Authors: Tianhao Zhou, Haipeng Li, Ziyi Wang, Ao Luo, Chen-Lin Zhang, Jiajun Li, Bing Zeng, Shuaicheng Liu

    Abstract: Image stitching from different captures often results in non-rectangular boundaries, which is often considered unappealing. To solve non-rectangular boundaries, current solutions involve cropping, which discards image content, inpainting, which can introduce unrelated content, or warping, which can distort non-linear features and introduce artifacts. To overcome these issues, we introduce a novel… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  11. arXiv:2401.09972  [pdf, other

    cs.CL

    Better Explain Transformers by Illuminating Important Information

    Authors: Linxin Song, Yan Cui, Ao Luo, Freddy Lecue, Irene Li

    Abstract: Transformer-based models excel in various natural language processing (NLP) tasks, attracting countless efforts to explain their inner workings. Prior methods explain Transformers by focusing on the raw gradient and attention as token attribution scores, where non-relevant information is often considered during explanation computation, resulting in confusing results. In this work, we propose highl… ▽ More

    Submitted 26 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  12. arXiv:2310.04420  [pdf, other

    cs.LG q-bio.NC

    BrainSCUBA: Fine-Grained Natural Language Captions of Visual Cortex Selectivity

    Authors: Andrew F. Luo, Margaret M. Henderson, Michael J. Tarr, Leila Wehbe

    Abstract: Understanding the functional organization of higher visual cortex is a central focus in neuroscience. Past studies have primarily mapped the visual and semantic selectivity of neural populations using hand-selected stimuli, which may potentially bias results towards pre-existing hypotheses of visual cortex functionality. Moving beyond conventional approaches, we introduce a data-driven method that… ▽ More

    Submitted 3 May, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Project page: https://www.cs.cmu.edu/~afluo/BrainSCUBA

  13. arXiv:2310.00234  [pdf, other

    cs.CR cs.CV eess.IV

    Pixel-Inconsistency Modeling for Image Manipulation Localization

    Authors: Chenqi Kong, Anwei Luo, Shiqi Wang, Haoliang Li, Anderson Rocha, Alex C. Kot

    Abstract: Digital image forensics plays a crucial role in image authentication and manipulation localization. Despite the progress powered by deep neural networks, existing forgery localization methodologies exhibit limitations when deployed to unseen datasets and perturbed images (i.e., lack of generalization and robustness to real-world applications). To circumvent these problems and aid image integrity,… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  14. arXiv:2309.16217  [pdf, other

    cs.CV

    GAFlow: Incorporating Gaussian Attention into Optical Flow

    Authors: Ao Luo, Fan Yang, Xin Li, Lang Nie, Chunyu Lin, Haoqiang Fan, Shuaicheng Liu

    Abstract: Optical flow, or the estimation of motion fields from image sequences, is one of the fundamental problems in computer vision. Unlike most pixel-wise tasks that aim at achieving consistent representations of the same category, optical flow raises extra demands for obtaining local discrimination and smoothness, which yet is not fully explored by existing approaches. In this paper, we push Gaussian A… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: To appear in ICCV-2023

  15. arXiv:2309.11092  [pdf, other

    cs.CV cs.MM

    Forgery-aware Adaptive Vision Transformer for Face Forgery Detection

    Authors: Anwei Luo, Rizhao Cai, Chenqi Kong, Xiangui Kang, Jiwu Huang, Alex C. Kot

    Abstract: With the advancement in face manipulation technologies, the importance of face forgery detection in protecting authentication integrity becomes increasingly evident. Previous Vision Transformer (ViT)-based detectors have demonstrated subpar performance in cross-database evaluations, primarily because fully fine-tuning with limited Deepfake data often leads to forgetting pre-trained knowledge and o… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  16. arXiv:2309.05968  [pdf

    cs.LG cs.NE physics.bio-ph

    Neural Network Layer Matrix Decomposition reveals Latent Manifold Encoding and Memory Capacity

    Authors: Ng Shyh-Chang, A-Li Luo, Bo Qiu

    Abstract: We prove the converse of the universal approximation theorem, i.e. a neural network (NN) encoding theorem which shows that for every stably converged NN of continuous activation functions, its weight matrix actually encodes a continuous function that approximates its training dataset to within a finite margin of error over a bounded domain. We further show that using the Eckart-Young theorem for t… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  17. arXiv:2308.12535  [pdf, other

    cs.CV eess.IV

    SCP: Spherical-Coordinate-based Learned Point Cloud Compression

    Authors: Ao Luo, Linxin Song, Keisuke Nonaka, Kyohei Unno, Heming Sun, Masayuki Goto, Jiro Katto

    Abstract: In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this… ▽ More

    Submitted 8 February, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

  18. arXiv:2306.03089  [pdf, other

    cs.CV

    Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models

    Authors: Andrew F. Luo, Margaret M. Henderson, Leila Wehbe, Michael J. Tarr

    Abstract: A long standing goal in neuroscience has been to elucidate the functional organization of the brain. Within higher visual cortex, functional accounts have remained relatively coarse, focusing on regions of interest (ROIs) and taking the form of selectivity for broad categories such as faces, places, bodies, food, or words. Because the identification of such ROIs has typically relied on manually as… ▽ More

    Submitted 28 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Oral). Project page: https://www.cs.cmu.edu/~afluo/BrainDiVE/

  19. arXiv:2306.00306  [pdf, other

    cs.CV

    Low-Light Image Enhancement with Wavelet-based Diffusion Models

    Authors: Hai Jiang, Ao Luo, Songchen Han, Haoqiang Fan, Shuaicheng Liu

    Abstract: Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. To address these issues, we propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL. Specifically, we present a wavelet-based conditional diffusion model (WCDM) that leverages… ▽ More

    Submitted 25 September, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: Accepted by Siggraph Aisa 2023 (ACM Transactions on Graphics)

  20. arXiv:2305.10217  [pdf, other

    astro-ph.IM cs.CV

    Deep Learning Applications Based on WISE Infrared Data: Classification of Stars, Galaxies and Quasars

    Authors: Guiyu Zhao, Bo Qiu, A-Li Luo, Xiaoyu Guo, Lin Yao, Kun Wang, Yuanbo Liu

    Abstract: The Wide-field Infrared Survey Explorer (WISE) has detected hundreds of millions of sources over the entire sky. However, classifying them reliably is a great challenge due to degeneracies in WISE multicolor space and low detection levels in its two longest-wavelength bandpasses. In this paper, the deep learning classification network, IICnet (Infrared Image Classification network), is designed to… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  21. arXiv:2304.12489  [pdf, other

    cs.CV cs.CR

    Beyond the Prior Forgery Knowledge: Mining Critical Clues for General Face Forgery Detection

    Authors: Anwei Luo, Chenqi Kong, Jiwu Huang, Yongjian Hu, Xiangui Kang, Alex C. Kot

    Abstract: Face forgery detection is essential in combating malicious digital face attacks. Previous methods mainly rely on prior expert knowledge to capture specific forgery clues, such as noise patterns, blending boundaries, and frequency artifacts. However, these methods tend to get trapped in local optima, resulting in limited robustness and generalization capability. To address these issues, we propose… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  22. arXiv:2303.11011  [pdf, other

    cs.CV

    Learning Optical Flow from Event Camera with Rendered Dataset

    Authors: Xinglong Luo, Kunming Luo, Ao Luo, Zhengning Wang, Ping Tan, Shuaicheng Liu

    Abstract: We study the problem of estimating optical flow from event cameras. One important issue is how to build a high-quality event-flow dataset with accurate event values and flow labels. Previous datasets are created by either capturing real scenes by event cameras or synthesizing from images with pasted foreground objects. The former case can produce real event values but with calculated flow labels,… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  23. arXiv:2303.09603  [pdf, ps, other

    math.CO cs.SC

    Rigorous Analytic Combinatorics in Several Variables in SageMath

    Authors: Benjamin Hackl, Andrew Luo, Stephen Melczer, Jesse Selover, Elaine Wong

    Abstract: We introduce the new sage_acsv package for the SageMath computer algebra system, allowing users to rigorously compute asymptotics for a large variety of multivariate sequences with rational generating functions. Using Sage's support for exact computations over the algebraic number field, this package provides the first rigorous implementation of algorithms from the theory of analytic combinatorics… ▽ More

    Submitted 31 August, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: 8 pages; Package: https://pypi.org/project/sage-acsv/

    Journal ref: Séminaire Lotharingiende Combinatoire 89B (2023): Proceedings of the 35th FPSAC Conference, Article #90,12pp

  24. arXiv:2212.10081  [pdf, other

    astro-ph.IM astro-ph.GA cs.LG

    Galaxy Image Classification using Hierarchical Data Learning with Weighted Sampling and Label Smoothing

    Authors: Xiaohua Ma, Xiangru Li, Ali Luo, Jinqu Zhang, Hui Li

    Abstract: With the development of a series of Galaxy sky surveys in recent years, the observations increased rapidly, which makes the research of machine learning methods for galaxy image recognition a hot topic. Available automatic galaxy image recognition researches are plagued by the large differences in similarity between categories, the imbalance of data between different classes, and the discrepancy b… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: accepted by MNRAS

    Journal ref: Monthly Notices of the Royal Astronomical Society, 2023, 519(3): 4765-4779

  25. arXiv:2210.03137  [pdf, other

    cs.LG math.OC

    Deep Inventory Management

    Authors: Dhruv Madeka, Kari Torkkola, Carson Eisenach, Anna Luo, Dean P. Foster, Sham M. Kakade

    Abstract: This work provides a Deep Reinforcement Learning approach to solving a periodic review inventory control system with stochastic vendor lead times, lost sales, correlated demand, and price matching. While this dynamic program has historically been considered intractable, our results show that several policy learning approaches are competitive with or outperform classical methods. In order to train… ▽ More

    Submitted 28 November, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

  26. arXiv:2207.11075  [pdf, other

    cs.CV

    RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos

    Authors: Yunhui Han, Kunming Luo, Ao Luo, Jiangyu Liu, Haoqiang Fan, Guiming Luo, Shuaicheng Liu

    Abstract: Obtaining the ground truth labels from a video is challenging since the manual annotation of pixel-wise flow labels is prohibitively expensive and laborious. Besides, existing approaches try to adapt the trained model on synthetic datasets to authentic videos, which inevitably suffers from domain discrepancy and hinders the performance for real-world applications. To solve these problems, we propo… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: ECCV 2022 Oral

  27. arXiv:2206.00621  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

    Authors: Yan Zeng, Wangchunshu Zhou, Ao Luo, Ziming Cheng, Xinsong Zhang

    Abstract: In this paper, we introduce Cross-View Language Modeling, a simple and effective pre-training framework that unifies cross-lingual and cross-modal pre-training with shared architectures and objectives. Our approach is motivated by a key observation that cross-lingual and cross-modal pre-training share the same goal of aligning two different views of the same object into a common semantic space. To… ▽ More

    Submitted 12 June, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: ACL 2023

  28. arXiv:2204.00628  [pdf, other

    cs.SD cs.CV cs.LG cs.RO eess.AS

    Learning Neural Acoustic Fields

    Authors: Andrew Luo, Yilun Du, Michael J. Tarr, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

    Abstract: Our environment is filled with rich and dynamic acoustic information. When we walk into a cathedral, the reverberations as much as appearance inform us of the sanctuary's wide open space. Similarly, as an object moves around us, we expect the sound emitted to also exhibit this movement. While recent advances in learned implicit functions have led to increasingly higher quality representations of t… ▽ More

    Submitted 14 January, 2023; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: NeurIPS 2022. Project page: https://www.andrew.cmu.edu/user/afluo/Neural_Acoustic_Fields/

  29. arXiv:2202.06877  [pdf, other

    cs.CR

    A Review of zk-SNARKs

    Authors: Thomas Chen, Hui Lu, Teeramet Kunpittaya, Alan Luo

    Abstract: A zk-SNARK is a protocol that lets one party, the prover, prove to another party, the verifier, that a statement about some privately-held information is true without revealing the information itself. This paper describes technical foundations, current applications, and some novel applications of zk-SNARKs. Regarding technical foundations, we go over the Quadratic Arithmetic Program reduction and… ▽ More

    Submitted 25 October, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

  30. arXiv:2202.03857  [pdf, other

    cs.CV

    Learning Optical Flow with Adaptive Graph Reasoning

    Authors: Ao Luo, Fan Yang, Kunming Luo, Xin Li, Haoqiang Fan, Shuaicheng Liu

    Abstract: Estimating per-pixel motion between video frames, known as optical flow, is a long-standing problem in video understanding and analysis. Most contemporary optical flow techniques largely focus on addressing the cross-image matching with feature similarity, with few methods considering how to explicitly reason over the given scene for achieving a holistic motion understanding. In this work, taking… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

    Comments: To appear in AAAI-22

  31. arXiv:2201.00112  [pdf, other

    cs.CV

    SurfGen: Adversarial 3D Shape Synthesis with Explicit Surface Discriminators

    Authors: Andrew Luo, Tianqin Li, Wen-Hao Zhang, Tai Sing Lee

    Abstract: Recent advances in deep generative models have led to immense progress in 3D shape synthesis. While existing models are able to synthesize shapes represented as voxels, point-clouds, or implicit functions, these methods only indirectly enforce the plausibility of the final 3D shape surface. Here we present a 3D shape synthesis framework (SurfGen) that directly applies adversarial training to the o… ▽ More

    Submitted 31 December, 2021; originally announced January 2022.

    Comments: ICCV 2021. Project page: https://github.com/aluo-x/NeuralRaycaster

  32. arXiv:2104.03560  [pdf, other

    cs.CV

    ASFlow: Unsupervised Optical Flow Learning with Adaptive Pyramid Sampling

    Authors: Kunming Luo, Ao Luo, Chuan Wang, Haoqiang Fan, Shuaicheng Liu

    Abstract: We present an unsupervised optical flow estimation method by proposing an adaptive pyramid sampling in the deep pyramid network. Specifically, in the pyramid downsampling, we propose an Content Aware Pooling (CAP) module, which promotes local feature gathering by avoiding cross region pooling, so that the learned features become more representative. In the pyramid upsampling, we propose an Adaptiv… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  33. Hybrid Power-Law Models of Network Traffic

    Authors: Pat Devlin, Jeremy Kepner, Ashley Luo, Erin Meger

    Abstract: The availability of large scale streaming network data has reinforced the ubiquity of power-law distributions in observations and enabled precision measurements of the distribution parameters. The increased accuracy of these measurements allows new underlying generative network models to be explored. The preferential attachment model is a natural starting point for these models. This work adds add… ▽ More

    Submitted 29 March, 2021; originally announced March 2021.

    Comments: 8 pages, 4 figures. arXiv admin note: text overlap with arXiv:1904.04396

  34. arXiv:2101.09465  [pdf, other

    cs.CL

    WebSRC: A Dataset for Web-Based Structural Reading Comprehension

    Authors: Xingyu Chen, Zihan Zhao, Lu Chen, Danyang Zhang, Jiabao Ji, Ao Luo, Yuxuan Xiong, Kai Yu

    Abstract: Web search is an essential way for humans to obtain information, but it's still a great challenge for machines to understand the contents of web pages. In this paper, we introduce the task of structural reading comprehension (SRC) on web. Given a web page and a question about it, the task is to find the answer from the web page. This task requires a system not only to understand the semantics of t… ▽ More

    Submitted 8 November, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

    Comments: EMNLP 2021

  35. arXiv:2011.14535  [pdf, other

    cs.HC cs.GR

    Beyond LunAR: An augmented reality UI for deep-space exploration missions

    Authors: Sarah Radway, Anthony Luo, Carmine Elvezio, Jenny Cha, Sophia Kolak, Elijah Zulu, Sad Adib

    Abstract: As space exploration efforts shift to deep space missions, new challenges emerge regarding astronaut communication and task completion. While the round trip propagation delay for lunar communications is 2.6 seconds, the time delay increases to nearly 22 minutes for Mars missions. This creates a need for astronaut independence from earth-based assistance, and places greater significance upon the li… ▽ More

    Submitted 29 November, 2020; originally announced November 2020.

    Comments: 9 pages, 5 figures

    ACM Class: H.5; I.3

  36. arXiv:2008.03087  [pdf, other

    cs.CV

    Cascade Graph Neural Networks for RGB-D Salient Object Detection

    Authors: Ao Luo, Xin Li, Fan Yang, Zhicheng Jiao, Hong Cheng, Siwei Lyu

    Abstract: In this paper, we study the problem of salient object detection (SOD) for RGB-D images using both color and depth information.A major technical challenge in performing salient object detection fromRGB-D images is how to fully leverage the two complementary data sources. Current works either simply distill prior knowledge from the corresponding depth map for handling the RGB-image or blindly fuse c… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

  37. arXiv:2007.11744  [pdf, other

    cs.CV

    End-to-End Optimization of Scene Layout

    Authors: Andrew Luo, Zhoutong Zhang, Jiajun Wu, Joshua B. Tenenbaum

    Abstract: We propose an end-to-end variational generative model for scene layout synthesis conditioned on scene graphs. Unlike unconditional scene layout generation, we use scene graphs as an abstract but general representation to guide the synthesis of diverse scene layouts that satisfy relationships included in the scene graph. This gives rise to more flexible control over the synthesis process, allowing… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

    Comments: CVPR 2020 (Oral). Project page: http://3dsln.csail.mit.edu/

  38. arXiv:2007.11257  [pdf, other

    cs.CV cs.LG eess.IV

    Deep-VFX: Deep Action Recognition Driven VFX for Short Video

    Authors: Ao Luo, Ning Xie, Zhijia Tao, Feng Jiang

    Abstract: Human motion is a key function to communicate information. In the application, short-form mobile video is so popular all over the world such as Tik Tok. The users would like to add more VFX so as to pursue creativity and personlity. Many special effects are added on the short video platform. These gives the users more possibility to show off these personality. The common and traditional way is to… ▽ More

    Submitted 22 July, 2020; originally announced July 2020.

  39. arXiv:2007.10504  [pdf, other

    cs.AI cs.LG stat.ML

    Battlesnake Challenge: A Multi-agent Reinforcement Learning Playground with Human-in-the-loop

    Authors: Jonathan Chung, Anna Luo, Xavier Raffin, Scott Perry

    Abstract: We present the Battlesnake Challenge, a framework for multi-agent reinforcement learning with Human-In-the-Loop Learning (HILL). It is developed upon Battlesnake, a multiplayer extension of the traditional Snake game in which 2 or more snakes compete for the final survival. The Battlesnake Challenge consists of an offline module for model training and an online module for live competitions. We dev… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

  40. arXiv:2007.05186  [pdf, other

    cs.IR

    GLOW : Global Weighted Self-Attention Network for Web Search

    Authors: Xuan Shan, Chuanjie Liu, Yiqian Xia, Qi Chen, Yusi Zhang, Kaize Ding, Yaobo Liang, Angen Luo, Yuxiang Luo

    Abstract: Deep matching models aim to facilitate search engines retrieving more relevant documents by mapping queries and documents into semantic vectors in the first-stage retrieval. When leveraging BERT as the deep matching model, the attention score across two words are solely built upon local contextualized word embeddings. It lacks prior global knowledge to distinguish the importance of different words… ▽ More

    Submitted 23 May, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

    Comments: 8pages,2figures

  41. arXiv:2007.01510  [pdf, other

    cs.IR cs.CL cs.LG

    MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks

    Authors: Yusi Zhang, Chuanjie Liu, Angen Luo, Hui Xue, Xuan Shan, Yuxiang Luo, Yiqian Xia, Yuanchi Yan, Haidong Wang

    Abstract: We study the problem of deep recall model in industrial web search, which is, given a user query, retrieve hundreds of most relevance documents from billions of candidates. The common framework is to train two encoding models based on neural embedding which learn the distributed representations of queries and documents separately and match them in the latent semantic space. However, all the exitin… ▽ More

    Submitted 3 July, 2020; originally announced July 2020.

  42. arXiv:2006.02416  [pdf, other

    cs.CY

    Assessing Holistic Impacts of Major Events on the Bitcoin Blockchain Network

    Authors: Anthony Luo, Dianxiang Xu

    Abstract: As the pioneer of blockchain technology, Bitcoin is the most popular cryptocurrency to date. Given its dramatic price spikes (and crashes) along with the never-ending news from SEC regulations to security breaches, there seems to be a lack of understanding about the dynamics of cryptocurrencies. These dynamics are believed to be affected by various political, security, financial, and regulatory ev… ▽ More

    Submitted 3 June, 2020; originally announced June 2020.

    Comments: 23 pages, 20 figures, 10 tables

    ACM Class: K.4.4

  43. arXiv:2004.14584  [pdf, other

    cs.LG cs.CV stat.ML

    Out-of-the-box channel pruned networks

    Authors: Ragav Venkatesan, Gurumurthy Swaminathan, Xiong Zhou, Anna Luo

    Abstract: In the last decade convolutional neural networks have become gargantuan. Pre-trained models, when used as initializers are able to fine-tune ever larger networks on small datasets. Consequently, not all the convolutional features that these fine-tuned models detect are requisite for the end-task. Several works of channel pruning have been proposed to prune away compute and memory from models that… ▽ More

    Submitted 30 April, 2020; originally announced April 2020.

    Comments: Under review at ECCV 2020

  44. arXiv:2002.00092  [pdf, other

    cs.CV

    Hybrid Graph Neural Networks for Crowd Counting

    Authors: Ao Luo, Fan Yang, Xin Li, Dong Nie, Zhicheng Jiao, Shangchen Zhou, Hong Cheng

    Abstract: Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is stil… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

    Comments: To appear in AAAI 2020

  45. arXiv:1910.08695  [pdf, other

    cs.CV

    Fast Portrait Segmentation with Highly Light-weight Network

    Authors: Yuezun Li, Ao Luo, Siwei Lyu

    Abstract: In this paper, we describe a fast and light-weight portrait segmentation method based on a new highly light-weight backbone (HLB) architecture. The core element of HLB is a bottleneck-based factorized block (BFB) that has much fewer parameters than existing alternatives while keeping good learning capacity. Consequently, the HLB-based portrait segmentation method can run faster than the existing m… ▽ More

    Submitted 30 May, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

  46. arXiv:1901.02875  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Learning to Infer and Execute 3D Shape Programs

    Authors: Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: Human perception of 3D shapes goes beyond reconstructing them as a set of points or a composition of geometric primitives: we also effortlessly understand higher-level shape structure such as the repetition and reflective symmetry of object parts. In contrast, recent advances in 3D shape sensing focus more on low-level geometry but less on these higher-level relationships. In this paper, we propos… ▽ More

    Submitted 9 August, 2019; v1 submitted 9 January, 2019; originally announced January 2019.

    Comments: ICLR 2019. Project page: http://shape2prog.csail.mit.edu

  47. arXiv:1504.02164  [pdf, ps, other

    astro-ph.SR astro-ph.IM cs.CV

    Linearly Supporting Feature Extraction For Automated Estimation Of Stellar Atmospheric Parameters

    Authors: Xiangru Li, Yu Lu, Georges Comte, Ali Luo, Yongheng Zhao, Yongjun Wang

    Abstract: We describe a scheme to extract linearly supporting (LSU) features from stellar spectra to automatically estimate the atmospheric parameters $T_{eff}$, log$~g$, and [Fe/H]. "Linearly supporting" means that the atmospheric parameters can be accurately estimated from the extracted features through a linear model. The successive steps of the process are as follow: first, decompose the spectrum using… ▽ More

    Submitted 9 April, 2015; v1 submitted 8 April, 2015; originally announced April 2015.

    Comments: 21 pages, 7 figures, 8 tables, The Astrophysical Journal Supplement Series (accepted for publication)

    Journal ref: ApJS, 2015, 218(1): 3