Skip to main content

Showing 1–50 of 84 results for author: Sato, I

  1. arXiv:2406.12353  [pdf, other

    stat.ML cs.LG

    Top-Down Bayesian Posterior Sampling for Sum-Product Networks

    Authors: Soma Yokoi, Issei Sato

    Abstract: Sum-product networks (SPNs) are probabilistic models characterized by exact and fast evaluation of fundamental probabilistic operations. Its superior computational tractability has led to applications in many fields, such as machine learning with time constraints or accuracy requirements and real-time systems. The structural constraints of SPNs supporting fast inference, however, lead to increased… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  2. arXiv:2405.16747  [pdf, other

    cs.LG

    Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective

    Authors: Akiyoshi Tomihari, Issei Sato

    Abstract: The two-stage fine-tuning (FT) method, linear probing then fine-tuning (LP-FT), consistently outperforms linear probing (LP) and FT alone in terms of accuracy for both in-distribution (ID) and out-of-distribution (OOD) data. This success is largely attributed to the preservation of pre-trained features, achieved through a near-optimal linear head obtained during LP. However, despite the widespread… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  3. arXiv:2402.09050  [pdf, other

    cs.LG

    End-to-End Training Induces Information Bottleneck through Layer-Role Differentiation: A Comparative Analysis with Layer-wise Training

    Authors: Keitaro Sakamoto, Issei Sato

    Abstract: End-to-end (E2E) training, optimizing the entire model through error backpropagation, fundamentally supports the advancements of deep learning. Despite its high performance, E2E training faces the problems of memory consumption, parallel computing, and discrepancy with the functionalities of the actual brain. Various alternative methods have been proposed to overcome these difficulties; however, n… ▽ More

    Submitted 31 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: TMLR2024

  4. arXiv:2310.17951  [pdf, other

    cs.CV cs.AI

    Understanding Parameter Saliency via Extreme Value Theory

    Authors: Shuo Wang, Issei Sato

    Abstract: Deep neural networks are being increasingly implemented throughout society in recent years. It is useful to identify which parameters trigger misclassification in diagnosing undesirable model behaviors. The concept of parameter saliency is proposed and used to diagnose convolutional neural networks (CNNs) by ranking convolution filters that may have caused misclassification on the basis of paramet… ▽ More

    Submitted 5 December, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  5. arXiv:2310.06379  [pdf, other

    cs.LG

    Initialization Bias of Fourier Neural Operator: Revisiting the Edge of Chaos

    Authors: Takeshi Koshizuka, Masahiro Fujisawa, Yusuke Tanaka, Issei Sato

    Abstract: This paper investigates the initialization bias of the Fourier neural operator (FNO). A mean-field theory for FNO is established, analyzing the behavior of the random FNO from an \emph{edge of chaos} perspective. We uncover that the forward and backward propagation behaviors exhibit characteristics unique to FNO, induced by mode truncation, while also showcasing similarities to those of densely co… ▽ More

    Submitted 15 February, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2307.14023  [pdf, other

    cs.LG

    Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

    Authors: Tokio Kajitsuka, Issei Sato

    Abstract: Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice. This is primarily due to the interpretation of the softmax function as an approximation of the hardmax function. By clarifying the connection between the softmax function and the Boltzmann operator,… ▽ More

    Submitted 29 January, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: ICLR 2024

    MSC Class: 68T07 ACM Class: I.2.0

  7. arXiv:2305.19743  [pdf, other

    cs.CV

    Towards Monocular Shape from Refraction

    Authors: Antonin Sulc, Imari Sato, Bastian Goldluecke, Tali Treibitz

    Abstract: Refraction is a common physical phenomenon and has long been researched in computer vision. Objects imaged through a refractive object appear distorted in the image as a function of the shape of the interface between the media. This hinders many computer vision applications, but can be utilized for obtaining the geometry of the refractive interface. Previous approaches for refractive surface recov… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 12 pages, 6 figures, The 32nd British Machine Vision Conference (BMVC)

    Journal ref: 32nd British Machine Vision Conference 2021, BMVA Press, 2021,

  8. arXiv:2305.16573  [pdf, other

    cs.LG

    Exploring Weight Balancing on Long-Tailed Recognition Problem

    Authors: Naoya Hasegawa, Issei Sato

    Abstract: Recognition problems in long-tailed data, in which the sample size per class is heavily skewed, have gained importance because the distribution of the sample size per class in a dataset is generally exponential unless the sample size is intentionally adjusted. Various methods have been devised to address these problems.Recently, weight balancing, which combines well-known classical regularization… ▽ More

    Submitted 28 April, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: Paper accepted for publication at ICLR 2024

  9. arXiv:2212.10352  [pdf, other

    cs.NE cs.LG

    Fixed-Weight Difference Target Propagation

    Authors: Tatsukichi Shibuya, Nakamasa Inoue, Rei Kawakami, Ikuro Sato

    Abstract: Target Propagation (TP) is a biologically more plausible algorithm than the error backpropagation (BP) to train deep networks, and improving practicality of TP is an open issue. TP methods require the feedforward and feedback networks to form layer-wise autoencoders for propagating the target values generated at the output layer. However, this causes certain drawbacks; e.g., careful hyperparameter… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23). 9 pages and 3 figures in main manuscript; 11 pages and 5 figures in supplementary material

  10. arXiv:2211.11492  [pdf, other

    cs.CV

    ClipCrop: Conditioned Cropping Driven by Vision-Language Model

    Authors: Zhihang Zhong, Mingxi Cheng, Zhirong Wu, Yuhui Yuan, Yinqiang Zheng, Ji Li, Han Hu, Stephen Lin, Yoichi Sato, Imari Sato

    Abstract: Image cropping has progressed tremendously under the data-driven paradigm. However, current approaches do not account for the intentions of the user, which is an issue especially when the composition of the input image is complex. Moreover, labeling of cropping data is costly and hence the amount of data is limited, leading to poor generalization performance of current algorithms in the wild. In t… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  11. arXiv:2211.11423  [pdf, other

    cs.CV

    Blur Interpolation Transformer for Real-World Motion from Blur

    Authors: Zhihang Zhong, Mingdeng Cao, Xiang Ji, Yinqiang Zheng, Imari Sato

    Abstract: This paper studies the challenging problem of recovering motion from blur, also known as joint deblurring and interpolation or blur temporal super-resolution. The challenges are twofold: 1) the current methods still leave considerable room for improvement in terms of visual quality even on the synthetic dataset, and 2) poor generalization to real-world data. To this end, we propose a blur interpol… ▽ More

    Submitted 7 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: Accepted by CVPR2023

  12. Informative Sample-Aware Proxy for Deep Metric Learning

    Authors: Aoyu Li, Ikuro Sato, Kohta Ishikawa, Rei Kawakami, Rio Yokota

    Abstract: Among various supervised deep metric learning methods proxy-based approaches have achieved high retrieval accuracies. Proxies, which are class-representative points in an embedding space, receive updates based on proxy-sample similarities in a similar manner to sample representations. In existing methods, a relatively small number of samples can produce large gradient magnitudes (ie, hard samples)… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted at ACM Multimedia Asia (MMAsia) 2022

  13. arXiv:2211.08583  [pdf, other

    cs.LG cs.AI

    Empirical Study on Optimizer Selection for Out-of-Distribution Generalization

    Authors: Hiroki Naganuma, Kartik Ahuja, Shiro Takagi, Tetsuya Motokawa, Rio Yokota, Kohta Ishikawa, Ikuro Sato, Ioannis Mitliagkas

    Abstract: Modern deep learning systems do not generalize well when the test data distribution is slightly different to the training data distribution. While much promising work has been accomplished to address this fragility, a systematic study of the role of optimizers and their out-of-distribution generalization performance has not been undertaken. In this study, we examine the performance of popular firs… ▽ More

    Submitted 5 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted to TMLR

  14. arXiv:2207.10123  [pdf, other

    cs.CV

    Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance

    Authors: Zhihang Zhong, Xiao Sun, Zhirong Wu, Yinqiang Zheng, Stephen Lin, Imari Sato

    Abstract: We study the challenging problem of recovering detailed motion from a single motion-blurred image. Existing solutions to this problem estimate a single image sequence without considering the motion ambiguity for each region. Therefore, the results tend to converge to the mean of the multi-modal possibilities. In this paper, we explicitly account for such motion ambiguity, allowing us to generate m… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: ECCV2022

  15. arXiv:2207.01847  [pdf, other

    cs.LG

    PoF: Post-Training of Feature Extractor for Improving Generalization

    Authors: Ikuro Sato, Ryota Yamada, Masayuki Tanaka, Nakamasa Inoue, Rei Kawakami

    Abstract: It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feat… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML2022. Contains a link to the code

  16. arXiv:2206.01606  [pdf, ps, other

    stat.ML cs.LG

    Excess risk analysis for epistemic uncertainty with application to variational inference

    Authors: Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama

    Abstract: Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU). Due to computational complexity issues, approximation methods such as variational inference (VI) have been used in practice to obtain posterior distributions and their generalization abilities have been analyzed extensively, for example, by PAC-Bayesian theory; however, little analysis… ▽ More

    Submitted 11 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  17. arXiv:2206.00944  [pdf, other

    cs.LG cs.CV stat.ML

    Feature Space Particle Inference for Neural Network Ensembles

    Authors: Shingo Yashima, Teppei Suzuki, Kohta Ishikawa, Ikuro Sato, Rei Kawakami

    Abstract: Ensembles of deep neural networks demonstrate improved performance over single models. For enhancing the diversity of ensemble members while keeping their performance, particle-based inference methods offer a promising approach from a Bayesian perspective. However, the best way to apply these methods to neural networks is still unclear: seeking samples from the weight-space posterior suffers from… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: ICML2022

  18. arXiv:2205.07320  [pdf, other

    cs.LG stat.ML

    Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective

    Authors: Keitaro Sakamoto, Issei Sato

    Abstract: The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large le… ▽ More

    Submitted 28 September, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

  19. arXiv:2204.13849  [pdf, other

    cs.CV cs.LG

    Goldilocks-curriculum Domain Randomization and Fractal Perlin Noise with Application to Sim2Real Pneumonia Lesion Detection

    Authors: Takahiro Suzuki, Shouhei Hanaoka, Issei Sato

    Abstract: A computer-aided detection (CAD) system based on machine learning is expected to assist radiologists in making a diagnosis. It is desirable to build CAD systems for the various types of diseases accumulating daily in a hospital. An obstacle in developing a CAD system for a disease is that the number of medical images is typically too small to improve the performance of the machine learning model.… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

  20. arXiv:2204.08226  [pdf, other

    cs.LG cs.CV

    Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey

    Authors: Kento Nozawa, Issei Sato

    Abstract: Representation learning enables us to automatically extract generic feature representations from a dataset to solve another machine learning task. Recently, extracted feature representations by a representation learning algorithm and a simple predictor have exhibited state-of-the-art performance on several machine learning tasks. Despite its remarkable progress, there exist various ways to evaluat… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: The extended version of "Kento Nozawa and Issei Sato. Evaluation Methods for Representation Learning: A Survey. In IJCAI-ECAI Survey Track, 2022."

  21. arXiv:2204.04853  [pdf, other

    cs.LG q-bio.PE

    Neural Lagrangian Schrödinger Bridge: Diffusion Modeling for Population Dynamics

    Authors: Takeshi Koshizuka, Issei Sato

    Abstract: Population dynamics is the study of temporal and spatial variation in the size of populations of organisms and is a major part of population ecology. One of the main difficulties in analyzing population dynamics is that we can only obtain observation data with coarse time intervals from fixed-point observations due to experimental costs or measurement constraints. Recently, modeling population dyn… ▽ More

    Submitted 26 February, 2023; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Published at ICLR 2023 (notable top 25%)

  22. arXiv:2203.13694  [pdf, other

    cs.CV

    Implicit Neural Representations for Variable Length Human Motion Generation

    Authors: Pablo Cervantes, Yusuke Sekikawa, Ikuro Sato, Koichi Shinoda

    Abstract: We propose an action-conditional human motion generation method using variational implicit neural representations (INR). The variational formalism enables action-conditional distributions of INRs, from which one can easily sample representations to generate novel human motion sequences. Our method offers variable-length sequence generation by construction because a part of INR is optimized for a w… ▽ More

    Submitted 15 July, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: Accepted to ECCV 2022

  23. arXiv:2203.06451  [pdf, other

    cs.CV

    Bringing Rolling Shutter Images Alive with Dual Reversed Distortion

    Authors: Zhihang Zhong, Mingdeng Cao, Xiao Sun, Zhirong Wu, Zhongyi Zhou, Yinqiang Zheng, Stephen Lin, Imari Sato

    Abstract: Rolling shutter (RS) distortion can be interpreted as the result of picking a row of pixels from instant global shutter (GS) frames over time during the exposure of the RS camera. This means that the information of each instant GS frame is partially, yet sequentially, embedded into the row-dependent distortion. Inspired by this fact, we address the challenging task of reversing this process, i.e.,… ▽ More

    Submitted 20 July, 2022; v1 submitted 12 March, 2022; originally announced March 2022.

    Comments: ECCV2022 Oral

  24. arXiv:2110.05076  [pdf, other

    cs.CV cs.LG

    A Closer Look at Prototype Classifier for Few-shot Image Classification

    Authors: Mingcheng Hou, Issei Sato

    Abstract: The prototypical network is a prototype classifier based on meta-learning and is widely used for few-shot learning because it classifies unseen examples by constructing class-specific prototypes without adjusting hyper-parameters during meta-testing. Interestingly, recent research has attracted a lot of attention, showing that training a new linear classifier, which does not use a meta-learning al… ▽ More

    Submitted 15 September, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: 21 pages with 10 appendix section Our paper has been accepted in 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  25. arXiv:2108.13753  [pdf, other

    stat.ML cs.LG

    Disentanglement Analysis with Partial Information Decomposition

    Authors: Seiya Tokui, Issei Sato

    Abstract: We propose a framework to analyze how multivariate representations disentangle ground-truth generative factors. A quantitative analysis of disentanglement has been based on metrics designed to compare how one variable explains each generative factor. Current metrics, however, may fail to detect entanglement that involves more than two variables, e.g., representations that duplicate and rotate gene… ▽ More

    Submitted 9 February, 2022; v1 submitted 31 August, 2021; originally announced August 2021.

    Comments: ICLR 2022

  26. arXiv:2106.16028  [pdf, other

    cs.CV

    Real-world Video Deblurring: A Benchmark Dataset and An Efficient Recurrent Neural Network

    Authors: Zhihang Zhong, Ye Gao, Yinqiang Zheng, Bo Zheng, Imari Sato

    Abstract: Real-world video deblurring in real time still remains a challenging task due to the complexity of spatially and temporally varying blur itself and the requirement of low computational cost. To improve the network efficiency, we adopt residual dense blocks into RNN cells, so as to efficiently extract the spatial features of the current frame. Furthermore, a global spatio-temporal attention module… ▽ More

    Submitted 15 October, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: Accepted by IJCV (extended version of ECCV2020)

  27. arXiv:2106.05010  [pdf, ps, other

    stat.ML cs.LG

    Loss function based second-order Jensen inequality and its application to particle variational inference

    Authors: Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama

    Abstract: Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure… ▽ More

    Submitted 9 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

  28. arXiv:2105.11599  [pdf, other

    cs.CV cs.GR

    Multi-view 3D Reconstruction of a Texture-less Smooth Surface of Unknown Generic Reflectance

    Authors: Ziang Cheng, Hongdong Li, Yuta Asano, Yinqiang Zheng, Imari Sato

    Abstract: Recovering the 3D geometry of a purely texture-less object with generally unknown surface reflectance (e.g. non-Lambertian) is regarded as a challenging task in multi-view reconstruction. The major obstacle revolves around establishing cross-view correspondences where photometric constancy is violated. This paper proposes a simple and practical solution to overcome this challenge based on a co-loc… ▽ More

    Submitted 24 May, 2021; originally announced May 2021.

    Comments: Accepted to CVPR2021

  29. arXiv:2104.05014  [pdf, other

    cs.CV

    One Ring to Rule Them All: a simple solution to multi-view 3D-Reconstruction of shapes with unknown BRDF via a small Recurrent ResNet

    Authors: Ziang Cheng, Hongdong Li, Richard Hartley, Yinqiang Zheng, Imari Sato

    Abstract: This paper proposes a simple method which solves an open problem of multi-view 3D-Reconstruction for objects with unknown and generic surface materials, imaged by a freely moving camera and a freely moving point light source. The object can have arbitrary (e.g. non-Lambertian), spatially-varying (or everywhere different) surface reflectances (svBRDF). Our solution consists of two smallsized neural… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

  30. arXiv:2104.01601  [pdf, other

    cs.CV

    Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes

    Authors: Zhihang Zhong, Yinqiang Zheng, Imari Sato

    Abstract: Joint rolling shutter correction and deblurring (RSCD) techniques are critical for the prevalent CMOS cameras. However, current approaches are still based on conventional energy optimization and are developed for static scenes. To enable learning-based approaches to address real-world RSCD problem, we contribute the first dataset, BS-RSCD, which includes both ego-motion and object-motion in dynami… ▽ More

    Submitted 4 April, 2021; originally announced April 2021.

    Comments: To be published in CVPR 2021

  31. arXiv:2103.09414  [pdf, other

    cs.PL cs.LG

    Toward Neural-Network-Guided Program Synthesis and Verification

    Authors: Naoki Kobayashi, Taro Sekiyama, Issei Sato, Hiroshi Unno

    Abstract: We propose a novel framework of program and invariant synthesis called neural network-guided synthesis. We first show that, by suitably designing and training neural networks, we can extract logical formulas over integers from the weights and biases of the trained neural networks. Based on the idea, we have implemented a tool to synthesize formulas from positive/negative examples and implication c… ▽ More

    Submitted 25 August, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

    Comments: A summary will appear in Proceedings of SAS 2021, Springer LNCS

  32. arXiv:2102.12232  [pdf, ps, other

    cs.LG cs.NE

    Abelian Neural Networks

    Authors: Kenshin Abe, Takanori Maehara, Issei Sato

    Abstract: We study the problem of modeling a binary operation that satisfies some algebraic requirements. We first construct a neural network architecture for Abelian group operations and derive a universal approximation property. Then, we extend it to Abelian semigroup operations using the characterization of associative symmetric polynomials. Both models take advantage of the analytic invertibility of inv… ▽ More

    Submitted 24 February, 2021; originally announced February 2021.

  33. arXiv:2102.06866  [pdf, other

    cs.LG stat.ML

    Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning

    Authors: Kento Nozawa, Issei Sato

    Abstract: Instance discriminative self-supervised representation learning has been attracted attention thanks to its unsupervised nature and informative feature representation for downstream tasks. In practice, it commonly uses a larger number of negative samples than the number of supervised classes. However, there is an inconsistency in the existing analysis; theoretically, a large number of negative samp… ▽ More

    Submitted 14 January, 2022; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021. 26 pages, 6 figures, and 6 tables

  34. arXiv:2102.00678  [pdf, other

    cs.LG stat.ML

    Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification

    Authors: Nan Lu, Shida Lei, Gang Niu, Issei Sato, Masashi Sugiyama

    Abstract: To cope with high annotation costs, training a classifier only from weakly supervised data has attracted a great deal of attention these days. Among various approaches, strengthening supervision from completely unsupervised classification is a promising direction, which typically employs class priors as the only supervision and trains a binary classifier from unlabeled (U) datasets. While existing… ▽ More

    Submitted 11 June, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: ICML2021 camera-ready version

  35. arXiv:2011.11152  [pdf, other

    cs.LG cs.AI

    On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective

    Authors: Zeke Xie, Zhiqiang Xu, Jingzhao Zhang, Issei Sato, Masashi Sugiyama

    Abstract: Weight decay is a simple yet powerful regularization technique that has been very widely used in training of deep neural networks (DNNs). While weight decay has attracted much attention, previous studies fail to discover some overlooked pitfalls on large gradient norms resulted by weight decay. In this paper, we discover that, weight decay can unfortunately lead to large gradient norms at the fina… ▽ More

    Submitted 19 October, 2023; v1 submitted 22 November, 2020; originally announced November 2020.

    Comments: NeurIPS 2023, 21 pages, 20 figures. Keywords: Weight Decay, Regularization, Optimization, Deep Learning

  36. arXiv:2011.06220  [pdf, other

    cs.LG

    Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting

    Authors: Zeke Xie, Fengxiang He, Shaopeng Fu, Issei Sato, Dacheng Tao, Masashi Sugiyama

    Abstract: Deep learning is often criticized by two serious issues which rarely exist in natural nervous systems: overfitting and catastrophic forgetting. It can even memorize randomly labelled data, which has little knowledge behind the instance-label pairs. When a deep network continually learns over time by accommodating new tasks, it usually quickly overwrites the knowledge learned from previous tasks. R… ▽ More

    Submitted 10 May, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

    Comments: Accepted by Neural Computation, MIT Press;20 pages; 13 figures; Key Words: Neural Variability, Neuroscience, Deep Learning, Label Noise, Catastrophic Forgetting

  37. arXiv:2008.00645  [pdf, other

    cs.LG stat.ML

    Active Classification with Uncertainty Comparison Queries

    Authors: Zhenghang Cui, Issei Sato

    Abstract: Noisy pairwise comparison feedback has been incorporated to improve the overall query complexity of interactively learning binary classifiers. The \textit{positivity comparison oracle} is used to provide feedback on which is more likely to be positive given a pair of data points. Because it is impossible to infer accurate labels using this oracle alone \textit{without knowing the classification th… ▽ More

    Submitted 28 October, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: Code and Dataset: https://github.com/zchenry/uncertainty-comparison

  38. arXiv:2007.01659  [pdf, other

    stat.ML cs.LG

    Diagnostic Uncertainty Calibration: Towards Reliable Machine Predictions in Medical Domain

    Authors: Takahiro Mimori, Keiko Sasada, Hirotaka Matsui, Issei Sato

    Abstract: We propose an evaluation framework for class probability estimates (CPEs) in the presence of label uncertainty, which is commonly observed as diagnosis disagreement between experts in the medical domain. We also formalize evaluation metrics for higher-order statistics, including inter-rater disagreement, to assess predictions on label uncertainty. Moreover, we propose a novel post-hoc method calle… ▽ More

    Submitted 22 March, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: 31 pages, 6 figures

  39. arXiv:2006.15815  [pdf, other

    cs.LG stat.ML

    Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

    Authors: Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

    Abstract: Adaptive Moment Estimation (Adam), which combines Adaptive Learning Rate and Momentum, would be the most popular stochastic optimizer for accelerating the training of deep neural networks. However, it is empirically known that Adam often generalizes worse than Stochastic Gradient Descent (SGD). The purpose of this paper is to unveil the mystery of this behavior in the diffusion theoretical framewo… ▽ More

    Submitted 14 June, 2022; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: ICML2022, Long Oral Presentation, 30 pages, 14 figures, Key Words: Deep Learning Theory, Optimization, Adam, Adaptive Inertia, Flat Minima

  40. arXiv:2006.08306  [pdf, other

    cs.LG stat.ML

    LFD-ProtoNet: Prototypical Network Based on Local Fisher Discriminant Analysis for Few-shot Learning

    Authors: Kei Mukaiyama, Issei Sato, Masashi Sugiyama

    Abstract: The prototypical network (ProtoNet) is a few-shot learning framework that performs metric learning and classification using the distance to prototype representations of each class. It has attracted a great deal of attention recently since it is simple to implement, highly extensible, and performs well in experiments. However, it only takes into account the mean of the support vectors as prototypes… ▽ More

    Submitted 25 September, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 20 pages

    MSC Class: 68T01(Primary); 68T05(Secondary)

  41. arXiv:2006.07571  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    $γ$-ABC: Outlier-Robust Approximate Bayesian Computation Based on a Robust Divergence Estimator

    Authors: Masahiro Fujisawa, Takeshi Teshima, Issei Sato, Masashi Sugiyama

    Abstract: Approximate Bayesian computation (ABC) is a likelihood-free inference method that has been employed in various applications. However, ABC can be sensitive to outliers if a data discrepancy measure is chosen inappropriately. In this paper, we propose to use a nearest-neighbor-based $γ$-divergence estimator as a data discrepancy measure. We show that our estimator possesses a suitable theoretical ro… ▽ More

    Submitted 5 March, 2021; v1 submitted 13 June, 2020; originally announced June 2020.

    Comments: The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021); 48 pages, 22 figures

  42. arXiv:2006.06207  [pdf, other

    stat.ML cs.LG

    Pairwise Supervision Can Provably Elicit a Decision Boundary

    Authors: Han Bao, Takuya Shimada, Liyuan Xu, Issei Sato, Masashi Sugiyama

    Abstract: Similarity learning is a general problem to elicit useful representations by predicting the relationship between a pair of patterns. This problem is related to various important preprocessing tasks such as metric learning, kernel learning, and contrastive learning. A classifier built upon the representations is expected to perform well in downstream classification; however, little theory has been… ▽ More

    Submitted 28 February, 2022; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: In Proceedings of AISTATS2021

  43. arXiv:2005.04107  [pdf, other

    cs.GR cs.HC cs.LG

    Sequential Gallery for Interactive Visual Design Optimization

    Authors: Yuki Koyama, Issei Sato, Masataka Goto

    Abstract: Visual design tasks often involve tuning many design parameters. For example, color grading of a photograph involves many parameters, some of which non-expert users might be unfamiliar with. We propose a novel user-in-the-loop optimization method that allows users to efficiently find an appropriate parameter set by exploring such a high-dimensional design space through much easier two-dimensional… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: To be published at ACM Trans. Graph. (Proc. SIGGRAPH 2020); Project page available at https://koyama.xyz/project/sequential_gallery/

    Journal ref: ACM Trans. Graph. 39, 4 (July 2020), pp.88:1-88:12

  44. arXiv:2003.04691  [pdf, other

    stat.ML cs.LG

    Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time

    Authors: Hideaki Imamura, Nontawat Charoenphakdee, Futoshi Futami, Issei Sato, Junya Honda, Masashi Sugiyama

    Abstract: The Gaussian process bandit is a problem in which we want to find a maximizer of a black-box function with the minimum number of function evaluations. If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework. However, a drawback with current methods is in the assumption that the evaluation time for every observation is constant, which can be unre… ▽ More

    Submitted 10 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

  45. arXiv:2002.03497  [pdf, other

    cs.LG stat.ML

    Few-shot Domain Adaptation by Causal Mechanism Transfer

    Authors: Takeshi Teshima, Issei Sato, Masashi Sugiyama

    Abstract: We study few-shot supervised domain adaptation (DA) for regression problems, where only a few labeled target domain data and many labeled source domain data are available. Many of the current DA methods base their transfer assumptions on either parametrized distribution shift or apparent distribution similarities, e.g., identical conditionals or small distributional discrepancies. However, these a… ▽ More

    Submitted 18 August, 2020; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: 33 pages, 3 figures. Camera-ready version for Thirty-seventh International Conference on Machine Learning (ICML 2020)

  46. arXiv:2002.03495  [pdf, other

    cs.LG stat.ML

    A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

    Authors: Zeke Xie, Issei Sato, Masashi Sugiyama

    Abstract: Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice. SGD is known to find a flat minimum that often generalizes well. However, it is mathematically unclear how deep learning can select a flat minimum among so many minima. To answer the question quantitatively, we develop a density diffusion theory (DDT) to reveal how minima selection qua… ▽ More

    Submitted 15 January, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

    Comments: ICLR 2021; 28 pages; 19 figures

  47. arXiv:2001.07847  [pdf, other

    eess.IV cs.CV cs.LG

    A versatile anomaly detection method for medical images with a flow-based generative model in semi-supervision setting

    Authors: H. Shibata, S. Hanaoka, Y. Nomura, T. Nakao, I. Sato, D. Sato, N. Hayashi, O. Abe

    Abstract: Oversight in medical images is a crucial problem, and timely reporting of medical images is desired. Therefore, an all-purpose anomaly detection method that can detect virtually all types of lesions/diseases in a given image is strongly desired. However, few commercially available and versatile anomaly detection methods for medical images have been provided so far. Recently, anomaly detection meth… ▽ More

    Submitted 20 October, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

  48. arXiv:1911.09011  [pdf, other

    stat.ML cs.LG

    Bayesian interpretation of SGD as Ito process

    Authors: Soma Yokoi, Issei Sato

    Abstract: The current interpretation of stochastic gradient descent (SGD) as a stochastic process lacks generality in that its numerical scheme restricts continuous-time dynamics as well as the loss function and the distribution of gradient noise. We introduce a simplified scheme with milder conditions that flexibly interprets SGD as a discrete-time approximation of an Ito process. The scheme also works as… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

  49. arXiv:1911.06181  [pdf, other

    cs.CV cs.LG

    Adversarial Transformations for Semi-Supervised Learning

    Authors: Teppei Suzuki, Ikuro Sato

    Abstract: We propose a Regularization framework based on Adversarial Transformations (RAT) for semi-supervised learning. RAT is designed to enhance robustness of the output distribution of class prediction for a given data against input perturbation. RAT is an extension of Virtual Adversarial Training (VAT) in such a way that RAT adversarialy transforms data along the underlying data distribution by a rich… ▽ More

    Submitted 18 November, 2019; v1 submitted 13 November, 2019; originally announced November 2019.

    Comments: Accepted by AAAI 2020

  50. arXiv:1907.10225  [pdf, ps, other

    cs.LG stat.ML

    Classification from Triplet Comparison Data

    Authors: Zhenghang Cui, Nontawat Charoenphakdee, Issei Sato, Masashi Sugiyama

    Abstract: Learning from triplet comparison data has been extensively studied in the context of metric learning, where we want to learn a distance metric between two instances, and ordinal embedding, where we want to learn an embedding in an Euclidean space of the given instances that preserves the comparison order as well as possible. Unlike fully-labeled data, triplet comparison data can be collected in a… ▽ More

    Submitted 18 April, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

    Comments: Code: https://github.com/zchenry/triplet_classification