-
EdgePruner: Poisoned Edge Pruning in Graph Contrastive Learning
Authors:
Hiroya Kato,
Kento Hasegawa,
Seira Hidano,
Kazuhide Fukushima
Abstract:
Graph Contrastive Learning (GCL) is unsupervised graph representation learning that can obtain useful representation of unknown nodes. The node representation can be utilized as features of downstream tasks. However, GCL is vulnerable to poisoning attacks as with existing learning models. A state-of-the-art defense cannot sufficiently negate adverse effects by poisoned graphs although such a defen…
▽ More
Graph Contrastive Learning (GCL) is unsupervised graph representation learning that can obtain useful representation of unknown nodes. The node representation can be utilized as features of downstream tasks. However, GCL is vulnerable to poisoning attacks as with existing learning models. A state-of-the-art defense cannot sufficiently negate adverse effects by poisoned graphs although such a defense introduces adversarial training in the GCL. To achieve further improvement, pruning adversarial edges is important. To the best of our knowledge, the feasibility remains unexplored in the GCL domain. In this paper, we propose a simple defense for GCL, EdgePruner. We focus on the fact that the state-of-the-art poisoning attack on GCL tends to mainly add adversarial edges to create poisoned graphs, which means that pruning edges is important to sanitize the graphs. Thus, EdgePruner prunes edges that contribute to minimizing the contrastive loss based on the node representation obtained after training on poisoned graphs by GCL. Furthermore, we focus on the fact that nodes with distinct features are connected by adversarial edges in poisoned graphs. Thus, we introduce feature similarity between neighboring nodes to help more appropriately determine adversarial edges. This similarity is helpful in further eliminating adverse effects from poisoned graphs on various datasets. Finally, EdgePruner outputs a graph that yields the minimum contrastive loss as the sanitized graph. Our results demonstrate that pruning adversarial edges is feasible on six datasets. EdgePruner can improve the accuracy of node classification under the attack by up to 5.55% compared with that of the state-of-the-art defense. Moreover, we show that EdgePruner is immune to an adaptive attack.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
'I am both here and there' Parallel Control of Multiple Robotic Avatars by Disabled Workers in a Café
Authors:
Giulia Barbareschi,
Midori Kawaguchi,
Hiroki Kato,
Masato Nagahiro,
Kazuaki Takehuchi,
Yoshifumi Shiiba,
Shunichi Kasahara,
Kai Kunze,
Kouta Minamizawa
Abstract:
Robotic avatars can help disabled people extend their reach in interacting with the world. Technological advances make it possible for individuals to embody multiple avatars simultaneously. However, existing studies have been limited to laboratory conditions and did not involve disabled participants. In this paper, we present a real-world implementation of a parallel control system allowing disabl…
▽ More
Robotic avatars can help disabled people extend their reach in interacting with the world. Technological advances make it possible for individuals to embody multiple avatars simultaneously. However, existing studies have been limited to laboratory conditions and did not involve disabled participants. In this paper, we present a real-world implementation of a parallel control system allowing disabled workers in a café to embody multiple robotic avatars at the same time to carry out different tasks. Our data corpus comprises semi-structured interviews with workers, customer surveys, and videos of café operations. Results indicate that the system increases workers' agency, enabling them to better manage customer journeys. Parallel embodiment and transitions between avatars create multiple interaction loops where the links between disabled workers and customers remain consistent, but the intermediary avatar changes. Based on our observations, we theorize that disabled individuals possess specific competencies that increase their ability to manage multiple avatar bodies.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Interaction in Remote Peddling Using Avatar Robot by People with Disabilities
Authors:
Takashi Kanetsuna,
Kazuaki Takeuchi,
Hiroaki Kato,
Taichi Sono,
Hirotaka Osawa,
Kentaro Yoshifuji,
Yoichi Yamazaki
Abstract:
Telework "avatar work," in which people with disabilities can engage in physical work such as customer service, is being implemented in society. In order to enable avatar work in a variety of occupations, we propose a mobile sales system using a mobile frozen drink machine and an avatar robot "OriHime", focusing on mobile customer service like peddling. The effect of the peddling by the system on…
▽ More
Telework "avatar work," in which people with disabilities can engage in physical work such as customer service, is being implemented in society. In order to enable avatar work in a variety of occupations, we propose a mobile sales system using a mobile frozen drink machine and an avatar robot "OriHime", focusing on mobile customer service like peddling. The effect of the peddling by the system on the customers are examined based on the results of video annotation.
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Multi-View Neural Surface Reconstruction with Structured Light
Authors:
Chunyu Li,
Taisuke Hashimoto,
Eiichi Matsumoto,
Hiroharu Kato
Abstract:
Three-dimensional (3D) object reconstruction based on differentiable rendering (DR) is an active research topic in computer vision. DR-based methods minimize the difference between the rendered and target images by optimizing both the shape and appearance and realizing a high visual reproductivity. However, most approaches perform poorly for textureless objects because of the geometrical ambiguity…
▽ More
Three-dimensional (3D) object reconstruction based on differentiable rendering (DR) is an active research topic in computer vision. DR-based methods minimize the difference between the rendered and target images by optimizing both the shape and appearance and realizing a high visual reproductivity. However, most approaches perform poorly for textureless objects because of the geometrical ambiguity, which means that multiple shapes can have the same rendered result in such objects. To overcome this problem, we introduce active sensing with structured light (SL) into multi-view 3D object reconstruction based on DR to learn the unknown geometry and appearance of arbitrary scenes and camera poses. More specifically, our framework leverages the correspondences between pixels in different views calculated by structured light as an additional constraint in the DR-based optimization of implicit surface, color representations, and camera poses. Because camera poses can be optimized simultaneously, our method realizes high reconstruction accuracy in the textureless region and reduces efforts for camera pose calibration, which is required for conventional SL-based methods. Experiment results on both synthetic and real data demonstrate that our system outperforms conventional DR- and SL-based methods in a high-quality surface reconstruction, particularly for challenging objects with textureless or shiny surfaces.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Specialized Re-Ranking: A Novel Retrieval-Verification Framework for Cloth Changing Person Re-Identification
Authors:
Renjie Zhang,
Yu Fang,
Huaxin Song,
Fangbin Wan,
Yanwei Fu,
Hirokazu Kato,
Yang Wu
Abstract:
Cloth changing person re-identification(Re-ID) can work under more complicated scenarios with higher security than normal Re-ID and biometric techniques and is therefore extremely valuable in applications. Meanwhile, higher flexibility in appearance always leads to more similar-looking confusing images, which is the weakness of the widely used retrieval methods. In this work, we shed light on how…
▽ More
Cloth changing person re-identification(Re-ID) can work under more complicated scenarios with higher security than normal Re-ID and biometric techniques and is therefore extremely valuable in applications. Meanwhile, higher flexibility in appearance always leads to more similar-looking confusing images, which is the weakness of the widely used retrieval methods. In this work, we shed light on how to handle these similar images. Specifically, we propose a novel retrieval-verification framework. Given an image, the retrieval module can search for similar images quickly. Our proposed verification network will then compare the input image and the candidate images by contrasting those local details and give a similarity score. An innovative ranking strategy is also introduced to take a good balance between retrieval and verification results. Comprehensive experiments are conducted to show the effectiveness of our framework and its capability in improving the state-of-the-art methods remarkably on both synthetic and realistic datasets.
△ Less
Submitted 7 October, 2022;
originally announced October 2022.
-
Meta Avatar Robot Cafe: Linking Physical and Virtual Cybernetic Avatars to Provide Physical Augmentation for People with Disabilities
Authors:
Yoichi Yamazaki,
Tsukuto Yamada,
Hiroki Nomura,
Nobuaki Hosoda,
Ryoma Kawamura,
Kazuaki Takeuchi,
Hiroaki Kato,
Ryuma Niiyama,
Kentaro Yoshifuji
Abstract:
Meta avatar robot cafe is a cafe that fuses cyberspace and physical space to create new encounters with people. We create a place where people with disabilities who have difficulty going out can freely switch between their physical bodies and virtual bodies, and communicate their presence and warmth to each other.
Meta avatar robot cafe is a cafe that fuses cyberspace and physical space to create new encounters with people. We create a place where people with disabilities who have difficulty going out can freely switch between their physical bodies and virtual bodies, and communicate their presence and warmth to each other.
△ Less
Submitted 18 July, 2022;
originally announced August 2022.
-
Simultaneous Contact-Rich Grasping and Locomotion via Distributed Optimization Enabling Free-Climbing for Multi-Limbed Robots
Authors:
Yuki Shirai,
Xuan Lin,
Alexander Schperberg,
Yusuke Tanaka,
Hayato Kato,
Varit Vichathorn,
Dennis Hong
Abstract:
While motion planning of locomotion for legged robots has shown great success, motion planning for legged robots with dexterous multi-finger grasping is not mature yet. We present an efficient motion planning framework for simultaneously solving locomotion (e.g., centroidal dynamics), grasping (e.g., patch contact), and contact (e.g., gait) problems. To accelerate the planning process, we propose…
▽ More
While motion planning of locomotion for legged robots has shown great success, motion planning for legged robots with dexterous multi-finger grasping is not mature yet. We present an efficient motion planning framework for simultaneously solving locomotion (e.g., centroidal dynamics), grasping (e.g., patch contact), and contact (e.g., gait) problems. To accelerate the planning process, we propose distributed optimization frameworks based on Alternating Direction Methods of Multipliers (ADMM) to solve the original large-scale Mixed-Integer NonLinear Programming (MINLP). The resulting frameworks use Mixed-Integer Quadratic Programming (MIQP) to solve contact and NonLinear Programming (NLP) to solve nonlinear dynamics, which are more computationally tractable and less sensitive to parameters. Also, we explicitly enforce patch contact constraints from limit surfaces with micro-spine grippers. We demonstrate our proposed framework in the hardware experiments, showing that the multi-limbed robot is able to realize various motions including free-climbing at a slope angle 45° with a much shorter planning time.
△ Less
Submitted 5 July, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
SCALER: A Tough Versatile Quadruped Free-Climber Robot
Authors:
Yusuke Tanaka,
Yuki Shirai,
Xuan Lin,
Alexander Schperberg,
Hayato Kato,
Alexander Swerdlow,
Naoya Kumagai,
Dennis Hong
Abstract:
This paper introduces SCALER, a quadrupedal robot that demonstrates climbing on bouldering walls, overhangs, ceilings and trotting on the ground. SCALER is one of the first high-degrees of freedom four-limbed robots that can free-climb under the Earth's gravity and one of the most mechanically efficient quadrupeds on the ground. Where other state-of-the-art climbers specialize in climbing, SCALER…
▽ More
This paper introduces SCALER, a quadrupedal robot that demonstrates climbing on bouldering walls, overhangs, ceilings and trotting on the ground. SCALER is one of the first high-degrees of freedom four-limbed robots that can free-climb under the Earth's gravity and one of the most mechanically efficient quadrupeds on the ground. Where other state-of-the-art climbers specialize in climbing, SCALER promises practical free-climbing with payload \textit{and} ground locomotion, which realizes true versatile mobility. A new climbing gait, SKATE gait, increases the payload by utilizing the SCALER body linkage mechanism. SCALER achieves a maximum normalized locomotion speed of $1.87$ /s, or $0.56$ m/s on the ground and $1.0$ /min, or $0.35$ m/min in bouldering wall climbing. Payload capacity reaches $233$ % of the SCALER weight on the ground and $35$ % on the vertical wall. Our GOAT gripper, a mechanically adaptable underactuated two-finger gripper, successfully grasps convex and non-convex objects and supports SCALER.
△ Less
Submitted 30 July, 2022; v1 submitted 3 July, 2022;
originally announced July 2022.
-
Monocular Differentiable Rendering for Self-Supervised 3D Object Detection
Authors:
Deniz Beker,
Hiroharu Kato,
Mihai Adrian Morariu,
Takahiro Ando,
Toru Matsuoka,
Wadim Kehl,
Adrien Gaidon
Abstract:
3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale. To overcome this ambiguity, we present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks. Our method predicts the 3D location and meshes of each object in an image u…
▽ More
3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale. To overcome this ambiguity, we present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects with the help of strong shape priors and 2D instance masks. Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective derived from a pretrained monocular depth estimation network. We use the KITTI 3D object detection dataset to evaluate the accuracy of the method. Experiments demonstrate that we can effectively use noisy monocular depth and differentiable rendering as an alternative to expensive 3D ground-truth labels or LiDAR information.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
Differentiable Rendering: A Survey
Authors:
Hiroharu Kato,
Deniz Beker,
Mihai Morariu,
Takahiro Ando,
Toru Matsuoka,
Wadim Kehl,
Adrien Gaidon
Abstract:
Deep neural networks (DNNs) have shown remarkable performance improvements on vision-related tasks such as object detection or image segmentation. Despite their success, they generally lack the understanding of 3D objects which form the image, as it is not always possible to collect 3D information about the scene or to easily annotate it. Differentiable rendering is a novel field which allows the…
▽ More
Deep neural networks (DNNs) have shown remarkable performance improvements on vision-related tasks such as object detection or image segmentation. Despite their success, they generally lack the understanding of 3D objects which form the image, as it is not always possible to collect 3D information about the scene or to easily annotate it. Differentiable rendering is a novel field which allows the gradients of 3D objects to be calculated and propagated through images. It also reduces the requirement of 3D data collection and annotation, while enabling higher success rate in various applications. This paper reviews existing literature and discusses the current state of differentiable rendering, its applications and open research problems.
△ Less
Submitted 30 July, 2020; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Self-supervised Learning of 3D Objects from Natural Images
Authors:
Hiroharu Kato,
Tatsuya Harada
Abstract:
We present a method to learn single-view reconstruction of the 3D shape, pose, and texture of objects from categorized natural images in a self-supervised manner. Since this is a severely ill-posed problem, carefully designing a training method and introducing constraints are essential. To avoid the difficulty of training all elements at the same time, we propose training category-specific base sh…
▽ More
We present a method to learn single-view reconstruction of the 3D shape, pose, and texture of objects from categorized natural images in a self-supervised manner. Since this is a severely ill-posed problem, carefully designing a training method and introducing constraints are essential. To avoid the difficulty of training all elements at the same time, we propose training category-specific base shapes with fixed pose distribution and simple textures first, and subsequently training poses and textures using the obtained shapes. Another difficulty is that shapes and backgrounds sometimes become excessively complicated to mistakenly reconstruct textures on object surfaces. To suppress it, we propose using strong regularization and constraints on object surfaces and background images. With these two techniques, we demonstrate that we can use natural image collections such as CIFAR-10 and PASCAL objects for training, which indicates the possibility to realize 3D object reconstruction on diverse object categories beyond synthetic datasets.
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Programmable View Update Strategies on Relations
Authors:
Van-Dang Tran,
Hiroyuki Kato,
Zhenjiang Hu
Abstract:
View update is an important mechanism that allows updates on a view by translating them into the corresponding updates on the base relations. The existing literature has shown the ambiguity of translating view updates. To address this ambiguity, we propose a robust language-based approach for making view update strategies programmable and validatable. Specifically, we introduce a novel approach to…
▽ More
View update is an important mechanism that allows updates on a view by translating them into the corresponding updates on the base relations. The existing literature has shown the ambiguity of translating view updates. To address this ambiguity, we propose a robust language-based approach for making view update strategies programmable and validatable. Specifically, we introduce a novel approach to use Datalog to describe these update strategies. We propose a validation algorithm to check the well-behavedness of the written Datalog programs. We present a fragment of the Datalog language for which our validation is both sound and complete. This fragment not only has good properties in theory but is also useful for solving practical view updates. Furthermore, we develop an algorithm for optimizing user-written programs to efficiently implement updatable views in relational database management systems. We have implemented our proposed approach. The experimental results show that our framework is feasible and efficient in practice.
△ Less
Submitted 31 August, 2020; v1 submitted 13 November, 2019;
originally announced November 2019.
-
Toward Co-existing Database Schemas based on Bidirectional Transformation
Authors:
Jumpei Tanaka,
Van-Dang Tran,
Hiroyuki kato,
Zhenjiang Hu
Abstract:
According to strong demands for rapid and reliable software delivery, co-existing database schema versions with multiple application versions are reality to contribute them. Current database management systems do not support co-existing schema versions in one database. Although a design of co-existing schema based on updatable view tables was previously proposed, its flexibility is limited due to…
▽ More
According to strong demands for rapid and reliable software delivery, co-existing database schema versions with multiple application versions are reality to contribute them. Current database management systems do not support co-existing schema versions in one database. Although a design of co-existing schema based on updatable view tables was previously proposed, its flexibility is limited due to pre-defined several restrictions to achieve data synchronization among schemas and handling independent unsynchronized data in each schema. In this preliminary report, we present a new approach for co-existing schemas based on bidirectional transformation. We explain the required properties to realize co-existing schemas, bidirectionality and totality. We show that the co-existing schemas can be implemented systematically by applying putback-based bidirectional transformation to satisfy both the bidirectionality and the totality. While the bidirectionality can be satisfied by applying bidirectional transformation, to satisfy the totality, extra functions need to be introduced. How to derive these extra functions is presented.
△ Less
Submitted 30 October, 2019; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Learning View Priors for Single-view 3D Reconstruction
Authors:
Hiroharu Kato,
Tatsuya Harada
Abstract:
There is some ambiguity in the 3D shape of an object when the number of observed views is small. Because of this ambiguity, although a 3D object reconstructor can be trained using a single view or a few views per object, reconstructed shapes only fit the observed views and appear incorrect from the unobserved viewpoints. To reconstruct shapes that look reasonable from any viewpoint, we propose to…
▽ More
There is some ambiguity in the 3D shape of an object when the number of observed views is small. Because of this ambiguity, although a 3D object reconstructor can be trained using a single view or a few views per object, reconstructed shapes only fit the observed views and appear incorrect from the unobserved viewpoints. To reconstruct shapes that look reasonable from any viewpoint, we propose to train a discriminator that learns prior knowledge regarding possible views. The discriminator is trained to distinguish the reconstructed views of the observed viewpoints from those of the unobserved viewpoints. The reconstructor is trained to correct unobserved views by fooling the discriminator. Our method outperforms current state-of-the-art methods on both synthetic and natural image datasets; this validates the effectiveness of our method.
△ Less
Submitted 29 March, 2019; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Learning sparse optimal rule fit by safe screening
Authors:
Hiroki Kato,
Hiroyuki Hanada,
Ichiro Takeuchi
Abstract:
In this paper, we consider linear prediction models in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyperrectangle in the input space. Since the number of all possible rules generated from the training dataset becomes extremely large, it has been difficult to consider all of them when fitting a sparse model. In this paper, we propose Safe O…
▽ More
In this paper, we consider linear prediction models in the form of a sparse linear combination of rules, where a rule is an indicator function defined over a hyperrectangle in the input space. Since the number of all possible rules generated from the training dataset becomes extremely large, it has been difficult to consider all of them when fitting a sparse model. In this paper, we propose Safe Optimal Rule Fit (SORF) as an approach to resolve this problem, which is formulated as a convex optimization problem with sparse regularization. The proposed SORF method utilizes the fact that the set of all possible rules can be represented as a tree. By extending a recently popularized convex optimization technique called safe screening, we develop a novel method for pruning the tree such that pruned nodes are guaranteed to be irrelevant to the prediction model. This approach allows us to efficiently learn a prediction model constructed from an exponentially large number of all possible rules. We demonstrate the usefulness of the proposed method by numerical experiments using several benchmark datasets.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
Making View Update Strategies Programmable - Toward Controlling and Sharing Distributed Data -
Authors:
Yasuhito Asano,
Soichiro Hidaka,
Zhenjiang Hu,
Yasunori Ishihara,
Hiroyuki Kato,
Hsiang-Shang Ko,
Keisuke Nakano,
Makoto Onizuka,
Yuya Sasaki,
Toshiyuki Shimizu,
Van-Dang Tran,
Kanae Tsushima,
Masatoshi Yoshikawa
Abstract:
Views are known mechanisms for controlling access of data and for sharing data of different schemas. Despite long and intensive research on views in both the database community and the programming language community, we are facing difficulties to use views in practice. The main reason is that we lack ways to directly describe view update strategies to deal with the inherent ambiguity of view updat…
▽ More
Views are known mechanisms for controlling access of data and for sharing data of different schemas. Despite long and intensive research on views in both the database community and the programming language community, we are facing difficulties to use views in practice. The main reason is that we lack ways to directly describe view update strategies to deal with the inherent ambiguity of view updating. This paper aims to provide a new language-based approach to controlling and sharing distributed data based on views, and establish a software foundation for systematic construction of such data management systems. Our key observation is that a view should be defined through a view update strategy rather than a view definition. We show that Datalog can be used for specifying view update strategies whose unique view definition can be automatically derived, present a novel P2P-based programmable architecture for distributed data management where updatable views are fully utilized for controlling and sharing distributed data, and demonstrate its usefulness through the development of a privacy-preserving ride-sharing alliance system.
△ Less
Submitted 27 September, 2018;
originally announced September 2018.
-
A View-based Programmable Architecture for Controlling and Integrating Decentralized Data
Authors:
Yasuhito Asano,
Soichiro Hidaka,
Zhenjiang Hu,
Yasunori Ishihara,
Hiroyuki Kato,
Hsiang-Shang Ko,
Keisuke Nakano,
Makoto Onizuka,
Yuya Sasaki,
Toshiyuki Shimizu,
Kanae Tsushima,
Masatoshi Yoshikawa
Abstract:
The view and the view update are known mechanism for controlling access of data and for integrating data of different schemas. Despite intensive and long research on them in both the database community and the programming language community, we are facing difficulties to use them in practice. The main reason is that we are lacking of control over the view update strategy to deal with inherited amb…
▽ More
The view and the view update are known mechanism for controlling access of data and for integrating data of different schemas. Despite intensive and long research on them in both the database community and the programming language community, we are facing difficulties to use them in practice. The main reason is that we are lacking of control over the view update strategy to deal with inherited ambiguity of view update for a given view.
This vision paper aims to provide a new language-based approach to controlling and integrating decentralized data based on the view, and establish a software foundation for systematic construction of such data management systems. Our key observation is that a view should be defined through a view update strategy rather than a query. In other words, the view definition should be extracted from the view update strategy, which is in sharp contrast to the traditional approaches where the view update strategy is derived from the view definition.
In this paper, we present the first programmable architecture with a declarative language for specifying update strategies over views, whose unique view definition can be automatically derived, and show how it can be effectively used to control data access, integrate data generally allowing coexistence of GAV (global as view) and LAV (local as view), and perform both analysis and updates on the integrated data. We demonstrate its usefulness through development of a privacy-preserving ride-sharing alliance system, discuss its application scope, and highlight future challenges.
△ Less
Submitted 18 March, 2018;
originally announced March 2018.
-
Neural 3D Mesh Renderer
Authors:
Hiroharu Kato,
Yoshitaka Ushiku,
Tatsuya Harada
Abstract:
For modeling the 3D world behind 2D images, which 3D representation is most appropriate? A polygon mesh is a promising candidate for its compactness and geometric properties. However, it is not straightforward to model a polygon mesh from 2D images using neural networks because the conversion from a mesh to an image, or rendering, involves a discrete operation called rasterization, which prevents…
▽ More
For modeling the 3D world behind 2D images, which 3D representation is most appropriate? A polygon mesh is a promising candidate for its compactness and geometric properties. However, it is not straightforward to model a polygon mesh from 2D images using neural networks because the conversion from a mesh to an image, or rendering, involves a discrete operation called rasterization, which prevents back-propagation. Therefore, in this work, we propose an approximate gradient for rasterization that enables the integration of rendering into neural networks. Using this renderer, we perform single-image 3D mesh reconstruction with silhouette image supervision and our system outperforms the existing voxel-based approach. Additionally, we perform gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and 3D DeepDream, with 2D supervision for the first time. These applications demonstrate the potential of the integration of a mesh renderer into neural networks and the effectiveness of our proposed renderer.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
Melody Generation for Pop Music via Word Representation of Musical Properties
Authors:
Andrew Shin,
Leopold Crestel,
Hiroharu Kato,
Kuniaki Saito,
Katsunori Ohnishi,
Masataka Yamaguchi,
Masahiro Nakawaki,
Yoshitaka Ushiku,
Tatsuya Harada
Abstract:
Automatic melody generation for pop music has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melody has turned out to be highly challenging due to a number of factors. Representation of multivariate property of notes has been one of the primary challenges. It is also difficult to remain in the permissible spectrum of musical variety, out…
▽ More
Automatic melody generation for pop music has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melody has turned out to be highly challenging due to a number of factors. Representation of multivariate property of notes has been one of the primary challenges. It is also difficult to remain in the permissible spectrum of musical variety, outside of which would be perceived as a plain random play without auditory pleasantness. Observing the conventional structure of pop music poses further challenges. In this paper, we propose to represent each note and its properties as a unique `word,' thus lessening the prospect of misalignments between the properties, as well as reducing the complexity of learning. We also enforce regularization policies on the range of notes, thus encouraging the generated melody to stay close to what humans would find easy to follow. Furthermore, we generate melody conditioned on song part information, thus replicating the overall structure of a full song. Experimental results demonstrate that our model can generate auditorily pleasant songs that are more indistinguishable from human-written ones than previous models.
△ Less
Submitted 31 October, 2017;
originally announced October 2017.
-
Visual Language Modeling on CNN Image Representations
Authors:
Hiroharu Kato,
Tatsuya Harada
Abstract:
Measuring the naturalness of images is important to generate realistic images or to detect unnatural regions in images. Additionally, a method to measure naturalness can be complementary to Convolutional Neural Network (CNN) based features, which are known to be insensitive to the naturalness of images. However, most probabilistic image models have insufficient capability of modeling the complex a…
▽ More
Measuring the naturalness of images is important to generate realistic images or to detect unnatural regions in images. Additionally, a method to measure naturalness can be complementary to Convolutional Neural Network (CNN) based features, which are known to be insensitive to the naturalness of images. However, most probabilistic image models have insufficient capability of modeling the complex and abstract naturalness that we feel because they are built directly on raw image pixels. In this work, we assume that naturalness can be measured by the predictability on high-level features during eye movement. Based on this assumption, we propose a novel method to evaluate the naturalness by building a variant of Recurrent Neural Network Language Models on pre-trained CNN representations. Our method is applied to two tasks, demonstrating that 1) using our method as a regularizer enables us to generate more understandable images from image features than existing approaches, and 2) unnaturalness maps produced by our method achieve state-of-the-art eye fixation prediction performance on two well-studied datasets.
△ Less
Submitted 9 November, 2015;
originally announced November 2015.
-
Image Reconstruction from Bag-of-Visual-Words
Authors:
Hiroharu Kato,
Tatsuya Harada
Abstract:
The objective of this work is to reconstruct an original image from Bag-of-Visual-Words (BoVW). Image reconstruction from features can be a means of identifying the characteristics of features. Additionally, it enables us to generate novel images via features. Although BoVW is the de facto standard feature for image recognition and retrieval, successful image reconstruction from BoVW has not been…
▽ More
The objective of this work is to reconstruct an original image from Bag-of-Visual-Words (BoVW). Image reconstruction from features can be a means of identifying the characteristics of features. Additionally, it enables us to generate novel images via features. Although BoVW is the de facto standard feature for image recognition and retrieval, successful image reconstruction from BoVW has not been reported yet. What complicates this task is that BoVW lacks the spatial information for including visual words. As described in this paper, to estimate an original arrangement, we propose an evaluation function that incorporates the naturalness of local adjacency and the global position, with a method to obtain related parameters using an external image database. To evaluate the performance of our method, we reconstruct images of objects of 101 kinds. Additionally, we apply our method to analyze object classifiers and to generate novel images via BoVW.
△ Less
Submitted 19 May, 2015;
originally announced May 2015.