This year, a Chinese scientist, Lin Xiao from Microsoft, won the Classic Thesis Award for his research.

Editor’s note: This article comes from WeChat public account “ Heart of the Machine ” (ID: almosthuman2014), author Synced, the original title was: “13,000 people participated in the conference, NeurIPS 2019 award-winning papers announced, Microsoft Chinese scholars won the classic paper award”.
The

NeurlPS 2019 conference was recently held, and the conference has announced the best papers and other awards. This year, a Chinese scientist, Lin Xiao from Microsoft, won the Classic Thesis Award for his research.

As the top international conference in the field of artificial intelligence, this year’s NeurlPS has attracted much attention as in previous years. Due to too many registrations, tickets for this year’s conference must be lucky. wechat_redirect “> Draw lottery decision .

In terms of papers, the number of submissions to this year ’s conference also hit a record high, once making NeurIPS server is down . In the end, a total of 6,743 valid papers were submitted and 1,428 accepted, with an acceptance rate of 21.17% .

Previously, there were also some scholars with Practitioners cannot attend the conference due to visa issues .

According to official statistics of the conference, the total number of participants this year has exceeded 13,000, and the number of applicants for the “lottery” has reached 15,000. The number of participants in 2018 was less than 9,000, and the number of participants rose by almost 50% in one year.

According to what the Heart of the Machine reporters saw at the scene, this year ’s conference was indeed unprecedented: At the host venue, the Vancouver Convention and Exhibition Center, the crowds who received the admission cards were queued early from the B1 floor to the second floor.

NeurIPS 2019: 13,000 people attended the conference, the winning papers were announced, and Microsoft Chinese scholars won the Classic Paper Award

On the second floor, you have to make a full circle:

NeurIPS 2019: 13,000 people participated in the conference, the winning papers were announced, and Microsoft Chinese scholars won the Classic Paper Award

NeurIPS’s ongoing activities are only exhibitions and some tutorials. Although the main agenda of the conference will start on December 10th, local time, just a few hours ago, the official announcement of the most interesting NeurIPS Outstanding papers and other awards.

It is worth noting that in addition to the Outstanding Paper Award and the Test of Time Award, this year’s organizing committee has also added the “Outstanding New Direction Paper Award” to commend the “For the Future” Researchers with outstanding performance in research innovation.

NeurIPS 2019: 13,000 people attended the conference, winning papers announced, Microsoft Chinese scholars won the Classic Paper Award

Outstanding Paper Award

Paper name: Distribution-Independent PAC Learning of Halfspaces with Massart Noise

NeurIPS 2019: 13,000 people attended the conference, winning papers announced, Microsoft Chinese scholars won the Classic Paper Award

Author: Ilias Diakonikolas, Themis Gouleakis, Christos Tzamos

Agency: University of Wisconsin-Madison, Max Planck

Paper address: https://papers.nips.cc/paper/8722-distribution-independent-pac-learning-of-halfspaces-with-massart-noise

Summary: The author of this article has studied PAC learning problems (under Massart noise) in distributed independent half-space.

Specifically, given a set of labelsThe sample (x, y) is sampled from the distribution D in the R ^ d + 1 dimension. Thus, the edge distribution on the unlabeled point x is arbitrary, and the label y is generated from an unknown half space, and this unknown half Space is destroyed by Massart noise, where the noise rate η <1/2. Now our goal is to find the hypothesis h, which can minimize misclassification error. NeurIPS 2019: 13,000 participants, the award-winning papers announced, Microsoft Chinese scholars won the classic paper award

For this problem, the authors propose a poly (d, 1 / ε) time algorithm with a misclassification error of η + ε. In addition, they provide evidence that the algorithm’s error guarantees may be computationally difficult to implement. The authors stated that prior to their research, even for class of disjunctions, there were no efficient weak (independently distributed) learners in this model.

This algorithm for half-space (or even disjunction) has been an open question in various studies, from Solan (1988), Cohen (1997) to the recent FOCS 2003 tutorial by Avrim Blum. This issue was highlighted.

Comment: This paper studies the case where the linear threshold function is in binary classification with unknown, bounded label noise training data. It solves a very basic and long-term problem and proposes an efficient algorithm for learning. This is a long-term open question in the core area of ​​machine learning, and this paper has made a huge contribution. Its contribution is to efficiently learn half-space under Massart noise.

For example: under 1% Massart noise, weak learning disjunction (error rate is 49%) is also open. This paper shows how to effectively achieve excess risk equivalent to Massart noise and epsilon (and perform poly (1 / epsilon) in time as expected). The algorithm is very complex and the results are technically difficult to determine. The ultimate goal is to be able to efficiently obtain excess risk equivalent to epsilon (timely execute poly (1 / epsilon)).

Outstanding New Direction Paper Award

Paper name: Uniform convergence may be unable to explain generalization in deep learning

Author:. Vaishnavh Nagarajan, J Zico Kolter

Institutions: Carnegie Mellon University, Bosch Center for Artificial Intelligence

Paper address: https://papers.nips.cc/paper/8722-distribution-independent-pac-learning-of-halfspaces-with-massart-noise

Abstract: In order to explain the surprisingly good generalization performance of parametric deep networks, recent papers have developed various generalization boundaries for deep learning. These boundaries are based on uniform convergence theory. Basic learning skills.

As we all know, many existing boundaries are numerically large. Through a lot of experiments, researchers have revealed a more interesting aspect of these boundaries: in fact, these boundaries can grow with the training data set While increasing.

According to the observations, they then gave some examples of over-parameterized linear classifiers and neural networks trained with gradient descent (GD), and in these examples, uniform convergence proved to be unable to “explain pan “Even if the implicit bias of gradient descent is considered as fully as possible.

More precisely, even if only the set of classifiers with gradient descent output is considered, the test error of these classifiers is less than some small ε in the setting. Researchers have also shown that applying (two-sided) consistent convergence to this set of classifiers will only produce a voiding generalization guarantee greater than 1-ε. Based on these findings, researchers have questioned the ability to generalize boundaries based on uniform convergence, in order to fully understand why over-parameterized deep networks generalize well.

Comment: This paper fundamentally presents some negative results, showing that many existing (norm-based) deep learning algorithm performance boundaries do not reach their stated results. The author further states that as those researchers continue to learn about bilateral consensus convergence mechanisms, they cannot achieve the results they claim. Although the paper does not solve (or pretend to solve) the generalization problem in deep neural networks, this is a starting point, indicating that the community is starting to look at deep learning from a new perspective.

Outstanding Paper Award Honorable Mention

Paper Name: Nonparametric density estimation& convergence of GANs under Besov IPM losses

Author: Ananya Uppal, Shashank Singh, Barnabás Póczos

Institution: Carnegie Mellon University

Paper address: https://papers.nips.cc/paper/9109-nonparametric-density-estimation-convergence-rates-for-gans-under-besov-ipm-losses

Abstract: In this article, the researchers explored the non-parametric probability density estimation problem of the large loss function family (Besov IPM), which includes L ^ p distance, total variation distance, and Wasserstein distance and KolmogorovSmirnov distance Fanhua version. For various loss function settings, the researchers provided upper and lower bounds to clarify exactly how the choice of loss function and data assumptions affect the determination of the minimax maximal convergence rate.

Researchers have also shown that linear distribution estimates often fail to achieve optimal convergence rates, such as linear distributions such as empirical distributions or kernel density estimators. The upper and lower bounds they come up with can help generalize, unify or enhance some of the most classic research results. In addition, IPM can be used to formally generate statistical models of adversarial networks. Therefore, the researchers showed how the results indicate the statistical error boundary of GAN, for example, GAN strictly surpasses the best linear estimator.

Comment: This paper uses rigorous theoretical demonstration to show that GAN has more advantages in probability density estimation than linear methods (in terms of convergence speed). Drawing on the results of previous research on wavelet shrinkage, this paper provides new insights into the characterization capabilities of GANs. Specifically, researchers have derived a minimum maximum convergence rate for non-parametric probability density estimates in an environment with a large number of categories of losses (called integral probability measures) and large function categories (Besov space).

Reviewers believe that this paper has a profound impact on the research work of non-parametric estimation and GAN.

Paper name: Fast and Accurate Least-Mean-Squares Solvers

Author: Alaa Maalouf, Ibrahim Jubran, Dan Feldman

Institution: University of Haifa, Israel

Paper address: https: //papers.nips.cc / paper / 9040-fast-and-accurate-least-mean-squares-solvers

Abstract: From linear regression to decision tree and matrix factorization, the minimum mean square solver is very important. The researchers advocate a new algorithm, which inputs a finite set of n-dimensional real vector, and outputs the weighted subset of d + 1 vectors, and each vector are equal. Caratheodory’s Theorem (1907) demonstrated the process of calculating this subset needs O (n ^ 2 * d ^ 2) time complexity, so is not practical in practice.

The researchers’ algorithm only needs O (nd) time complexity to calculate such a subset, and O (log n) calls Caratheodory to build on a smaller but more “smart” subset. This is a new paradigm of mixing between different data summarization techniques, such as sketches and coresets. As an application example, researchers show how the performance of conventional LMS algorithm for lifting solver, e.g. scikit-learn algorithm library, acceleration energy up to 100 times. Finally, existing experimental results and open source code are provided.

The following figure shows an overview of FAST-CARATHEODORY-SET algorithm, which in turn calculate equalization segmentation, each cluster sketch, core set (coreset) sets all sketch and B, is calculated for all joint sets sliced C, which corresponds to the previously calculated B, the last calculated recursively core set of C, until a sufficiently small core set.

NeurIPS 2019: 1.3 million participants, winning paper published, Microsoft won the classic Chinese scholars Paper Award

Comments: From linear and Lasso regression to singular value decomposition and elastic networks, the Least Mean-Square solver is at the core of many machine learning algorithms. This paper shows how to reduce the computational complexity of the minimum mean square solver by one to two orders of magnitude, and this reduction in complexity does not cause a loss of accuracy and can improve numerical stability.

This method relies on Caratheodory’s theorem, which shows that a core set (point set of d2 + 1 with dimension d) is sufficient to represent all n points in the convex set. The novelty of this paper isIt proposes a divide-and-conquer algorithm to extract a coreset with an acceptable complexity (O (nd + d5 log n), where d << n).

Reviewers emphasized the importance of this method. As a practical method, it can quickly implement and improve the performance of existing algorithms. At the same time, it can also be extended to other algorithms, because the method’s recursive block principle has a strong Sexuality.

Honorary nomination for Outstanding New Direction Paper Award

Paper name: Putting An End to End-to-End: Gradient-Isolated Learning of Representations

Author: Sindy Löwe, Peter O’Connor, Bastiaan Veeling

Institution: University of Amsterdam

Paper address: https://papers.nips.cc/paper/8568-putting-an-end-to-end-to-end-gradient-isolated-learning-of-representations

Abstract: In this paper, researchers propose a new deep learning method for local self-supervised representation learning. This method does not require labels or end-to-end backpropagation, but rather takes advantage of the natural order in the data. Biological nerves do not seem to be able to learn without backpropagating global error signals. Inspired by this, researchers have divided a deep neural network into independent gradient modules.

During training, each module utilizes the InfoNCE boundary proposed by Oord et al. [2018] to maximize the input information of the module. Although this is a greedy training method, the results still show that each module improves on the output of the previous module. In the audiovisual field, the representations created by the top modules have obtained very competitive results on downstream classification tasks.

The method proposed in this paper supports asynchronous optimization of modules, allowing very deep neural networks to perform large-scale distributed training on unlabeled data sets.

Comment: Based on the unsupervised standard proposed by Oord et al., this article revisits the layered construction of deep networks, especially the interactive information between current and temporal input representations. The self-organization in this perceptual network may be in algorithms (avoiding end-to-end optimization due to its huge memory footprint and computational problems) and cognition (using “slow function” characteristics to a biologically reasonable Learning process).

Paper Title: Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Author: Vincent Sitzmann, Michael Zollhöfer, Gordon Wetzstein

Agency: Stanford University

Paper address: https://papers.nips.cc/paper/8396-scene-representation-networks-continuous-3d-structure-aware-neural-scene-representations

Abstract: Unsupervised learning with generative models has the potential to discover rich representations of 3D scenes. Although geometric deep learning has explored 3D structure-aware representations of scene geometry, these models often require explicit 3D supervision. The emerging neural scene representation can only be trained using various pose 2D images, but existing methods ignore the three-dimensional structure of the scene.

In this article, researchers have proposed a scene representation network (SRN), which is a continuous 3D structure-aware scene representation that encodes both geometry and appearance. SRN characterizes the scene as a continuous function and maps world coordinates to a feature representation of local scene attributes.

By describing the imaging system systematically as a differentiable ray-marching algorithm, SRN can perform end-to-end training. It only needs to use 2D images and camera positions of these images without depth or shape. This method can be generalized naturally between different scenes, and the powerful geometry and appearance prior in the learning process.

Comment: This paper perfectly combines the two main methods in CV—multi-view geometry method and depth characterization method. Specifically, this paper has three major contributions: 1) based on each voxel’s neural renderer, which realizes 3D perceptual non-resolution rendering of the scene; 2) a differentiable ray-marching algorithm that solves the problem of looking for light along the camera The problem of surface intersections; 3) A potential scene characterization method that uses autoencoders and supernetworks to regress the parameters of the scene characterization network.

Test of Time Award

Last year, the Classic Paper Award at the NeuroIPS Conference was awarded to researchers at NEC and Google. The principle of awarding the Classic Paper Award is “important contribution, lasting impact, and broad appeal.”