This paper presents a scalable approach for semi-supervised learning on graph-structured data using an efficient variant of convolutional neural networks that operate directly on graphs. The authors motivate their convolutional architecture using a localized first-order approximation of spectral graph convolutions. The paper reports linear scaling in the number of graph edges and hidden representations that encode local graph structure and node features. Experiments on citation networks and a knowledge graph dataset show the approach outperforming related methods by a significant margin.
This brief summarizes a paper that trains image models from natural language supervision by predicting which caption matches which image at internet scale, then uses language to enable zero-shot transfer to many downstream vision tasks.
This paper analyzes why many machine learning models, including neural networks, misclassify adversarial examples created by small worst-case perturbations, and it presents a fast method to generate such examples for adversarial training.
This paper introduces conditional generative adversarial nets (cGANs) by feeding a conditioning variable y to both the generator and discriminator, and reports demonstrations on MNIST class-conditional digit generation plus preliminary examples for multimodal modeling and image tagging.
This paper proposes extending an encoder–decoder neural machine translation model by letting the model soft-search the source sentence for the parts most relevant to predicting each target word, addressing a conjectured bottleneck from encoding the entire source sentence into a single fixed-length vector.
TensorFlow is presented as a machine learning system that operates at large scale and in heterogeneous environments. The paper describes TensorFlow as using dataflow graphs to represent computation, shared state, and the operations that mutate that state.
This paper studies transfer learning for NLP through a single text-to-text framework, comparing pre-training objectives, architectures, data, and transfer approaches across many tasks, and reporting state-of-the-art results on multiple benchmarks using scale and the Colossal Clean Crawled Corpus.
A brief summary of arXiv:1412.3555, which compares recurrent units in RNNs and reports that gated units such as LSTM and GRU outperform traditional tanh units on polyphonic music and speech signal modeling tasks, with GRU comparable to LSTM.
MobileNets introduces an efficient CNN family for mobile and embedded vision that uses depth-wise separable convolutions and two global hyper-parameters to trade off latency and accuracy across tasks such as ImageNet classification and object detection.
This brief summarizes the TensorFlow paper (arXiv:1603.04467), focusing on what it claims about expressing machine learning computations and executing them across heterogeneous devices from mobile hardware to distributed clusters.
YOLOv4 studies which CNN features and training techniques reliably improve object detection, and it reports combining selected components such as CSP, CmBN, SAT, Mish, Mosaic augmentation, DropBlock, and CIoU loss to reach state-of-the-art results.
This paper describes knowledge distillation as a way to compress the predictive behavior of an expensive ensemble into a single model that is easier to deploy, and it reports results on MNIST and an acoustic model used in a commercial system.
This paper presents an end-to-end approach to mapping one sequence to another using multilayered LSTMs in an encoder–decoder setup, and it reports results on WMT’14 English-to-French translation with a BLEU score of 34.8 under an out-of-vocabulary penalty.
This brief summarizes the PyTorch paper (arXiv:1912.01703), which argues that usability and speed can be compatible in a deep learning framework through an imperative, Pythonic design that remains efficient and supports accelerators like GPUs.
This paper proposes two new model architectures for learning continuous vector representations of words from very large datasets, and it reports improved accuracy on word similarity evaluations at substantially lower computational cost, including training high-quality vectors from 1.6 billion words in less than a day.
This brief summarizes arXiv:1310.4546, which extends the continuous Skip-gram model to improve vector quality and training speed, introduces subsampling of frequent words and negative sampling, and discusses phrase learning to address word-order and idiom limitations.
Faster R-CNN (arXiv:1506.01497) introduces a Region Proposal Network that shares full-image convolutional features with a detection network to enable nearly cost-free region proposals, and it reports a merged design that shares convolutional features between the RPN and Fast R-CNN.
This paper presents scikit-learn as a Python module and package that integrates a wide range of state-of-the-art machine learning algorithms aimed at medium-scale supervised and unsupervised problems. It states a focus on bringing machine learning to non-specialists using a general-purpose high-level language, with emphasis on ease of use, performance, documentation, and API consistency. It also reports minimal dependencies, simplified BSD licensing intended to encourage academic and commercial use, and public downloads of code, binaries, and documentation via scikit-learn.org.
This paper introduces Vision Transformer (ViT), a “pure transformer” approach that treats an image as a sequence of patches and applies a Transformer directly for image classification. The authors report strong transfer results after large-scale pre-training and state that ViT can match or outperform state-of-the-art convolutional networks while using substantially fewer computational resources to train.
Batch Normalization introduces per-mini-batch normalization of layer inputs as part of the network architecture to reduce “internal covariate shift,” accelerate training, enable higher learning rates, relax initialization sensitivity, and sometimes reduce the need for Dropout.
This paper evaluates how increasing convolutional network depth affects accuracy for large-scale image recognition using an architecture built from very small 3×3 convolution filters, reporting significant improvements by pushing depth to 16–19 weight layers and describing results connected to an ImageNet Challenge 2014 submission and transfer to other datasets.