What does arXiv:1409.1556 report as its main contribution?

The paper states that its main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3×3) convolution filters. [S1]

Very Deep Convolutional Networks (VGG) paper brief — arXiv:1409.1556

Q: What depth range does the paper associate with significant improvements over prior-art configurations?

The paper reports that a significant improvement on prior-art configurations can be achieved by pushing network depth to 16–19 weight layers. [S1]

This paper evaluates how increasing convolutional network depth affects accuracy for large-scale image recognition using an architecture built from very small 3×3 convolution filters, reporting significant improvements by pushing depth to 16–19 weight layers and describing results connected to an ImageNet Challenge 2014 submission and transfer to other datasets.

What this paper is about

The paper investigates the effect of convolutional network depth on accuracy in a large-scale image recognition setting. [S1] The paper reports a thorough evaluation of networks of increasing depth. [S1] The evaluation is conducted using an architecture that uses very small 3×3 convolution filters. [S1] The paper states that significant improvement on prior-art configurations can be achieved by pushing the depth to 16–19 weight layers. [S1] The paper connects these findings to the authors’ ImageNet Challenge 2014 submission. [S1] The paper reports that the team secured first place in the localisation track and second place in the classification track at the ImageNet Challenge 2014. [S1] The paper also reports that the learned representations generalise well to other datasets. [S1] The paper reports that these representations achieve state-of-the-art results on other datasets. [S1] The paper states that the authors have made their two best-performing ConvNet models publicly available. [S1] The paper states that the public release is intended to facilitate further research on deep visual representations in computer vision. [S1]

Core claims to remember

The paper’s main contribution is described as a thorough evaluation of networks of increasing depth. [S1] The evaluation uses an architecture built around very small 3×3 convolution filters. [S1] The paper reports that pushing depth to 16–19 weight layers yields a significant improvement over prior-art configurations in the evaluated setting. [S1] The paper states that the findings were the basis of the authors’ ImageNet Challenge 2014 submission. [S1] The paper reports first place in localisation and second place in classification for the team in the ImageNet Challenge 2014 tracks mentioned in the paper. [S1] The paper reports that the learned representations generalise well to other datasets. [S1] The paper reports that the representations achieve state-of-the-art results on other datasets. [S1] The paper states that two best-performing ConvNet models were made publicly available to facilitate further research use of deep visual representations. [S1]

Limitations and caveats

The paper’s reported evaluation is tied to an architecture that uses very small 3×3 convolution filters. [S1] The paper’s reported depth range for the highlighted improvement is 16–19 weight layers. [S1] The paper describes its primary accuracy investigation in a large-scale image recognition setting. [S1] The paper connects the reported findings to an ImageNet Challenge 2014 submission and to localisation and classification tracks as described in the paper. [S1] The paper reports generalisation to other datasets and reports state-of-the-art results on those datasets in its description. [S1]

How to apply this in study or projects

Read the parts of the paper that “investigate the effect of the convolutional network depth on its accuracy” and note the specific accuracy comparisons the paper reports. [S1] Follow the “thorough evaluation of networks of increasing depth” and extract the depth values that the paper treats as increasing depth within the evaluated family. [S1] Track how the architecture uses “very small (3×3) convolution filters” and list the architectural choices that the paper keeps constant while depth increases. [S1] Locate where the paper reports that improvement is achieved “by pushing the depth to 16–19 weight layers,” and record the exact language used for “significant improvement” and “prior-art configurations. [S1] ” [S1] Review the section that connects the findings to the authors’ “ImageNet Challenge 2014 submission,” and copy the reported placements for localisation and classification tracks exactly as stated. [S1] Extract the passages where the paper states that “representations generalise well to other datasets” and where it reports “state-of-the-art results,” and list the datasets and evaluation settings as the paper presents them. [S1] Find the references in the paper to the “two best-performing ConvNet models” that were made “publicly available,” and note the access details and any usage statements included by the authors. [S1]

Paper brief: Very Deep Convolutional Networks for Large-Scale Image Recognition (arXiv:1409.1556)

What this paper is about

Core claims to remember

Limitations and caveats

How to apply this in study or projects

Sources

FAQ

What does arXiv:1409.1556 report as its main contribution?

What depth range does the paper associate with significant improvements over prior-art configurations?

Related reads