Awesome CVPR Paper Download - Awesome CVPR Paper Source code download

Awesome CVPR Paper

AI Source Code

1.0.0

Download

CVPR 202220212020201920182017 latest article download

CVPR2022 Recruitment List

CVPR2022 Baidu Cloud is preparing

The public account [Computer Vision Alliance] responded to CVPR2021 in the background and obtained Baidu Cloud

CVPR2021 the most complete 1660 articles pdf (4.3G)

Link: https://pan.baidu.com/s/1GWkqUOcO6KMOu-uLJrSpbA Extraction code: vwkx

Latest recommendations:

3D human body pose estimation/pose estimation/Transformer/3D human body reconstruction

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Paper: https://arxiv.org/pdf/2111.12707.pdf

Code: https://github.com/Vegetebird/MHFormer

【1】CVPR2021 admission results announced

Public account [Computer Vision Alliance] backstage reply to CVPR2021 Download the latest paper
CVPR 2021
All CVPR 2021 admission articles

【2】CVPR2021 latest updated paper

Fully Convolutional Networks for Panoptic Segmentation

This paper aims to use a fully convolutional form to uniformly express and predict objects and surrounding environments, thereby achieving accurate and efficient panoramic segmentation. Specifically, this article proposes a convolution kernel generator that encodes the semantic information of each object and each type of environment into different convolution kernels, and convolves it with high-resolution feature maps to directly output the segmentation results of each foreground and background. . Through this approach, individual differences and semantic consistency of objects and environments can be preserved respectively. This method achieves state-of-the-art results in speed and accuracy on multiple panoramic segmentation datasets. Keywords: unified expression, dynamic convolution, panoptic segmentation arxiv: https://arxiv.org/abs/2012.00720 github: https://github.com/yanwei-li/PanopticFCN

oral paper

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

FFB6D proposes a network full-flow bidirectional fusion RGBD representation learning framework and applies it to the 6D pose estimation problem. We found that existing representation learning methods fail to make good use of the two complementary data sources of appearance information in RGB and geometric information in depth maps (point clouds).

To this end, we design a bidirectional dense fusion module and apply it to each encoding and decoding layer of CNN and point cloud network. This full-flow bidirectional fusion mechanism allows the two networks to make full use of the local and global complementary information extracted from each other, thereby obtaining better representations for downstream prediction tasks. In addition, in terms of output representation selection, we designed a SIFT-FPS key point selection algorithm based on the texture and geometric information of the item, which simplifies the difficulty of the network locating key points and improves pose accuracy. Our method achieves significant improvements on multiple benchmarks. And this RGBD representation learning backbone network can be applied to more visual tasks with RGBD as input by cascading different prediction networks. Keywords: RGBD representation learning, 3D vision, 6D pose estimation PDF: https://arxiv.org/abs/2103.02242 code: https://github.com/ethnhe/FFB6D

RepVGG: Making VGG-style ConvNets Great Again

Science and technology are always spiraling upward. We have "revived" the VGG-style single-channel minimalist convolutional neural network architecture, with a 3x3 convolution all the way to the end. It has reached the SOTA level in speed and performance, and has an accuracy rate of over 80% on ImageNet.

In order to overcome the difficulty of training the VGG-style architecture, we use structural re-parameterization to construct the identity mapping and 1x1 convolution branch in the model during training, and then equivalently merge them into 3x3 after training. Convolution, so the model only contains 3x3 convolution during inference. This architecture does not have any branching structures, so it is highly parallel and very fast. And since the main part only has one operator, "3x3-ReLU", it is particularly suitable for customized hardware. Keywords: structural re-parameterization, minimalist architecture, efficient model https://arxiv.org/abs/2101.03697

Dynamic Region-Aware Convolution

This article proposes a new convolution operation—Dynamic Region-Aware Convolution (DRConv: Dynamic Region-Aware Convolution), which can allocate customized convolution kernels to different plane areas based on feature similarity. Compared with traditional convolutions, this convolution method greatly enhances the modeling ability of the diversity of image semantic information. Standard convolutional layers can increase the number of convolution kernels to extract more visual elements, but will result in higher computational costs. DRConv uses a learnable allocator to transfer gradually increasing convolution kernels to planar dimensions, which not only improves the representation ability of convolution, but also maintains computational cost and translation invariance.

DRConv is an effective and elegant method for handling the complex and varied distribution of semantic information. It can replace standard convolutions in any existing network with its plug-and-play characteristics, and has significant performance improvements for lightweight networks. promote. This paper evaluates DRConv on various models (MobileNet series, ShuffleNetV2, etc.) and tasks (classification, face recognition, detection and segmentation). In ImageNet classification, DRConv-based ShuffleNetV2-0.5× at the level of 46M calculations Achieving 67.1% performance, a 6.3% improvement over the baseline. https://arxiv.org/abs/2003.12243

Diverse Branch Block: Building a Convolution as an Inception-like Unit

We propose a convolutional network basic module (DBB) to enrich the microstructure of the model during training without changing its macrostructure, thereby improving its performance. This module can be equivalently converted to a convolution through structural re-parameterization after training, thus not introducing any additional inference overhead. picture

We have summarized six structures that can be equivalently transformed, including 1x1-KxK continuous convolution, average pooling, etc., and used these six transformations to give a representative DBB instance similar to Inception, which can be used on various architectures. Both achieved significant performance improvements. We have confirmed through experiments that "non-linearity during training" (but linear during inference, such as BN) and "diverse links" (for example, 1x1+3x3 is better than 3x3+3x3) are the keys to the effectiveness of DBB. Keywords: Structure re-parameterization, no reasoning overhead, painless improvement

Generalized Few-Shot Object Detection without Forgetting

Most of the past work focused on the performance of small class samples at the expense of the performance of large class samples. This paper proposes a small-class sample target detector without forgetting effect, which can achieve better small-class sample category performance without losing the performance of large-class sample categories. In this paper, we find that pretrained detectors rarely produce false positive predictions on unseen classes, and we also find that RPN is not an ideal class-agnostic component. Based on these two findings, we designed two simple and effective structures, Re-detector and Bias-Balanced RPN, which can achieve small class sample target detection without forgetting effect by only adding a small number of parameters and inference time. Keywords: small sample learning, target detection

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

This paper proposes a unified framework for handling visual recognition tasks containing long-tail data distributions. We first conducted an experimental analysis of existing two-stage methods for dealing with long-tail problems, and found out the main performance bottlenecks of existing methods. Based on experimental analysis, we propose a distribution alignment strategy to systematically solve long-tail vision tasks.

The framework is designed based on a two-stage method. In the first stage, an instance-balanced sampling strategy is used for feature representation learning (representation learning). In the second stage, we first designed an input-aware alignment function to correct the score of the input data. At the same time, in order to introduce a priori of the data set distribution, we designed a generalized re-weighting scheme to handle various visual task scenarios such as image classification, semantic segmentation, object detection and instance segmentation. We verified our method on four tasks and achieved significant performance improvements on each task. Keywords: image classification, semantic segmentation, object detection, instance segmentation

End-to-End Object Detection with Fully Convolutional Network

For the first time, this paper removes NMS (non-maximum suppression) post-processing on the fully convolutional target detector and achieves end-to-end training. We analyzed mainstream one-stage object detection methods and found that the traditional one-to-many label allocation strategy is the key to these methods relying on NMS, and thus proposed a prediction-aware one-to-one label allocation strategy. In addition, in order to improve the performance of one-to-one label assignment, we propose modules that enhance feature representation capabilities and auxiliary loss functions that accelerate model convergence. Our method achieves comparable performance to mainstream one-stage object detection methods without NMS. On dense scenes, the recall of our method exceeds the theoretical upper limit of object detection methods relying on NMS. Keywords: end-to-end detection, label assignment, fully convolutional network https://arxiv.org/abs/2012.03544

OTA: Optimal Transport Assignment for Object Detection

We propose a target detection sample matching strategy based on optimal transmission theory, which uses global information to find optimal sample matching results. Compared with existing sample matching technology, it has the following advantages: 1). High detection accuracy. The globally optimal matching results can help the detector to be trained in a stable and efficient manner, and ultimately achieve optimal detection performance on the COCO data set. 2). Wide range of applicable scenarios. Existing target detection algorithms need to redesign strategies or adjust parameters when encountering complex scenes such as dense targets or severe occlusion. The optimal transmission model includes the process of finding the optimal solution in the global modeling process. Without any additional adjustments, it can achieve state-of-the-art performance in various scenes with dense targets and severe occlusion, and has great application potential. Keywords: target detection, optimal transmission, sample matching strategy

IQDet: Instance-wise Quality Distribution Sampling for Object Detection

Since the label assignment of the one-stage detector is static and does not consider the global information of the object frame, we propose an object detector based on object mass distribution sampling. In this article, we propose the quality distribution encoding module QDE and the quality distribution sampling module QDS. By extracting the regional features of the target frame and modeling the quality distribution of the prediction frame based on the Gaussian mixture model, we can dynamically select the positive value of the detection frame. Negative sample allocation. This method only involves label allocation in the training phase, and can achieve the current best results on multiple data sets such as COCO. Keywords: label assignment

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding

The FSCE method proposed in the paper aims to solve the problem of small sample object detection from the perspective of optimizing feature representation. In small-sample object detection tasks, the number of target samples is limited, and the correct classification of target samples often has a great impact on the final performance. FSCE uses the idea of contrastive learning to encode relevant candidate frames and optimize their feature representation, strengthening the intra-class compactness and inter-class repulsion of features. The final method has been effectively improved on the common COCO and Pascal VOC data sets. Keywords: small sample target detection, comparative learning paper link: https://arxiv.org/abs/2103.05950

Neural Architecture Search with Random Labels

The existing mainstream NAS algorithm performs model search through the prediction performance of the subnetwork on the verification set. However, under the parameter sharing mechanism, there is a large difference between the prediction performance on the verification set and the true performance of the model. For the first time, we broke the paradigm of model evaluation based on prediction performance, evaluated subnetworks from the perspective of model convergence speed, and hypothesized that the faster the model converges, the higher its corresponding prediction performance will be.

Based on the model convergence framework, we found that the model convergence has nothing to do with the real labels of the images, and further proposed a new NAS paradigm-RLNAS that uses random labels for super network training. RLNAS has been verified in multiple data sets (NAS-Bench-201, ImageNet) and multiple search spaces (DARTS, MobileNet-like). The experimental results show that RLNAS can achieve the performance of existing NAS using only structures searched for random labels. SOTA level. RLNAS seems counter-intuitive at first, but its unexpectedly good results provide a stronger baseline for the NAS community and further inspire thinking about the nature of NAS. Keywords: neural network architecture search, model convergence assumption, random label https://arxiv.org/abs/2101.11834

Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation

Current human pose estimation algorithms use heat map regression to obtain the final joint points. These methods typically use a fixed-standard-deviation 2D Gaussian kernel covering all skeleton keypoints to construct a true heat map, and use the true heat map to supervise the model. Since the real heat maps of joint points of different people are constructed using the same Gaussian kernel, this method does not consider the scale differences of different people, which will cause label ambiguity and affect the model effect.

This paper proposes a scale-adaptive heat map regression that can adaptively generate the standard deviation required to construct labels based on the size of the human body, thereby making the model more robust to human bodies of different scales; and proposes weight-adaptive regression to balance positive and negative samples , further explore the scale-adaptive heat map regression effect. This paper finally achieves the most advanced performance in bottom-up human pose estimation. Keywords: Human pose estimation, bottom-up, adaptive heatmap regression https://arxiv.org/abs/2012.15175 https://github.com/greatlog/SWAHR-HumanPose

General Instance Distillation for Object Detection

GID proposes a novel distillation method based on detection tasks. By extracting general instances (GI) from teachers and studnet respectively, the GISM module is proposed to adaptively select instances with large differences for feature-based, relation-based and response-based distillation. This method applies relational knowledge distillation to the detection framework for the first time, and unifies the distillation target from independent consideration of positive and negative sample distillation to a more essential GI distillation. The process does not rely on GT and reaches SOTA. Keywords: target detection, knowledge distillation https://arxiv.org/abs/2103.02340

Activate or Not: Learning Customized Activation

We propose a new activation function ACON (activate or not), which can adaptively learn to activate or not. ACON established the connection between ReLU and Swish: We found that although the two forms are very different, Swish is a smooth form of ReLU. Based on this discovery, we further proposed more variants, such as meta-acon, which achieved twice the cost-free increase compared to SENet. We verify the generalization performance of this concise and effective activation function on multiple tasks. Keywords: activation function, neural network https://arxiv.org/abs/2009.04759

You Only Look One-level Feature

In this article, we first analyzed the role of FPN in the single-stage detector RetinaNet. Through experiments, we found that the divide-and-conquer idea of assigning objects of different scales to different levels of detection in FPN has a great impact on the detection results. From an optimization perspective, this idea decomposes the optimization problem in detection, making optimization learning simpler and improving detection accuracy. However, the design of FPN based on multi-level features complicates the network structure of the detection method, introduces additional calculations, and slows down the detection speed. In order to avoid the above problems, this paper proposes to detect objects of all scales at a single level. At the same time, to solve the problem of difficult optimization in single-level feature detection, a solution of hole encoder and balanced matching is proposed.

The detection accuracy of the single-level feature-based detector YOLOF proposed in this article is comparable to that of FPN-based RetinaNet when only using C5 features, and the detection speed is 2.5 times that of RetinaNet. In addition, compared with DETR, which also uses only C5 features, YOLOF can achieve comparable performance with faster convergence (7x). Keywords: single-stage target detection, single-scale features, balance between detection speed and accuracy https://arxiv.org/abs/2103.09460 https://github.com/megvii-model/YOLOF

Points as Queries: Weakly Semi-supervised Object Detection by Points

Improving the performance of the detector without increasing the labeling cost is the goal of this study. This paper selects a small number of bounding boxes and a large number of point annotations to train the detector. Point annotation is chosen because it is rich in information: it contains the location and category information of the instance, and the annotation cost is low. This paper proposes Point DETR by extending the point encoder to DETR. The overall framework is: train Point DETR through bounding box data; encode point annotations into queries and predict pseudo boxes; train student models through bounding box and pseudo box data . On the COCO dataset, using only 20% fully annotated data, our detector achieves 33.3AP, exceeding the baseline by 2.0AP. Keywords: target detection, semi-supervised, weak supervision

Practical Wide-Angle Portraits Correction with Deep Structured Models

Wide-angle lenses are loved for their wide field of view, but they suffer from lens distortion and perspective distortion, which manifest as curved background lines, stretching, squeezing and tilting of faces, etc. To this end, this paper constructs a cascade de-distortion network consisting of a line correction network, a face correction network and a transition module, so that the background presents perspective projection and the face area presents stereoscopic projection, and smoothly transitions between the two areas, so that in Eliminate various distortions while maintaining FOV. This method does not require camera parameters, can achieve real-time performance, and surpasses existing methods in both qualitative and quantitative evaluations. Keywords: wide-angle portrait distortion correction, deep cascade network

UPFlow:Upsampling Pyramid for Unsupervised Optical Flow Learning

We propose a new unsupervised optical flow learning method UPFlow. We found that the current unsupervised optical flow method has two problems in multi-scale pyramid processing: the problem of interpolation ambiguity in the flow upsampling process and the problem of lack of supervision of multi-scale flow. In this regard, we propose a self-guided upsampling module that uses an interpolation flow and an interpolation map to change the upsampling interpolation mechanism, thereby achieving more refined upsampling. In addition, we propose to use the final output of the network as pseudo labels to supervise the learning of multi-scale flow. Based on these improvements, our method is able to obtain clearer and sharper optical flow results. We conduct experiments on multiple optical flow benchmark datasets, including Sintel, KITTI 2012 and KITTI 2015. The performance of UPFlow exceeds the current best unsupervised optical flow algorithm by about 20%. Keywords: optical flow estimation, unsupervised learning https://arxiv.org/abs/2012.00212

NBNet: Noise Basis Learning for Image Denoising with Subspace Projection

NBNet is a framework that solves the problem of image noise reduction. We approach this problem with a novel perspective: image-adaptive projection. Specifically, we learn a set of subspaces on the feature space, and image denoising can be accomplished by selecting an appropriate signal subspace and projecting onto this subspace. Compared with the previous one-volume network structure, NBNet can naturally and more efficiently extract and utilize structural information in images through projection, especially weak texture areas, to help us restore images. Through such a simple method, NBNet achieved SOTA on the two benchmarks of DND and SIDD with less calculation. Keywords: Image denoising, subspace https://arxiv.org/abs/2012.15028

Dynamic Metric Learning: Towards a Scalable Metric Space to Accommodate Multiple Semantic Scales

This work introduces "dynamic range", an important attribute in metrics, into deep metric learning, resulting in a new task called "dynamic metric learning". We found that previous depth measurements actually only contained one scale, such as only distinguishing whether faces and pedestrians were similar or dissimilar. No matter how accurate such measuring tools are, they are inflexible and have limited uses in actual use. In fact, our daily measuring tools usually have a dynamic range. For example, a ruler always has multiple scales (such as 1mm, 1cm or even 10cm) to measure objects of different scales. We believe that the time has come for the field of deep metric learning to introduce dynamic range. Because visual concepts themselves have different sizes. “Animals” and “plants” all correspond to large scales, while “elk” corresponds to relatively small scales. On a small scale, two elk may look very different, but on another large scale, the same two elk should be considered very similar.

To this end, we propose this dynamic metric learning task, which requires learning a single metric space that can simultaneously provide similarity measures for visual concepts of different semantic sizes. Furthermore, we construct three multi-scale datasets and propose a simple baseline method. We believe that dynamic range will become an indispensable property of deep metric learning and bring new perspectives and new application scenarios to the entire field of deep metric learning.

3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management
Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies https://arxiv.org/abs/2012.04872
Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization https://arxiv.org/abs/2012.07947
3D CNNs with Adaptive Temporal Feature Resolutions https://arxiv.org/abs/2011.08652
KeepAugment: A Simple Information-Preserving Data Augmentation https://arxiv.org/pdf/2011.11778.pdf
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs https://arxiv.org/pdf/2011.14107.pdf
D-NeRF: Neural Radiance Fields for Dynamic Scenes https://arxiv.org/abs/2011.13961
Coarse-Fine Networks for Temporal Activity Detection in Videos
Instance Localization for Self-supervised Detection Pretraining https://arxiv.org/pdf/2102.08318.pdf https://github.com/limbo0000/InstanceLoc
Weakly-supervised Grounded Visual Question Answering using Capsules
4D Panoptic LiDAR Segmentation https://arxiv.org/abs/2102.12472
Dogfight: Detecting Drones from Drone Videos
Multiple Instance Active Learning for Object Detection https://github.com/yuantn/MIAL/raw/master/paper.pdf https://github.com/yuantn/MIAL
Reconsidering Representation Alignment for Multi-view Clustering
Self-supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map
Image-to-image Translation via Hierarchical Style Disentanglement Xinyang Li, Shengchuan Zhang, Jie Hu, Liujuan Cao, Xiaopeng Hong, Xudong Mao, Feiyue Huang, Yongjian Wu, Rongrong Ji https://arxiv.org/abs/2103.01456 https:/ /github.com/imlixinyang/HiSD
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation https://arxiv.org/pdf/2012.08512.pdf https://tarun005.github.io/FLAVR/Code https://tarun005.github.io/FLAVR/
Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition Stephen Hausler, Sourav Garg, Ming Xu, Michael Milford, Tobias Fischer https://arxiv.org/abs/2103.01486
Depth from Camera Motion and Object Detection Brent A. Griffin, Jason J. Corso https://arxiv.org/abs/2103.01468
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers https://arxiv.org/pdf/2011.09094.pdf
Multi-Stage Progressive Image Restoration https://arxiv.org/abs/2102.02808 https://github.com/swz30/MPRNet
Weakly Supervised Learning of Rigid 3D Scene Flow https://arxiv.org/pdf/2102.08945.pdf https://arxiv.org/pdf/2102.08945.pdf https://3dsceneflow.github.io/
Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah https://arxiv.org/abs/2103.01315
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels https://arxiv.org/abs/2101.05022 https://github.com/naver-ai/relabel_imagenet
Rethinking Channel Dimensions for Efficient Model Design https://arxiv.org/abs/2007.00992 https://github.com/clovaai/rexnet
Coarse-Fine Networks for Temporal Activity Detection in Videos Kumara Kahatapitiya, Michael S. Ryoo https://arxiv.org/abs/2103.01302
A Deep Emulator for Secondary Motion of 3D Characters Mianlun Zheng, Yi Zhou, Duygu Ceylan, Jernej Barbic https://arxiv.org/abs/2103.01261
Fair Attribute Classification through Latent Space De-biasing https://arxiv.org/abs/2012.01469 https://github.com/princetonvisualai/gan-debiasing https://princetonvisualai.github.io/gan-debiasing/
Auto-Exposure Fusion for Single-Image Shadow Removal Lan Fu, Changqing Zhou, Qing Guo, Felix Juefei-Xu, Hongkai Yu, Wei Feng, Yang Liu, Song Wang https://arxiv.org/abs/2103.01255
Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling https://arxiv.org/pdf/2102.06183.pdf https://github.com/jayleicn/ClipBERT
MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing Zhengjue Wang, Hao Zhang, Ziheng Cheng, Bo Chen, Xin Yuan https://arxiv.org/abs/2103.01786
AttentiveNAS: Improving Neural Architecture Search via Attentive https://arxiv.org/pdf/2011.09011.pdf
Diffusion Probabilistic Models for 3D Point Cloud Generation Shitong Luo, Wei Hu https://arxiv.org/abs/2103.01458
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge Francisco Rivera Valverde, Juana Valeria Hurtado, Abhinav Valada https://arxiv.org/abs/2103.01353 http://rl. uni-freiburg.de/research/multimodal-distill
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation https://arxiv.org/abs/2008.00951 https://github.com/eladrich/pixel2style2pixel https://eladrich.github.io/pixel2style2pixel/
Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph Xin Ye, Yezhou Yang https://arxiv.org/abs/2103.01350
RepVGG: Making VGG-style ConvNets Great Again https://arxiv.org/abs/2101.03697 https://github.com/megvii-model/RepVGG
Transformer Interpretability Beyond Attention Visualization https://arxiv.org/pdf/2012.09838.pdf https://github.com/hila-chefer/Transformer-Explainability
PREDATOR: Registration of 3D Point Clouds with Low Overlap https://arxiv.org/pdf/2011.13005.pdf https://github.com/ShengyuH/OverlapPredator https://overlappredator.github.io/
Multiresolution Knowledge Distillation for Anomaly Detection https://arxiv.org/abs/2011.11108
Positive-Unlabeled Data Purification in the Wild for Object Detection
Data-Free Knowledge Distillation For Image Super-Resolution
Manifold Regularized Dynamic Network Pruning
Pre-Trained Image Processing Transformer https://arxiv.org/pdf/2012.00364.pdf
ReNAS: Relativistic Evaluation of Neural Architecture Search https://arxiv.org/pdf/1910.01523.pdf
AdderSR: Towards Energy Efficient Image Super-Resolution https://arxiv.org/pdf/2009.08891.pdf https://github.com/huawei-noah/AdderNet
Learning Student Networks in the Wild https://arxiv.org/pdf/1904.01186.pdf https://github.com/huawei-noah/DAFL https://www.zhihu.com/question/446299297
HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens https://arxiv.org/pdf/2005.14446.pdf
Probabilistic Embeddings for Cross-Modal Retrieval https://arxiv.org/abs/2101.05068
PLOP: Learning without Forgetting for Continual Semantic Segmentation https://arxiv.org/abs/2011.11390
Rainbow Memory: Continual Learning with a Memory of Diverse Samples
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

【3】CVPR2020, 2019, 2018, 2017 download link

CVPR 2020

Public account [Computer Vision Alliance] replies to CVPR2020 in the background and downloads the latest paper
CVPR 2020

CVPR 2019

CVPR 2019 Paper List
CVPR 2019 Paper

CVPR 2018

CVPR 2018 Paper List
CVPR 2018 Paper

CVPR 2017

CVPR 2017 Paper List
CVPR 2017 Paper

CVPR 2020

1.GhostNet: More Features from Cheap Operations (architecture beyond Mobilenet v3) Paper link: https://arxiv.org/pdf/1911.11907arxiv.org Model (amazing performance on ARM CPU): https://github. com/iamhankai/ghostnetgithub.com

We beat other SOTA lightweight CNNs such as MobileNetV3 and FBNet.

AdderNet: Do We Really Need Multiplications in Deep Learning? (Additive Neural Network) has achieved very good performance on large-scale neural networks and data sets. Paper link: https://arxiv.org/pdf/1912.13200arxiv.org
Frequency Domain Compact 3D Convolutional Neural Networks (3dCNN compression) Paper link: https://arxiv.org/pdf/1909.04977arxiv.org Open source code: https://github.com/huawei-noah/CARSgithub.com
A Semi-Supervised Assessor of Neural Architectures (Neural Network Accuracy Predictor NAS)
Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection (NAS detection) backbone-neck-head search together, Trinity
CARS: Continuous Evolution for Efficient Neural Architecture Search (NAS) is efficient, has multiple advantages of differentiability and evolution, and can output Pareto front research
On Positive-Unlabeled Classification in GAN (PU+GAN)
Learning multiview 3D point cloud registration (3D point cloud) Paper link: arxiv.org/abs/2001.05119
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition Paper link: arxiv.org/abs/2001.09691
Action Modifiers: Learning from Adverbs in Instructional Video Paper link: arxiv.org/abs/1912.06617
PolarMask: Single Shot Instance Segmentation with Polar Representation (Instance Segmentation Modeling) Paper link: arxiv.org/abs/1909.13226 Paper interpretation: https://zhuanlan.zhihu.com/p/84890413 Open source code: https://github. com/xieenze/PolarMask
Rethinking Performance Estimation in Neural Architecture Search (NAS) Since the real time-consuming part of block wise neural architecture search is the performance estimation part, this article finds the optimal parameters for block wise NAS, which is faster and more relevant.
Distribution Aware Coordinate Representation for Human Pose Estimation Paper link: arxiv.org/abs/1910.06278 Github: https://github.com/ilovepose/DarkPose Author team homepage: https://ilovepose.github.io/ coco/

renew

Visual Commonsense R-CNN, Visual Commonsense R-CNN

https://arxiv.org/abs/2002.12204

Out-of-distribution image detection

https://arxiv.org/abs/2002.11297

Blurry Video Frame Interpolation

https://arxiv.org/abs/2002.12259

Meta-transfer learning zero-sample super score

https://arxiv.org/abs/2002.12213

3D indoor scene understanding

https://arxiv.org/abs/2002.12212

6. Generate unbiased scene graph from biased training

https://arxiv.org/abs/2002.11949

Automatic encoding of double-bottleneck hashes

https://arxiv.org/abs/2002.11930

A social spatiotemporal graph convolutional neural network for human trajectory prediction

https://arxiv.org/abs/2002.11927

Toward universal representation learning for deep face recognition

https://arxiv.org/abs/2002.11841

visual representation generalization

https://arxiv.org/abs/1912.03330

Reduce contextual bias

https://arxiv.org/abs/2002.11812

Unsupervised reinforcement learning for transferable meta-skills

https://arxiv.org/abs/1911.07450

Fast and accurate spatio-temporal video super-resolution

https://arxiv.org/abs/2002.11616

Object relationship diagram Teacher recommends video captioning for learning

https://arxiv.org/abs/2002.11566

Rethinking Weakly Supervised Object Location Routing

https://arxiv.org/abs/2002.11359

A universal agent that learns visual and linguistic navigation through pre-training

https://arxiv.org/pdf/2002.10638.pdf

GhostNet lightweight neural network

https://arxiv.org/pdf/1911.11907.pdf

AdderNet: In deep learning, do we really need multiplication?

https://arxiv.org/pdf/1912.13200.pdf

CARS: the continued evolution of efficient neural architecture search

https://arxiv.org/abs/1909.04977

Removing reflections from a single image via collaborative iterative cascade fine-tuning

https://arxiv.org/abs/1911.06634

Filter grafting of deep neural networks

https://arxiv.org/pdf/2001.05868.pdf

PolarMask: Unifying instance segmentation to FCN

https://arxiv.org/pdf/1909.13226.pdf

Semi-supervised semantic image segmentation

https://arxiv.org/pdf/1811.07073.pdf

Defend against generic attacks through selective feature regeneration

https://arxiv.org/pdf/1906.03444.pdf

Real-time fine-grained sketch-based image retrieval

https://arxiv.org/abs/2002.10310

Asking VQA models with sub-questions

https://arxiv.org/abs/1906.03444

Learning neural 3D texture spaces from 2D exemplars

https://geometry.cs.ucl.ac.uk/projects/2020/neuraltexture/

NestedVAE: Isolating common factors through weak oversight

https://arxiv.org/abs/2002.11576

Achieve multiple future trajectory predictions

https://arxiv.org/pdf/1912.06445.pdf

Robust image classification using sequence attention models

https://arxiv.org/pdf/1912.02184

【4】CVPR’s best papers in recent years

2018 Taskonomy: Disentangling Task Transfer LearningAmir R. Zamir, Stanford University; et al.

Alexander Sax, Stanford University
William Shen, Stanford University
Leonidas Guibas, Stanford University
Jitendra Malik, University of California Berkeley
Silvio Savarese, Stanford University

2017 Densely Connected Convolutional NetworksZhuang Liu, Tsinghua University; et al.

Gao Huang, Cornell University
Laurens van der Maaten, Facebook ai Research
Kilian Q. Weinberger, Cornell University

Learning From SimularEd and Unsupervned Images Through Adversarial Trainingashish Shrivastava, Apple Inc.; Et Al.

Tomas PFISTER, Apple Inc.
Oncel Tuzel, Apple Inc.
Josh Susskind, Apple Inc.
Wenda Wang, Apple Inc.
Russ webb, Apple Inc.

2016 Deep Residual Learning for Image Recognitionkaiming He, Microsoft Research; et al.

Xiangyu zhang, microSoft Research
Shaoqing ren, microSoft Research
Jian Sun, Microsoft Research

2015 DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Timerichard A. Newcombe, University of Washington; et al.

Dieter Fox, University of Washington
Steven M. Seitz, University of Washington

2014 What Camera Motion Reveals About Shape with Unknown BRDFMANMOHAN CHANDRAKER, NEC Labs America 2013 Fast, Accurate Detection of 100,000 Object Classes on A Single M ACHINEETHOMAS Dean, Google; Et Al.

Mark a. Ruzon, Google
Mark Segal, Google
Jonathon shlens, Google
Sudhendra vijayanarasimhan, Google
Jay yagnik, Google

2012 A Simple Prior-Free Method for Non-RIGID Structure-FROM-Motion FactorizationYuchao Dai, northwestern polytechnical university; et al.

Hongdong Li, Australian National University
Mingyi he, northwestern polytechnical university

2011 Real-Time Human Pose Recognition in Parts from Single Depth Imagesjamie Shotton, Microsoft Research; Et Al.

Andrew Fitzgibbon, Microsoft Research
Mat Cook, Microsoft Research
Toby Sharp, Microsoft Research
Mark Finocchio, Microsoft Research
Richard Moore, Microsoft Research
Alex kipman, microsoft research
Andrew Blake, Microsoft Research

2010 Efficient Computation of Robust Low-RANK MATRIX Approximations in the Presence of Missing Data usi ... Anders Erikson & Anton Va Den Hendel, University of Adelaid 2009 Single Image Haze Removal USING Dark Channel Priorkaiming He, The Chinese University of Hong Kong; ET al.

Jian Sun, Microsoft Research
Xiaoou tang, the Chinese university of Hong Kong

2008 Global Stereo Reconstruction Under Second Order Smoolhness Priorsoliver WoodFord, University of Oxford; Et Al.

IAN Reid, Oxford Brookes University
Philip Torr, University of Oxford
Andrew Fitzgibbon, Microsoft Research

Beyond SlInding Windows: Object Localization by Efficient Subwindow Searchchistoph H. Lampert, Max Planck Institut; et al.

Matthew b. Blaschko, Max Planck Institut
Thomas Hodmann, Google

2007 Dynamic 3D Scene Analysis from a Moving VehicleBastian Leibe, ETH ZURICH; ET Al.

Nico Cornelis, Katholieke Universiteit Leuven
Kurt Cornelis, Katholieke Universiteit Leuven
Luc Van Gool, ETH Zurich

2006 Putting Objects in PerspectiveRek Hoied, Carnegie Mellon University; Et Al.

Alexei EFROS, Carnegie Mellon University
Martial hebert, Carnegie Mellon University

2005 Real-Time Non-Rigid Surface Detectionjulien Pilet, école Polytechnique Fédérale de Lausanne; et al.

Vincent Lepetit, école Polytechnique Fédérale de Lausanne
Pascal Fua, école Polytechnique Fédérale de Lausanne

2004 Programmable Imaging USING A DIGITAL MICRORRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRROOOROAAAAEAEEEAAAAEationctEEEctctEEctularctEEEEularctctctctctctctularct

Vlad Branzoi, Columbia University
Terry E. Boult, University of Colorado

2003 Object Class Recognition by Unsupervned Scale-Invariant Learningrob Fergus, university of oxford; et al.

Pietro Perona, California Institute of Technology
Andrew Zisserman, University of Oxford

2001 Morphable 3D Models From VideoMatthew Brand, Mitsubishi Electric Research Laboratories 2000 Real-Time Tracking of Non-Rigid Mean Shiftdorin u, Siemens Corporate Research; ET Al.

Visvanathan Ramesh, Siemens Corporate Research
Peter Meer, Rutgers University

Expand

Additional Information