Link: https://pan.baidu.com/s/1GWkqUOcO6KMOu-uLJrSpbA Extraction code: vwkx
update: 2022/03/02 Update some article interpretations
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Paper: https://arxiv.org/pdf/2111.12707.pdf
Code: https://github.com/Vegetebird/MHFormer
This paper aims to use a fully convolutional form to uniformly express and predict objects and surrounding environments, thereby achieving accurate and efficient panoramic segmentation. Specifically, this article proposes a convolution kernel generator that encodes the semantic information of each object and each type of environment into different convolution kernels, and convolves it with high-resolution feature maps to directly output the segmentation results of each foreground and background. . Through this approach, individual differences and semantic consistency of objects and environments can be preserved respectively. This method achieves state-of-the-art results in speed and accuracy on multiple panoramic segmentation datasets. Keywords: unified expression, dynamic convolution, panoptic segmentation arxiv: https://arxiv.org/abs/2012.00720 github: https://github.com/yanwei-li/PanopticFCN
oral paper
FFB6D proposes a network full-flow bidirectional fusion RGBD representation learning framework and applies it to the 6D pose estimation problem. We found that existing representation learning methods fail to make good use of the two complementary data sources of appearance information in RGB and geometric information in depth maps (point clouds).
To this end, we design a bidirectional dense fusion module and apply it to each encoding and decoding layer of CNN and point cloud network. This full-flow bidirectional fusion mechanism allows the two networks to make full use of the local and global complementary information extracted from each other, thereby obtaining better representations for downstream prediction tasks. In addition, in terms of output representation selection, we designed a SIFT-FPS key point selection algorithm based on the texture and geometric information of the item, which simplifies the difficulty of the network locating key points and improves pose accuracy. Our method achieves significant improvements on multiple benchmarks. And this RGBD representation learning backbone network can be applied to more visual tasks with RGBD as input by cascading different prediction networks. Keywords: RGBD representation learning, 3D vision, 6D pose estimation PDF: https://arxiv.org/abs/2103.02242 code: https://github.com/ethnhe/FFB6D
Science and technology are always spiraling upward. We have "revived" the VGG-style single-channel minimalist convolutional neural network architecture, with a 3x3 convolution all the way to the end. It has reached the SOTA level in speed and performance, and has an accuracy rate of over 80% on ImageNet.
In order to overcome the difficulty of training the VGG-style architecture, we use structural re-parameterization to construct the identity mapping and 1x1 convolution branch in the model during training, and then equivalently merge them into 3x3 after training. Convolution, so the model only contains 3x3 convolution during inference. This architecture does not have any branching structures, so it is highly parallel and very fast. And since the main part only has one operator, "3x3-ReLU", it is particularly suitable for customized hardware. Keywords: structural re-parameterization, minimalist architecture, efficient model https://arxiv.org/abs/2101.03697
This article proposes a new convolution operation—Dynamic Region-Aware Convolution (DRConv: Dynamic Region-Aware Convolution), which can allocate customized convolution kernels to different plane areas based on feature similarity. Compared with traditional convolutions, this convolution method greatly enhances the modeling ability of the diversity of image semantic information. Standard convolutional layers can increase the number of convolution kernels to extract more visual elements, but will result in higher computational costs. DRConv uses a learnable allocator to transfer gradually increasing convolution kernels to planar dimensions, which not only improves the representation ability of convolution, but also maintains computational cost and translation invariance.
DRConv is an effective and elegant method for handling the complex and varied distribution of semantic information. It can replace standard convolutions in any existing network with its plug-and-play characteristics, and has significant performance improvements for lightweight networks. promote. This paper evaluates DRConv on various models (MobileNet series, ShuffleNetV2, etc.) and tasks (classification, face recognition, detection and segmentation). In ImageNet classification, DRConv-based ShuffleNetV2-0.5× at the level of 46M calculations Achieving 67.1% performance, a 6.3% improvement over the baseline. https://arxiv.org/abs/2003.12243
We propose a convolutional network basic module (DBB) to enrich the microstructure of the model during training without changing its macrostructure, thereby improving its performance. This module can be equivalently converted to a convolution through structural re-parameterization after training, thus not introducing any additional inference overhead. picture
We have summarized six structures that can be equivalently transformed, including 1x1-KxK continuous convolution, average pooling, etc., and used these six transformations to give a representative DBB instance similar to Inception, which can be used on various architectures. Both achieved significant performance improvements. We have confirmed through experiments that "non-linearity during training" (but linear during inference, such as BN) and "diverse links" (for example, 1x1+3x3 is better than 3x3+3x3) are the keys to the effectiveness of DBB. Keywords: Structure re-parameterization, no reasoning overhead, painless improvement
Most of the past work focused on the performance of small class samples at the expense of the performance of large class samples. This paper proposes a small-class sample target detector without forgetting effect, which can achieve better small-class sample category performance without losing the performance of large-class sample categories. In this paper, we find that pretrained detectors rarely produce false positive predictions on unseen classes, and we also find that RPN is not an ideal class-agnostic component. Based on these two findings, we designed two simple and effective structures, Re-detector and Bias-Balanced RPN, which can achieve small class sample target detection without forgetting effect by only adding a small number of parameters and inference time. Keywords: small sample learning, target detection
This paper proposes a unified framework for handling visual recognition tasks containing long-tail data distributions. We first conducted an experimental analysis of existing two-stage methods for dealing with long-tail problems, and found out the main performance bottlenecks of existing methods. Based on experimental analysis, we propose a distribution alignment strategy to systematically solve long-tail vision tasks.
The framework is designed based on a two-stage method. In the first stage, an instance-balanced sampling strategy is used for feature representation learning (representation learning). In the second stage, we first designed an input-aware alignment function to correct the score of the input data. At the same time, in order to introduce a priori of the data set distribution, we designed a generalized re-weighting scheme to handle various visual task scenarios such as image classification, semantic segmentation, object detection and instance segmentation. We verified our method on four tasks and achieved significant performance improvements on each task. Keywords: image classification, semantic segmentation, object detection, instance segmentation
For the first time, this paper removes NMS (non-maximum suppression) post-processing on the fully convolutional target detector and achieves end-to-end training. We analyzed mainstream one-stage object detection methods and found that the traditional one-to-many label allocation strategy is the key to these methods relying on NMS, and thus proposed a prediction-aware one-to-one label allocation strategy. In addition, in order to improve the performance of one-to-one label assignment, we propose modules that enhance feature representation capabilities and auxiliary loss functions that accelerate model convergence. Our method achieves comparable performance to mainstream one-stage object detection methods without NMS. On dense scenes, the recall of our method exceeds the theoretical upper limit of object detection methods relying on NMS. Keywords: end-to-end detection, label assignment, fully convolutional network https://arxiv.org/abs/2012.03544
We propose a target detection sample matching strategy based on optimal transmission theory, which uses global information to find optimal sample matching results. Compared with existing sample matching technology, it has the following advantages: 1). High detection accuracy. The globally optimal matching results can help the detector to be trained in a stable and efficient manner, and ultimately achieve optimal detection performance on the COCO data set. 2). Wide range of applicable scenarios. Existing target detection algorithms need to redesign strategies or adjust parameters when encountering complex scenes such as dense targets or severe occlusion. The optimal transmission model includes the process of finding the optimal solution in the global modeling process. Without any additional adjustments, it can achieve state-of-the-art performance in various scenes with dense targets and severe occlusion, and has great application potential. Keywords: target detection, optimal transmission, sample matching strategy
Since the label assignment of the one-stage detector is static and does not consider the global information of the object frame, we propose an object detector based on object mass distribution sampling. In this article, we propose the quality distribution encoding module QDE and the quality distribution sampling module QDS. By extracting the regional features of the target frame and modeling the quality distribution of the prediction frame based on the Gaussian mixture model, we can dynamically select the positive value of the detection frame. Negative sample allocation. This method only involves label allocation in the training phase, and can achieve the current best results on multiple data sets such as COCO. Keywords: label assignment
The FSCE method proposed in the paper aims to solve the problem of small sample object detection from the perspective of optimizing feature representation. In small-sample object detection tasks, the number of target samples is limited, and the correct classification of target samples often has a great impact on the final performance. FSCE uses the idea of contrastive learning to encode relevant candidate frames and optimize their feature representation, strengthening the intra-class compactness and inter-class repulsion of features. The final method has been effectively improved on the common COCO and Pascal VOC data sets. Keywords: small sample target detection, comparative learning paper link: https://arxiv.org/abs/2103.05950
The existing mainstream NAS algorithm performs model search through the prediction performance of the subnetwork on the verification set. However, under the parameter sharing mechanism, there is a large difference between the prediction performance on the verification set and the true performance of the model. For the first time, we broke the paradigm of model evaluation based on prediction performance, evaluated subnetworks from the perspective of model convergence speed, and hypothesized that the faster the model converges, the higher its corresponding prediction performance will be.
Based on the model convergence framework, we found that the model convergence has nothing to do with the real labels of the images, and further proposed a new NAS paradigm-RLNAS that uses random labels for super network training. RLNAS has been verified in multiple data sets (NAS-Bench-201, ImageNet) and multiple search spaces (DARTS, MobileNet-like). The experimental results show that RLNAS can achieve the performance of existing NAS using only structures searched for random labels. SOTA level. RLNAS seems counter-intuitive at first, but its unexpectedly good results provide a stronger baseline for the NAS community and further inspire thinking about the nature of NAS. Keywords: neural network architecture search, model convergence assumption, random label https://arxiv.org/abs/2101.11834
Current human pose estimation algorithms use heat map regression to obtain the final joint points. These methods typically use a fixed-standard-deviation 2D Gaussian kernel covering all skeleton keypoints to construct a true heat map, and use the true heat map to supervise the model. Since the real heat maps of joint points of different people are constructed using the same Gaussian kernel, this method does not consider the scale differences of different people, which will cause label ambiguity and affect the model effect.
This paper proposes a scale-adaptive heat map regression that can adaptively generate the standard deviation required to construct labels based on the size of the human body, thereby making the model more robust to human bodies of different scales; and proposes weight-adaptive regression to balance positive and negative samples , further explore the scale-adaptive heat map regression effect. This paper finally achieves the most advanced performance in bottom-up human pose estimation. Keywords: Human pose estimation, bottom-up, adaptive heatmap regression https://arxiv.org/abs/2012.15175 https://github.com/greatlog/SWAHR-HumanPose
GID proposes a novel distillation method based on detection tasks. By extracting general instances (GI) from teachers and studnet respectively, the GISM module is proposed to adaptively select instances with large differences for feature-based, relation-based and response-based distillation. This method applies relational knowledge distillation to the detection framework for the first time, and unifies the distillation target from independent consideration of positive and negative sample distillation to a more essential GI distillation. The process does not rely on GT and reaches SOTA. Keywords: target detection, knowledge distillation https://arxiv.org/abs/2103.02340
We propose a new activation function ACON (activate or not), which can adaptively learn to activate or not. ACON established the connection between ReLU and Swish: We found that although the two forms are very different, Swish is a smooth form of ReLU. Based on this discovery, we further proposed more variants, such as meta-acon, which achieved twice the cost-free increase compared to SENet. We verify the generalization performance of this concise and effective activation function on multiple tasks. Keywords: activation function, neural network https://arxiv.org/abs/2009.04759
In this article, we first analyzed the role of FPN in the single-stage detector RetinaNet. Through experiments, we found that the divide-and-conquer idea of assigning objects of different scales to different levels of detection in FPN has a great impact on the detection results. From an optimization perspective, this idea decomposes the optimization problem in detection, making optimization learning simpler and improving detection accuracy. However, the design of FPN based on multi-level features complicates the network structure of the detection method, introduces additional calculations, and slows down the detection speed. In order to avoid the above problems, this paper proposes to detect objects of all scales at a single level. At the same time, to solve the problem of difficult optimization in single-level feature detection, a solution of hole encoder and balanced matching is proposed.
The detection accuracy of the single-level feature-based detector YOLOF proposed in this article is comparable to that of FPN-based RetinaNet when only using C5 features, and the detection speed is 2.5 times that of RetinaNet. In addition, compared with DETR, which also uses only C5 features, YOLOF can achieve comparable performance with faster convergence (7x). Keywords: single-stage target detection, single-scale features, balance between detection speed and accuracy https://arxiv.org/abs/2103.09460 https://github.com/megvii-model/YOLOF
Improving the performance of the detector without increasing the labeling cost is the goal of this study. This paper selects a small number of bounding boxes and a large number of point annotations to train the detector. Point annotation is chosen because it is rich in information: it contains the location and category information of the instance, and the annotation cost is low. This paper proposes Point DETR by extending the point encoder to DETR. The overall framework is: train Point DETR through bounding box data; encode point annotations into queries and predict pseudo boxes; train student models through bounding box and pseudo box data . On the COCO dataset, using only 20% fully annotated data, our detector achieves 33.3AP, exceeding the baseline by 2.0AP. Keywords: target detection, semi-supervised, weak supervision
Wide-angle lenses are loved for their wide field of view, but they suffer from lens distortion and perspective distortion, which manifest as curved background lines, stretching, squeezing and tilting of faces, etc. To this end, this paper constructs a cascade de-distortion network consisting of a line correction network, a face correction network and a transition module, so that the background presents perspective projection and the face area presents stereoscopic projection, and smoothly transitions between the two areas, so that in Eliminate various distortions while maintaining FOV. This method does not require camera parameters, can achieve real-time performance, and surpasses existing methods in both qualitative and quantitative evaluations. Keywords: wide-angle portrait distortion correction, deep cascade network
We propose a new unsupervised optical flow learning method UPFlow. We found that the current unsupervised optical flow method has two problems in multi-scale pyramid processing: the problem of interpolation ambiguity in the flow upsampling process and the problem of lack of supervision of multi-scale flow. In this regard, we propose a self-guided upsampling module that uses an interpolation flow and an interpolation map to change the upsampling interpolation mechanism, thereby achieving more refined upsampling. In addition, we propose to use the final output of the network as pseudo labels to supervise the learning of multi-scale flow. Based on these improvements, our method is able to obtain clearer and sharper optical flow results. We conduct experiments on multiple optical flow benchmark datasets, including Sintel, KITTI 2012 and KITTI 2015. The performance of UPFlow exceeds the current best unsupervised optical flow algorithm by about 20%. Keywords: optical flow estimation, unsupervised learning https://arxiv.org/abs/2012.00212
NBNet is a framework that solves the problem of image noise reduction. We approach this problem with a novel perspective: image-adaptive projection. Specifically, we learn a set of subspaces on the feature space, and image denoising can be accomplished by selecting an appropriate signal subspace and projecting onto this subspace. Compared with the previous one-volume network structure, NBNet can naturally and more efficiently extract and utilize structural information in images through projection, especially weak texture areas, to help us restore images. Through such a simple method, NBNet achieved SOTA on the two benchmarks of DND and SIDD with less calculation. Keywords: Image denoising, subspace https://arxiv.org/abs/2012.15028
This work introduces "dynamic range", an important attribute in metrics, into deep metric learning, resulting in a new task called "dynamic metric learning". We found that previous depth measurements actually only contained one scale, such as only distinguishing whether faces and pedestrians were similar or dissimilar. No matter how accurate such measuring tools are, they are inflexible and have limited uses in actual use. In fact, our daily measuring tools usually have a dynamic range. For example, a ruler always has multiple scales (such as 1mm, 1cm or even 10cm) to measure objects of different scales. We believe that the time has come for the field of deep metric learning to introduce dynamic range. Because visual concepts themselves have different sizes. “Animals” and “plants” all correspond to large scales, while “elk” corresponds to relatively small scales. On a small scale, two elk may look very different, but on another large scale, the same two elk should be considered very similar.
To this end, we propose this dynamic metric learning task, which requires learning a single metric space that can simultaneously provide similarity measures for visual concepts of different semantic sizes. Furthermore, we construct three multi-scale datasets and propose a simple baseline method. We believe that dynamic range will become an indispensable property of deep metric learning and bring new perspectives and new application scenarios to the entire field of deep metric learning.
3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management
Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies https://arxiv.org/abs/2012.04872
Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization https://arxiv.org/abs/2012.07947
3D CNNs with Adaptive Temporal Feature Resolutions https://arxiv.org/abs/2011.08652
KeepAugment: A Simple Information-Preserving Data Augmentation https://arxiv.org/pdf/2011.11778.pdf
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs https://arxiv.org/pdf/2011.14107.pdf
D-NeRF: Neural Radiance Fields for Dynamic Scenes https://arxiv.org/abs/2011.13961
Coarse-Fine Networks for Temporal Activity Detection in Videos
Instance Localization for Self-supervised Detection Pretraining https://arxiv.org/pdf/2102.08318.pdf https://github.com/limbo0000/InstanceLoc
Weakly-supervised Grounded Visual Question Answering using Capsules
4D Panoptic LiDAR Segmentation https://arxiv.org/abs/2102.12472
Dogfight: Detecting Drones from Drone Videos
Multiple Instance Active Learning for Object Detection https://github.com/yuantn/MIAL/raw/master/paper.pdf https://github.com/yuantn/MIAL
Reconsidering Representation Alignment for Multi-view Clustering
Self-supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map
Image-to-image Translation via Hierarchical Style Disentanglement Xinyang Li, Shengchuan Zhang, Jie Hu, Liujuan Cao, Xiaopeng Hong, Xudong Mao, Feiyue Huang, Yongjian Wu, Rongrong Ji https://arxiv.org/abs/2103.01456 https:/ /github.com/imlixinyang/HiSD
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation https://arxiv.org/pdf/2012.08512.pdf https://tarun005.github.io/FLAVR/Code https://tarun005.github.io/FLAVR/
Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition Stephen Hausler, Sourav Garg, Ming Xu, Michael Milford, Tobias Fischer https://arxiv.org/abs/2103.01486
Depth from Camera Motion and Object Detection Brent A. Griffin, Jason J. Corso https://arxiv.org/abs/2103.01468
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers https://arxiv.org/pdf/2011.09094.pdf
Multi-Stage Progressive Image Restoration https://arxiv.org/abs/2102.02808 https://github.com/swz30/MPRNet
Weakly Supervised Learning of Rigid 3D Scene Flow https://arxiv.org/pdf/2102.08945.pdf https://arxiv.org/pdf/2102.08945.pdf https://3dsceneflow.github.io/
Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah https://arxiv.org/abs/2103.01315
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels https://arxiv.org/abs/2101.05022 https://github.com/naver-ai/relabel_imagenet
Rethinking Channel Dimensions for Efficient Model Design https://arxiv.org/abs/2007.00992 https://github.com/clovaai/rexnet
Coarse-Fine Networks for Temporal Activity Detection in Videos Kumara Kahatapitiya, Michael S. Ryoo https://arxiv.org/abs/2103.01302
A Deep Emulator for Secondary Motion of 3D Characters Mianlun Zheng, Yi Zhou, Duygu Ceylan, Jernej Barbic https://arxiv.org/abs/2103.01261
Fair Attribute Classification through Latent Space De-biasing https://arxiv.org/abs/2012.01469 https://github.com/princetonvisualai/gan-debiasing https://princetonvisualai.github.io/gan-debiasing/
Auto-Exposure Fusion for Single-Image Shadow Removal Lan Fu, Changqing Zhou, Qing Guo, Felix Juefei-Xu, Hongkai Yu, Wei Feng, Yang Liu, Song Wang https://arxiv.org/abs/2103.01255
Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling https://arxiv.org/pdf/2102.06183.pdf https://github.com/jayleicn/ClipBERT
MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing Zhengjue Wang, Hao Zhang, Ziheng Cheng, Bo Chen, Xin Yuan https://arxiv.org/abs/2103.01786
AttentiveNAS: Improving Neural Architecture Search via Attentive https://arxiv.org/pdf/2011.09011.pdf
Diffusion Probabilistic Models for 3D Point Cloud Generation Shitong Luo, Wei Hu https://arxiv.org/abs/2103.01458
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge Francisco Rivera Valverde, Juana Valeria Hurtado, Abhinav Valada https://arxiv.org/abs/2103.01353 http://rl. uni-freiburg.de/research/multimodal-distill
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation https://arxiv.org/abs/2008.00951 https://github.com/eladrich/pixel2style2pixel https://eladrich.github.io/pixel2style2pixel/
Hierarchical and Partially Observable Goal-driven Policy Learning with Goals Relational Graph Xin Ye, Yezhou Yang https://arxiv.org/abs/2103.01350
RepVGG: Making VGG-style ConvNets Great Again https://arxiv.org/abs/2101.03697 https://github.com/megvii-model/RepVGG
Transformer Interpretability Beyond Attention Visualization https://arxiv.org/pdf/2012.09838.pdf https://github.com/hila-chefer/Transformer-Explainability
PREDATOR: Registration of 3D Point Clouds with Low Overlap https://arxiv.org/pdf/2011.13005.pdf https://github.com/ShengyuH/OverlapPredator https://overlappredator.github.io/
Multiresolution Knowledge Distillation for Anomaly Detection https://arxiv.org/abs/2011.11108
Positive-Unlabeled Data Purification in the Wild for Object Detection
Data-Free Knowledge Distillation For Image Super-Resolution
Manifold Regularized Dynamic Network Pruning
Pre-Trained Image Processing Transformer https://arxiv.org/pdf/2012.00364.pdf
ReNAS: Relativistic Evaluation of Neural Architecture Search https://arxiv.org/pdf/1910.01523.pdf
AdderSR: Towards Energy Efficient Image Super-Resolution https://arxiv.org/pdf/2009.08891.pdf https://github.com/huawei-noah/AdderNet
Learning Student Networks in the Wild https://arxiv.org/pdf/1904.01186.pdf https://github.com/huawei-noah/DAFL https://www.zhihu.com/question/446299297
HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens https://arxiv.org/pdf/2005.14446.pdf
Probabilistic Embeddings for Cross-Modal Retrieval https://arxiv.org/abs/2101.05068
PLOP: Learning without Forgetting for Continual Semantic Segmentation https://arxiv.org/abs/2011.11390
Rainbow Memory: Continual Learning with a Memory of Diverse Samples
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
1.GhostNet: More Features from Cheap Operations (architecture beyond Mobilenet v3) Paper link: https://arxiv.org/pdf/1911.11907arxiv.org Model (amazing performance on ARM CPU): https://github. com/iamhankai/ghostnetgithub.com
We beat other SOTA lightweight CNNs such as MobileNetV3 and FBNet.
AdderNet: Do We Really Need Multiplications in Deep Learning? (Additive Neural Network) has achieved very good performance on large-scale neural networks and data sets. Paper link: https://arxiv.org/pdf/1912.13200arxiv.org
Frequency Domain Compact 3D Convolutional Neural Networks (3dCNN compression) Paper link: https://arxiv.org/pdf/1909.04977arxiv.org Open source code: https://github.com/huawei-noah/CARSgithub.com
A Semi-Supervised Assessor of Neural Architectures (Neural Network Accuracy Predictor NAS)
Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection (NAS detection) backbone-neck-head search together, Trinity
CARS: Continuous Evolution for Efficient Neural Architecture Search (NAS) is efficient, has multiple advantages of differentiability and evolution, and can output Pareto front research
On Positive-Unlabeled Classification in GAN (PU+GAN)
Learning multiview 3D point cloud registration (3D point cloud) Paper link: arxiv.org/abs/2001.05119
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition Paper link: arxiv.org/abs/2001.09691
Action Modifiers: Learning from Adverbs in Instructional Video Paper link: arxiv.org/abs/1912.06617
PolarMask: Single Shot Instance Segmentation with Polar Representation (Instance Segmentation Modeling) Paper link: arxiv.org/abs/1909.13226 Paper interpretation: https://zhuanlan.zhihu.com/p/84890413 Open source code: https://github. com/xieenze/PolarMask
Rethinking Performance Estimation in Neural Architecture Search (NAS) Since the real time-consuming part of block wise neural architecture search is the performance estimation part, this article finds the optimal parameters for block wise NAS, which is faster and more relevant.
Distribution Aware Coordinate Representation for Human Pose Estimation Paper link: arxiv.org/abs/1910.06278 Github: https://github.com/ilovepose/DarkPose Author team homepage: https://ilovepose.github.io/ coco/
https://arxiv.org/abs/2002.12204
https://arxiv.org/abs/2002.11297
https://arxiv.org/abs/2002.12259
https://arxiv.org/abs/2002.12213
https://arxiv.org/abs/2002.12212
6. Generate unbiased scene graph from biased training
https://arxiv.org/abs/2002.11949
https://arxiv.org/abs/2002.11930
https://arxiv.org/abs/2002.11927
https://arxiv.org/abs/2002.11841
https://arxiv.org/abs/1912.03330
https://arxiv.org/abs/2002.11812
https://arxiv.org/abs/1911.07450
https://arxiv.org/abs/2002.11616
https://arxiv.org/abs/2002.11566
https://arxiv.org/abs/2002.11359
https://arxiv.org/pdf/2002.10638.pdf
https://arxiv.org/pdf/1911.11907.pdf
https://arxiv.org/pdf/1912.13200.pdf
https://arxiv.org/abs/1909.04977
https://arxiv.org/abs/1911.06634
https://arxiv.org/pdf/2001.05868.pdf
https://arxiv.org/pdf/1909.13226.pdf
https://arxiv.org/pdf/1811.07073.pdf
https://arxiv.org/pdf/1906.03444.pdf
https://arxiv.org/abs/2002.10310
https://arxiv.org/abs/1906.03444
https://geometry.cs.ucl.ac.uk/projects/2020/neuraltexture/
https://arxiv.org/abs/2002.11576
https://arxiv.org/pdf/1912.06445.pdf
https://arxiv.org/pdf/1912.02184