Learning to See Through Obstructions

See-Through Captions: Real-Time Captioning on Transparent Display for Deaf and Hard-of-Hearing People

Estimation of continuous valence and arousal levels from faces in naturalistic conditions

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset

C-Space Tunnel Discovery for Puzzle Path Planning

Filter Style Transfer between Photos

Dynamic facial asset and rig generation from a single scan

SpeedNet: Learning the Speediness in Videos

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

Monocular Real-Time Volumetric Performance Capture

Controlling Style and Semantics in Weakly-Supervised Image Generation

Learned Motion Matching

Neural Light Transport for Relighting and View Synthesis

Full-Body Awareness from Partial Observations

Complementary Dynamics

Non-Local Musical Statistics as Guides for Audio-to-Score Piano Transcription

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Multimodal Humor Dataset: Predicting Laughter tracks for Sitcoms

BARF: Bundle-Adjusting Neural Radiance Fields

Animating Pictures with Eulerian Motion Fields

End-to-End Object Detection with Transformers(DETR)

DDPM - Diffusion Models Beat GANs on Image Synthesis

DDPM - Denoising Diffusion Probabilistic Models

XCiT: Cross-Covariance Image Transformers

Involution: Inverting the Inherence of Convolution for Visual Recognition

Alias-Free Generative Adversarial Networks

MakeItTalk: Speaker-Aware Talking-Head Animation

TeethTap: Recognizing Discrete Teeth Gestures Using Motion and Acoustic Sensing on an Earpiece

Transferring Dense Pose to Proximal Animal Classes

A Simple Framework for Contrastive Learning of Visual Representations(自己教師学習)

Whole-Body Human Pose Estimation in the Wild

Zero-Shot Text-to-Image Generation(DALL·E)

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Fake It Till You Make It: Face analysis in the wild using synthetic data alone

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

LIFE: Lighting Invariant Flow Estimation

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Generating Visual Explanations

High-Fidelity Neural Human Motion Transfer from Monocular Video

ADOP: Approximate Differentiable One-Pixel Point Rendering

Vectorization of Raster Manga by Deep Reinforcement Learning

Where's Swimmy?: Mining unique color features buried in galaxies by deep anomaly detection using Subaru Hyper Suprime-Cam data

Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

An Empirical Study of Training Self-Supervised Vision Transformers

VinVL: Revisiting Visual Representations in Vision-Language Models

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

AST: Audio Spectrogram Transformer

SSAST: Self-Supervised Audio Spectrogram Transformer

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

Learning in High Dimension Always Amounts to Extrapolation

Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

mixup: Beyond Empirical Risk Minimization

Deep Hough-Transform Line Priors

Self-Supervised Monocular Depth Estimation with Internal Feature Fusion

Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation

Deep Neural Networks as Gaussian Processes