You Only Look Once:Unified, Real-Time Object Detection(YOLO)

YOLO9000:Better, Faster, Stronger(YOLOv2)

YOLOv3: An Incremental Improvement

YOLOv4: Optimal Speed and Accuracy of Object Detection


Semantically Multi-modal Image Synthesis

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

StyleRig: Rigging StyleGAN for 3D Control over Portrait Images

AnimeGAN: A Novel Lightweight GAN for Photo Animation

Portrait Shadow Manipulation

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

Background Matting: The World is Your Green Screen

FuturePong: Real-time Table Tennis Trajectory Forecasting using Pose Prediction Network

Optical Non-Line-of-Sight Physics-based 3D Human Pose Estimation

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

Rewriting a Deep Generative Model

RAFT: Recurrent All-Pairs Field Transforms for Optical Flow(ECCV2020 Best Paper)

SinGAN: Learning a Generative Model from a Single Natural Image(ICCV2019 Best Paper)

Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild(CVPR2020 Best Paper)

Seeing Voices and Hearing Faces: Cross-modal biometric matching

3D Photography using Context-aware Layered Depth Inpainting

Image Inpainting for Irregular Holes Using Partial Convolutions

Dance Revolution: Long Sequence Dance Generation with Music via Curriculum Learning

Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence

In-Home Daily-Life Captioning Using Radio Signals

4D Visualization of Dynamic Events from Unconstrained Multi-View Videos

RoomShift: Room-scale Dynamic Haptics for VR with Furniture-moving Swarm Robots

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Computational Design and Optimization of Non-Circular Gears

VIBE: Video Inference for Human Body Pose and Shape Estimation

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Recapture as You Want

Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

OralCam: Enabling Self-Examination and Awareness of Oral Health Using a Smartphone Camera

XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

Fabriccio: Touchless Gestural Input on Interactive Fabrics

FitByte: Automatic Diet Monitoring in Unconstrained Situations Using Multimodal Sensing on Eyeglasses

Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players

Learning Agile Robotic Locomotion Skills by Imitating Animals

Consistent Video Depth Estimation

Breaking the cycle - Colleagues are all you need

Efficient Neural Audio Synthesis

Ear2Face: Deep Biometric Modality Mapping

Learning to Shadow Hand-drawn Sketches

A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild

High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times

SynSin: End-to-end View Synthesis from a Single Image

Talking Head Anime from a Single Image

High-Fidelity Synthesis with Disentangled Representation

Beyond English-Centric Multilingual Machine Translation

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Real-time 2D Multi-Person Pose Estimation on CPU: Lightweight OpenPose

Multiple Human Tracking with Alternately Updating Trajectories and Multi-Frame Action Features

AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

Learning Deformable Tetrahedral Meshes for 3D Reconstruction

Combining detection and tracking for human pose estimation in videos

Blind Video Temporal Consistency via Deep Video Prior