You Only Look Once:Unified, Real-Time Object Detection(YOLO)

YOLO9000:Better, Faster, Stronger(YOLOv2)

YOLOv3: An Incremental Improvement

YOLOv4: Optimal Speed and Accuracy of Object Detection


Semantically Multi-modal Image Synthesis

Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis

StyleRig: Rigging StyleGAN for 3D Control over Portrait Images

AnimeGAN: A Novel Lightweight GAN for Photo Animation

Portrait Shadow Manipulation

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer

Background Matting: The World is Your Green Screen

FuturePong: Real-time Table Tennis Trajectory Forecasting using Pose Prediction Network

Optical Non-Line-of-Sight Physics-based 3D Human Pose Estimation

You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

Rewriting a Deep Generative Model

RAFT: Recurrent All-Pairs Field Transforms for Optical Flow(ECCV2020 Best Paper)

SinGAN: Learning a Generative Model from a Single Natural Image(ICCV2019 Best Paper)

Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild(CVPR2020 Best Paper)

Seeing Voices and Hearing Faces: Cross-modal biometric matching

3D Photography using Context-aware Layered Depth Inpainting

Image Inpainting for Irregular Holes Using Partial Convolutions

Dance Revolution: Long Sequence Dance Generation with Music via Curriculum Learning

Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence

In-Home Daily-Life Captioning Using Radio Signals

4D Visualization of Dynamic Events from Unconstrained Multi-View Videos

RoomShift: Room-scale Dynamic Haptics for VR with Furniture-moving Swarm Robots

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Computational Design and Optimization of Non-Circular Gears

VIBE: Video Inference for Human Body Pose and Shape Estimation

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Recapture as You Want

Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting

OralCam: Enabling Self-Examination and Awareness of Oral Health Using a Smartphone Camera

XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera

Fabriccio: Touchless Gestural Input on Interactive Fabrics

FitByte: Automatic Diet Monitoring in Unconstrained Situations Using Multimodal Sensing on Eyeglasses

Vid2Player: Controllable Video Sprites that Behave and Appear like Professional Tennis Players

Learning Agile Robotic Locomotion Skills by Imitating Animals

Consistent Video Depth Estimation

Breaking the cycle - Colleagues are all you need

Efficient Neural Audio Synthesis

Ear2Face: Deep Biometric Modality Mapping

Learning to Shadow Hand-drawn Sketches

A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild

High-resolution Piano Transcription with Pedals by Regressing Onsets and Offsets Times

SynSin: End-to-end View Synthesis from a Single Image

Talking Head Anime from a Single Image

High-Fidelity Synthesis with Disentangled Representation

Beyond English-Centric Multilingual Machine Translation