AI関連(2)
Learning to See Through Obstructions
See-Through Captions: Real-Time Captioning on Transparent Display for Deaf and Hard-of-Hearing People
Estimation of continuous valence and arousal levels from faces in naturalistic conditions
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
C-Space Tunnel Discovery for Puzzle Path Planning
Filter Style Transfer between Photos
Dynamic facial asset and rig generation from a single scan
SpeedNet: Learning the Speediness in Videos
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
Monocular Real-Time Volumetric Performance Capture
Controlling Style and Semantics in Weakly-Supervised Image Generation
Learned Motion Matching
Neural Light Transport for Relighting and View Synthesis
Full-Body Awareness from Partial Observations
Complementary Dynamics
Non-Local Musical Statistics as Guides for Audio-to-Score Piano Transcription
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
Multimodal Humor Dataset: Predicting Laughter tracks for Sitcoms
BARF: Bundle-Adjusting Neural Radiance Fields
Animating Pictures with Eulerian Motion Fields
End-to-End Object Detection with Transformers(DETR)
DDPM - Diffusion Models Beat GANs on Image Synthesis
DDPM - Denoising Diffusion Probabilistic Models
XCiT: Cross-Covariance Image Transformers
Involution: Inverting the Inherence of Convolution for Visual Recognition
Alias-Free Generative Adversarial Networks
MakeItTalk: Speaker-Aware Talking-Head Animation
TeethTap: Recognizing Discrete Teeth Gestures Using Motion and Acoustic Sensing on an Earpiece
Transferring Dense Pose to Proximal Animal Classes
A Simple Framework for Contrastive Learning of Visual Representations(自己教師学習)
Whole-Body Human Pose Estimation in the Wild
Zero-Shot Text-to-Image Generation(DALL·E)
Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation
Fake It Till You Make It: Face analysis in the wild using synthetic data alone
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
LIFE: Lighting Invariant Flow Estimation
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Generating Visual Explanations
High-Fidelity Neural Human Motion Transfer from Monocular Video
ADOP: Approximate Differentiable One-Pixel Point Rendering
Vectorization of Raster Manga by Deep Reinforcement Learning
Where's Swimmy?: Mining unique color features buried in galaxies by deep anomaly detection using Subaru Hyper Suprime-Cam data
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
An Empirical Study of Training Self-Supervised Vision Transformers
VinVL: Revisiting Visual Representations in Vision-Language Models
FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
AST: Audio Spectrogram Transformer
SSAST: Self-Supervised Audio Spectrogram Transformer
Unifying Multimodal Transformer for Bi-directional Image and Text Generation
Learning in High Dimension Always Amounts to Extrapolation
Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction
mixup: Beyond Empirical Risk Minimization
Deep Hough-Transform Line Priors
Self-Supervised Monocular Depth Estimation with Internal Feature Fusion
Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation
Deep Neural Networks as Gaussian Processes