Segment Anything

Track Anything: Segment Anything Meets Videos



ImageBind: One Embedding Space To Bind Them All

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion

Human Motion Diffusion Model

FLAME: Free-form Language-based Motion Synthesis & Editing

Adding Conditional Control to Text-to-Image Diffusion Models

HuMoR: 3D Human Motion Model for Robust Pose Estimation

3D Human Mesh Estimation from Virtual Markers

LoRA: Low-Rank Adaptation of Large Language Models

QLoRA: Efficient Finetuning of Quantized LLMs

Gorilla: Large Language Model Connected with Massive APIs

High-resolution image reconstruction with latent diffusion models from human brain activity

Vision GNN: An Image is Worth Graph of Nodes

Segment Anything in High Quality

FasterViT: Fast Vision Transformers with Hierarchical Attention

LLMZip: Lossless Text Compression using Large Language Models

Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting

Time-varying Signals Recovery via Graph Neural Networks

Segment Anything Meets Point Tracking


Color Diffusion: Colorizing Black and White Images with Diffusion Models

Distinguishing ChatGPT(-3.5, -4)-generated and human-written papers through Japanese stylometric analysis

FacTool: Factuality Detection in Generative AI

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer(MiDaS)

Sketching the Future(STF): Applying Conditional Control Techniques to Text-to-Video Models

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer(MiDaS)

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

ProPainter: Improving Propagation and Transformer for Video Inpainting


Invariant Feature Regularization for Fair Face Recognition

D3GA - Drivable 3D Gaussian Avatars

3D Gaussian Splatting for Real-Time Radiance Field Rendering

A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications