编辑丨极市平台 CVPR2023已经放榜,今年有2360篇,接收率为25.78%。在CVPR2023正式会议召开前,为了让大家更快地获取和学习到计算机视觉前沿技术,极市对CVPR2023 最新论文进行追踪,包括分研究方向的论文、代码汇总以及论文技术直播分享。 CVPR 2023 论文分方向整理目前在极市社区持续更新中,已累计更新了 381 篇,项目地址:https://www.cvmart.net/community/detail/7422 以下是最近更新的 CVPR 2023 论文,包含检测、分割、人脸、视频处理、医学影像、神经网络结构、多模态、小样本学习等方向。 下载地址:https://www.cvmart.net/community/detail/7454 目录- 检测 - 分割 - 视频处理 - 估计 - 人脸 - 目标跟踪 - 图像&视频检索/视频理解 - 医学影像 - GAN/生成式/对抗式 - 图像生成/图像合成 - 神经网络结构设计 - 数据处理 - 模型训练/泛化 - 图像特征提取与匹配 - 视觉表征学习 - 模型评估 - 多模态学习 - 视觉预测 - 数据集 - 小样本学习/零样本学习 - 持续学习 - 迁移学习/domain/自适应 - 场景图 - 视觉定位/位姿估计 - 视觉推理/视觉问答 - 对比学习 - 强化学习 - 机器人 - 半监督学习/弱监督学习/无监督学习/自监督学习 - 其他 检测2D 目标检测(2D Object Detection) [1]Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection paper:https://arxiv.org/abs/2303.05892 3D 目标检测(3D object detection) [1]Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection paper:https://arxiv.org/abs/2303.05886 [2]PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection paper:https://arxiv.org/abs/2303.08129 code:https://github.com/blvlab/pimae [3]MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences paper:https://arxiv.org/abs/2303.08316 [4]CAPE: Camera View Position Embedding for Multi-View 3D Object Detection paper:https://arxiv.org/abs/2303.10209 code:https://github.com/PaddlePaddle/Paddle3D [5]Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency paper:https://arxiv.org/abs/2303.08686) [6]AeDet: Azimuth-invariant Multi-view 3D Object Detection paper:https://arxiv.org/abs/2211.12501 code:https://github.com/fcjian/AeDet 异常检测(Anomaly Detection) [1]DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection paper:https://arxiv.org/abs/2211.11317 分割全景分割(Panoptic Segmentation) [1]UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration paper:https://arxiv.org/abs/2206.15083 语义分割(Semantic Segmentation) [1]MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving paper:https://arxiv.org/abs/2303.08600 code:https://github.com/jialeli1/lidarseg3d [2]Side Adapter Network for Open-Vocabulary Semantic Segmentation paper:https://arxiv.org/abs/2302.12242 code:https://github.com/mendelxu/san [3]Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes paper:https://arxiv.org/abs/2211.10206 实例分割(Instance Segmentation) [1]FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation paper:https://arxiv.org/abs/2303.08594 [2]SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation paper:https://arxiv.org/abs/2303.08578 code:https://github.com/lslrh/sim [3]DynaMask: Dynamic Mask Selection for Instance Segmentation paper:https://arxiv.org/abs/2303.07868 code:https://github.com/lslrh/dynamask 视频目标分割(Video Object Segmentation) [1]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation paper:https://arxiv.org/abs/2303.07815 [2]InstMove: Instance Motion for Object-centric Video Segmentation paper:https://arxiv.org/abs/2303.08132 code:https://github.com/wjf5203/vnext [3]Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation paper:https://arxiv.org/abs/2303.10100 视频处理(Video Processing) [1]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation paper:https://arxiv.org/abs/2303.07815 [2]InstMove: Instance Motion for Object-centric Video Segmentation paper:https://arxiv.org/abs/2303.08132 code:https://github.com/wjf5203/vnext [3]Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior paper:https://arxiv.org/abs/2303.09757 code:https://github.com/jiaqixuac/map-net [4]Blind Video Deflickering by Neural Filtering with a Flawed Atlas paper:https://arxiv.org/abs/2303.08120 code:https://github.com/chenyanglei/all-in-one-deflicker 视频生成/视频合成(Video Generation/Video Synthesis) [1]3D Cinemagraphy from a Single Image paper:https://arxiv.org/abs/2303.05724 [2]VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation paper:https://arxiv.org/abs/2303.08320 code:https://github.com/modelscope/modelscope 视频超分(Video Super-Resolution) [1]Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting paper:https://arxiv.org/abs/2303.08331 估计光流/运动估计(Optical Flow/Motion Estimation) [1]Rethinking Optical Flow from Geometric Matching Consistent Perspective paper:https://arxiv.org/abs/2303.08384 code:https://github.com/dqiaole/matchflow 深度估计(Depth Estimation) [1]Fully Self-Supervised Depth Estimation from Defocus Clue paper:https://arxiv.org/abs/2303.10752 code:https://github.com/ehzoahis/dered 人体解析/人体姿态估计(Human Parsing/Human Pose Estimation) [1]Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video paper:https://arxiv.org/abs/2303.08475 [2]Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer paper:https://arxiv.org/abs/2302.14338 手势估计(Gesture Estimation) [1]CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment paper:https://arxiv.org/abs/2303.05725 code:https://arxiv.org/abs/2303.05725 图像处理 [1]DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation paper:https://arxiv.org/abs/2303.06285 code:https://github.com/yueming6568/deltaedit 图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction) [1]Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank paper:https://arxiv.org/abs/2303.09101 code:https://github.com/huang-shirui/semi-uir [1]ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction paper:https://arxiv.org/abs/2303.05938 code:https://github.com/zhengdiyu/arbitrary-hands-3d-reconstruction 风格迁移(Style Transfer) [1]StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields paper:https://arxiv.org/abs/2303.10598 [2]Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN paper:https://arxiv.org/abs/2204.14079 code:https://github.com/LeeDongYeun/FixNoise 人脸人脸识别/检测(Facial Recognition/Detection) [1]Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection paper:https://arxiv.org/abs/2303.08545 [2]Multi Modal Facial Expression Recognition with Transformer-Based Fusion Networks and Dynamic Sampling paper:https://arxiv.org/abs/2303.08419 人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing) [1]Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation paper:https://arxiv.org/abs/2106.09614 code:https://github.com/unibas-gravis/Occlusion-Robust-MoFA 目标跟踪(Object Tracking) [1]MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking paper:https://arxiv.org/abs/2303.10404 [2]Visual Prompt Multi-Modal Tracking paper:https://arxiv.org/abs/2303.10826 code:https://github.com/jiawen-zhu/vipt 图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding) [1]Data-Free Sketch-Based Image Retrieval paper:https://arxiv.org/abs/2303.07775 [2]DAA: A Delta Age AdaIN operation for age estimation via binary code transformer paper:https://arxiv.org/abs/2303.07929 [3]Dual-path Adaptation from Image to Video Transformers paper:https://arxiv.org/abs/2303.09857 code:https://github.com/park-jungin/dualpath 图像/视频字幕(Image/Video Caption) [1]Dual-Stream Transformer for Generic Event Boundary Captioning paper:https://arxiv.org/abs/2207.03038 code:https://github.com/gx77/dual-stream-transformer-for-generic-event-boundary-captioning 行为识别/动作识别/检测/分割/定位(Action/Activity Recognition) [1]Video Test-Time Adaptation for Action Recognition paper:https://arxiv.org/abs/2211.15393 行人重识别/检测(Re-Identification/Detection) [1]TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification paper:https://arxiv.org/abs/2303.06819 code:https://github.com/kali-hac/transg 医学影像(Medical Imaging) [1]Neuron Structure Modeling for Generalizable Remote Physiological Measurement paper:https://arxiv.org/abs/2303.05955 code:https://github.com/lupaopao/nest [2]Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses paper:https://arxiv.org/abs/2303.08364 code:https://github.com/junbongjang/contour-tracking [3]Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification paper:https://arxiv.org/abs/2303.08446 GAN/生成式/对抗式(GAN/Generative/Adversarial) [2]Graph Transformer GANs for Graph-Constrained House Generation paper:https://arxiv.org/abs/2303.08225 [1]Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models paper:https://arxiv.org/abs/2303.10774 图像生成/图像合成(Image Generation/Image Synthesis) [1]3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process paper:https://arxiv.org/abs/2303.10406 code:https://github.com/colorful-liyu/3dqd [2]A Dynamic Multi-Scale Voxel Flow Network for Video Prediction paper:https://arxiv.org/abs/2303.09875 code:https://github.com/megvii-research/CVPR2023-DMVFN [3]Regularized Vector Quantization for Tokenized Image Synthesis paper:https://arxiv.org/abs/2303.06424 三维视觉点云(Point Cloud) [1]Controllable Mesh Generation Through Sparse Latent Point Diffusion Models paper:https://arxiv.org/abs/2303.07938 [2]Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis paper:https://arxiv.org/abs/2303.08134 code:https://github.com/zrrskywalker/point-nn [3]Rotation-Invariant Transformer for Point Cloud Matching paper:https://arxiv.org/abs/2303.08231 [4]Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration paper:https://arxiv.org/abs/2303.09950 code:https://github.com/qinzheng93/graphscnet 三维重建(3D Reconstruction) [1]Masked Wavelet Representation for Compact Neural Radiance Fields paper:https://arxiv.org/abs/2212.09069 [2]Decoupling Human and Camera Motion from Videos in the Wild paper:https://arxiv.org/abs/2302.12827 [3]Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction paper:https://arxiv.org/abs/2303.05937 [4]NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images paper:https://arxiv.org/abs/2303.07653 [5]PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision paper:https://arxiv.org/abs/2303.09554 [6]SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation paper:https://arxiv.org/abs/2212.04493 code:https://github.com/yccyenchicheng/SDFusion 场景重建/视图合成/新视角合成(Novel View Synthesis) [1]Robust Dynamic Radiance Fields paper:https://arxiv.org/abs/2301.02239 [2]I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs paper:https://arxiv.org/abs/2303.07634 [3]MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures paper:https://arxiv.org/abs/2208.00277 code:https://github.com/google-research/jax3d 神经网络结构设计(Neural Network Structure Design) [1]LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs paper:https://arxiv.org/abs/2206.10555 code:https://github.com/dvlab-research/largekernel3d CNN [1]Randomized Adversarial Training via Taylor Expansion paper:https://arxiv.org/abs/2303.10653 code:https://github.com/alexkael/randomized-adversarial-training [2]Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations paper:https://arxiv.org/abs/2303.08085 code:https://github.com/hmichaeli/alias_free_convnets Transformer [1]BiFormer: Vision Transformer with Bi-Level Routing Attention paper:https://arxiv.org/abs/2303.08810 code:https://github.com/rayleizhu/biformer [2]Making Vision Transformers Efficient from A Token Sparsification View paper:https://arxiv.org/abs/2303.08685 图神经网络(GNN) [1]Turning Strengths into Weaknesses: A Certified Robustness Inspired Attack Framework against Graph Neural Networks paper:https://arxiv.org/abs/2303.06199 数据处理 [1]TINC: Tree-structured Implicit Neural Compression paper:https://arxiv.org/abs/2211.06689 code:https://github.com/richealyoung/tinc 图像聚类(Image Clustering) [1]On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering paper:https://arxiv.org/abs/2303.09877 code:https://github.com/danieltrosten/deepmvc 模型训练/泛化(Model Training/Generalization) [1]HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining paper:https://arxiv.org/abs/2303.05675 [2]Universal Instance Perception as Object Discovery and Retrieval paper:https://arxiv.org/abs/2303.06674 code:https://github.com/MasterBin-IIAU/UNINEXT [3]Sharpness-Aware Gradient Matching for Domain Generalization paper:https://arxiv.org/abs/2303.10353 code:https://github.com/wang-pengfei/sagm 图像特征提取与匹配(Image feature extraction and matching) [2]Iterative Geometry Encoding Volume for Stereo Matching paper:https://arxiv.org/abs/2303.06615 code:https://github.com/gangweix/igev [1]Referring Image Matting paper:https://arxiv.org/abs/2206.05149 code:https://github.com/jizhizili/rim 视觉表征学习(Visual Representation Learning) [1]MARLIN: Masked Autoencoder for facial video Representation LearnINg paper:https://arxiv.org/abs/2211.06627 code:https://github.com/ControlNet/MARLIN 模型评估(Model Evaluation) [1]TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets paper:https://arxiv.org/abs/2303.05762 code:https://github.com/chenweixin107/trojdiff 多模态学习(Multi-Modal Learning) [1]Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos paper:https://arxiv.org/abs/2303.10421 code:https://github.com/xkwangcn/abaw-5th-rt-iai [2]Emotional Reaction Intensity Estimation Based on Multimodal Data paper:https://arxiv.org/abs/2303.09167 [3]Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers paper:https://arxiv.org/abs/2303.09164 [4]Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning paper:https://arxiv.org/abs/2303.05952 视听学习(Audio-visual Learning) [1]Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring paper:https://arxiv.org/abs/2303.08536 code:https://github.com/joannahong/av-relscore [2]CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective paper:https://arxiv.org/abs/2303.06357 code:https://arxiv.org/abs/2303.06357 视觉-语言(Vision-language) [1]Lana: A Language-Capable Navigator for Instruction Following and Generation paper:https://arxiv.org/abs/2303.08409 code:https://github.com/wxh1996/lana-vln 视觉预测(Vision-based Prediction) [1]TBP-Former: Learning Temporal Bird"s-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving paper:https://arxiv.org/abs/2303.09998 数据集(Dataset) [1]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others paper:https://arxiv.org/abs/2212.04825 code:https://github.com/facebookresearch/Whac-A-Mole [2]MVImgNet: A Large-scale Dataset of Multi-view Images paper:https://arxiv.org/abs/2303.06042 [3]SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments paper:https://arxiv.org/abs/2303.09095 code:https://github.com/climbingdaily/SLOPER4D [4]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others paper:https://arxiv.org/abs/2212.04825 code:https://github.com/facebookresearch/Whac-A-Mole [5]MVImgNet: A Large-scale Dataset of Multi-view Images paper:https://arxiv.org/abs/2303.06042 小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning) [1]DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection paper:https://arxiv.org/abs/2303.09674 code:https://github.com/phoenix-v/digeo [2]Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings paper:https://arxiv.org/abs/2303.09352 code:https://github.com/uitml/nohub [3]Bi-directional Distribution Alignment for Transductive Zero-Shot Learning paper:https://arxiv.org/abs/2303.08698 code:https://github.com/zhicaiwww/bi-vaegan 持续学习(Continual Learning/Life-long Learning) [1]Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning paper:https://arxiv.org/abs/2303.09483 code:https://github.com/kim-sanghwan/ancl 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation) [1]Trainable Projected Gradient Method for Robust Fine-tuning paper:https://arxiv.org/abs/2303.10720 [2]DA-DETR: Domain Adaptive Detection Transformer with Information Fusion paper:https://arxiv.org/abs/2103.17084 [3]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection paper:https://arxiv.org/abs/2203.15793 code:https://github.com/vibashan/irg-sfda [4]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection paper:https://arxiv.org/abs/2203.15793 code:https://github.com/vibashan/irg-sfda 场景图场景图理解(Scene Graph Understanding) [1]PLA: Language-Driven Open-Vocabulary 3D Scene Understanding paper:https://arxiv.org/abs/2211.16312 code:https://github.com/cvmi-lab/pla 视觉定位/位姿估计(Visual Localization/Pose Estimation) [1]PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers paper:https://arxiv.org/abs/2303.09187 [2]StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition paper:https://arxiv.org/abs/2212.00937 视觉推理/视觉问答(Visual Reasoning/VQA) [1]Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning paper:https://arxiv.org/abs/2303.10482 code:https://github.com/szzexpoi/poem [2]Generative Bias for Robust Visual Question Answering paper:https://arxiv.org/abs/2208.00690 对比学习(Contrastive Learning) [1]Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation paper:https://arxiv.org/abs/2303.10323 code:https://github.com/mlii0117/dcl 强化学习(Reinforcement Learning) [1]EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning paper:https://arxiv.org/abs/2303.10876 code:https://github.com/mediabrain-sjtu/eqmotion 机器人(Robotic) [1]Efficient Map Sparsification Based on 2D and 3D Discretized Grids paper:https://arxiv.org/abs/2303.10882 半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning) [1]Extracting Class Activation Maps from Non-Discriminative Features as well paper:https://arxiv.org/abs/2303.10334 code:https://github.com/zhaozhengchen/lpcam [2]TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation paper:https://arxiv.org/abs/2303.09870 code:https://github.com/devavrattomar/tesla [3]LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding paper:https://arxiv.org/abs/2303.09665 [4]MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection paper:https://arxiv.org/abs/2303.09061 code:https://github.com/lliuz/mixteacher [5]Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination paper:https://arxiv.org/abs/2303.06380 [6]Non-Contrastive Unsupervised Learning of Physiological Signals from Video paper:https://arxiv.org/abs/2303.07944 其他 [1]Facial Affective Analysis based on MAE and Multi-modal Information for 5th ABAW Competition paper:https://arxiv.org/abs/2303.10849 [2]Partial Network Cloning paper:https://arxiv.org/abs/2303.10597 code:https://github.com/jngwenye/pncloning [3]Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection paper:https://arxiv.org/abs/2303.10449 code:https://github.com/lufan31/et-ood [4]Adversarial Counterfactual Visual Explanations paper:https://arxiv.org/abs/2303.09962 code:https://github.com/guillaumejs2403/ace [5]A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation paper:https://arxiv.org/abs/2303.09165 code:https://github.com/huitangtang/on_the_utility_of_synthetic_data [6]Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation paper:https://arxiv.org/abs/2303.09119 code:https://github.com/advocate99/diffgesture [7]Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry paper:https://arxiv.org/abs/2303.08658 code:https://github.com/kebii/r2et [8]Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations paper:https://arxiv.org/abs/2202.04235 code:https://github.com/twweeb/composite-adv [9]Backdoor Defense via Deconfounded Representation Learning paper:https://arxiv.org/abs/2303.06818 code:https://github.com/zaixizhang/cbd [10]Label Information Bottleneck for Label Enhancement paper:https://arxiv.org/abs/2303.06836 [11]LayoutDM: Discrete Diffusion Model for Controllable Layout Generation paper:https://arxiv.org/abs/2303.08137 code:https://github.com/CyberAgentAILab/layout-dm [12]Diversity-Aware Meta Visual Prompting paper:https://arxiv.org/abs/2303.08138 code:https://github.com/shikiw/dam-vp