DiffuMatting : Synthesizing Arbitrary Objects with Matting-level Annotation

1Tencent,2Xiamen University

[Paper]      [Code]      [Dataset]

TL;DR: DiffuMatting is a versatile model, capable of generating any object and providing high-precision matting-level annotations. This ensures its broad applicability, making it a valuable tool in various downstream tasks.

Your Image Description

Abstract

Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of `matting anything'. Our DiffuMatting can 1). act as an anything matting factory with high accurate annotations 2). be well-compatible with community LoRAs or various conditional control approaches to achieve the community-friendly art design and controllable generation. Specifically, inspired by green-screen-matting, we aim to teach the diffusion model to paint on a fixed green screen canvas. To this end, a large-scale green-screen dataset (Green100K) is collected as a training dataset for DiffuMatting. Secondly, a green background control loss is proposed to keep the drawing board as a pure green color to distinguish the foreground and background. To ensure the synthesized object has more edge details, a detailed-enhancement of transition boundary loss is proposed as a guideline to generate objects with more complicated edge structures. Aiming to simultaneously generate the object and its matting annotation, we build a matting head to make a green-color removal in the latent space of the VAE decoder. Our DiffuMatting shows several potential applications (e.g., matting-data generator, community-friendly art design and controllable generation). As a matting-data generator, DiffuMatting synthesizes general object and portrait matting sets, effectively reducing the relative MSE error by 15.4% in General Object Matting and 11.4% in Portrait Matting tasks.

Method

Your Image Description

An overview of Our DiffuMatting Network. Our DiffuMatting mainly consists of Green100k data collection and caption, green-screen detailed objects synthesis assisted by the green-background control loss and the detailed-enhancement loss of transition boundary, and matting-level annotation refinement via a matting-head in VAE latent space constrained by latent VAE loss and GreenPost. For more detailed information, please refer to our paper.

Comparisons

Your Image Description

Visual performance of our DiffuMatting on green-screen object generation in comparison with SOTA Midjourney and SD-XL models, and these models have difficulties in consistently generating objects on the pure green-screen.

Your Image Description
Your Image Description
Your Image Description

Comparison with other methods in three aspects: the purity of the generated green screen background, the level of detail of the generated objects, and the accuracy of the generated annotations. As a most recent work contemporaneous with ours, LayerDiffuse [Lvmin Zhang] shows powerful functionality. Extremely hard cases (e.g., net) are shown in Fig, which indicates the superiority of our Diffumatting on the delicate and complex structure objects.

Applications

Your Image Description
Your Image Description
Your Image Description
Your Image Description
Your Image Description

Image Composition: Generate, Copy, and Paste; foreground-conditioned generation; Complex prompt geneartion; Same ID with different positions and poses; Controllable text-to-image generation combing with the ControlNet; Applying to Community LoRs. Art design: Style of blue and white porcelain texture with alpha-channel generated by DiffuMatting.

Your Image Description

Blue-screen objects with annotations generated by our DiffuMatting, overcoming the failure of green-foreground generation ( e.g., showing a girl with a green shirt).

Your Image Description

Downstream Matting Task.

Dataset

BibTeX

@article{hu2024diffumatting,
  title={DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation},
  author={Hu, Xiaobin and Peng, Xu and Luo, Donghao and Ji, Xiaozhong and Peng, Jinlong and Jiang, Zhengkai and Zhang, Jiangning and Jin, Taisong and Wang, Chengjie and Ji, Rongrong},
  journal={arXiv preprint arXiv:2403.06168},
  year={2024}
      }