--> EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation

EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation

Deqiang Yin1*, Junyi Guo1*, Huanda Lu2, Fangyu Wu1, Dongming Lu3,
1Xi’an Jiaotong-Liverpool University 2NingboTech University 3Zhejiang University
ACM MM 2025

*Equal Contribution Corresponding Author
alt text

Example images and edit instructions from EditGarment

Abstract

Instruction-based garment editing enables precise image modifications via natural language, with broad applications in fashion design and customization. Unlike general editing tasks, it requires understanding garment-specific semantics and attribute dependencies. However, progress is limited by the scarcity of high-quality instruction–image pairs, as manual annotation is costly and hard to scale. While MLLMs have shown promise in automated data synthesis, their application to garment editing is constrained by imprecise instruction modeling and a lack of fashion-specific supervisory signals. To address these challenges, we present an automated pipeline for constructing a garment editing dataset. We first define six editing instruction categories aligned with real-world fashion workflows to guide the generation of balanced and diverse instruction–image triplets. Second, we introduce Fashion Edit Score, a semantic-aware evaluation metric that captures semantic dependencies between garment attributes and provides reliable supervision during construction. Using this pipeline, we construct a total of 52,257 candidate triplets and retain 20,596 high-quality triplets to build EditGarment, the first instruction-based dataset tailored to standalone garment editing.

EditGarment Construction Pipeline

Pipeline Overview. Our dataset construction pipeline contains part (a): first, we apply Qwen-VL to generate editing triplets regarding our predefined editing categories; and part (b): second, we design a dependency graph structure to quantitatively evaluate the quality of generated data. The semantic dependency graph in the second part illustrates a visualization of our proposed graph structure. Data scored above the predefined threshold will be collected into our EditGarment dataset.

Qualitative Conparison

Qualitative comparison between pre-trained IP2P and IP2P fine-tuned on our dataset (both complete dataset and the version without evaluation) in six editing categories. Green boxes indicate sub-optimal editing results, and red boxes refer to accurate editing results.

BibTeX

@misc{yin2025editgarmentinstructionbasedgarmentediting,
      title={EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation}, 
      author={Deqiang Yin and Junyi Guo and Huanda Lu and Fangyu Wu and Dongming Lu},
      year={2025},
      eprint={2508.03497},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.03497}, 
}