EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation

Deqiang Yin¹^*, Junyi Guo¹^*, Huanda Lu², Fangyu Wu¹^†, Dongming Lu³,

¹Xi’an Jiaotong-Liverpool University ²NingboTech University ³Zhejiang University
ACM MM 2025
^*Equal Contribution ^†Corresponding Author

Abstract

Instruction-based garment editing enables precise image modifications via natural language, with broad applications in fashion design and customization. Unlike general editing tasks, it requires understanding garment-specific semantics and attribute dependencies. However, progress is limited by the scarcity of high-quality instruction–image pairs, as manual annotation is costly and hard to scale. While MLLMs have shown promise in automated data synthesis, their application to garment editing is constrained by imprecise instruction modeling and a lack of fashion-specific supervisory signals. To address these challenges, we present an automated pipeline for constructing a garment editing dataset. We first define six editing instruction categories aligned with real-world fashion workflows to guide the generation of balanced and diverse instruction–image triplets. Second, we introduce Fashion Edit Score, a semantic-aware evaluation metric that captures semantic dependencies between garment attributes and provides reliable supervision during construction. Using this pipeline, we construct a total of 52,257 candidate triplets and retain 20,596 high-quality triplets to build EditGarment, the first instruction-based dataset tailored to standalone garment editing.

EditGarment Construction Pipeline

Pipeline Overview. Our dataset construction pipeline contains part (a): first, we apply Qwen-VL to generate editing triplets regarding our predefined editing categories; and part (b): second, we design a dependency graph structure to quantitatively evaluate the quality of generated data. The semantic dependency graph in the second part illustrates a visualization of our proposed graph structure. Data scored above the predefined threshold will be collected into our EditGarment dataset.

Qualitative Conparison

Qualitative comparison between pre-trained IP2P and IP2P fine-tuned on our dataset (both complete dataset and the version without evaluation) in six editing categories. Green boxes indicate sub-optimal editing results, and red boxes refer to accurate editing results.

BibTeX

@misc{yin2025editgarmentinstructionbasedgarmentediting, title={EditGarment: An Instruction-Based Garment Editing Dataset Constructed with Automated MLLM Synthesis and Semantic-Aware Evaluation}, author={Deqiang Yin and Junyi Guo and Huanda Lu and Fangyu Wu and Dongming Lu}, year={2025}, eprint={2508.03497}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2508.03497}, }