The paper proposes Diffusion-Edit (DiffEdit), an advanced automated diffusion-based framework for reference-driven semantic image editing. It simplifies background removal and object replacement by automatically generating shape masks and using reference images as guidance. Traditional editing methods often require tedious manual adjustments or multiple specialized models. In contrast, DiffEdit leverages the generative capabilities of pre-trained diffusion models to create high-fidelity, context-aware edits within a single image. The framework automatically removes target regions (e.g., backgrounds or objects) by generating a precise shape mask from diffusion-driven discrepancy maps, without user annotation. It integrates reference images into the masked regions by applying cross-attention mechanisms and context-aware conditioning to ensure the inserted content is semantically and stylistically consistent with the original scene. Across our evaluation sets, DiffEdit attains the highest QS (editing quality), highest CLIP (reference alignment), and lowest FID (realism) among the compared methods, indicating consistent gains in realism, coherence, and controllability for object replacement. We designed the framework to extend to background correction and texture editing; we leave systematic validation of these tasks to future work. This method not only improves semantic image editing efficiency but also allows for adaptive, user-guided generative modeling.