Generative AI

Leveraging Open Source for Automatic Background Removal in Photo Editing: A Step-by-Step Look at the Technical Approach

By
Abhishek Kathuria
Updated
April 29, 2024

                                                                                  Before

                                                                                  After

Imagine capturing a perfect family moment, only to find it marred by photobombers in the background. For photographers and content creators, unwanted objects in the background can often mar an otherwise perfect image. Traditionally, removing these objects has been a manual process, requiring meticulous selection tools and a steady hand.  While tools like Google Pixel's Magic Eraser allow manual removal of unwanted objects, we look into unique open-source solutions that automate this process entirely. 

In this article, we explore the technique of leveraging open-source libraries to achieve automatic background removal. This approach offers a powerful and cost-effective solution, breaking down the technical methodology step-by-step:

  1. Input Image:  The process begins with the image requiring background removal.
  2. Person Detection (Masked R-CNN):  At the heart of this step lies Mask R-CNN, a sophisticated object detection algorithm. Unlike simpler bounding boxes, Mask R-CNN creates a pixel-wise segmentation of the image. This means it precisely outlines each person, providing a clear distinction between foreground elements (the people) and background clutter.
  3. Depth Estimation (MiDaS):  Understanding the relative depth of objects within a scene is crucial for differentiating the foreground from the background. This is where MiDaS, a pre-trained monocular depth estimation model, comes into play. By analyzing the image, MiDaS predicts the distance of each object from the viewpoint, creating a depth map. Finetuning the MiDaS threshold helps in differentiating between foreground and background elements. 
  4. Mask Back People and Removal:  Armed with the precise segmentation mask from step 2 and the optimized depth information, the model can now distinguish between foreground and background elements. The background areas are then masked, essentially creating a digital blueprint that isolates the people (foreground) from the rest of the image. Subsequently, the masked background objects are meticulously removed from the image.
  5. Inpainting Model (Samsung's Lama):  The Inpainting Model, specifically Samsung's Lama (Local Area Mask-based Attention), uses a transformer-based architecture tailored for image inpainting. Focused on local attention mechanisms, Lama processes image regions independently, enhancing efficiency and detail handling. It reconstructs gaps left by object removal by analyzing surrounding textures and patterns, then uses learned operations to generate new pixels that blend seamlessly with the adjacent areas. This attention to local details ensures the reconstructed background looks natural, as if the removed objects were never there, streamlining the image editing process with sophisticated automation.

Open Source Advantages and Challenges

Leveraging open-source libraries for automatic background removal offers several advantages:

Cost-Effectiveness: By utilizing pre-trained models, this approach eliminates the need for expensive proprietary software or specialized hardware.

Accessibility: Open-source libraries make this technology more accessible to a wider range of users, from hobbyist photographers to professional content creators.

Customization: The open-source nature allows for potential customization and adaptation to specific needs. Developers can fine-tune the models for various use cases.

While leveraging pre-trained, open-source models offers a cost-effective approach, it comes with its own set of considerations:

Computational Demands: The model relies heavily on deep learning techniques, demanding significant processing power. Utilizing a GPU can significantly reduce inference time, but resource limitations can be a challenge, especially for large image datasets.

Evaluation Metrics: Establishing a robust evaluation metric for this task remains an open challenge. Traditional metrics used for image compression tasks may not translate effectively to assessing the quality of background removal and the seamless integration of the inpainted regions. Manual evaluation remains the primary means of assessing the model's success.

Beyond Open Source: Exploring the Potential of Google Cloud Platform

While open-source solutions provide valuable foundations for automated background removal, they often require extensive computational power, posing a challenge for individuals with limited resources. This is where the robust infrastructure of Google Cloud Platform (GCP) comes into play.

GCP offers a comprehensive range of computing capabilities that can significantly enhance and scale the deployment of open-source models. With its powerful compute engines and scalable virtual machines, GCP enables users to process large image datasets more efficiently. Moreover, GCP's global network ensures low latency and high throughput, facilitating faster processing times and more responsive model training and inference.

Additionally, GCP integrates seamlessly with tools like PyTorch and other machine learning frameworks, which are often used in the development of these open-source models. This integration not only simplifies the workflow but also optimizes the performance of the models by utilizing GCP's advanced GPU and TPU offerings. Such capabilities make it feasible for a wider audience to adopt and customize sophisticated background removal technologies, ultimately broadening their applicability and impact in various fields.

Conclusion

The journey toward fully automated object removal from photos is making promising strides. Open-source tools lay a strong groundwork, while advanced platforms like Vertex AI extend these capabilities, ensuring that even users with limited local resources can achieve professional-quality edits effortlessly. As these technologies evolve, we can anticipate even more sophisticated, accurate, and user-friendly tools, making high-quality photo editing increasingly accessible to all.

If you're developing a machine learning solution and are looking to harness the power of Google Cloud Platform's (GCP) advanced machine learning capabilities, including Vertex AI and robust computing resources, contact our experts at Bitstrapped. Book a free 30-minute discovery call to discuss your initiatives and discover how GCP can significantly enhance your machine learning deployment. We're ready to answer any questions you might have about machine learning technologies and show you how GCP can elevate your project's efficiency and scalability.

Article By

Abhishek Kathuria