Generating person segmentation with the Vision Framework

Generating person segmentation with the Vision Framework

Learn how to use the Vision framework to detect people on images and segment them out with a mask.

One task that can be easily addressed with the machine learning leverage provided by the Vision framework is generating a segmentation mask for people in images, separating them from the background.

This kind of operation is particularly useful in scenarios where isolating people from the background is essential, such as:

  • Creating virtual backgrounds for video conferencing apps;
  • Enhancing photos with creative dynamic editing features in image editing apps;
  • Accessibility tools that allow highlighting individuals for navigation assistance, or interaction in complex settings.

Let's explore the GeneratePersonSegmentationRequest, the Vision framework request responsible for the creation of a matte mask for the detected person in the input image.

Generating person segmentation of an image

To start using this class, you need to import Vision.

import Vision

Then, create a function that receives the image to be analyzed as a parameter and returns the PixelBufferObservation object.

func generatePersonSegmentation(image: UIImage) async throws -> PixelBufferObservation? {
    // 1. The image to process
    guard let image = CIImage(image: image) else {
        return nil
    }
    
    do{
        // 2. The request
        let request = GeneratePersonSegmentationRequest()
        // 3. Settings
        request.qualityLevel = .balanced
        // 4. The result from the performing request
        let result = try await request.perform(on: image)
        return result
        
    } catch {
        print("Encountered an error when performing the request: \(error.localizedDescription)")
    }
    return nil
}

The function generatePersonSegmentation(image:) works as follows:

  1. Declares a constant called image to store the image to be processed as CIImage
  2. Sets the GeneratePersonSegmentationRequest object
  3. Sets the qualityLevel of the request, a value that tells the request how to balance the accuracy and the performance with 3 different case options:
    1. accurate - preferring the image quality rather than performance
    2. balanced - balancing the image quality and the performance
    3. fast - giving importance to the performance only
  4. Stores the result from the performed request using the method perform(on:)

In case a result is provided, the returned object will be of type PixelBufferObservation, which is a representation of an image deriving from the image-analysis request.

From this object, it is possible to access to the cgImage property, a CGImage object that stores the Core Graphics image created from the pixel buffer observation, which allows the rendering of the matte mask for the detected person in the input image.

Changing the background color

func applyMauveBackground(originalImage: UIImage, observation: PixelBufferObservation) -> CGImage? {
        
        // 1. The CGImage from the observation and the CIImage of original image
        guard let maskCGImage = try? observation.cgImage,
              let ciOriginalImage = CIImage(image: originalImage) else { return nil }
        // 2. The CIImage from the mask
        var ciMaskImage = CIImage(cgImage: maskCGImage)
        
        // 3. Ensure the mask and original image have the same size
        // a. Retrieves the dimensions of the original image.
        let originalExtent = ciOriginalImage.extent
        // b. Resizes the mask 
        ciMaskImage = CIImage(cgImage: maskCGImage).transformed(by: CGAffineTransform(scaleX: originalExtent.width / CGFloat(maskCGImage.width), y: originalExtent.height / CGFloat(maskCGImage.height)))
        
        // 4. Create a mauve background
        let mauveBackground = CIImage(color: CIColor(red: 1.0, green: 0.5, blue: 1.0))
            // a. crop it 
            .cropped(to: ciOriginalImage.extent)
        
        // 5. Composite the original image over the mauve background using the mask
        let blendFilter = CIFilter.blendWithMask()
        blendFilter.inputImage = ciOriginalImage
        blendFilter.backgroundImage = mauveBackground
        blendFilter.maskImage = ciMaskImage
        
        // 6. Render the composite image
        let context = CIContext()
        guard let outputImage = blendFilter.outputImage,
              let cgImage = context.createCGImage(outputImage, from: outputImage.extent) else { return nil }
        
        return cgImage
}

The applyMauveBackground(originalImage:observation:) is a function that takes a UIImage and PixelBufferObservation as a parameter and return a CGImage object:

  1. Access the cgImage property of the observation and converts the original image into a CIImage for Core Image processing
  2. Convert the segmentation mask from a CGImage  to a CIImage, making it compatible with Core Image filters
  3. Ensure the same size for the mask and the image
    1. Retrieve the dimensions of the original image by accessing the extent property
    2. Resize the mask using a CGAffineTransform to scale it to match the size of the original image, ensuring alignment for blending
  4. Create a mauve background
    1. Crop it to match the size of the original image
  5. Composite the original image over the mauve background using the mask with  CIFilter.blendWithMask() filter to combine all together:
    1. inputImage - the original image (person to keep visible)
    2. backgroundImage - the solid mauve background
    3. maskImage - the resized segmentation mask to define where the person is visible
  6. Render the composite image
  7. Returns the rendered CGImage

Integration in a SwiftUI view

import SwiftUI
import Vision
import CoreImage.CIFilterBuiltins

struct ContentView: View {

    @State var image: CGImage?
    
    var body: some View {
        VStack {
            if let image = image {
                Image(uiImage: UIImage(cgImage: image))
                    .resizable()
                    .scaledToFit()
            } else {
                // Placeholder image of your choice
                Image("placeholder")
                    .resizable()
                    .scaledToFit()
            }
            
            Button(action: {
                self.getSegmentation()
            }, label: {
                Text("Change background color")
            })
        }
        .padding()
    }
    
    private func getSegmentation(){
        Task {
            do {
                guard let uiImage = UIImage(named: "picture"),
                      let observation = try await generatePersonSegmentation(image: uiImage) else { return }
                      
                self.image = applyMauveBackground(
                    originalImage: uiImage,
                    observation: observation
                )
            } catch {
                print("Error generating persons: \(error)")
            }
        }
    }

    private func applyMauveBackground(originalImage: UIImage, observation: PixelBufferObservation) -> CGImage? {
        
        guard let maskCGImage = try? observation.cgImage,
              let ciOriginalImage = CIImage(image: originalImage) else { return nil }
        
        var ciMaskImage = CIImage(cgImage: maskCGImage)
        
        let originalExtent = ciOriginalImage.extent
        ciMaskImage = CIImage(cgImage: maskCGImage).transformed(by: CGAffineTransform(scaleX: originalExtent.width / CGFloat(maskCGImage.width), y: originalExtent.height / CGFloat(maskCGImage.height)))
        
        let mauveBackground = CIImage(color: CIColor(red: 1.0, green: 0.5, blue: 1.0))
            .cropped(to: ciOriginalImage.extent)
        
        let blendFilter = CIFilter.blendWithMask()
        blendFilter.inputImage = ciOriginalImage
        blendFilter.backgroundImage = mauveBackground
        blendFilter.maskImage = ciMaskImage
        
        let context = CIContext()
        guard let outputImage = blendFilter.outputImage,
              let cgImage = context.createCGImage(outputImage, from: outputImage.extent) else { return nil }
        
        return cgImage
    }
    
    private func generatePersonSegmentation(image: UIImage) async throws -> PixelBufferObservation? {
        guard let image = CIImage(image: image) else { return nil }
        
        do{
            let request = GeneratePersonSegmentationRequest()
            request.qualityLevel = .balanced
            let result = try await request.perform(on: image)
            return result
            
        } catch {
            print("Encountered an error when performing the request: \(error.localizedDescription)")
        }
        return nil
    }
}

In this example, a button triggers the segmentation process and displays the people detected mask, highlighting people against a mauve background.

The GeneratePersonSegmentationRequest class in the Vision framework simplifies implementing person segmentation in native iOS applications with seamless integration. It makes incorporating advanced image processing features very easy, allowing one to leverage the power of machine learning to enhance user experiences effortlessly.