Generating person segmentation with the Vision Framework
Learn how to use the Vision framework to detect people on images and segment them out with a mask.
One task that can be easily addressed with the machine learning leverage provided by the Vision framework is generating a segmentation mask for people in images, separating them from the background.
This kind of operation is particularly useful in scenarios where isolating people from the background is essential, such as:
- Creating virtual backgrounds for video conferencing apps;
- Enhancing photos with creative dynamic editing features in image editing apps;
- Accessibility tools that allow highlighting individuals for navigation assistance, or interaction in complex settings.
Let's explore the GeneratePersonSegmentationRequest
, the Vision framework request responsible for the creation of a matte mask for the detected person in the input image.
Generating person segmentation of an image
To start using this class, you need to import Vision.
import Vision
Then, create a function that receives the image to be analyzed as a parameter and returns the PixelBufferObservation
object.
func generatePersonSegmentation(image: UIImage) async throws -> PixelBufferObservation? {
// 1. The image to process
guard let image = CIImage(image: image) else {
return nil
}
do{
// 2. The request
let request = GeneratePersonSegmentationRequest()
// 3. Settings
request.qualityLevel = .balanced
// 4. The result from the performing request
let result = try await request.perform(on: image)
return result
} catch {
print("Encountered an error when performing the request: \(error.localizedDescription)")
}
return nil
}
The function generatePersonSegmentation(image:)
works as follows:
- Declares a constant called image to store the image to be processed as
CIImage
- Sets the
GeneratePersonSegmentationRequest
object - Sets the
qualityLevel
of the request, a value that tells the request how to balance the accuracy and the performance with 3 different case options: - Stores the result from the performed request using the method
perform(on:)
In case a result is provided, the returned object will be of type PixelBufferObservation
, which is a representation of an image deriving from the image-analysis request.
From this object, it is possible to access to the cgImage
property, a CGImage
object that stores the Core Graphics image created from the pixel buffer observation, which allows the rendering of the matte mask for the detected person in the input image.
Changing the background color
func applyMauveBackground(originalImage: UIImage, observation: PixelBufferObservation) -> CGImage? {
// 1. The CGImage from the observation and the CIImage of original image
guard let maskCGImage = try? observation.cgImage,
let ciOriginalImage = CIImage(image: originalImage) else { return nil }
// 2. The CIImage from the mask
var ciMaskImage = CIImage(cgImage: maskCGImage)
// 3. Ensure the mask and original image have the same size
// a. Retrieves the dimensions of the original image.
let originalExtent = ciOriginalImage.extent
// b. Resizes the mask
ciMaskImage = CIImage(cgImage: maskCGImage).transformed(by: CGAffineTransform(scaleX: originalExtent.width / CGFloat(maskCGImage.width), y: originalExtent.height / CGFloat(maskCGImage.height)))
// 4. Create a mauve background
let mauveBackground = CIImage(color: CIColor(red: 1.0, green: 0.5, blue: 1.0))
// a. crop it
.cropped(to: ciOriginalImage.extent)
// 5. Composite the original image over the mauve background using the mask
let blendFilter = CIFilter.blendWithMask()
blendFilter.inputImage = ciOriginalImage
blendFilter.backgroundImage = mauveBackground
blendFilter.maskImage = ciMaskImage
// 6. Render the composite image
let context = CIContext()
guard let outputImage = blendFilter.outputImage,
let cgImage = context.createCGImage(outputImage, from: outputImage.extent) else { return nil }
return cgImage
}
The applyMauveBackground(originalImage:observation:)
is a function that takes a UIImage
and PixelBufferObservation
as a parameter and return a CGImage
object:
- Access the
cgImage
property of the observation and converts the original image into aCIImage
for Core Image processing - Convert the segmentation mask from a
CGImage
to aCIImage
, making it compatible with Core Image filters - Ensure the same size for the mask and the image
- Retrieve the dimensions of the original image by accessing the
extent
property - Resize the mask using a
CGAffineTransform
to scale it to match the size of the original image, ensuring alignment for blending
- Retrieve the dimensions of the original image by accessing the
- Create a mauve background
- Crop it to match the size of the original image
- Composite the original image over the mauve background using the mask with
CIFilter.blendWithMask()
filter to combine all together:inputImage
- the original image (person to keep visible)backgroundImage
- the solid mauve backgroundmaskImage
- the resized segmentation mask to define where the person is visible
- Render the composite image
- Returns the rendered
CGImage
Integration in a SwiftUI view
import SwiftUI
import Vision
import CoreImage.CIFilterBuiltins
struct ContentView: View {
@State var image: CGImage?
var body: some View {
VStack {
if let image = image {
Image(uiImage: UIImage(cgImage: image))
.resizable()
.scaledToFit()
} else {
// Placeholder image of your choice
Image("placeholder")
.resizable()
.scaledToFit()
}
Button(action: {
self.getSegmentation()
}, label: {
Text("Change background color")
})
}
.padding()
}
private func getSegmentation(){
Task {
do {
guard let uiImage = UIImage(named: "picture"),
let observation = try await generatePersonSegmentation(image: uiImage) else { return }
self.image = applyMauveBackground(
originalImage: uiImage,
observation: observation
)
} catch {
print("Error generating persons: \(error)")
}
}
}
private func applyMauveBackground(originalImage: UIImage, observation: PixelBufferObservation) -> CGImage? {
guard let maskCGImage = try? observation.cgImage,
let ciOriginalImage = CIImage(image: originalImage) else { return nil }
var ciMaskImage = CIImage(cgImage: maskCGImage)
let originalExtent = ciOriginalImage.extent
ciMaskImage = CIImage(cgImage: maskCGImage).transformed(by: CGAffineTransform(scaleX: originalExtent.width / CGFloat(maskCGImage.width), y: originalExtent.height / CGFloat(maskCGImage.height)))
let mauveBackground = CIImage(color: CIColor(red: 1.0, green: 0.5, blue: 1.0))
.cropped(to: ciOriginalImage.extent)
let blendFilter = CIFilter.blendWithMask()
blendFilter.inputImage = ciOriginalImage
blendFilter.backgroundImage = mauveBackground
blendFilter.maskImage = ciMaskImage
let context = CIContext()
guard let outputImage = blendFilter.outputImage,
let cgImage = context.createCGImage(outputImage, from: outputImage.extent) else { return nil }
return cgImage
}
private func generatePersonSegmentation(image: UIImage) async throws -> PixelBufferObservation? {
guard let image = CIImage(image: image) else { return nil }
do{
let request = GeneratePersonSegmentationRequest()
request.qualityLevel = .balanced
let result = try await request.perform(on: image)
return result
} catch {
print("Encountered an error when performing the request: \(error.localizedDescription)")
}
return nil
}
}
In this example, a button triggers the segmentation process and displays the people detected mask, highlighting people against a mauve background.
The GeneratePersonSegmentationRequest
class in the Vision framework simplifies implementing person segmentation in native iOS applications with seamless integration. It makes incorporating advanced image processing features very easy, allowing one to leverage the power of machine learning to enhance user experiences effortlessly.