
Detecting documents in an image with the Vision framework
Learn how to use the Vision framework to detect documents in images.
When it comes to understanding which part of an image contains a document, we can leverage the machine learning capabilities of the Vision framework. The DetectDecoumentSegmentationRequest
type provides us with an easy and simple way to accomplish that.
private func detectDocument() async throws -> DetectedDocumentObservation? {
// 1. Set up the Request
let request = DetectDocumentSegmentationRequest()
// 2. The image to perform the detection on
let image = CIImage(image: UIImage(named: "document-sample")!)!
// 3. Perform the request
guard let observation = try await request.perform(on: image) else {return nil}
// 4. The result
return observation
}
To perform the detect document segmentation request works as follows:
- Start with the creation of an instance of the request.
- Adjust its settings if and as needed, and set the object to perform the request on.
- Perform the request using one of the perform methods, like
perform(on:orientation:)
- Return the resulting observation.
The resulting value is of type DetectedDocumentObservation
, an observation object that stores:
- The
confidence
of the observation - a float value stating the observation’s accuracy, from 0 to 1; - The four corners of the region containing the document in the analyzed image as
NormalizedPoint
values: - The
boundingBox
- the bounding box of the object with coordinates normalized to the dimensions of the processed image, with the origin at the lower-left corner of the picture; - The
globalSegmentationMask
- aPixelBufferObservation
representing a segmentation mask for the detected document.
This request is beneficial when performing tasks like highlighting or extracting the region containing the document directly, like in the example below:
import SwiftUI
import Vision
import CoreImage.CIFilterBuiltins
struct ContentView: View {
@State var image: CGImage?
var body: some View {
VStack {
if let image = image {
Image(uiImage: UIImage(cgImage: image))
.resizable()
.scaledToFit()
} else {
Image("document")
.resizable()
.scaledToFit()
}
Button(action: {
self.highlightDocument()
}, label: {
Text("Highlight Document")
})
}
.padding()
}
private func highlightDocument() {
Task {
guard let uiImage = UIImage(named: "document"),
let observation = try await detectDocument(image: uiImage) else { return }
guard let highlightedDocument = try await applyFilter(startImage: uiImage, observation: observation) else { return }
self.image = highlightedDocument
return
}
return
}
// Detect the document segmentation
private func detectDocument(image: UIImage) async throws -> DetectedDocumentObservation? {
// The image to perform the detection on
guard let image = CIImage(image: image) else { return nil }
do {
// Set up the Request
let request = DetectDocumentSegmentationRequest()
// Perform the request
guard let observation = try await request.perform(on: image) else { return nil }
return observation
} catch {
print("Encountered an error when performing the request: \(error.localizedDescription)")
}
return nil
}
// Apply mauve color on the parts of the image that are not included in the detected document
private func applyFilter(startImage: UIImage, observation: DetectedDocumentObservation) async throws -> CGImage? {
// 1. The CIImage of original image and the CGImage from the observation
guard let image = CIImage(image: startImage) else { return nil }
let maskCGImage = try observation.globalSegmentationMask.cgImage
// 2. The CIImage from the mask
var ciMaskImage = CIImage(cgImage: maskCGImage)
// 3. Ensure the mask and original image have the same size
let originalExtent = image.extent
ciMaskImage = CIImage(cgImage: maskCGImage).transformed(by: CGAffineTransform(
scaleX: originalExtent.width / CGFloat(maskCGImage.width),
y: originalExtent.height / CGFloat(maskCGImage.height)
))
// 4. Create a mauve background
let mauveBackground = CIImage(color: CIColor(red: 1.0, green: 0.5, blue: 1.0))
// 4.a Crop it to match the size of the original image
.cropped(to: image.extent)
// 5. Composite the original image over the green background using the mask
let blendFilter = CIFilter.blendWithMask()
blendFilter.inputImage = image
blendFilter.backgroundImage = mauveBackground
blendFilter.maskImage = ciMaskImage
// 6. Render the composite image
let context = CIContext()
guard let outputImage = blendFilter.outputImage,
let cgImage = context.createCGImage(outputImage, from: outputImage.extent) else { return nil }
return cgImage
}
}
In this SwiftUI app, a button triggers the detection and filtering process, allowing users to differentiate the document from its background for enhanced visual readability.
It first extracts the document’s segmentation mask and then applies a mauve background to non-document areas using Core Image filters. The processed image replaces the original in the UI.