Calculating the semantic distance between words with the Natural Language framework

Calculating the semantic distance between words with the Natural Language framework

Use the Natural Language framework to find synonyms by calculating the semantic distance of words.

Pieces of natural language text, like words and sentences, can share semantic similarities among them. This concept is what most of the suggestions prompt are based on and it is enabled by the NLEmbedding class of the Natural Language framework.

Natural Language Processing allows computers to understand and interact with human language. It helps to process and analyze text data, extracting meaningful information and making sense of the relationships between words and phrases. The NLEmbedding class provides the ability to map and analyze these relationships in a structured manner.

This capability is crucial for understanding and processing text. It powers a range of features across Appleā€™s ecosystem:

  • Helps Siri suggest relevant apps and actions;
  • Improves Spotlight search by refining results
  • Enables Notes to suggest related content
  • Allows Mail to offer relevant replies
  • Enhances keyboard predictions by understanding the context of your conversation in Messages.

In this reference article, we are going to explore the NLEmbedding class to understand how to take advantage of it.

Calculating semantic distance

Given a string, NLEmbedding creates a neighborhood populated by other pieces of text that share semantic similarities.

Some will be located further from that string, while others will be nearby. This distance is regulated by the semantic similarity the strings confronted share. The smaller the distance, the bigger the similarity. The NLEmbedding represents this neighborhood - a map of strings - into a vector space where strings are vectors.

import NaturalLanguage

func getSemanticDistance(for word: String, in language: NLLanguage) {
		
    // 1. Create the embedding for the language the word belongs to
    if let embedding = NLEmbedding.wordEmbedding(for: language) {
        
        // 2. Find the neighbors for the word
        embedding.enumerateNeighbors(for: word, maximumCount: 10) { neighbor, distance in
            
            // 3. Acces the neighbor distance from the word
            print("\(neighbor): \(distance)")
            
            return true
        }
    }
}

To start implementing semantic distance features with NLEmbedding start by importing the NaturalLanguage framework, then:

  1. Create an instance of NLEmbedding and call the method wordEmbedding(for:) passing the language of the string to process as a parameter. If available, it will return an NLEmbedding object, otherwise it will return nil.
  2. Call enumerateNeighbors(for:maximumCount:distanceType:using:) method, pass the string you want the neighbors of and the maximum number of times it has to be called. For each neighbor found, it will return its distance to the processed word.

The NLEmbedding object provides access to several useful information, such as:

  • the number of words in the vocabulary, with the vocabularySize;
  • the language of the text in the word embedding, accessing the language property;
  • checking if a term is in the vocabulary using the contains(_:) method, and in that case, requesting its vector represented as an array of doubles by using the vector(for:) method.

The NLEmbedding class also provides a method to get the distance between one word and another.

func semanticDistanceBetween(firstWord: String, and secondWord: String, for language: NLLanguage) {
    // 1. Create the embedding for the language the words belongs to
    if let embedding = NLEmbedding.wordEmbedding(for: language) {
        // 2. Calculate the distance between words
        let distance = embedding.distance(between: firstWord, and: secondWord)
        
        print("The distance between \(firstWord) and \(secondWord) is \(distance).")
    }
}

The distance(between:and:distanceType:) method calculates the distance between two strings in the vector space. It takes as parameters:

  • The two strings you want to calculate the distance of;
  • A distanceType, the type of distance metric to use when determining similarity.

Example with a SwiftUI view

The following is an example of a SwiftUI view to find synonyms of a word using the Natural Language framework.

import NaturalLanguage

struct SynonymsView: View {
    
    @State var text: String = ""
    @State var results: [String] = []
    @State var errorMessage: String? = nil
    
    var body: some View {
        
        List {
            Section {
                TextEditor(text: $text)
                    .frame(height: 100)
                    .textFieldStyle(.plain)
            }
            
            Section {
                Button("Find synonyms") {
                    self.findSynomys()
                }
            }
            
            if results.isEmpty {
                ContentUnavailableView(
                    errorMessage != nil ? errorMessage! : "Text wasn't analyzed",
                    systemImage: "doc.text.magnifyingglass"
                )
            } else {
                Section {
                    ForEach(results, id: \.self ) { item in
                        Text("\(item)")
                    }
                }
            }
        }
    }
    
    private func findSynomys() {
        do {
            try self.getSynonyms(word: text)
        } catch {
            errorMessage = "Error: \(error)"
        }
    }
    
    // Detecting the hypothethical languages the word belongs to
    private func getLanguages(of text: String) -> [Dictionary<NLLanguage,Double>.Element]? {
        let languageRecognizer = NLLanguageRecognizer()
        languageRecognizer.processString(text)
        
        let hypothesis = languageRecognizer
            .languageHypotheses(withMaximum: 5)
            .sorted(by: { $0.value > $1.value })
        
        return hypothesis
    }
    
    // Looking for synonyms
    private func getSynonyms(word: String) throws {
        
        self.results.removeAll()
        
        guard let languages = getLanguages(of: word) else {
            throw ErrorMessage.languageNotRecognized
        }
        
        for language in languages {
            if let embedding = NLEmbedding.wordEmbedding(for: language.key) {
                guard embedding.contains(word.lowercased()) else {
                    throw ErrorMessage.wordNotIncludedInEmbedding
                }
                
                embedding.enumerateNeighbors(for: word.lowercased(), maximumCount: 5) { neighbor, distance in
                    self.results.append(neighbor)
                    return true
                }
            }
        }
        
        guard (self.results.count > 0) else {
            throw ErrorMessage.noSynonymFound
        }
    }
    
    // Handling Errors
    enum ErrorMessage: Error, CustomStringConvertible {
        
        case noSynonymFound
        case wordNotIncludedInEmbedding
        case vocabularyNotAvailable
        case languageNotRecognized
        
        var description: String {
            switch self {
            case .noSynonymFound:
                "No synonym found"
            case .wordNotIncludedInEmbedding:
                "Word not included in the vocabulary"
            case .vocabularyNotAvailable:
                "Vocabulary not available"
            case .languageNotRecognized:
                "Language not recognized"
            }
        }
    }
    
}

In the example above, the user inputs a word and when the button is pressed, it will look for its synonyms using the Natural Language framework.

The "Find synonyms" button triggers the detection of the language the word belongs to and, if found, checks whether the word exists or not in the language's vocabulary. Then, it retrieves a collection of similar words displaying them in a list.

If no synonyms are found or the word isn't recognized, it shows an error message.