Calculating the semantic distance between words with the Natural Language framework
Use the Natural Language framework to find synonyms by calculating the semantic distance of words.
Pieces of natural language text, like words and sentences, can share semantic similarities among them. This concept is what most of the suggestions prompt are based on and it is enabled by the NLEmbedding
class of the Natural Language framework.
Natural Language Processing allows computers to understand and interact with human language. It helps to process and analyze text data, extracting meaningful information and making sense of the relationships between words and phrases. The NLEmbedding
class provides the ability to map and analyze these relationships in a structured manner.
This capability is crucial for understanding and processing text. It powers a range of features across Appleās ecosystem:
- Helps Siri suggest relevant apps and actions;
- Improves Spotlight search by refining results
- Enables Notes to suggest related content
- Allows Mail to offer relevant replies
- Enhances keyboard predictions by understanding the context of your conversation in Messages.
In this reference article, we are going to explore the NLEmbedding
class to understand how to take advantage of it.
Calculating semantic distance
Given a string, NLEmbedding
creates a neighborhood populated by other pieces of text that share semantic similarities.
Some will be located further from that string, while others will be nearby. This distance is regulated by the semantic similarity the strings confronted share. The smaller the distance, the bigger the similarity. The NLEmbedding
represents this neighborhood - a map of strings - into a vector space where strings are vectors.
import NaturalLanguage
func getSemanticDistance(for word: String, in language: NLLanguage) {
// 1. Create the embedding for the language the word belongs to
if let embedding = NLEmbedding.wordEmbedding(for: language) {
// 2. Find the neighbors for the word
embedding.enumerateNeighbors(for: word, maximumCount: 10) { neighbor, distance in
// 3. Acces the neighbor distance from the word
print("\(neighbor): \(distance)")
return true
}
}
}
To start implementing semantic distance features with NLEmbedding
start by importing the NaturalLanguage
framework, then:
- Create an instance of
NLEmbedding
and call the methodwordEmbedding(for:)
passing the language of the string to process as a parameter. If available, it will return anNLEmbedding
object, otherwise it will returnnil
. - Call
enumerateNeighbors(for:maximumCount:distanceType:using:)
method, pass the string you want the neighbors of and the maximum number of times it has to be called. For each neighbor found, it will return its distance to the processed word.
The NLEmbedding
object provides access to several useful information, such as:
- the number of words in the vocabulary, with the
vocabularySize
; - the language of the text in the word embedding, accessing the
language
property; - checking if a term is in the vocabulary using the
contains(_:)
method, and in that case, requesting its vector represented as an array of doubles by using thevector(for:)
method.
The NLEmbedding
class also provides a method to get the distance between one word and another.
func semanticDistanceBetween(firstWord: String, and secondWord: String, for language: NLLanguage) {
// 1. Create the embedding for the language the words belongs to
if let embedding = NLEmbedding.wordEmbedding(for: language) {
// 2. Calculate the distance between words
let distance = embedding.distance(between: firstWord, and: secondWord)
print("The distance between \(firstWord) and \(secondWord) is \(distance).")
}
}
The distance(between:and:distanceType:)
method calculates the distance between two strings in the vector space. It takes as parameters:
- The two strings you want to calculate the distance of;
- A
distanceType
, the type of distance metric to use when determining similarity.
Example with a SwiftUI view
The following is an example of a SwiftUI view to find synonyms of a word using the Natural Language framework.
import NaturalLanguage
struct SynonymsView: View {
@State var text: String = ""
@State var results: [String] = []
@State var errorMessage: String? = nil
var body: some View {
List {
Section {
TextEditor(text: $text)
.frame(height: 100)
.textFieldStyle(.plain)
}
Section {
Button("Find synonyms") {
self.findSynomys()
}
}
if results.isEmpty {
ContentUnavailableView(
errorMessage != nil ? errorMessage! : "Text wasn't analyzed",
systemImage: "doc.text.magnifyingglass"
)
} else {
Section {
ForEach(results, id: \.self ) { item in
Text("\(item)")
}
}
}
}
}
private func findSynomys() {
do {
try self.getSynonyms(word: text)
} catch {
errorMessage = "Error: \(error)"
}
}
// Detecting the hypothethical languages the word belongs to
private func getLanguages(of text: String) -> [Dictionary<NLLanguage,Double>.Element]? {
let languageRecognizer = NLLanguageRecognizer()
languageRecognizer.processString(text)
let hypothesis = languageRecognizer
.languageHypotheses(withMaximum: 5)
.sorted(by: { $0.value > $1.value })
return hypothesis
}
// Looking for synonyms
private func getSynonyms(word: String) throws {
self.results.removeAll()
guard let languages = getLanguages(of: word) else {
throw ErrorMessage.languageNotRecognized
}
for language in languages {
if let embedding = NLEmbedding.wordEmbedding(for: language.key) {
guard embedding.contains(word.lowercased()) else {
throw ErrorMessage.wordNotIncludedInEmbedding
}
embedding.enumerateNeighbors(for: word.lowercased(), maximumCount: 5) { neighbor, distance in
self.results.append(neighbor)
return true
}
}
}
guard (self.results.count > 0) else {
throw ErrorMessage.noSynonymFound
}
}
// Handling Errors
enum ErrorMessage: Error, CustomStringConvertible {
case noSynonymFound
case wordNotIncludedInEmbedding
case vocabularyNotAvailable
case languageNotRecognized
var description: String {
switch self {
case .noSynonymFound:
"No synonym found"
case .wordNotIncludedInEmbedding:
"Word not included in the vocabulary"
case .vocabularyNotAvailable:
"Vocabulary not available"
case .languageNotRecognized:
"Language not recognized"
}
}
}
}
In the example above, the user inputs a word and when the button is pressed, it will look for its synonyms using the Natural Language framework.
The "Find synonyms" button triggers the detection of the language the word belongs to and, if found, checks whether the word exists or not in the language's vocabulary. Then, it retrieves a collection of similar words displaying them in a list.
If no synonyms are found or the word isn't recognized, it shows an error message.