Identifying the Language in a Text using the Natural Language Framework

Identifying the Language in a Text using the Natural Language Framework

By the end of this article, you will be able to identify the dominant language in a piece of text using the Natural Language framework.

Apple’s machine learning frameworks make it easier for anyone to add powerful features to their applications. Besides allowing you to create your own machine learning models, Apple has also created frameworks powered by CoreML focused on specific applications of the technology and you can import them into your applications to take advantage of a rich set of machine learning capabilities.

One of these frameworks is the Natural Language framework, which provides a variety of natural language processing features and supports many different languages. In this article, we will explore how to use the Natural Language framework to detect the language of a piece of text.

Language Identification

For identifying the language of a body of text we use a NLLanguageRecognizer. It automatically detects the language of a piece of text. It can identify the language by the most likely language or by providing you with a set of language candidates.

Apple Developer Documentation

NLLanguageRecognizer Class Documentation

From now on we will use this sample text for our examples.

var sampleText = "Anunciada em 2014, a linguagem de programação Swift  tornou-se rapidamente uma das linguagens que mais cresce na história. Swift facilita a escrita de software incrivelmente rápido e seguro por design. Nossos objetivos para a linguagem Swift são ambiciosos: queremos tornar a programação simples, e as coisas difíceis possíveis."

Sample text used in the examples.

In the Natural Language framework, we can identify the dominant language of the text in two ways.

The first way is by using the NLLanguageRecognizer type method dominantLanguage(for:) which, if possible, will return the most likely language for that piece of text as an NLLanguage object.

func language(of text: String) -> String {
	if let language = NLLanguageRecognizer.dominantLanguage(for: text) {
		return language.rawValue
	} else {
		return "Could not identify dominant language"
	}
}

print("Identified language: \(language(of: sampleText))")
// Identified language: pt

Identify the dominant language by using the type method dominantLanguage(for:)

The second way is by instantiating an NLLanguageRecognizer object and using the method processString(_:) to analyze the piece of text you want to identify the dominant language of. Afterward, you can check which is the dominant language of that piece of text through the .dominantLanguage property.

func language(of text: String) -> String {
	let languageRecognizer = NLLanguageRecognizer()
	languageRecognizer.processString(text)

	if let language = languageRecognizer.dominantLanguage {
		return language.rawValue
	} else {
		return "Could not identify the language."
	}
}

print("Identified language: \(language(of: sampleText))")
// Identified language: pt

Identify the dominant language by processing the string with a language recognizer object.

If what you are interested in is how likely that piece of text is of a certain language, then you can work with the method languageHypotheses(withMaximum:). It returns a dictionary with each language and the probability of that piece of text being of that language, limited to a maximum number of languages defined by you.

Of course, identifying the most probable language of a piece of text or its dominant language is just one of the capabilities of the Natural Language framework.

In the future we will explore how to use and when to use text tokenization, breaking up a piece of text into tokens, and named entity recognition, categorizing these tokens.

Natural Language also allows you to use custom models to classify or tag natural language, so you can train models to attend exactly what you need to power your applications.

For more about the Natural Language framework check the official documentation by Apple.

Apple Developer Documentation

Natural Language framework documentation