Natural language detection for Go.
- Supports 84 languages
- 100% written in Go
- No external dependencies
- Fast
- Recognizes not only a language, but also a script (Latin, Cyrillic, etc)
Installation:
    go get -u github.com/abadojack/whatlanggoSimple usage example:
package main
import (
	"fmt"
	"github.com/abadojack/whatlanggo"
)
func main() {
	info := whatlanggo.Detect("Foje funkcias kaj foje ne funkcias")
	fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script], " Confidence: ", info.Confidence)
}package main
import (
	"fmt"
	"github.com/abadojack/whatlanggo"
)
func main() {
	//Blacklist
	options := whatlanggo.Options{
		Blacklist: map[whatlanggo.Lang]bool{
			whatlanggo.Ydd: true,
		},
	}
	info := whatlanggo.DetectWithOptions("האקדמיה ללשון העברית", options)
	fmt.Println("Language:", info.Lang.String(), "Script:", whatlanggo.Scripts[info.Script])
	//Whitelist
	options1 := whatlanggo.Options{
		Whitelist: map[whatlanggo.Lang]bool{
			whatlanggo.Epo: true,
			whatlanggo.Ukr: true,
		},
	}
	info = whatlanggo.DetectWithOptions("Mi ne scias", options1)
	fmt.Println("Language:", info.Lang.String(), " Script:", whatlanggo.Scripts[info.Script])
}For more details, please check the documentation.
Go 1.8 or higher
The algorithm is based on the trigram language models, which is a particular case of n-grams. To understand the idea, please check the original whitepaper Cavnar and Trenkle '94: N-Gram-Based Text Categorization'.
It is based on the following factors:
- How many unique trigrams are in the given text
- How big is the difference between the first and the second(not returned) detected languages? This metric is called ratein the code base.
Therefore, it can be presented as 2d space with threshold functions, that splits it into "Reliable" and "Not reliable" areas. This function is a hyperbola and it looks like the following one:
For more details, please check a blog article Introduction to Rust Whatlang Library and Natural Language Identification Algorithms.
whatlanggo is a derivative of Franc (JavaScript, MIT) by Titus Wormer.
Thanks to greyblake (Potapov Sergey) for creating whatlang-rs from where I got the idea and algorithms.
