NUSS - Mixed N-Grams and Unigram Sequence Segmentation
Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.
Last updated 4 months ago
cpp
3.00 score 8 scripts 148 downloads