Package: NUSS 0.1.0

Oskar Kosch

NUSS: Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Authors:Oskar Kosch [aut, cre]

NUSS_0.1.0.tar.gz
NUSS_0.1.0.zip(r-4.7)NUSS_0.1.0.zip(r-4.6)NUSS_0.1.0.zip(r-4.5)
NUSS_0.1.0.tgz(r-4.6-x86_64)NUSS_0.1.0.tgz(r-4.6-arm64)NUSS_0.1.0.tgz(r-4.5-x86_64)NUSS_0.1.0.tgz(r-4.5-arm64)
NUSS_0.1.0.tar.gz(r-4.7-arm64)NUSS_0.1.0.tar.gz(r-4.7-x86_64)NUSS_0.1.0.tar.gz(r-4.6-arm64)NUSS_0.1.0.tar.gz(r-4.6-x86_64)
NUSS_0.1.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
NUSS/json (API)

# Install 'NUSS' in R:
install.packages('NUSS', repos = c('https://theogrost.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/theogrost/nuss/issues

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

Conda:

cpp

2.70 score 8 scripts 145 downloads 6 exports 45 dependencies

Last updated from:2e104423fa. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK170
linux-devel-x86_64OK150
source / vignettesOK196
linux-release-arm64OK158
linux-release-x86_64OK152
macos-release-arm64OK164
macos-release-x86_64OK405
macos-oldrel-arm64OK161
macos-oldrel-x86_64OK263
windows-develOK141
windows-releaseOK166
windows-oldrelOK129
wasm-releaseOK128

Exports:igreplngrams_dictionaryngrams_segmentationnussunigram_dictionaryunigram_sequence_segmentation

Dependencies:BHclicpp11data.tabledigestdplyrdttenglishfloatgenericsgluelatticelexiconlgrlifecyclemagrittrMatrixMatrixExtramgsubmlapiNLPpillarpkgconfigpurrrqdapRegexR6RcppRcppArmadilloRhpcBLASctlrlangrsparseslamstringistringrsyuzhettext2vectextcleantextshapetibbletidyrtidyselectutf8vctrswithrzoo