Package: NUSS 0.1.0

Oskar Kosch
NUSS: Mixed N-Grams and Unigram Sequence Segmentation
Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.
Authors:
NUSS_0.1.0.tar.gz
NUSS_0.1.0.zip(r-4.7)NUSS_0.1.0.zip(r-4.6)NUSS_0.1.0.zip(r-4.5)
NUSS_0.1.0.tgz(r-4.6-x86_64)NUSS_0.1.0.tgz(r-4.6-arm64)NUSS_0.1.0.tgz(r-4.5-x86_64)NUSS_0.1.0.tgz(r-4.5-arm64)
NUSS_0.1.0.tar.gz(r-4.7-arm64)NUSS_0.1.0.tar.gz(r-4.7-x86_64)NUSS_0.1.0.tar.gz(r-4.6-arm64)NUSS_0.1.0.tar.gz(r-4.6-x86_64)
NUSS_0.1.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
card.svg |card.png
NUSS/json (API)
| # Install 'NUSS' in R: |
| install.packages('NUSS', repos = c('https://theogrost.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/theogrost/nuss/issues
- base_dictionary - Base dictionary with unigrams
Last updated from:2e104423fa. Checks:13 OK. Indexed: yes.
| Target | Result | Time | Files | Syslog |
|---|---|---|---|---|
| linux-devel-arm64 | OK | 170 | ||
| linux-devel-x86_64 | OK | 150 | ||
| source / vignettes | OK | 196 | ||
| linux-release-arm64 | OK | 158 | ||
| linux-release-x86_64 | OK | 152 | ||
| macos-release-arm64 | OK | 164 | ||
| macos-release-x86_64 | OK | 405 | ||
| macos-oldrel-arm64 | OK | 161 | ||
| macos-oldrel-x86_64 | OK | 263 | ||
| windows-devel | OK | 141 | ||
| windows-release | OK | 166 | ||
| windows-oldrel | OK | 129 | ||
| wasm-release | OK | 128 |
Exports:igreplngrams_dictionaryngrams_segmentationnussunigram_dictionaryunigram_sequence_segmentation
Dependencies:BHclicpp11data.tabledigestdplyrdttenglishfloatgenericsgluelatticelexiconlgrlifecyclemagrittrMatrixMatrixExtramgsubmlapiNLPpillarpkgconfigpurrrqdapRegexR6RcppRcppArmadilloRhpcBLASctlrlangrsparseslamstringistringrsyuzhettext2vectextcleantextshapetibbletidyrtidyselectutf8vctrswithrzoo
Readme and manuals
Help Manual
| Help page | Topics |
|---|---|
| Base dictionary with unigrams | base_dictionary |
| Perform inverse regex search (C++) | igrepl |
| Create n-grams dictionary | ngrams_dictionary |
| Segmenting sequences with n-grams. | ngrams_segmentation |
| Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function | nuss |
| Create unigram dictionary | unigram_dictionary |
| Segmenting sequences with unigrams | unigram_sequence_segmentation |