Package: NUSS 0.1.0

Oskar Kosch

NUSS: Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Authors:Oskar Kosch [aut, cre]

NUSS_0.1.0.tar.gz
NUSS_0.1.0.zip(r-4.7)NUSS_0.1.0.zip(r-4.6)NUSS_0.1.0.zip(r-4.5)
NUSS_0.1.0.tgz(r-4.6-x86_64)NUSS_0.1.0.tgz(r-4.6-arm64)NUSS_0.1.0.tgz(r-4.5-x86_64)NUSS_0.1.0.tgz(r-4.5-arm64)
NUSS_0.1.0.tar.gz(r-4.7-arm64)NUSS_0.1.0.tar.gz(r-4.7-x86_64)NUSS_0.1.0.tar.gz(r-4.6-arm64)NUSS_0.1.0.tar.gz(r-4.6-x86_64)
NUSS_0.1.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
card.svg |card.png
NUSS/json (API)

# Install 'NUSS' in R:

install.packages('NUSS', repos = c('https://theogrost.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/theogrost/nuss/issues

Uses libs:

c++– GNU Standard C++ Library v3

Datasets:

base_dictionary - Base dictionary with unigrams

On CRAN:

cpp

2.70 score 8 scripts 145 downloads 6 exports 45 dependencies

Last updated from:2e104423fa. Checks:13 OK. Indexed: yes.

Target	Result	Time
linux-devel-arm64	OK	170
linux-devel-x86_64	OK	150
source / vignettes	OK	196
linux-release-arm64	OK	158
linux-release-x86_64	OK	152
macos-release-arm64	OK	164
macos-release-x86_64	OK	405
macos-oldrel-arm64	OK	161
macos-oldrel-x86_64	OK	263
windows-devel	OK	141
windows-release	OK	166
windows-oldrel	OK	129
wasm-release	OK	128

Exports:igrepl ngrams_dictionary ngrams_segmentation nuss unigram_dictionary unigram_sequence_segmentation

Dependencies:BH cli cpp11 data.table digest dplyr dtt english float generics glue lattice lexicon lgr lifecycle magrittr Matrix MatrixExtra mgsub mlapi NLP pillar pkgconfig purrr qdapRegex R6 Rcpp RcppArmadillo RhpcBLASctl rlang rsparse slam stringi stringr syuzhet text2vec textclean textshape tibble tidyr tidyselect utf8 vctrs withr zoo

Citation

Development and contributors

Readme and manuals

Help Manual

Help page	Topics
Base dictionary with unigrams	base_dictionary
Perform inverse regex search (C++)	igrepl
Create n-grams dictionary	ngrams_dictionary
Segmenting sequences with n-grams.	ngrams_segmentation
Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function	nuss
Create unigram dictionary	unigram_dictionary
Segmenting sequences with unigrams	unigram_sequence_segmentation