Skip to contents

Controls for text data used in the blocking functions, passed to tokenizers::tokenize_character_shingles.

Usage

controls_txt(
  n_shingles = 2L,
  n_chunks = 10L,
  lowercase = TRUE,
  strip_non_alphanum = TRUE
)

Arguments

n_shingles

length of shingles (default 2L),

n_chunks

passed to (default 10L),

lowercase

should the characters be made lowercase? (default TRUE),

strip_non_alphanum

should punctuation and white space be stripped? (default TRUE).

Value

Returns a list with parameters.

Author

Maciej Beręsewicz