Controls for text data used in the blocking
functions, passed to tokenizers::tokenize_character_shingles.
Usage
controls_txt(
n_shingles = 2L,
n_chunks = 10L,
lowercase = TRUE,
strip_non_alphanum = TRUE
)
Arguments
- n_shingles
length of shingles (default 2L
),
- n_chunks
passed to (default 10L
),
- lowercase
should the characters be made lowercase? (default TRUE
),
- strip_non_alphanum
should punctuation and white space be stripped? (default TRUE
).
Value
Returns a list with parameters.