I am attempting to train a naive bayes model on text data, having predetermined the number of folds (so as to allow for comparison with other models), and employed adaptive resampling for hyperparameter tuning. However, this error appears:
Error in if (tmps < .Machine$double.eps^0.5) 0 else tmpm/tmps :
missing value where TRUE/FALSE needed
I know there are other methods, such as provided by the quanteda package, however, I wanting to remain with caret so that I am able to compare other models using the same data.
Any help would be much appreciated.
My code is below:
library(tidyverse) library(quanteda) library(quanteda.textmodels) library(caret) corp <- data_corpus_moviereviews set.seed(300) id_train <- sample(docnames(corp), size = 1500, replace = FALSE) # get training set training_dfm <- corpus_subset(corp, docnames(corp) %in% id_train) %>% dfm(stem = TRUE, tolower=TRUE, remove=stopwords("en"), remove_symbols=TRUE) # get test set (documents not in id_train, make features equal) test_dfm <- corpus_subset(corp, !docnames(corp) %in% id_train) %>% dfm(stem = TRUE, tolower=TRUE, remove_symbols=TRUE, remove=stopwords("en")) %>% dfm_select(pattern = training_dfm, selection = "keep") training_m <- convert(training_dfm, to = "matrix") test_m <- convert(test_dfm, to = "matrix") myFolds <- createFolds(training_m, k = 5) myControl <- trainControl( method="adaptive_cv", repeats=2, summaryFunction = twoClassSummary, classProbs = TRUE, verboseIter = TRUE, index = myFolds, adaptive = list(min = 2, alpha = 0.05, method = "gls", complete = TRUE), search = "random") nb_caret <- train(x = training_m, y = as.factor(docvars(training_dfm, "sentiment")), method = "naive_bayes", trControl = myControl, tuneLength = 3, verbose = TRUE, metric = "ROC") ```