Skip to content

Conversation

mboecker
Copy link

This encodes only columns with type "string" with quotation marks.

@jakob-r

@mboecker
Copy link
Author

We could also do something like this, if it is preferred:

  if (quote == "all") {

    # Quote everything
    quote = TRUE
  } else if(quote == "fast") {

    # find out which columns to quote, we currently quote strings but not factors
    quote = which(sapply(colnames(x), function(colname) { is.character(x[,colname]) }))
  } else if(quote == "slow") {

    # quote strings and factors, which contain whitespace (as per specification)
    quote = which(sapply(colnames(x), function(colname) { any(sapply(x[,colname], function(str) grepl(pattern = "\\s+", str))) }))
  } else {

    stop("writeARFF: missing argument quote. please specify all, fast or slow")
  }

@jakob-r
Copy link
Member

jakob-r commented Jan 21, 2019

why so comlplicated?

vapply(data, is.character, logical(1))

but actually i would like something like this

vapply(data, function(x) is.factor(x) && all(!is.na(as.numeric(levels(x)))), logical(1))

Background info for the others:
If we have a column that has e.g. 0,1 and is our label. The MOA Framework did not classify correctly if they where enquoted as '0' and '1'. This can be a bug on their side as well but it seems that many datasets dont store factors enquoted so i wanted this option.

@mboecker
Copy link
Author

Hey @jakob-r
I adapted your code, but added is.numeric as a case to not use quotes.
This works nicely and results in strings and factors that contain non-numeric levels being quoted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants