Variable names in concrete data:
Concrete data Summary
Mixtures data Summary
Mixtures data Feature Plot
We could list out all the variable names as shown below:
featurePlot(x=mixtures[,c(“Cement”, “BlastFurnaceSlag”, “FlyAsh”, “Water”, “Superplasticizer”, “CoarseAggregate”, “FineAggregate”, “Age”)],y=mixtures$CompressiveStrength, plot=”pairs”)
or simplify a bit:
names <- colnames(mixtures)
names <- names[-length(names)]
featurePlot(x = mixtures[, names], y = mixtures$CompressiveStrength, plot=“pairs”)
As of 4/19/15 the caret package has the following dependencies:
- package ‘colorspace
Verify it installed correctly; try loading the library:
If you also see “also installing the dependency ‘CORElearn’” in the console is because Applied Predictive Modeling requires CORElearn which should install automatically. If it already is installed you won’t see that message.
I came a cross a data set that included two unique identification fields. The first was for the data set and was truly unique. The second was a cross reference to a separate data set and unfortunately wasn’t unique. I wrote some R code to remove any duplicated instances of the non-unique identifiers.
The data set was loaded in from a .csv into a R data frame, mydata. “File Number” is the column name where the duplicated values reside.
#Remove duplicates, all instances
duplicates = duplicated(mydata[c("File Number")])
remove <- unique(mydata[duplicates,c("File Number")])
mydata[(mydata[,c("File Number")] %in% remove),c("File Number")] <- ""
Code box by: Crayon Syntax Highlighter