I came a cross a data set that included two unique identification fields. The first was for the data set and was truly unique. The second was a cross reference to a separate data set and unfortunately wasn’t unique. I wrote some R code to remove any duplicated instances of the non-unique identifiers.
The data set was loaded in from a .csv into a R data frame, mydata. “File Number” is the column name where the duplicated values reside.
#Remove duplicates, all instances
duplicates = duplicated(mydata[c("File Number")])
remove <- unique(mydata[duplicates,c("File Number")])
mydata[(mydata[,c("File Number")] %in% remove),c("File Number")] <- ""
Code box by: Crayon Syntax Highlighter