I have a dataframe as below, I want to create dummy columns based for each unique string available in the genres column
tconst genres 1: tt0000001 Documentary,Short 2: tt0000002 Animation,Short 3: tt0000003 Animation,Comedy,Romance 4: tt0000004 Animation,Short 5: tt0000005 Comedy,Short 6: tt0000006 Short 7: tt0000007 Short,Sport 8: tt0000008 Documentary,Short 9: tt0000009 Romance,Short10: tt0000010 Documentary,Short11: tt0000011 Documentary,Short12: tt0000012 Documentary,Short13: tt0000013 Documentary,Short14: tt0000014 Comedy,Short15: tt0000015 Animation,Short
I have tried to use the code below, apart from not being efficient, it is returning incorrect output.
uniqueGenre <- MovieRating_test %>% separate_rows(genres) %>% pull() %>% unique()for(i in 1:nrow(MovieRating_test)){ for(j in uniqueGenre){ MovieRating_test[i,j] <- ifelse(j %in% strsplit(as.character(MovieRating_test[,"genres"][i]),","), 1, 0) }}
dataset
MovieRating_test <- structure(list(tconst = c("tt0000001", "tt0000002", "tt0000003", "tt0000004", "tt0000005", "tt0000006", "tt0000007", "tt0000008", "tt0000009", "tt0000010", "tt0000011", "tt0000012", "tt0000013", "tt0000014", "tt0000015"), genres = c("Documentary,Short", "Animation,Short", "Animation,Comedy,Romance", "Animation,Short", "Comedy,Short", "Short", "Short,Sport", "Documentary,Short", "Romance,Short", "Documentary,Short", "Documentary,Short", "Documentary,Short", "Documentary,Short", "Comedy,Short", "Animation,Short")), row.names = c(NA, -15L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000023514d61ef0>)
a data.table solution would be prefered but any solution is welcome.