We may use dummy_cols
from fastDummies
library(fastDummies)dummy_cols(MovieRating_test, 'genres', split = ",")
-output
tconst genres genres_Animation genres_Comedy genres_Romance genres_Short genres_Documentary<char> <char> <int> <int> <int> <int> <int> 1: tt0000001 Documentary,Short 0 0 0 1 1 2: tt0000002 Animation,Short 1 0 0 1 0 3: tt0000003 Animation,Comedy,Romance 1 1 1 0 0 4: tt0000004 Animation,Short 1 0 0 1 0 5: tt0000005 Comedy,Short 0 1 0 1 0 6: tt0000006 Short 0 0 0 1 0 7: tt0000007 Short,Sport 0 0 0 1 0 8: tt0000008 Documentary,Short 0 0 0 1 1 9: tt0000009 Romance,Short 0 0 1 1 010: tt0000010 Documentary,Short 0 0 0 1 111: tt0000011 Documentary,Short 0 0 0 1 112: tt0000012 Documentary,Short 0 0 0 1 113: tt0000013 Documentary,Short 0 0 0 1 114: tt0000014 Comedy,Short 0 1 0 1 015: tt0000015 Animation,Short 1 0 0 1 0 genres_Sport<int> 1: 0 2: 0 3: 0 4: 0 5: 0 6: 0 7: 1 8: 0 9: 010: 011: 012: 013: 014: 015: 0
Or another option with mtabulate
library(data.table)library(qdapTools)m1 <- MovieRating_test[, +(mtabulate(strsplit(genres, ",")) > 0)]MovieRating_test[, colnames(m1) := as.data.frame(m1)]
-output
> MovieRating_test tconst genres Animation Comedy Documentary Romance Short Sport<char> <char> <int> <int> <int> <int> <int> <int> 1: tt0000001 Documentary,Short 0 0 1 0 1 0 2: tt0000002 Animation,Short 1 0 0 0 1 0 3: tt0000003 Animation,Comedy,Romance 1 1 0 1 0 0 4: tt0000004 Animation,Short 1 0 0 0 1 0 5: tt0000005 Comedy,Short 0 1 0 0 1 0 6: tt0000006 Short 0 0 0 0 1 0 7: tt0000007 Short,Sport 0 0 0 0 1 1 8: tt0000008 Documentary,Short 0 0 1 0 1 0 9: tt0000009 Romance,Short 0 0 0 1 1 010: tt0000010 Documentary,Short 0 0 1 0 1 011: tt0000011 Documentary,Short 0 0 1 0 1 012: tt0000012 Documentary,Short 0 0 1 0 1 013: tt0000013 Documentary,Short 0 0 1 0 1 014: tt0000014 Comedy,Short 0 1 0 0 1 015: tt0000015 Animation,Short 1 0 0 0 1 0