r - Computing difference between rows in a data frame -
i have data frame. compute how "far" each row given row. let consider 1st row. let data frame follows:
> sampledf x1 x2 x3 1 5 5 4 2 2 2 9 1 7 7 3
what wish following:
- compute difference between 1st row & others:
sampledf[1,]-sampledf[2,]
- consider absolute value:
abs(sampledf[1,]-sampledf[2,])
- compute sum of newly formed data frame of differences:
rowsums(newdf)
now whole data frame.
newdf <- sapply(2:4,function(x) { return (abs(sampledf[1,]-sampledf[x,]));})
this creates problem in result transposed list. hence,
newdf <- as.data.frame(t(sapply(2:4,function(x) { return (abs(sampledf[1,]-sampledf[x,]));})))
but problem arises while computing rowsums:
> class(newdf) [1] "data.frame" > rowsums(newdf) error in base::rowsums(x, na.rm = na.rm, dims = dims, ...) : 'x' must numeric > newdf x1 x2 x3 1 3 3 3 2 1 4 4 3 6 2 2 >
puzzle 1: why error? did notice newdf[1,1] list & not number. because of that? how can ensure result of sapply & transpose simple data frame of numbers?
so proceed create global data frame & modify within function:
sapply(2:4,function(x) { newdf <<- as.data.frame(rbind(newdf,abs(sampledf[1,]-sampledf[x,])));})
> newdf x1 x2 x3 2 3 3 3 3 1 4 4 4 6 2 2 > rowsums(outdf) 2 3 4 9 9 10 >
this expected.
puzzle 2: there cleaner way achieve this? how can every row in data frame (shown above "distance" row 1. need other rows well)? running loop option?
to put in words, trying compute manhattan distance:
dist(sampledf, method = "manhattan") # 1 2 3 # 2 9 # 3 9 10 # 4 10 9 9
regarding implementation, think problem inner function returning data.frame
when should return numeric vector. doing return(unlist(abs(sampledf[1,]-sampledf[x,])))
should fix it.
Comments
Post a Comment