python - How should I go about subsampling from a scipy.sparse.csr.csr_matrix and a list -


i have scipy.sparse.csr.csr_matrix represents words in document , list of lists each index represents categories each index in matrix.

the problem having need randomly select n amount of rows data.

so if matrix looks this

[1:3 2:3 4:4] [1:5 2:5 5:4] 

and list of lists looked this

((20,40) (80,50))   

and needed sample 1 value end this

[1:3 2:3 4:4] ((20,40)) 

i have searched scipy documentation cannot find way generate new csr matrix using list of indexes.

you can index csr matrix using list of indices. first create matrix, , @ it:

>>> m = csr_matrix([[0,0,1,0], [4,3,0,0], [3,0,0,8]]) <3x4 sparse matrix of type '<type 'numpy.int64'>'     5 stored elements in compressed sparse row format>  >>>  print m.toarray() [[0 0 1 0]  [4 3 0 0]  [3 0 0 8]] 

of course, can first row:

>>> m[0] <1x4 sparse matrix of type '<type 'numpy.int64'>'     1 stored elements in compressed sparse row format>  >>> print m[0].toarray() [[0 0 1 0]] 

but can @ first , third row @ once using list [0,2] index:

>>> m[[0,2]] <2x4 sparse matrix of type '<type 'numpy.int64'>'     3 stored elements in compressed sparse row format>  >>> print m[[0,2]].toarray() [[0 0 1 0]  [3 0 0 8]] 

now can generate n random indices no repeats (no replacement) using numpy's choice:

i = np.random.choice(np.arange(m.shape[0]), n, replace=false) 

then can grab indices both original matrix m:

sub_m = m[i] 

to grab them categories list of lists, must first make array, can index list i:

sub_c = np.asarray(categories)[i] 

if want have list of lists back, use:

sub_c.tolist() 

or, if have/want tuple of tuples, think have manually:

tuple(map(tuple, sub_c)) 

Comments

Popular posts from this blog

PHPMotion implementation - URL based videos (Hosted on separate location) -

javascript - Using Windows Media Player as video fallback for video tag -

c# - Unity IoC Lifetime per HttpRequest for UserStore -