string - Unexpected output in for loop - Python -
i have list:
t=[['universitario de deportes'],['lancaster'],['universitario de'],['juan aurich'],['muni'],['juan']]
i want reorder list according jaccard distance. if reorder t
expected ouput should be:
[['universitario de deportes'],['universitario de'],['lancaster'],['juan aurich'],['juan'],['muni']]
the code of jackard distance working ok, rest of code doesn't give expected output.the code below:
def jack(a,b): x=a.split() y=b.split() k=float(len(set(x)&set(y)))/float(len((set(x) | set(y)))) return k t=[['universitario de deportes'],['lancaster'],['universitario de'],['juan aurich'],['muni'],['juan']] import copy cp b=cp.deepcopy(t) c=[] while (len(b)>0): c.append(b[0][0]) d=b[0][0] del b[0] m in range (0 , len(b)+1): if m > len(b): break if jack(d,b[m][0])>0.3: c.append(b[m][0]) del b[m]
unfortunately, unexpected output same list :
print c ['universitario de deportes', 'lancaster', 'universitario de', 'juan aurich', 'muni', 'juan']
edit:
i tried correct code didn't work got little closer expected output:
t=[['universitario de deportes'],['lancaster'],['universitario de'],['juan aurich'],['muni'],['juan']] import copy cp b=cp.deepcopy(t) c=[] while (len(b)>0): c.append(b[0][0]) d=b[0][0] del b[0] m in range(0,len(b)-1): if jack(d,b[m][0])>0.3: c.append(b[m][0]) del b[m]
the "close" output is:
['universitario de deportes', 'universitario de', 'lancaster', 'juan aurich', 'muni', 'juan']
second edit:
finally, came solution has quite fast computational. currently, i'll use code order 60 thousands names. code below:
t=['universitario de deportes','lancaster','lancaste','juan aurich','lancaster','juan','universitario','juan franco'] import copy cp b=cp.deepcopy(t) c=[] while (len(b)>0): c.append(b[0]) e=b[0] del b[0] val in b: if jack(e,val)>0.3: c.append(val) b.remove(val) print c ['universitario de deportes', 'universitario', 'lancaster', 'lancaster', 'lancaste', 'juan aurich', 'juan', 'juan franco'
firstly, not sure why you've got in single-item lists, suggest flattening out first:
t = [l[0] l in t]
this gets rid of 0 indices everywhere, , means need shallow copies (as strings immutable).
secondly, last 3 lines of code never run:
if m > len(b): break # nothing after happen if jack(d,b[m][0])>0.3: c.append(b[m][0]) del b[m]
i think want is:
out = [] # sorted list index, val1 in enumerate(t): # work through each item in original list if val1 not in out: # if haven't put item in new list out.append(val1) # put item in new list val2 in t[index+1:]: # search rest of list if val2 not in out: # if haven't put item in new list jack(val1, val2) > 0.3: # , new item close current item out.append(val2) # add new item
this gives me
out == ['universitario de deportes', 'universitario de', 'lancaster', 'juan aurich', 'juan', 'muni']
i recommend using better variable names a
, b
, c
, etc..
Comments
Post a Comment