regex - Key confusion in python -
hey friends have seen weird code.am new python programming.the code is
import re, collections mylist = ['probes', 'gene.symbol', 'gene.title', 'go1', 'go2', 'go3', 'adx_kd_06.ip', 'adx_kd_24.ip', 'adx_lg_06.ip', 'adx_lg_24.ip', 'adx_lv_06.ip', 'adx_lv_24.ip', 'adx_sp_06.ip', 'adx_sp_24.ip', 'adx_ln_06.id', 'alm_ln_06.id', 'alm_lv_06.ip', 'alm_sp_06.ip', 'k3spg_lv_06.ip', 'k3spg_sp_06.ip', 'kkk_ln_06.id', 'kkk_lv_06.ip', 'kkk_sp_06.ip', 'endcn_lv_06.in', 'endcn_sp_06.in', 'bcd_lv_06.ip', 'bcd_sp_06.ip', 'adx_lv_06.id', 'adx_sp_06.id', 'alm_lv_06.id', 'alm_sp_06.id', 'd35_ln_06.id', 'k3spg_ln_06.id', 'k3_lv_06.id', 'k3_sp_06.id', 'bcd_ln_06.id', 'd35_lv_06.id', 'd35_sp_06.id', 'k3spg_lv_06.id', 'k3spg_sp_06.id', 'bcd_lv_06.id', 'bcd_sp_06.id', 'endcn_kd_06.in', 'endcn_lg_06.in', 'probes', 'gene.symbol', 'adx_kd_06.ip', 'adx_kd_24.ip', 'adx_lg_06.ip', 'adx_lg_24.ip', 'adx_lv_06.ip', 'adx_lv_24.ip', 'adx_sp_06.ip', 'adx_sp_24.ip', 'adx_ln_06.id', 'alm_ln_06.id', 'alm_lv_06.ip', 'alm_sp_06.ip', 'k3spg_lv_06.ip', 'k3spg_sp_06.ip', 'kkk_ln_06.id', 'kkk_lv_06.ip', 'kkk_sp_06.ip', 'endcn_lv_06.in', 'endcn_sp_06.in', 'bcd_lv_06.ip', 'bcd_sp_06.ip', 'adx_lv_06.id', 'adx_sp_06.id', 'alm_lv_06.id', 'alm_sp_06.id', 'd35_ln_06.id', 'k3spg_ln_06.id', 'k3_lv_06.id', 'k3_sp_06.id', 'bcd_ln_06.id', 'd35_lv_06.id', 'd35_sp_06.id', 'k3spg_lv_06.id', 'k3spg_sp_06.id', 'bcd_lv_06.id', 'bcd_sp_06.id', 'endcn_kd_06.in', 'endcn_lg_06.in'] regex = re.compile(r'([\w\d]+)_(\w\w)_(\d\d)\.(\w\w)') first_part_dict = collections.defaultdict(list) second_part_dict = collections.defaultdict(list) second instance of 'probes', separate first , second parts
cutoff_index = mylist.index('probes', 1) i, string in enumerate(mylist): matched = regex.match(string) if not matched: continue rg1, rg2, rg3, rg4 = matched.groups() key = rg1 + rg3 if < cutoff_index: first_part_dict[key].append(i) else: second_part_dict[key].append(i) we can see list above separated 2 parts, delimited 'probes', 'gene.symbol', 'gene.title', 'go1', 'go2', 'go3' , 'probes', 'gene.symbol'.
the regex components of first , second part is:
([\w\d]+)_(\w\w)_(\d\d)\.(\w\w) rg1 rg2 rg3 rg4 which should match string adx_sp_06.ip or k3spg_ln_06.id
my question ..i didnt understood use of first_part_dict[key].append(i) in code.i know given index here.am not in regex , think matched portion number.so key act number , first_part_dict dictionary.is value of index stored dictionary first_part_dict ??..
am confused..please me in undersding this..any appreciated ..and sorry long question..
the dictionary being used dictionary text/string key , list value.
what first_part_dict[key].append(i) doing is appending (or adding) value of i list corresponding key key of dictionary first_part_dict.
if key adx06, dictionary go {'adx06': []} {'adx06': [1]} should value of i 1.
i'll put walkthrough illustrate:
mylist = ['probes', 'gene.symbol', 'gene.title', 'go1', 'go2', 'go3', 'adx_kd_06.ip', 'adx_kd_24.ip', 'adx_lg_06.ip' i, string in enumerate(mylist): matched = regex.match(string) if not matched: continue rg1, rg2, rg3, rg4 = matched.groups() key = rg1 + rg3 if < cutoff_index: first_part_dict[key].append(i) else: second_part_dict[key].append(i) when pass through loop first time, i = 0 , string = 'probes'. since probes doesn't match regex, loop skips next item through continue.
this time, i = 1 , string = 'gene.symbol. once again, string doesn't match regex, skip next item. goes on until 7th item: adx_kd_06.ip. here, have i = 6 , string = 'adx_kd_06.ip' matches regex.
from that, rg1 = adx, rg2 = lg, rg3 = 06 , rg4 = ip. key becomes adx06 , first_part_dict[key].append(i) executing.
this create key adx06 in dictionary first_part_dict , append 6 value list. right now, have dict having {'adx06': [6]}. loop continues on next item.
this time, have i = 7 , string = 'adx_kd_24.ip'. matches regex , couple of lines later, have first_part_dict[key].append(i) executing.
this create key adx24 in dictionary first_part_dict , append 7 value list. right now, have dict having {'adx06': [6], 'adx24': [7]}. loop continues on next item.
this time, have i = 8 , string = 'adx_lg_06.ip'. matches regex , couple of lines later, have first_part_dict[key].append(i) executing again.
this create key adx06 in dictionary... wait! key exists, instead append 8 existing value list. right now, have dict having {'adx06': [6, 8], 'adx24': [7]}.
this goes on , on until items in list has been treated.
Comments
Post a Comment