regex - Key confusion in python -
hey friends have seen weird code.am new python programming.the code is
import re, collections mylist = ['probes', 'gene.symbol', 'gene.title', 'go1', 'go2', 'go3', 'adx_kd_06.ip', 'adx_kd_24.ip', 'adx_lg_06.ip', 'adx_lg_24.ip', 'adx_lv_06.ip', 'adx_lv_24.ip', 'adx_sp_06.ip', 'adx_sp_24.ip', 'adx_ln_06.id', 'alm_ln_06.id', 'alm_lv_06.ip', 'alm_sp_06.ip', 'k3spg_lv_06.ip', 'k3spg_sp_06.ip', 'kkk_ln_06.id', 'kkk_lv_06.ip', 'kkk_sp_06.ip', 'endcn_lv_06.in', 'endcn_sp_06.in', 'bcd_lv_06.ip', 'bcd_sp_06.ip', 'adx_lv_06.id', 'adx_sp_06.id', 'alm_lv_06.id', 'alm_sp_06.id', 'd35_ln_06.id', 'k3spg_ln_06.id', 'k3_lv_06.id', 'k3_sp_06.id', 'bcd_ln_06.id', 'd35_lv_06.id', 'd35_sp_06.id', 'k3spg_lv_06.id', 'k3spg_sp_06.id', 'bcd_lv_06.id', 'bcd_sp_06.id', 'endcn_kd_06.in', 'endcn_lg_06.in', 'probes', 'gene.symbol', 'adx_kd_06.ip', 'adx_kd_24.ip', 'adx_lg_06.ip', 'adx_lg_24.ip', 'adx_lv_06.ip', 'adx_lv_24.ip', 'adx_sp_06.ip', 'adx_sp_24.ip', 'adx_ln_06.id', 'alm_ln_06.id', 'alm_lv_06.ip', 'alm_sp_06.ip', 'k3spg_lv_06.ip', 'k3spg_sp_06.ip', 'kkk_ln_06.id', 'kkk_lv_06.ip', 'kkk_sp_06.ip', 'endcn_lv_06.in', 'endcn_sp_06.in', 'bcd_lv_06.ip', 'bcd_sp_06.ip', 'adx_lv_06.id', 'adx_sp_06.id', 'alm_lv_06.id', 'alm_sp_06.id', 'd35_ln_06.id', 'k3spg_ln_06.id', 'k3_lv_06.id', 'k3_sp_06.id', 'bcd_ln_06.id', 'd35_lv_06.id', 'd35_sp_06.id', 'k3spg_lv_06.id', 'k3spg_sp_06.id', 'bcd_lv_06.id', 'bcd_sp_06.id', 'endcn_kd_06.in', 'endcn_lg_06.in'] regex = re.compile(r'([\w\d]+)_(\w\w)_(\d\d)\.(\w\w)') first_part_dict = collections.defaultdict(list) second_part_dict = collections.defaultdict(list)
second instance of 'probes', separate first , second parts
cutoff_index = mylist.index('probes', 1) i, string in enumerate(mylist): matched = regex.match(string) if not matched: continue rg1, rg2, rg3, rg4 = matched.groups() key = rg1 + rg3 if < cutoff_index: first_part_dict[key].append(i) else: second_part_dict[key].append(i)
we can see list above separated 2 parts, delimited 'probes', 'gene.symbol', 'gene.title', 'go1', 'go2', 'go3' , 'probes', 'gene.symbol'.
the regex components of first , second part is:
([\w\d]+)_(\w\w)_(\d\d)\.(\w\w) rg1 rg2 rg3 rg4
which should match string adx_sp_06.ip
or k3spg_ln_06.id
my question ..i didnt understood use of first_part_dict[key].append(i)
in code.i know given index here.am not in regex , think matched portion number.so key act number , first_part_dict dictionary.is value of index stored dictionary first_part_dict ??..
am confused..please me in undersding this..any appreciated ..and sorry long question..
the dictionary being used dictionary text/string key , list value.
what first_part_dict[key].append(i)
doing is appending (or adding) value of i
list corresponding key key
of dictionary first_part_dict
.
if key adx06
, dictionary go {'adx06': []}
{'adx06': [1]}
should value of i
1.
i'll put walkthrough illustrate:
mylist = ['probes', 'gene.symbol', 'gene.title', 'go1', 'go2', 'go3', 'adx_kd_06.ip', 'adx_kd_24.ip', 'adx_lg_06.ip' i, string in enumerate(mylist): matched = regex.match(string) if not matched: continue rg1, rg2, rg3, rg4 = matched.groups() key = rg1 + rg3 if < cutoff_index: first_part_dict[key].append(i) else: second_part_dict[key].append(i)
when pass through loop first time, i = 0
, string = 'probes'
. since probes
doesn't match regex, loop skips next item through continue
.
this time, i = 1
, string = 'gene.symbol
. once again, string doesn't match regex, skip next item. goes on until 7th item: adx_kd_06.ip
. here, have i = 6
, string = 'adx_kd_06.ip'
matches regex.
from that, rg1 = adx
, rg2 = lg
, rg3 = 06
, rg4 = ip
. key becomes adx06
, first_part_dict[key].append(i)
executing.
this create key adx06
in dictionary first_part_dict
, append 6
value list. right now, have dict having {'adx06': [6]}
. loop continues on next item.
this time, have i = 7
, string = 'adx_kd_24.ip'
. matches regex , couple of lines later, have first_part_dict[key].append(i)
executing.
this create key adx24
in dictionary first_part_dict
, append 7
value list. right now, have dict having {'adx06': [6], 'adx24': [7]}
. loop continues on next item.
this time, have i = 8
, string = 'adx_lg_06.ip'
. matches regex , couple of lines later, have first_part_dict[key].append(i)
executing again.
this create key adx06
in dictionary... wait! key exists, instead append 8
existing value list. right now, have dict having {'adx06': [6, 8], 'adx24': [7]}
.
this goes on , on until items in list has been treated.
Comments
Post a Comment