python - Read entire group in an HDF5 file using a pandas.HDFStore -
i have hdf file that:
>>> dataset.store ... <class 'pandas.io.pytables.hdfstore'> ... file path: ../data/data_experiments_01-02-03.h5 ... /exp01/user01 frame_table (typ->appendable,nrows->221,ncols->124,indexers->[index]) ... /exp01/user02 frame_table (typ->appendable,nrows->163,ncols->124,indexers->[index]) ... /exp01/user03 frame_table (typ->appendable,nrows->145,ncols->124,indexers->[index]) ... /exp02/user01 frame_table (typ->appendable,nrows->194,ncols->124,indexers->[index]) ... /exp02/user02 frame_table (typ->appendable,nrows->145,ncols->124,indexers->[index]) ... /exp03/user03 frame_table (typ->appendable,nrows->348,ncols->124,indexers->[index]) ... /exp03/user01 frame_table (typ->appendable,nrows->240,ncols->124,indexers->[index])
from want retrieve users (userxy) 1 of experiments (exp0z) , append them single big dataframe. have tried store.get('exp03')
obtaining following error:
>>> store.get('exp03') ... ... --------------------------------------------------------------------------- ... typeerror traceback (most recent call last) ... <ipython-input-109-0a2e29e9e0a4> in <module>() ... ----> 1 dataset.store.get('/exp03') ... ... /library/python/2.7/site-packages/pandas/io/pytables.pyc in get(self, key) ... 613 if group none: ... 614 raise keyerror('no object named %s in file' % key) ... --> 615 return self._read_group(group) ... 616 ... 617 def select(self, key, where=none, start=none, stop=none, columns=none, ... ... /library/python/2.7/site-packages/pandas/io/pytables.pyc in _read_group(self, group, **kwargs) ... 1277 ... 1278 def _read_group(self, group, **kwargs): ... -> 1279 s = self._create_storer(group) ... 1280 s.infer_axes() ... 1281 return s.read(**kwargs) ... ... /library/python/2.7/site-packages/pandas/io/pytables.pyc in _create_storer(self, group, format, value, append, **kwargs) ... 1160 else: ... 1161 raise typeerror( ... -> 1162 "cannot create storer if object not existing " ... 1163 "nor value passed") ... 1164 else: ... ... typeerror: cannot create storer if object not existing nor value passed
i can retrieve single user calling store.get('exp03/user01')
, guess possible iterate store.keys()
, append manually retrieved dataframes, wonder if possible in single call store.get()
or other similar method.
edit: note dataset class contains pandas.hdfstore
this not implemented, though nice feature. (and fyi not have set default in .get(...)
because not explicit enough (e.g. should read tables, guessing), have argument control sub-tables suppose. if interested in implemented this, pls put github.
you can use internal functions make pretty easy though (and pass where
each of selects.
in [13]: store = pd.hdfstore('test.h5',mode='w') in [14]: store.append('df/foo1',dataframe(np.random.randn(10,2))) in [15]: store.append('df/foo2',dataframe(np.random.randn(10,2))) in [16]: pd.concat([ store.select(node._v_pathname) node in store.get_node('df') ]) out[16]: 0 1 0 -0.495847 -1.449251 1 -0.494721 1.572560 2 1.219985 0.280878 3 -0.419651 1.975562 4 -0.489689 -2.712342 5 -0.022466 -0.238129 6 -1.195269 -0.028390 7 -0.192648 1.220730 8 1.331892 0.950508 9 -0.790354 -0.743006 0 -0.761820 0.847983 1 -0.126829 1.304889 2 0.667949 -1.481652 3 0.030162 -0.111911 4 -0.433762 -0.596412 5 -1.110968 0.411241 6 -0.428930 0.086527 7 -0.866701 -1.286884 8 -0.649420 0.227999 9 -0.100669 -0.205232 [20 rows x 2 columns] in [17]: store.close()
keep in mind though if doing this, little reason have separate nodes when data same; more efficient have in single table field indicates name or id or whatever.
almost use different nodes heteregenous data (not necessary different dtypes, different 'types' of data).
that said, can organize like!
Comments
Post a Comment