database - Efficient persistence strategy for many-to-many relationship -


tl;dr: should use sql join table or redis sets store large amounts of many-to-many relationships

i have in-memory object graph structure have "many-to-many" index represented bidirectional mapping between ordered sets:

group_by_user |  user_by_group --------------+--------------- louis: [1,2]  |   1: [louis] john: [2,3]   |   2: [john, louis]               |   3: [john] 

the basic operations need able perform atomic "insert at" , "delete" operations on individual sets. need able efficient key lookup (e.g. lookup groups user member of, or lookup users members of 1 group). looking @ 70/30 read/write use case.

my question is: best bet persisting kind of data structure? should looking @ building own optimized on-disk storage system? otherwise, there particular database excel @ storing kind of structure?

before read further: stop being afraid of joins. classic case using genuine relational database such postgres.

there few reasons this:

  • this real rdbms optimized for
  • the database can take care of integrity constraints matter of course
  • this real rdbms optimized for
  • you have push "join" logic own code
  • this real rdbms optimized for
  • you have deal integrity concerns in own code
  • this real rdbms optimized for
  • you wind reinventing database features in own code
  • this real rdbms optimized for

yes, being little silly, because i'm trying drive home point.

i beating on drum hard because classic case has readily available, extremely optimized , profoundly stable tool custom designed it.

when wind reinventing database features mean start having make basic data management decisions in own code. example, have choose when write data disk, when pull it, how keep track of highest-frequency use data , cache in memory (and how manage cache), etc. making performance assumptions code can give whole codebase cancer on without noticing -- , if assumptions prove false later changing them can require major rewrite.

if store data on either end of many-to-many relationship in 1 store , many-to-many map in store have to:

  1. locate initial data on 1 side of mapping
  2. extract key(s)
  3. query key(s) in many-to-many handler
  4. receive response set(s)
  5. query whatever relevant other storage based on result
  6. build answer use within system

if structure data within rdbms begin code more like:

  1. run pre-built query indexed on whatever search criteria is
  2. build answer response

joins a lot less scary doing -- in concurrent system other things may changing in course of ad hoc locate-extract-query-receive-query-build procedure (which can managed, of course, why manage when rdbms designed manage it?).

join isn't slow operation in decent databases. have business applications join 20 tables on large tables (several millions of rows) , zips right through them. highly optimized sort of thing why use it. oracle @ (but can't afford it), db2 awesome (can't afford that, either), , sql server has come long way (can't afford version of 1 either!). mysql, on other hand, designed key-value store use-case in mind , matured in "performance above else" world of web applications -- , has problems integrity constraints , joins (but has handled replication very long time). not rdbmses created equal, without knowing else problem kind of datastore serve best.

even non-trivial data can make code explode in complexity -- hence popularity of database systems. aren't (supposed be) religions, tools let separate generic data-handling task own program's logic don't have reinvent wheel every project (but tend anyway).

but

q: when not want this?

a: when building graph , not set of many-to-many relations.

there other type of database designed handle case. need keep in mind, though, actual requirements are. data ephemeral? have correct? care if lose it? need replicated? etc. of time requirements relatively trivial , answer "no" these sort of higher-flying questions -- if have special operational needs may need take them account when making architectural decision.

if storing things documents (instead of structured records) on 1 hand, , need track graph of relationships among them on other combination of back-ends may idea. document database + graphing database glued custom code right thing.

think kind of situation actually facing instead of assuming have case x because familiar with.


Comments

Popular posts from this blog

PHPMotion implementation - URL based videos (Hosted on separate location) -

javascript - Using Windows Media Player as video fallback for video tag -

c# - Unity IoC Lifetime per HttpRequest for UserStore -