June 25, 2015, at 10:46 AM by-
Changed lines 24-25 from:
There are several directions that we have to take into account:
* An auto-discovery feature would be interesting, but will increase the complexity of OpenSIPS: we do not want that, rather we'd use a database to provision all nodes
* We need to find a mechanism for other nodes to find out which nodes are down and which are up: for that we either use a keep-alive system, with some MI commands
* We need to find a mechanism to indicate which node should handle certain resources. An idea is to use partition like sets.
Changed lines 33-265 from:
18:00 ---| bogdan_vs has changed the topic to: OpenSIPS monthly meeting
18:01 < razvanc>| here's the meeting's page: http://www.opensips.org/Community/IRCmeeting20150624
18:02 < razvanc>| so today we'll be talking about Clustering
18:02 < razvanc>| we'll continue the last meeting's topic :)
18:03 < lirakis>| is the general "thought" at least from the openips projects perspective - to more or less come up with a "cluster" module - that would implement some ... discovery / exchange mechanism?
18:04 <@ bogdan_vs>| first, to be sure everybody is on the same page - let's put in 2 words why we need clustering feature
18:04 <@ bogdan_vs>| in 2.2, we already have 2 features allowing multiple OpenSIPS servers to exchange/share data directly
18:04 <@ bogdan_vs>| and not via DB (SQL or noSQL) as we had so far
18:04 < lirakis>| this is ... binary replicaation and ?
18:05 <@ bogdan_vs>| in 2.2 we can do distributed dialog profiling and call rating based on BIN
18:05 <@ bogdan_vs>| yes binary replication directly between the OpenSIPS instances
18:06 <@ bogdan_vs>| and during last meeting we touched in the similar way the topic of distributed USRLOC
18:06 <@ bogdan_vs>| and distributing via BIN (versus noSQL) may be an option
18:07 < lirakis>| is binary replication TCP ?
18:07 <@ bogdan_vs>| so we already have 3 functionalities which need to know all the OpenSIPS peers involved in the sharing....basically the cluster
18:07 < liviuc>| of course, we are talking about a TCP-based replication mechanism
18:07 < lirakis>| just want to make sure ;)
18:07 <@ bogdan_vs>| lirakis: right now on UDP only, but liviuc promised to add TCP
18:07 < lirakis>| oh - ok
18:08 < lirakis>| yea so in that case binary replication can not really be trusted to be consistent in any but the most local clusters
18:08 < lirakis>| at any rate - yes we have 3 functionalities which need to know all the opensips peers involved
18:09 <@ bogdan_vs>| indeed
18:09 <@ bogdan_vs>| and we are looking at the option to have a "cluster" module
18:09 < Hydrosine>| So what kind of input are you looking for from us?
18:09 <@ bogdan_vs>| to know the peers, abd eventually their state
18:09 < lirakis>| i think you could have a "cluster" module ... as described that overlays these things is a good idea ... in that .. it becomes the discovery layer
18:10 < lirakis>| and then other modules that do some type of distributed state can hook into the cluster module to get the list of proxies
18:10 < lirakis>| so the discovery is decoupled from the state exchange
18:10 <@ bogdan_vs>| Hydrosine: 2 things - 1) if it make sense and 2) if yes, what other things it can be used for
18:10 <@ bogdan_vs>| (aside dialog, ratelimit, usrloc)
18:11 < lirakis>| i think that it COULD be used also as a state exchange piece too ... like it has some generic "state" object that is more or less a hash, which it could propegate to all known nodes
18:11 < Hydrosine>| LoadBalancing, Sharing the load of destinations with other load balancers
18:11 <@ bogdan_vs>| Hydrosine> indeed
18:12 < liviuc>| @lirakis - doesn't that sound like a heartbeat mechanism? or did I misunderstood completely?
18:12 < lirakis>| no not a heart beat at all
18:12 < liviuc>| s/ood/and/
18:12 < lirakis>| so im basically talking about some thing similar to how cassandra does discovery and "gossip"
18:13 < lirakis>| each "node" has an ip of at least 1 other node in the cluster, and when it comes up it queries that node for the "State" of the cluster
18:13 < carrar>| bogdan_vs: Could this clustering work in a anycast type setup, were all the public SIP IP's are are the same, and then a unique IP on each server for the communications between the servers?
18:13 < lirakis>| this state includes information about other nodes in the ring, as well as what state they have
18:14 < lirakis>| the "other state" they have could be all sorts of things ... number of dialogs, pike events .. who knows
18:14 < lirakis>| its an opaque object
18:14 < Hydrosine>| addon on lirakis, Galera cluster also works with this principe. you have to know only one node in the cluster to get the information about all of them.
18:16 <@ bogdan_vs>| carrar: what you say touches a different aspect - how to access the cluster from outside
18:17 <@ bogdan_vs>| right now, we more look into building the cluster itself...in terms of inter node communication
18:17 < carrar>| bogdan_vs: sip connections to customers would use the anycast IP
18:17 < carrar>| which is the same across all servers
18:17 <@ bogdan_vs>| or some DNS balancing
18:17 < carrar>| they get routed to the closest server
18:17 < lirakis>| carrar: we arent talking about load balancing requests
18:17 < carrar>| nore am I
18:18 < lirakis>| we are talking about node discovery, and data exchange between nodes
18:18 < liviuc>| also possibly smooth and transparent addition of extra nodes
18:19 < lirakis>| right - that would happen with the periodic "gossip" between nodes
18:19 < lirakis>| https://wiki.apache.org/cassandra/ArchitectureGossip
18:20 < Hydrosine>| will every node know about every transaction happening within the cluster? because most modules are built upon transaction information or dialog info (ie Load_balancer). So if all opensipses knew about all dialogs, they also know the load?
18:20 < lirakis>| so .. this is some thing that i think does not belong in the "cluster" module
18:20 < lirakis>| realtime data
18:20 < lirakis>| should be replicated some other way
18:20 < lirakis>| IMO
18:20 <@ bogdan_vs>| <Hydrosine> : sharing dialog info may be too much
18:20 <@ bogdan_vs>| the idea is to replicate only relevant data
18:21 <@ bogdan_vs>| like for shared dialog profiles, we share the value of local profile counters
18:21 < lirakis>| node down ... or too many dialogs ... etc. more "state" information
18:21 <@ bogdan_vs>| no need to share the whole dialog info
18:21 < Hydrosine>| ok
18:21 < jarrod>| hmm
18:23 < lirakis>| in that sense - you can do some thing like indicate which "nat ping flag" a proxy is responsible for, and if that proxy goes down, another node could take over ownership of the flag and start pinging
18:23 < lirakis>| i mean there are complexities about race/glare conditions for changing that ownership state
18:24 < lirakis>| but .. the framework would be there
18:25 < Hydrosine>| i don't think a clustering module is needed. we have all the events to work with the 'relevant data', we can puth those events wherever we want, mysql,couchbase,webapi's. Why develop a cluster module?
18:25 < Hydrosine>| but some modules do need to share some more information, but i think not on the opensips lvl
18:25 < lirakis>| Hydrosine: how do other proxies know about each other?
18:25 < lirakis>| how do they exchange the data full mesh?
18:25 < lirakis>| the idea is to exchange information WITHOUT using a DB backend
18:27 < Hydrosine>| for some modules they need to now eachother yes, the nat pinging. or uac_registrant!!
18:28 < lirakis>| yeah i agree not every module needs it - but just trying to figure out if there is a general framework for "cluster" that would be useful to hook into
18:28 < jarrod>| i guess im not understanding the type of information that would be relevant for the cluster module
18:28 < Hydrosine>| ^
18:28 < razvanc>| following the distributed usrloc problem last time
18:28 < jarrod>| now THAT i need/want
18:28 < lirakis>| jarrod: not sure if yourecall the nat pinging issue that was discussed.
18:29 < lirakis>| proxy X is responsible for pinging clients with bflag foo set
18:29 < lirakis>| proxy X goes down
18:29 < lirakis>| how does proxy Y know to take over pinging clients with bflag foo set
18:29 < jarrod>| well
18:29 < jarrod>| thats a good example
18:29 < lirakis>| this could all easily be done in a distribtued key store (cassandra, mongo, redis) etc
18:29 < razvanc>| exactly, I think the idea is that some modules need to know the entire topology to be able to take decisions
18:30 < lirakis>| just trying to figure out if there are other use cases too
18:30 < jarrod>| so instead of building it into the individual modules, create a more general module for them to exchange information
18:30 < lirakis>| and .. if there is some generalized mechanism that would be useful
18:30 < lirakis>| jarrod: yea thats kinda of the idea
18:30 < razvanc>| lirakis: I think there is
18:30 < razvanc>| imagine the dialog replication
18:30 < lirakis>| jarrod: node discovery, and basic state exchange
18:30 < Hydrosine>| uac_registrant is the same. share records among a clusters. but if one goes down the registers that node served have to be taken over.
18:31 < razvanc>| let's say you have a platform with 3 nodes, but you only want to replicate the dialogs to a single more instance
18:31 < jarrod>| yea, i use elasticsearch in this way
18:32 < razvanc>| I mean I see this useful for many scenarios where you want to group two or more instances to do the same thing
18:33 < jarrod>| a clustering module that does discovery with state and provides general hooks for exchanging data between X nodes to individual modules
18:33 < jarrod>| that sounds like a great idea
18:33 < Hydrosine>| gtg, i read up tommorow ;)
18:33 < razvanc>| Hydrosine: sure, thanks for attending
18:34 < razvanc>| I'll publish the logs
18:34 < lirakis>| jarrod: yea - that basically sounds like what im thinking of
18:34 < lirakis>| ALMOST like a cachedb backend
18:34 < lirakis>| with some extra sauce
18:35 < lirakis>| hehe
18:35 < jarrod>| sauce is always good
18:35 < liviuc>| also, the module should be seen as a mere performance optimizer, with the "distributed modules" easily being able to use NoSQL backends as well
18:36 < jarrod>| this may be too specific, but i wonder what happens on network partition
18:36 < jarrod>| i guess they store revisions and are brought up to speed by other nodes?
18:36 < lirakis>| so .. i am suggesting a cassandra gossip like exchange
18:37 < lirakis>| which automatically recovers from a network partition
18:37 < lirakis>| based on gossip digests, timestamps and built in hearbeat sequence
18:38 < lirakis>| got to drop for 5 min. back in a few
18:39 < jarrod>| yea, i guess so many database/key stores support great clustering, replicating, now even write anywhere environments, and this is just going to get better and better
18:40 < jarrod>| im always leery of reinventing something
18:40 < razvanc>| indeed, there are some mechanisms that solve this problem
18:40 < liviuc>| ^
18:40 < razvanc>| I'm going to look into some of them to find the best solution
18:41 < razvanc>| but for start, we were thinking to specify all nodes in a database
18:41 < jarrod>| cassandra, while it does some things pretty well, is just a heavy layer
18:42 < razvanc>| indeed, the next step would be to make the nodes auto-discoverable
18:42 < jarrod>| yea, discovery could be added later... i do like how C* has the concept of datacenters / racks
18:42 < jarrod>| for more geodistributed environments
18:42 < razvanc>| C*?
18:42 < jarrod>| C* == cassandra
18:42 < jarrod>| its just such a bulky project (and java eeek)
18:43 < razvanc>| oh, yeah, I know
18:43 < razvanc>| :)
18:44 < razvanc>| so, getting back, we were thinking we could specify the nodes in DB
18:45 < razvanc>| each instance queires the table
18:45 < razvanc>| and finds out all the other nodes
18:46 < liviuc>| this way, you must do a rolling restart when adding a new node, right?
18:47 < liviuc>| or MI command - never mind :)
18:47 < jarrod>| or mi
18:47 < jarrod>| yea
18:47 < razvanc>| yes, but again, that's the initial solution
18:47 < razvanc>| the next step, the servers could comunicate between them
18:47 < lirakis>| ok back
18:48 < razvanc>| and using a heartbeat mechanism disable the servers that are down
18:48 < razvanc>| because let's be honest, servers do no pop up that often :)
18:48 < razvanc>| when a new server apperas, you can do a rolling mi command
18:48 < razvanc>| :)
18:49 < liviuc>| the initial list _has_ to be file-system persistent - whether if it's a DB (like in MongoDB) or config files (like Percona Cluster)
18:49 < liviuc>| actually Mongo uses files+own collections - my bad
18:49 < lirakis>| so ... the idea is to put all the nodes in a DB?
18:50 < lirakis>| so we are still reliant on some distributed data store
18:50 < jarrod>| well, i think initially it would be each node each in their local db?
18:50 < razvanc>| lirakis: yes, for the begining
18:51 < jarrod>| oh
18:51 < jarrod>| hmm
18:51 < lirakis>| yeah i mean the whole idea of a cluster module (for me at least) would be to not have to rely on, or set up a heavy weight distributed data store
18:51 < lirakis>| ESPECIALLY for some thing that is "simple" like node discovery
18:52 < liviuc>| @lirakis: I like your p2p-oriented thinking :)
18:52 < razvanc>| lirakis: I agree
18:52 < lirakis>| i think for node discovery the gossip style thing on startup is really light weight, and not complicated
18:52 < liviuc>| so, lirakis is suggesting simply having to specify _1_ neighbour node on each OpenSIPS cluster node
18:53 < liviuc>| correct me if I'm wrong
18:53 < lirakis>| that is correct
18:53 < razvanc>| node discovery is just one issue
18:53 < lirakis>| right i understand
18:53 < lirakis>| but im saying .. that the node discovery part is not reinventing the wheel
18:53 < lirakis>| unlike distributing some shared state
18:53 < lirakis>| that IS reinventing the wheel
18:54 < liviuc>| heartbeats and split-brain problems are ... :(
18:54 < razvanc>| and I think we have it covered by 2 means: either DB provisioning (not very nice) and auto-learning (nice but more complex)
18:54 < razvanc>| I am now trying to address other issues
18:54 < lirakis>| ok
18:55 < liviuc>| maybe we should assess the performance gain of localizing the distributed data storage
18:55 < lirakis>| well ... i think if we are going to internalize/localize the distribtued data element within opensips, it HAS to be light weight
18:55 < lirakis>| if its not, then there is no point - just go setup NOSQLDB
18:56 < lirakis>| and i know its a complicated problem
18:56 < liviuc>| lightweight/heavyweight has nothing to do with that
18:56 < lirakis>| so ... is there really a way to do it clean
18:56 < liviuc>| it's just the matter of dealing with all the associated issues - exactly
18:57 < lirakis>| to quote bogdan_vs, i dont want to make opensips a "Frankenstein" of a proxy + some weird distributed data store project
18:57 < liviuc>| ^
18:57 < jarrod>| ^
18:58 < lirakis>| so ... im not certain if we can get some type of modular distribtued data store that .... isnt going to make it such a thing
18:58 < lirakis>| jarrod: back to magic database ;)
18:58 < jarrod>| thats my thinking... the databases that exist kinda accomplish this and more and more are going to come out
18:58 < jarrod>| and features are going to be added... the world is going distributed
18:59 < lirakis>| so ... are we just going to say ... its not worth it, and wait for a light weight distributed key store to be developed?
18:59 < jarrod>| i can see the need though, for communicating interal datastructures and information between proxies
18:59 < jarrod>| but similar to the binary replication?
18:59 < liviuc>| @lirakis: all it needs is to support async ops, and we're golden
19:00 < lirakis>| i can too ... im just not sure if the whole CAP theorem thing is not a simple problem to solve
19:00 < lirakis>| arg... magic database you torment me!
19:01 *| lirakis had a dream about writing a light weight distributed key value store thus deemed "magic database"
19:01 < razvanc>| :)
19:01 *| jarrod volunteered to help
19:01 < razvanc>| ok guys\
19:01 < razvanc>| let's wrap up
19:01 *| jarrod waves his wand
19:01 *| liviuc hops on his unicorn
19:01 < lirakis>| lol
19:02 < lirakis>| so ... conclusions - razvanc ?
19:03 < razvanc>| so I think we still need this module for a couple of reasons
19:03 < jarrod>| was the thought about distributed usrloc postponed hoping to be accomplished by a cluster module?
19:03 < lirakis>| i think they are complimentary
19:03 < razvanc>| jarrod: no, not at all
19:04 < razvanc>| lirakis: not really
19:04 < razvanc>| I mean if everybody pings everybody, then yes, they are complementary
19:04 < razvanc>| but if you want to assing specific clients to specific registrars
19:05 < razvanc>| then you need a mechnism to know the topology
19:05 < lirakis>| if you use an edge ... and the registrars do the pinging then you are good
19:05 < razvanc>| and basically that's what I was trying to find out today
19:05 < razvanc>| even if it uses an edge
19:05 < razvanc>| what happens if one registrar goes down?
19:06 < lirakis>| right - so each registrar uses a different natbflag ... and says in its state what edge its behind, and what blfag its using
19:06 < lirakis>| so if it goes down - and is detected via this gossip mechanism
19:06 < razvanc>| yup, that's the idea
19:06 < razvanc>| I mean even if it is not a gossip mechanism
19:06 < lirakis>| a registrar behind the same edge will take over that natblfag as well as its own
19:06 < liviuc>| @razvanc: can't they make use of the atomicity and availability of the NoSQL cluster to publish / retrieve state information?
19:06 < razvanc>| there's no way to control that now
19:06 < lirakis>| sure this is all "Theoretical"
19:07 < jarrod>| lunch, brb
19:07 < razvanc>| liviuc yes, it could
19:07 < razvanc>| but that's not there now
19:07 < razvanc>| and even if you have to take that info from SQL, NoSQL or gossiping, this mechanism has to be implemented
19:08 *| liviuc writes down a new task for Vlad
19:08 < lirakis>| heh
19:08 < razvanc>| :)
19:10 < razvanc>| ok guys, thank you for attending for this meeting
19:10 < razvanc>| I will write down the conclusions for today
19:10 < razvanc>| and keep you updated
19:11 < lirakis>| great
19:11 < lirakis>| thanks again for having the meeting!
19:11 < liviuc>| thank you for attending and sharing valuable ideas :)
June 23, 2015, at 04:37 PM by-
Changed lines 13-14 from:
When using a distributed platform, with multiple nodes, we often need a method to group several OpenSIPS instances that collaborate for the same purpose in different clusters. An example is the scenario we discussed during the [[IRCmeeting20150527|last public meeting]], where multiple nodes were designated to handle NAT pinging for a set of subscribers, while others for a different set.
When designing such a mechanism, there are a few problems that need to be addressed:
* how does a node discover the other nodes? Is there a static database that everybody queries to find out details about the topology, or it is done completely automatic?
* what happens if a node goes down? Who should handle the remaining work?
* how do the internal modules use the clustering information to provide a uniform platform?
Changed line 30 from:
June 23, 2015, at 04:13 PM by-
Added lines 1-26:
!!!!! Community -> [[Community/PublicMeetings | PublicMeetings]] -> 27th of May 2015
* Wednesday, 24th of June 2015, at 15:00 GMT (link [[http://www.timeanddate.com/worldclock/fixedtime.html?msg=OpenSIPS+Public+Meeting&iso=20150624T18&p1=49&ah=1|here]])
* IRC, #opensips channel on [[http://freenode.net|Freenode]]
!!!! IRC Logs
June 23, 2015, at 03:42 PM by-