From openSIPS

Community: IRCmeeting20150624

Community -> PublicMeetings -> 27th of May 2015



OpenSIPS Clustering

When using a distributed platform, with multiple nodes, we often need a method to group several OpenSIPS instances that collaborate for the same purpose in different clusters. An example is the scenario we discussed during the last public meeting, where multiple nodes were designated to handle NAT pinging for a set of subscribers, while others for a different set.

When designing such a mechanism, there are a few problems that need to be addressed:


There are several directions that we have to take into account:

IRC Logs

18:00            ---| bogdan_vs has changed the topic to: OpenSIPS monthly meeting
18:01 <     razvanc>| here's the meeting's page:
18:02 <     razvanc>| so today we'll be talking about Clustering
18:02 <     razvanc>| we'll continue the last meeting's topic :)
18:03 <     lirakis>| is the general "thought" at least from the openips projects perspective - to more or less come up with a "cluster" module - that would implement some ... discovery / exchange mechanism?
18:04 <@  bogdan_vs>| first, to be sure everybody is on the same page - let's put in 2 words why we need clustering feature
18:04 <@  bogdan_vs>| in 2.2, we already have 2 features allowing multiple OpenSIPS servers to exchange/share data directly
18:04 <@  bogdan_vs>| and not via DB (SQL or noSQL) as we had so far
18:04 <     lirakis>| this is ... binary replicaation and ?
18:05 <@  bogdan_vs>| in 2.2 we can do distributed dialog profiling and call rating based on BIN
18:05 <@  bogdan_vs>| yes binary replication directly between the OpenSIPS instances
18:06 <@  bogdan_vs>| and during last meeting we touched in the similar way the topic of distributed USRLOC
18:06 <@  bogdan_vs>| and distributing via BIN (versus noSQL) may be an option
18:07 <     lirakis>| is binary replication TCP ?
18:07 <@  bogdan_vs>| so we already have 3 functionalities which need to know all the OpenSIPS peers involved in the sharing....basically the cluster
18:07 <      liviuc>| of course, we are talking about a TCP-based replication mechanism
18:07 <     lirakis>| just want to make sure ;)
18:07 <@  bogdan_vs>| lirakis: right now on UDP only, but liviuc promised to add TCP
18:07 <     lirakis>| oh - ok
18:08 <     lirakis>| yea so in that case binary replication can not really be trusted to be consistent in any but the most local clusters
18:08 <     lirakis>| at any rate - yes we have 3 functionalities which need to know all the opensips peers involved
18:09 <@  bogdan_vs>| indeed
18:09 <@  bogdan_vs>| and we are looking at the option to have a "cluster" module
18:09 <   Hydrosine>| So what kind of input are you looking for from us?
18:09 <@  bogdan_vs>| to know the peers, abd eventually their state
18:09 <     lirakis>| i think you could have a "cluster" module ... as described that overlays these things is a good idea ... in that .. it becomes the discovery layer
18:10 <     lirakis>| and then other modules that do some type of distributed state can hook into the cluster module to get the list of proxies
18:10 <     lirakis>| so the discovery is decoupled from the state exchange
18:10 <@  bogdan_vs>| Hydrosine: 2 things - 1) if it make sense and 2) if yes, what other things it can be used for
18:10 <@  bogdan_vs>| (aside dialog, ratelimit, usrloc)
18:11 <     lirakis>| i think that it COULD be used also as a state exchange piece too ... like it has some generic "state" object that is more or less a hash, which it could propegate to all known nodes
18:11 <   Hydrosine>| LoadBalancing, Sharing the load of destinations with other load balancers
18:11 <@  bogdan_vs>| Hydrosine> indeed
18:12 <      liviuc>| @lirakis - doesn't that sound like a heartbeat mechanism? or did I misunderstood completely?
18:12 <     lirakis>| no not a heart beat at all
18:12 <      liviuc>| s/ood/and/
18:12 <     lirakis>| so im basically talking about some thing similar to how cassandra does discovery and "gossip"
18:13 <     lirakis>| each "node" has an ip of at least 1 other node in the cluster, and when it comes up it queries that node for the "State" of the cluster
18:13 <      carrar>| bogdan_vs: Could this clustering work in a anycast type setup, were all the public SIP IP's are are the same, and then a unique IP on each server for the communications between the servers?
18:13 <     lirakis>| this state includes information about other nodes in the ring, as well as what state they have
18:14 <     lirakis>| the "other state" they have could be all sorts of things ... number of dialogs, pike events .. who knows
18:14 <     lirakis>| its an opaque object
18:14 <   Hydrosine>| addon on lirakis, Galera cluster also works with this principe. you have to know only one node in the cluster to get the information about all of them.
18:16 <@  bogdan_vs>| carrar: what you say touches a different aspect - how to access the cluster from outside
18:17 <@  bogdan_vs>| right now, we more look into building the cluster terms of inter node communication
18:17 <      carrar>| bogdan_vs: sip connections to customers would use the anycast IP
18:17 <      carrar>| which is the same across all servers
18:17 <@  bogdan_vs>| or some DNS balancing
18:17 <      carrar>| they get routed to the closest server
18:17 <     lirakis>| carrar: we arent talking about load balancing requests
18:17 <      carrar>| nore am I
18:18 <     lirakis>| we are talking about node discovery, and data exchange between nodes
18:18 <      liviuc>| also possibly smooth and transparent addition of extra nodes
18:19 <     lirakis>| right - that would happen with the periodic "gossip" between nodes
18:19 <     lirakis>|
18:20 <   Hydrosine>| will every node know about every transaction happening within the cluster? because most modules are built upon transaction information or dialog info (ie Load_balancer). So if all opensipses knew about all dialogs, they also know the load?
18:20 <     lirakis>| so .. this is some thing that i think does not belong in the "cluster" module
18:20 <     lirakis>| realtime data
18:20 <     lirakis>| should be replicated some other way
18:20 <     lirakis>| IMO
18:20 <@  bogdan_vs>| <Hydrosine> : sharing dialog info may be too much
18:20 <@  bogdan_vs>| the idea is to replicate only relevant data
18:21 <@  bogdan_vs>| like for shared dialog profiles, we share the value of local profile counters
18:21 <     lirakis>| node down ... or too many dialogs ... etc. more "state" information
18:21 <@  bogdan_vs>| no need to share the whole dialog info
18:21 <   Hydrosine>| ok
18:21 <      jarrod>| hmm
18:23 <     lirakis>| in that sense - you can do some thing like indicate which "nat ping flag" a proxy is responsible for, and if that proxy goes down, another node could take over ownership of the flag and start pinging
18:23 <     lirakis>| i mean there are complexities about race/glare conditions for changing that ownership state
18:24 <     lirakis>| but .. the framework would be there
18:25 <   Hydrosine>| i don't think a clustering module is needed. we have all the events to work with the 'relevant data', we can puth those events wherever we want, mysql,couchbase,webapi's. Why develop a cluster module?
18:25 <   Hydrosine>| but some modules do need to share some more information, but i think not on the opensips lvl
18:25 <     lirakis>| Hydrosine: how do other proxies know about each other?
18:25 <     lirakis>| how do they exchange the data full mesh?
18:25 <     lirakis>| the idea is to exchange information WITHOUT using a DB backend
18:27 <   Hydrosine>| for some modules they need to now eachother yes, the nat pinging. or uac_registrant!!
18:28 <     lirakis>| yeah i agree not every module needs it - but just trying to figure out if there is a general framework for "cluster" that would be useful to hook into
18:28 <      jarrod>| i guess im not understanding the type of information that would be relevant for the cluster module
18:28 <   Hydrosine>| ^
18:28 <     razvanc>| following the distributed usrloc problem last time
18:28 <      jarrod>| now THAT i need/want
18:28 <     lirakis>| jarrod: not sure if yourecall the nat pinging issue that was discussed.
18:29 <     lirakis>| proxy X is responsible for pinging clients with bflag foo set
18:29 <     lirakis>| proxy X goes down
18:29 <     lirakis>| how does proxy Y know to take over pinging clients with bflag foo set
18:29 <      jarrod>| well
18:29 <      jarrod>| thats a good example
18:29 <     lirakis>| this could all easily be done in a distribtued key store (cassandra, mongo, redis) etc
18:29 <     razvanc>| exactly, I think the idea is that some modules need to know the entire topology to be able to take decisions
18:30 <     lirakis>| just trying to figure out if there are other use cases too
18:30 <      jarrod>| so instead of building it into the individual modules, create a more general module for them to exchange information
18:30 <     lirakis>| and .. if there is some generalized mechanism that would be useful
18:30 <     lirakis>| jarrod: yea thats kinda of the idea
18:30 <     razvanc>| lirakis: I think there is
18:30 <     razvanc>| imagine the dialog replication
18:30 <     lirakis>| jarrod: node discovery, and basic state exchange
18:30 <   Hydrosine>| uac_registrant is the same. share records among a clusters. but if one goes down the registers that node served have to be taken over.
18:31 <     razvanc>| let's say you have a platform with 3 nodes, but you only want to replicate the dialogs to a single more instance
18:31 <      jarrod>| yea, i use elasticsearch in this way
18:32 <     razvanc>| I mean I see this useful for many scenarios where you want to group two or more instances to do the same thing
18:33 <      jarrod>| a clustering module that does discovery with state and provides general hooks for exchanging data between X nodes to individual modules
18:33 <      jarrod>| that sounds like a great idea
18:33 <   Hydrosine>| gtg, i read up tommorow ;)
18:33 <     razvanc>| Hydrosine: sure, thanks for attending
18:34 <     razvanc>| I'll publish the logs
18:34 <     lirakis>| jarrod: yea - that basically sounds like what im thinking of
18:34 <     lirakis>| ALMOST like a cachedb backend
18:34 <     lirakis>| with some extra sauce
18:35 <     lirakis>| hehe
18:35 <      jarrod>| sauce is always good
18:35 <      liviuc>| also, the module should be seen as a mere performance optimizer, with the "distributed modules" easily being able to use NoSQL backends as well
18:36 <      jarrod>| this may be too specific, but i wonder what happens on network partition
18:36 <      jarrod>| i guess they store revisions and are brought up to speed by other nodes?
18:36 <     lirakis>| so .. i am suggesting a cassandra gossip like exchange
18:37 <     lirakis>| which automatically recovers from a network partition
18:37 <     lirakis>| based on gossip digests, timestamps and built in hearbeat sequence
18:38 <     lirakis>| got to drop for 5 min. back in a few
18:39 <      jarrod>| yea, i guess so many database/key stores support great clustering, replicating, now even write anywhere environments, and this is just going to get better and better
18:40 <      jarrod>| im always leery of reinventing something
18:40 <     razvanc>| indeed, there are some mechanisms that solve this problem
18:40 <      liviuc>| ^
18:40 <     razvanc>| I'm going to look into some of them to find the best solution
18:41 <     razvanc>| but for start, we were thinking to specify all nodes in a database
18:41 <      jarrod>| cassandra, while it does some things pretty well, is just a heavy layer
18:42 <     razvanc>| indeed, the next step would be to make the nodes auto-discoverable
18:42 <      jarrod>| yea, discovery could be added later... i do like how C* has the concept of datacenters / racks
18:42 <      jarrod>| for more geodistributed environments
18:42 <     razvanc>| C*?
18:42 <      jarrod>| C* == cassandra
18:42 <      jarrod>| its just such a bulky project (and java eeek)
18:43 <     razvanc>| oh, yeah, I know
18:43 <     razvanc>| :)
18:44 <     razvanc>| so, getting back, we were thinking we could specify the nodes in DB
18:45 <     razvanc>| each instance queires the table
18:45 <     razvanc>| and finds out all the other nodes
18:46 <      liviuc>| this way, you must do a rolling restart when adding a new node, right?
18:47 <      liviuc>| or MI command - never mind :)
18:47 <      jarrod>| or mi
18:47 <      jarrod>| yea
18:47 <     razvanc>| yes, but again, that's the initial solution
18:47 <     razvanc>| the next step, the servers could comunicate between them
18:47 <     lirakis>| ok back
18:48 <     razvanc>| and using a heartbeat mechanism disable the servers that are down
18:48 <     razvanc>| because let's be honest, servers do no pop up that often :)
18:48 <     razvanc>| when a new server apperas, you can do a rolling mi command
18:48 <     razvanc>| :)
18:49 <      liviuc>| the initial list _has_ to be file-system persistent - whether if it's a DB (like in MongoDB) or config files (like Percona Cluster)
18:49 <      liviuc>| actually Mongo uses files+own collections - my bad
18:49 <     lirakis>| so ... the idea is to put all the nodes in a DB?
18:50 <     lirakis>| so we are still reliant on some distributed data store
18:50 <      jarrod>| well, i think initially it would be each node each in their local db?
18:50 <     razvanc>| lirakis: yes, for the begining
18:51 <      jarrod>| oh
18:51 <      jarrod>| hmm
18:51 <     lirakis>| yeah i mean the whole idea of a cluster module (for me at least) would be to not have to rely on, or set up a heavy weight distributed data store
18:51 <     lirakis>| ESPECIALLY for some thing that is "simple" like node discovery
18:52 <      liviuc>| @lirakis: I like your p2p-oriented thinking :)
18:52 <     razvanc>| lirakis: I agree
18:52 <     lirakis>| i think for node discovery the gossip style thing on startup is really light weight, and not complicated
18:52 <      liviuc>| so, lirakis is suggesting simply having to specify _1_ neighbour node on each OpenSIPS cluster node
18:53 <      liviuc>| correct me if I'm wrong
18:53 <     lirakis>| that is correct
18:53 <     razvanc>| node discovery is just one issue
18:53 <     lirakis>| right i understand
18:53 <     lirakis>| but im saying .. that the node discovery part is not reinventing the wheel
18:53 <     lirakis>| unlike distributing some shared state
18:53 <     lirakis>| that IS reinventing the wheel
18:54 <      liviuc>| heartbeats and split-brain problems are ... :(
18:54 <     razvanc>| and I think we have it covered by 2 means: either DB provisioning (not very nice) and auto-learning (nice but more complex)
18:54 <     razvanc>| I am now trying to address other issues
18:54 <     lirakis>| ok
18:55 <      liviuc>| maybe we should assess the performance gain of localizing the distributed data storage
18:55 <     lirakis>| well ... i think if we are going to internalize/localize the distribtued data element within opensips, it HAS to be light weight
18:55 <     lirakis>| if its not, then there is no point - just go setup NOSQLDB
18:56 <     lirakis>| and i know its a complicated problem
18:56 <      liviuc>| lightweight/heavyweight has nothing to do with that
18:56 <     lirakis>| so ... is there really a way to do it clean
18:56 <      liviuc>| it's just the matter of dealing with all the associated issues - exactly
18:57 <     lirakis>| to quote bogdan_vs, i dont want to make opensips a "Frankenstein" of a proxy + some weird distributed data store project
18:57 <      liviuc>| ^
18:57 <      jarrod>| ^
18:58 <     lirakis>| so ... im not certain if we can get some type of modular distribtued data store that .... isnt going to make it such a thing
18:58 <     lirakis>| jarrod: back to magic database ;)
18:58 <      jarrod>| thats my thinking... the databases that exist kinda accomplish this and more and more are going to come out
18:58 <      jarrod>| and features are going to be added... the world is going distributed
18:59 <     lirakis>| so ... are we just going to say ... its not worth it, and wait for a light weight distributed key store to be developed?
18:59 <      jarrod>| i can see the need though, for communicating interal datastructures and information between proxies
18:59 <      jarrod>| but similar to the binary replication?
18:59 <      liviuc>| @lirakis: all it needs is to support async ops, and we're golden
19:00 <     lirakis>| i can too ... im just not sure if the whole CAP theorem thing is not a simple problem to solve
19:00 <     lirakis>| arg... magic database you torment me!
19:01              *| lirakis had a dream about writing a light weight distributed key value store thus deemed "magic database"
19:01 <     razvanc>| :)
19:01              *| jarrod volunteered to help 
19:01 <     razvanc>| ok guys\
19:01 <     razvanc>| let's wrap up
19:01              *| jarrod waves his wand
19:01              *| liviuc hops on his unicorn
19:01 <     lirakis>| lol
19:02 <     lirakis>| so ... conclusions - razvanc ?
19:03 <     razvanc>| so I think we still need this module for a couple of reasons
19:03 <      jarrod>| was the thought about distributed usrloc postponed hoping to be accomplished by a cluster module?
19:03 <     lirakis>| i think they are complimentary
19:03 <     razvanc>| jarrod: no, not at all
19:04 <     razvanc>| lirakis: not really
19:04 <     razvanc>| I mean if everybody pings everybody, then yes, they are complementary
19:04 <     razvanc>| but if you want to assing specific clients to specific registrars
19:05 <     razvanc>| then you need a mechnism to know the topology
19:05 <     lirakis>| if you use an edge ... and the registrars do the pinging then you are good
19:05 <     razvanc>| and basically that's what I was trying to find out today
19:05 <     razvanc>| even if it uses an edge
19:05 <     razvanc>| what happens if one registrar goes down?
19:06 <     lirakis>| right - so each registrar uses a different natbflag ... and says in its state what edge its behind, and what blfag its using
19:06 <     lirakis>| so if it goes down - and is detected via this gossip mechanism
19:06 <     razvanc>| yup, that's the idea
19:06 <     razvanc>| I mean even if it is not a gossip mechanism
19:06 <     lirakis>| a registrar behind the same edge will take over that natblfag as well as its own
19:06 <      liviuc>| @razvanc: can't they make use of the atomicity and availability of the NoSQL cluster to publish / retrieve state information?
19:06 <     razvanc>| there's no way to control that now
19:06 <     lirakis>| sure this is all "Theoretical"
19:07 <      jarrod>| lunch, brb
19:07 <     razvanc>| liviuc yes, it could
19:07 <     razvanc>| but that's not there now
19:07 <     razvanc>| and even if you have to take that info from SQL, NoSQL or gossiping, this mechanism has to be implemented
19:08              *| liviuc writes down a new task for Vlad
19:08 <     lirakis>| heh
19:08 <     razvanc>| :)
19:10 <     razvanc>| ok guys, thank you for attending for this meeting
19:10 <     razvanc>| I will write down the conclusions for today
19:10 <     razvanc>| and keep you updated
19:11 <     lirakis>| great
19:11 <     lirakis>| thanks again for having the meeting!
19:11 <      liviuc>| thank you for attending and sharing valuable ideas :)
Retrieved from
Page last modified on June 25, 2015, at 10:46 AM