openSIPS | Community / IRCmeeting20150527

Community -> PublicMeetings -> 27th of May 2015

When

Wednesday, 27th of May 2015, at 15:00 GMT (link here)
IRC, #opensips channel on Freenode

Topics

Distributed User Location

During this meeting we will discuss about implementing a distributed location service. As a starting point, we present two development directions. Some of them are more generic/abstract than others. Some are tailored for large platforms, and some for smaller deployments (faster to implement, also less nodes required).

When building a distributed user location server, there are a couple of issues that have to be taken into account:

Location: what server does a client register to? Is it a "hardcoded" server, is it DNS based, geo-location based?
Failover and high availability: what should a client do in case the "initial" server it registered to is unreachable? Should it detect that and re-register?
Security: who should the client accept traffic from? Any server behind an IP-range, based on domain, reversed-DNS entries?
NAT: in case the client is behind NAT, who does the pinging, who can penetrate the router's pin-hole, how can one server reach the client?
Storage: where do we store the information about which server is "responsible" for a client?
Data: what kind of data should be stored in the storage?
Scalability: how easy can the solution be scaled?

Note: when talking about this topic, although high-availability for 100% of components is possible through the use of virtual IPs, only UDP-based platforms eliminate the need for SIP user agents to re-REGISTER in case of an edge platform node failure. However, for stream-based connections (such as TCP and TLS) other mechanisms exist, such as TCP timeout or keep-alive messages.

Possible directions

Three-layer solution (SBC layer -> Registrar layer -> Shared cache)

Purpose of layers

SBC layer: Handles the NAT, location issues (can distribute REGISTERS among different registrars), security (it's the only one that talks directly to the client) and high-availability (can failover to a different registrar node if one is down) and scalability
Registrar layer: Registers clients, queries the shared cache to see the clients distribution (i.e. where each client is located)
Shared cache: a Cachedb engine (such as MongoDB, Cassandra, Redis) that stores the location (IP and port) of each client

PROs

The client talks to a single entity, so NAT issues are not a problem
All the complex logic (high availability, routing decisions) is handled inside the platform so it is easier to maintain and control
Support for all transport protocols (TLS, TCP, UDP, WS)

CONs

It relies on the SBC, and if it goes down, the entire platform is gone
The solution doesn't scale too well, since the SBC is a single point of entry

Details

This is the more generic solution, since it solves the problem in each use case (TCP/UDP endpoints, public or NAT'ed). The shared cache may be an SQL/NoSQL database or, together with the Registrar layer, it may even be part of an OpenSIPS cluster, where all nodes would share the same location information, in order to provide high availability. Note that in case of UDP-only platforms, this solution can be tuned to offer 100% high-availability by doubling the number of SBC together with virtual IPs. This way, any component of the SIP platform may catch fire, and none of the clients will even have to re-REGISTER!

In order for lookups to be properly routed back through the exact initial SBC (which was used at REGISTER time), each SBC will force routing back through itself with the help of the SIP "Path: " header. This will eliminate all NAT issues, regardless of transport protocol.

If user location data is too big to be fully replicated amongst all registrars, the SBC may incorporate sharding logic, evenly distributing REGISTERs across shards of the OpenSIPS cluster (data is then only replicated within the respective shard's nodes).

Two-layer solution (Registrar layer -> Shared cache)

Purpose of layers

Registrar layer: handles NAT mappings, Registers clients, stores information and queries the shared cache to see the clients distribution (i.e. where each client is located). Load balancing is done by the client (through geo-location or DNS based) as well as failover.
Shared cache: a Cachedb engine (such as MongoDB, Cassandra, Redis) that stores the server (AOR + SIP URI) which knows where the client is

PROs

This solution can scale very easy, by simply adding another registrar server
Can offer support for all transport protocols (TLS, TCP, UDP, WS), but no high availability for them (except UDP using a floating IP)
Can easily benefit of geo-location mecanisms

CONs

If one registrar server goes down, the clients behind stream-based connections (such as TLS, TCP, WS) have to explicitly re-REGISTER to a new server

Details

By storing location information (AOR-registrar mappings) within a shared cache (SQL/NoSQL DB), users are able to dial any of the platform's registrar servers and still initiate calls.

Although the SBC and Registrar layers have been merged, we are still able to implement similar fail-over mechanisms as above (anything but registrar node itself going down if TCP is used). However, the platform's nodes must perform more processing (edge proxy + registrar). There is now a need to consult a distributed cache service in order to fetch the location information when performing a lookup registrar operation.

If UDP is used, high-availability between two nodes can be achieved with the use of virtual IPs, together with either SIP replication, or a common database, or through the recently added binary replication feature.

Conclusions

The system has to be organized on three layers:
- Edge proxy - which handles stream connections (such as TCP, TLS, WS)
- Registrar - core servers that handle registrations
- Sharing system - such as databases or no-SQL storage used to share the registration info among all the registrars
Each of the layers above can be merged together for simplicity, but it is a better idea to have them decoupled
When a client registers to an Edge proxy, it has to add itself into the path. This way, every request going to the client, will pass through the edge proxy that already has a connection with the client.
Pinging is initiated by the registrar and is forwarded through the Edge Proxy
A federated scheme should be used to manage the registrar instances
If one instance is down, the others should be able to "inherit" its contacts
The usrloc module should be able to ping only a specific set of contacts, not all of them
We need to create a method to run custom queries, back-end dependent, when querying a no-SQL database. Since each back-end is different, we might need to use different approaches for each implementation.
Servers should be organized in stacks, where each stack contains a set of edge proxies and a set or registrars
HA mechanisms have to be take into account

Future Questions

How should a federation mechanism work?
Who should inherit the customers when a registrar instance fails?
How do we change the usrloc module to query the no-sql engine regardless the back-end implementation?

Future questions are discussed on the mailing list.

IRC Logs

18:00 <     razvanc>| hi all
18:00            --- | bogdan_vs has changed the topic to: OpenSIPS monthly meeting
18:00 <     razvanc>| and welcome to the new Public meeting discussion :D
18:00 <     lirakis>| :)
18:00 <@  bogdan_vs>| !!!!! meeting in progress !!!!!!
18:00 <      jarrod>| yipee
18:00 <      liviuc>| hello all
18:01 <     lirakis>| so - going off the public meeting notes: http://www.opensips.org/Community/IRCmeeting20150527
18:01 <     lirakis>| there are 2 proposed schemes for distributed userlocation
18:01 <     lirakis>| i would like to propose a 3rd
18:01 <@  bogdan_vs>| well, they are some starting points
18:01 <     lirakis>| sure
18:01 <@  bogdan_vs>| we do not want to limit to
18:01 <     lirakis>| basically following the guidelines of RFC5626 https://tools.ietf.org/html/rfc5626
18:02 <    brettnem>| Hey all!
18:02 <     lirakis>| using and edge proxy that forwards registration requests to a registrar/core processing server
18:02 <@  bogdan_vs>| as is time consuming to present solution -> we did the compile some things in advance
18:02 <@  bogdan_vs>| Hi Brett
18:02 <     lirakis>| diagrams: http://imgur.com/a/5hnmU
18:03 <     lirakis>| This system actually coveres the pros of both the 2 proposed solutions in the notes, while avoiding the cons of the SBC solution
18:03 <@  bogdan_vs>| lirakis: this is option 1 (from our side)
18:04 <     lirakis>| but - its not an SBC, it is an edge proxy (opensips)
18:04 <     lirakis>| and there are multiple edge proxies
18:04 <@  bogdan_vs>| if you do not have any network constrains (thanks to an sbc or edge proxy), is jut a a matter of sharing
18:04 <@  bogdan_vs>| sharing data
18:05 <@  bogdan_vs>| right ?
18:05 <     lirakis>| i would say yes
18:05 <     lirakis>| however
18:05 <     lirakis>| the difference between an SBC and an edge proxy is VERY big
18:06 <     lirakis>| an SBC has no geographic scaling capability while edge proxies can simply be added to expand
18:07 <@  bogdan_vs>| sure, but I do not to get into that....my point is that, from registrar perspective they are more or less the same
18:07 <     lirakis>| more or less
18:07 <     lirakis>| yes
18:07 <@  bogdan_vs>| :)
18:08 <@  bogdan_vs>| the challenge I see when comes to distributing the USRLOC comes from a kind of conflict
18:08 <     lirakis>| conflict between ?
18:08 <@  bogdan_vs>| you talk about "distributed" registrations, but you have to deal with "local" constrains
18:08 <@  bogdan_vs>| like the networks ones
18:08 <     lirakis>| NAT etc.?
18:09 <      jarrod>| the ones he addresses with path headers??
18:09 <@  bogdan_vs>| I can share the registration (contact), but it using that info will be limited by network (TCP, NAT, etc)
18:09 <     lirakis>| not if you use path headers
18:09 <     lirakis>| edge proxy appends a path header and any registrar MUST save with path support
18:09 <@  bogdan_vs>| jarrod: yes, indeed, but the question : do you require to have a SBC/edge all the time between registrar and client ?
18:10 <     lirakis>| yes
18:10 <      jarrod>| absolutely
18:10 <     lirakis>| either that or you make the registrar the edge - but that is simply an architectural decision
18:10 <@  bogdan_vs>| what if you have a cluster arch where the registrar + main proxy faces directly the client ?
18:10 <     lirakis>| the edge can also be the registrar and spiral registration requests
18:10 <     lirakis>| bogdan_vs, ^
18:10 <     lirakis>| this is what we do currently
18:11 <     lirakis>| our "core" proxy servers are also our regisrar
18:11 <     lirakis>| we recieve a REGISTER request, if the src_ip!=myself  append_path() forward(myself);
18:11 <@  bogdan_vs>| yes, but I guess you have the edge proxies, decoupling the registrar info from the net layer
18:11 <     lirakis>| I think that is smart to decouple them yes
18:12 <@  bogdan_vs>| and who is doing the pinging ? the registrar ?
18:12 <     lirakis>| that way edge proxies become the ... edge... handling connections (TCP/WS) and simply forwarding requests
18:12 <     lirakis>| bogdan_vs, yes the registrar sends a ping THROUGH the edge
18:12 <@  bogdan_vs>| (if you have an SBC, you could have it doing)
18:12 <     lirakis>| see the second and third images in http://imgur.com/a/5hnmU
18:13 <     lirakis>| opensips acting as the registrar with NAT helper will simply function as it would if it were the client directly REGISTERing
18:13 <@  bogdan_vs>| yes, that was more a fake question for the next step :)
18:13 <@  bogdan_vs>| and the registrar servers, how do they share the contacts ?
18:13 <     lirakis>| how do you know who is a local user to ping?
18:13 <     lirakis>| bogdan_vs, right - so how do they share contacts .... we currently use a distributed KV store
18:13 <     lirakis>| (cassandra)
18:13 <@  bogdan_vs>| I suppose you have several registrar servers ...
18:14 <     lirakis>| we do
18:14 <     lirakis>| we have ... 8 i think
18:14 <@  bogdan_vs>| and each one pings the contacts they personally received ?
18:14 <     lirakis>| yes
18:14 <@  bogdan_vs>| as all do have access to all contacts (via nosql)
18:14 <     lirakis>| each functions as an independant server, but also is "federated" with all the other servers
18:14 <     lirakis>| bogdan_vs, yes
18:14 <@  bogdan_vs>| and if a registrar goes down....is another one taking over the pinging ?
18:15 <     lirakis>| not currently
18:15 <@  bogdan_vs>| but ideally, it should :)
18:15 <     lirakis>| yes ideally
18:16 <     lirakis>| well - techncally sepeaking our registrar is also our "core processing" server
18:16 <@  bogdan_vs>| so what you have is a "kind" of usrloc using a DB ONLY mode with a distributed DB backend
18:16 <@  bogdan_vs>| (nosql backend, of course)
18:17 <     lirakis>| short answer - yes,  long answer - ... not exactly
18:17 <     lirakis>| but .. for the sake of this discussion - yes
18:17 <      jarrod>| im also accomplishing this simply with postgresql + bdr
18:17 <      jarrod>| (not nosql)
18:17 <      jarrod>| and using the standard usrloc module
18:18 <      jarrod>| i think the point is to enable opensips to rely on backends that support this type of geodistributed write anywhere
18:18 <     lirakis>| there is a minor issue currently where nathelper will try to ping ALL contacts - not just ones that it has a local socket for
18:18 <     lirakis>| from a "what does opensips need to support" to make this work perspective ....
18:19 <      jarrod>| like ive said, thats simple with custom nathelper flag per registrar
18:19 <     lirakis>| right
18:19 <     razvanc>| lirakis: its not that simple to figure out who has to ping the client
18:19 <     razvanc>| :)
18:19 <      jarrod>| but could easily have a better solution
18:19 <@  bogdan_vs>| jarrod: yes, the ugly raw query
18:19 <      jarrod>| bogdan_vs: that works with sql backends...
18:19 <      jarrod>| BUT
18:19 <      jarrod>| for usrloc specifically, there is ONLY ONE QUERY that uses that raw query
18:19 <     razvanc>| I mean if a registrar is down, who should inherit those customers?
18:20 <@  bogdan_vs>| so , with such a solutions there are issues:
18:20 <      jarrod>| for db_cachedb/cachedb_cassandra i was just assuming what it was wanting and handling the query myself
18:20 <    lirakis>| razvanc, it could be anyone so long as they send the ping's to the appropriate edge proxy
18:20 <@  bogdan_vs>| 1) assigning the pinging on a contact
18:20 <    lirakis>| razvanc, the REAL problem is if the edge proxy goes down
18:20 <@  bogdan_vs>| 2) the shared DB backend
18:20 <      jarrod>| lirakis: i actually prefer the client initiating the pings where applicable
18:20 <     lirakis>| jarrod, thats fine, and I also turn on client pings, but you cant rely on clients to do things right
18:20 <@  bogdan_vs>| lirakis: well, you need to have it redundant ?
18:21 <     lirakis>| bogdan_vs, so that gets into rfc5626
18:21 <      jarrod>| lirakis: basically if they use our provisioning server i dont set the natflag
18:21 <     lirakis>| there can be multiple edge's that a client has connections to, or has registered with but then it also needs to support instance id etc
18:21 <      jarrod>| because i make sure to enable it in the provisioned configurations
18:22 <     razvanc>| lirakis: indeed, if the edge proxy goes down, that's a different story
18:22 <     razvanc>| but let's focus on the registrar side
18:22 <     lirakis>| right - and i think that ... while it is important to have HA etc.   is a different discussion
18:22 <@  bogdan_vs>| now, without an intention of breaking the discussion idea.....
18:22 <     lirakis>| as i said - that gets into sip outbound
18:22 <@  bogdan_vs>| what about the second options ?
18:22 <@  bogdan_vs>| (as presented on the web page)
18:23 <@  bogdan_vs>| is anyone see it as something they may use/consider in a real deployment ?
18:23 <     lirakis>| i think the second options is basically the same as my "3rd option"   it simply does not consider seperating the connection managment
18:23 <      jarrod>| isnt the CON stated overcome with simple HA per registrar?
18:23 <@  bogdan_vs>| just to know hoe much time to allocate to each option :)
18:23 <     lirakis>| yeah i actually have never looked into HA stuff with opensips
18:23 <     lirakis>| it might be that you could simply HA the registrar's
18:24 <     lirakis>| but i have never looked into that
18:24 <     lirakis>| I think getting proper shared DB backed stuff working is the core issue at this point
18:24 <      jarrod>| right, thats a different topic, but that seems easy to overcome with each local instance
18:24 <     lirakis>| it doesnt exist
18:24 <      jarrod>| agreed
18:24 <      jarrod>| ^
18:24 <     lirakis>| we have a custom module written that makes cassandra work
18:24 <@  bogdan_vs>| so options 2 is actually condensing the edge proxy and registrar on the same instance ?
18:25 <     lirakis>| bogdan_vs, yes
18:25 <      jarrod>| i thikn for usrloc specifically, it should be able to accomodate any backend (i.e. raw_query)
18:25 <@  bogdan_vs>| so, to address all the possibilities, opensips should have functions for edge and others for distributed usrloc
18:26 <     razvanc>| jarrod: ideally the entire distributed functionality should be wrapped in simple opensips functions
18:26 <      jarrod>| ooh that sounds better
18:26 <@  bogdan_vs>| and the you can combine them as you want getting a pure Edge proxy, a pure distributed registrar or a mixed edge + registrar ??
18:26 <     razvanc>| that's what we are trying to figure out, what these functions should do behind :)
18:26 <      jarrod>| but its simple to write the simple translation functions per backend
18:26 <     lirakis>| bogdan_vs, yes the conceptual components are as you just described
18:26 <     lirakis>| how they are actually deployed is not really relevant
18:26 <     lirakis>| but those are the "elements" of the system
18:27 <@  bogdan_vs>| ok, we are getting somewhere :)
18:27 <     lirakis>| the edge manages connections, keeps NAT open via pings, and forwards requests to a registrar or core processing proxy
18:27 <     lirakis>| all of these may or may not be on the same machine
18:28 <     lirakis>| and may indeed be the same opensips instance
18:28 <@  bogdan_vs>| but I see some needs to improve USRLOC for : better noSQL support, better pinging integration
18:28 <     lirakis>| but they will have logically different segments of processing in the script (which is what we do)
18:28 <     razvanc>| the idea is that having all of them on a single machine, you have more information and thus you can do more stuff
18:28 <     lirakis>| bogdan_vs, i think that would be a very good start
18:29 <@  bogdan_vs>| as you want to be able to use a noSQL backend (not an SQL one) and to decide which contacts to ping
18:29 <     lirakis>| yes - that would be ideal
18:29 <@  bogdan_vs>| and on the other side, to better handle registrations themselves - like sip.instance and reg-id
18:29 <      jarrod>| but are you still thinking of funneling "nosql backends" through db_cachedb/cachedb_ modules?
18:30 <@  bogdan_vs>| to avoid all contact forking with mobile devices
18:30 <@  bogdan_vs>| jarrod: not necessarily
18:30 <     lirakis>| bogdan_vs, that is only a consideration when you start getting into clients that are even capable of doing stuff like SIP outbound
18:30 <     lirakis>| I know of no such client that currently exists
18:30 <@  bogdan_vs>| we can look at a native noSQL support in USRLOC
18:30 <      jarrod>| hmm
18:31 <@  bogdan_vs>| lirakis: like no client implements the 5626 ??
18:31 <      jarrod>| but so you are thinking of implementing "native noSQL" ONLY in usrloc?
18:31 <@  bogdan_vs>| jarrod : if it brings PROS
18:31 <      jarrod>| or making another generic module that can extend the functionality to other modules
18:31 <     lirakis>| bogdan_vs, i do not know of any
18:32 <@  bogdan_vs>| well, with the noSQL things are not so simple as with SQL - as there is no standard
18:32 <      jarrod>| right
18:32 <@  bogdan_vs>| some are simple KV, others with counters, others with lists, other with documents (json)
18:32 <     lirakis>| so i was thinking about adding some kind of "capability" for the DB api that would map usrloc type stuff
18:32 <      jarrod>| which is why i liked the db_cachedb integration where i can impement it PER backend
18:32 <     lirakis>| like instead of doing a raw query for get_all_u_contacts
18:32 <@  bogdan_vs>| and USRLOC is not simple when comes to data structures
18:33 <     lirakis>| the db module must have a capability for that call
18:33 <      jarrod>| raw query n my mind is the only thing that needs to change
18:33 <      jarrod>| (for quick implementation)
18:33 <     razvanc>| lirakis: you'd like a method to overwrite the raw queries the usrloc module does when querying the db?
18:34 <     lirakis>| basically change the usrloc raw queries to some API / interface
18:34 <     lirakis>| that the db module must implement
18:34 <@  bogdan_vs>| jarrod: even before getting to that raw query, your DB must support kind of lists
18:34 <@  bogdan_vs>| as usrloc is about AOR with a list of contacts
18:34 <@  bogdan_vs>| that needs to be consistently shared
18:35 <     lirakis>| right but with k/v stores ... how those "lists" are stored could be very different based on the backend
18:35 <     lirakis>| cassandra vs. redis. etc
18:35 <@  bogdan_vs>| right
18:35 <     lirakis>| we simply store a row key with the AOR as the key, and all the columns are the contacts stored as u_contact structs
18:35 <      jarrod>| yea, for the db_cachedb translation queries, i implement the logic PER module (i.e. usrloc has different logic than version)
18:36 <     lirakis>| making that an API/interface makes it so the DB module can choose how to implement the call to the backend
18:36 <@  bogdan_vs>| lirakis: so the value encodes the entire list of contacts?
18:36 <     lirakis>| bogdan_vs, if we call get_all_u_contacts() it returns a linked list (i think) of u_contact structs
18:37 <      jarrod>| what i do is perform the query and then built the db_res_t
18:37 <     lirakis>| but in the cassandra db - yes
18:37 <     lirakis>| ill show an example ... one sec :)
18:37 <     razvanc>| lirakis: you're usin the lists support in cassandra?
18:37 <     razvanc>| because otherwise I don't see how you can do this atomically
18:37 <    lirakis>| razvanc, its basic cassandra
18:38 <     razvanc>| and is there any mechanism you're using to ensure atomicity?
18:38 <     razvanc>| I mean if a client registers simultaneously on two registrars
18:38 <     razvanc>| and both of them add a record
18:38 <     razvanc>| in the same time
18:38 <     lirakis>| so this is why i said ... long answer - no
18:38 <     lirakis>| heh
18:38 <     razvanc>| oh :)
18:38 <@  bogdan_vs>| :D
18:39 <     lirakis>| so a client can register simultaneously on two registrars
18:39 <     lirakis>| and we rely on cassandras consistency mechanisms to sync the shared backend
18:39 <     lirakis>| you can specify how many servers are required for a "quorum"
18:39 <     lirakis>| we allow things to be "eventually" consistent
18:40 <@  bogdan_vs>| lirakis: ok, at timestamp 0, the key is empty (no registration)
18:40 <     lirakis>| and accept that within .5 seconds ... both phones may not ring
18:40 <     lirakis>| but within 1 second .. they will
18:40 <     razvanc>| yes, but if two servers insert two data in the same time, who merges them?
18:40 <@  bogdan_vs>| and then you have 2 registrations in the same time, on different servers
18:40 <@  bogdan_vs>| both servers will no value for the key and insert 2 different values (each server its own contact)
18:41 <     lirakis>| so there will be 2 registrations for the same key (aor)
18:41 <      jarrod>| right
18:41 <      jarrod>| with different contact (Source ip/port) and callid ?
18:41 <     lirakis>| yes
18:41 <     lirakis>| https://gist.githubusercontent.com/etamme/2a6aad375c616ce336a2/raw/8fd70a99a6fd528673c6548a0394ecdeac6541d6/gistfile1.txt
18:41 <     razvanc>| yup
18:41 <      jarrod>| which is what eric uses as a key
18:41 <     razvanc>| but same aor
18:41 <     lirakis>| see that gist
18:41 <      jarrod>| lirakis' solution solves that
18:42 <     lirakis>| location['AOR']=['callid-contact1','callid-contact2',...,'callid-contactN']
18:42 <     razvanc>| so with cassandra if you insert two records with the same AOR, they are merged into the same record?
18:43 <     lirakis>| a row (key) has many columns (values)
18:43 <     lirakis>| a key is an aor, a value is a SPECIFIC registration
18:43 <     razvanc>| oh right, I see now
18:43 <     lirakis>| if you try to write 2 registrations for the same AOR for the same CALL-ID-CONTACT
18:43 <     lirakis>| it will only end up with 1
18:43 <     lirakis>| cassandra uses timestamps
18:43 <     lirakis>| to know which is "newer"
18:44 <      jarrod>| and dont foget to mention the auto ttl
18:44 <      jarrod>| which addresses the expires query parameter of the raw query
18:44 <     lirakis>| yes - cassandra has automatic TTL so the registrations automatically expire
18:44 <@  bogdan_vs>| ok, but you see....this is the tricky part
18:44 <     lirakis>| but yea - i dont want to make this about "lets use cassandra"
18:45 <@  bogdan_vs>| you made the USRLOC yo work with cassandra
18:45 <     lirakis>| ...cassandra is just one possible tool
18:45 <@  bogdan_vs>| not with nosql
18:45 <@  bogdan_vs>| :)
18:45 <     lirakis>| i know that
18:45 <     lirakis>| we literally wrote a db_cassandra_usrloc module
18:45 <@  bogdan_vs>| so getting a generic nosql support may be a hard one
18:45 <     lirakis>| it can only be used with usrloc
18:45 <      jarrod>| yea, i wouldnt target generic nosql support
18:45 <@  bogdan_vs>| actually a db_cassandra_usrloc_dbonly module :)
18:45 <      jarrod>| trying to make this work with every backend
18:45 <     lirakis>| and .. i dont know .. if thats an answer
18:46 <     lirakis>| bogdan_vs, well it actually has a mysql connection inside it too .... ;)
18:46 <      jarrod>| just making it possible to integrate it with backends that support this type of geo distributed model
18:46 <@  bogdan_vs>| dude, you rock !! :)
18:46 <     razvanc>| lirakis: lol :D
18:46 <     lirakis>| i mean .. is that a possible answer .... custom db_userloc module per backend
18:46 <     lirakis>| ?
18:47 <     razvanc>| lirakis: it's not the nice answer I think
18:47 <@  bogdan_vs>| well.....if it solved the problem and there is no other better solution....
18:47 <     lirakis>| id agree its nota  nice answer
18:47 <     lirakis>| im just not certain how "generic" a solution we can come up with
18:47 <     lirakis>| i just wanted to throw out there what we did
18:48 <     lirakis>| we did it that way to minimize having to change anything else within opensips
18:48 <@  bogdan_vs>| aha
18:48 <@  bogdan_vs>| ok, that's the DB side....
18:48 <@  bogdan_vs>| what about federating...:)
18:49 <      jarrod>| it seems more advantageous to rely on the DB to accomplish the guts
18:49 <@  bogdan_vs>| do we need a mechanism to configure all the opensips nodes that need to share ?
18:49 <      jarrod>| i guarantee more and more of these type of backends with immerge
18:49 <     lirakis>| thats really all accomplished with the PATH headers.  I think the nat ping stuff could use some attention
18:49 <@  bogdan_vs>| lirakis: yes, that is noted and we will check
18:50 <     lirakis>| bogdan_vs, i think that as far as federating is concerned ... its really up to the implementor to determine who can pass traffic
18:50 <     lirakis>| on onsip ... we will accept a call from any place on the internet
18:50 <     lirakis>| and we will send calls to any sip address
18:50 <     lirakis>| (we challenge the latter)
18:50 <     lirakis>| but we operate in "stacks" ...
18:51 <     lirakis>| so there is a stack of a edge, a registrar, and a core
18:51 <     lirakis>| edge only signals to registar or core
18:51 <     lirakis>| and only cores signal between each other
18:51 <     razvanc>| I think what bogdan_vs wants to address is whether opensips needs to know all instances inside the platform
18:51 <     lirakis>| no
18:51 <     razvanc>| and how do they manage to find themselves
18:51 <     lirakis>| they dont need to
18:52 <     razvanc>| they do need in case of registrars failure
18:52 <     razvanc>| when contacts have to be inherited
18:52 <     razvanc>| by someone else
18:52 <     lirakis>| if edge1 gets a request for bob@biloxi.com, it sends it to its core ... that core does a usrlocation lookup, gets a contact with a PATH ... sets the $du and its done
18:52 <     lirakis>| ok - so I think that "HA" registrars is some thing worth discussing
18:53 <     lirakis>| or how to "inherit" contacts
18:53 <     razvanc>| actually I was thinking about pinging
18:53 <     lirakis>| right
18:53 <     lirakis>| so was i
18:53 <     razvanc>| who should ping the "lost" clients
18:53 <     lirakis>| stack1  has  edge1, registrar1, core1
18:53 <     lirakis>| for some reason registrar1 dies
18:53 <     lirakis>| ... who sends pings
18:53 <     razvanc>| yup
18:53 <     lirakis>| i mean ... the nathelper module can be anywhere
18:53 <     lirakis>| really
18:53 <     lirakis>| and it just needs to see the "view" of what it "owns"
18:54 <     lirakis>| just so long as the pings traverse the appropriate edge
18:54 <     lirakis>| so its really ... HA nathelper
18:54 <     lirakis>| not HA registrar
18:55 <     razvanc>| I was thinking of something else
18:56 <     razvanc>| stack has edge1, registrar1, registrar2, core1
18:56 <     lirakis>| right
18:56 <     razvanc>| let's add registrar3 too :)
18:56 <     lirakis>| sure
18:56 <     razvanc>| each registrar has its own set of clients
18:56 <     lirakis>| but .. it doesnt
18:56 <     razvanc>| and each of them ping its own set of clients (there's no mechanism right now for this)
18:56 <     lirakis>| ok .. keep going with your scenario
18:57 <     razvanc>| in case registrar3 fails
18:57 <     razvanc>| registrar1 and 2 should take its clients
18:57 <     lirakis>| ok
18:57 <     lirakis>| so ... there is no "ownership" of contacts per registrar
18:57 <     lirakis>| they simply save them, and look them up
18:57 <     razvanc>| yeah, but that means everybody pings everybody
18:57 <     lirakis>| no no
18:57 <     lirakis>| so .. nathelper
18:58 <     lirakis>| or whatever is doing the pings
18:58 <     lirakis>| is responsible for lookingup users associated with a given "edge" and pinging them
18:58 <     lirakis>| edge "owns" the clients
18:58 <     lirakis>| registrars own nothing
18:58 <     razvanc>| so the edge is somehow corelated to the registrar
18:58 <     lirakis>| there could be any number of registrars behind an edge
18:59 <     lirakis>| or there could just be one
18:59 <     lirakis>| but in OUR implementations there is currently a 1:1 relationship
18:59 <     lirakis>| we scale in "stacks"
18:59 <     razvanc>| so each registrar sends to a couple of edges
19:00 <     lirakis>| other way around
19:00 <     lirakis>| edge:registrar 1:M
19:00 <@  bogdan_vs>| and if the registrar goes down, how is pinging its edges ??
19:00 <     razvanc>| so each client will receive M pingings?
19:00 <    lirakis>| razvanc, no - so im saying that the registrars (M) are not really responsible for sending pings ...
19:01 <     lirakis>| you could literally have a seperate server
19:01 <     lirakis>| nat-helper1
19:01 <     lirakis>| that simply loads up nathelper, looks at the database for contatcts with flag "stack1" set
19:01 <     lirakis>| and pings them
19:01 <@  bogdan_vs>| ok, still the same problem
19:02 <@  bogdan_vs>| is the pinger configured for "stack1" goes down
19:02 <     lirakis>| yes
19:02 <@  bogdan_vs>| which one will ping the "stack1" ?
19:02 <     lirakis>| i dont know - but like i said ... this is a HA nathelper problem
19:02 <     lirakis>| not an HA registrar problem
19:02 <@  bogdan_vs>| ok, so you do not cover the HA part
19:02 <     lirakis>| right
19:02 <     lirakis>| i have no idea how to do HA in opensips
19:02 <     lirakis>| lol
19:03 <     lirakis>| is there a heartbeat mechanism
19:03 <@  bogdan_vs>| :) .....we were thinking to get a solution to cover both distribution and HA
19:03 <     lirakis>| if so ... then each pinger box
19:03 <     lirakis>| would need to know about others
19:03 <@  bogdan_vs>| or at least to come with all the ingredients to do it
19:03 <     lirakis>| so that it could take over
19:04 <     lirakis>| bogdan_vs, right - i was just saying they are not directly connected problems
19:04 <@  bogdan_vs>| yes, that what I meant by "a federation" support
19:04 <     lirakis>| right i got you know
19:04 <     lirakis>| *now
19:04 <@  bogdan_vs>| so be able to configure all the nodes part of the federation, so they can both share and backup
19:04 <@  bogdan_vs>| we are looking also in other scenarios where such support will be needed
19:04 <@  bogdan_vs>| not usrlocs related
19:05 <     lirakis>| sure - so ... when you startup nathelper .... is there a db table some where that has all the other pingers with state?
19:05 <@  bogdan_vs>| brettnem knows ;)
19:05 <@  bogdan_vs>| lirakis: kind of
19:05 <     lirakis>| so you want some generic mechanism for service discovery
19:05 <     lirakis>| they all have naptr records with SRV's  for pinger.stack1.domain.com
19:05 <     lirakis>| lol :P
19:05 <@  bogdan_vs>| going further....yes
19:06 <     lirakis>| so this is sort of getting into the binary replication land IMO ....
19:06 <     lirakis>| i can tell you how cassandra does it ...
19:06 <@  bogdan_vs>| well, this is already exceeding the topic of today....but something like that will be useful at some point
19:06 <     lirakis>| each node has a schema, and one or more seed nodes.   When they come online they try to contact a seed node to get updated info about the "ring"
19:07 <@  bogdan_vs>| lirakis: not going so far...as binary repl....just to share state and assume roles in the cluster
19:07 <@  bogdan_vs>| yes....kind of
19:07 <     lirakis>| so .. a cluster module could be very interesting
19:07 <     lirakis>| that would enable support for some thing like that
19:07 <     lirakis>| it could expose an api that other modules could use to share / update information about a cluster
19:08 <     lirakis>| that would just be backed by a db backend though
19:08 <@  bogdan_vs>| right
19:08 <     lirakis>| i dont think we want to rewrite a distributed keystore as an opensips module
19:08 <     lirakis>| heh
19:09 <@  bogdan_vs>| ok, trying so wrapup the usrloc discussion
19:09 <     lirakis>| but ... it could just have some functions that are keystore like .... get_nodes() ... get_value(key)  ...etc
19:09 <@  bogdan_vs>| 1) functionality focus on (a) edge and (b) shared registrar
19:10 <@  bogdan_vs>| and you can combine or use them separately
19:10 <@  bogdan_vs>| 2)  see options in enabling generic nosql support in usrloc
19:10 <@  bogdan_vs>| 3) solving the natpinging issue (distributed and HA)
19:11 <     lirakis>| If those 3 can be covered ... i think we'll have a solid distributed userlocation solution in opensips
19:11 <     lirakis>| :D
19:11 <@  bogdan_vs>| hopefully :)
19:12 <@  bogdan_vs>| ok, I guess we can continue the discussion for each topic on the mailing list
19:12 <@  bogdan_vs>| (after digesting a bit the discussion)
19:12 <@  bogdan_vs>| :)
19:12 <     lirakis>| OK - sounds good (and in here too)
19:12 <@  bogdan_vs>| and the clustering topic sounds promising ;)
19:12 <     lirakis>| it does sound interesting :)
19:13 <     lirakis>| thanks again guys for taking the time to have these monthly meetings
19:13 <@  bogdan_vs>| ok, anything else to add for the usrloc topic ?
19:13 <     lirakis>| very much appreciated
19:13 <@  bogdan_vs>| if not, we can call it done for today !
19:14 <@  bogdan_vs>| thank you all for being here
19:14 <     lirakis>| thanks!
19:14 <@  bogdan_vs>| and especially for contributing
19:14 <@  bogdan_vs>| ;)
19:14 <@  bogdan_vs>| Razvan will do the update on the web page, as a short summary
19:15 <@  bogdan_vs>| ok, thank you everybody !
19:15 <     lirakis>| excellent - thanks again
19:15 <@  bogdan_vs>| !!!!!! Meeting ended  !!!!!!
19:16 <     razvanc>| thank you guys!

From openSIPS

Community: IRCmeeting20150527