openSIPS | Development / Distributed User Location Design

February 17, 2018, at 07:04 PM by liviu -

Changed lines 93-94 from:

This solution is an appropriate choice for a single site with a medium-sized subscriber population (order of millions), which could all fit into a single OpenSIPS box (all cluster boxes are mirrored). The NAT bindings are tied to the SBC layer, with the cluster nodes routing out both call and ping traffic through this layer. With the help of the cluster layer, who is able to signal when a node joins/leaves the network, each node is able to determine its very own "pinging slice", by performing an AOR hash modulo current_no_of_cluster_nodes.

to:

This solution is an appropriate choice for a single site with a medium-sized subscriber population (order of millions), which could all fit into a single OpenSIPS box (all cluster boxes are mirrored). The NAT bindings are tied to the SBC layer, with the cluster nodes routing out both call and ping traffic through this layer. With the help of the cluster layer, which is able to signal when a node joins/leaves the network, each node is able to determine its very own "pinging slice", by performing an AOR hash modulo current_no_of_cluster_nodes.

Changed line 119 from:

Similar to the "basic" solution, the NAT bindings are tied to the SBC layer, with the cluster nodes routing out both call and ping traffic through this layer. With the help of the cluster layer, who is able to signal when a node joins/leaves the network, each node is able to determine its very own "pinging slice", by applying an AOR hash modulo current_no_of_cluster_nodes filter to the DB cluster query.

to:

Similar to the "basic" solution, the NAT bindings are tied to the SBC layer, with the cluster nodes routing out both call and ping traffic through this layer. With the help of the cluster layer, which is able to signal when a node joins/leaves the network, each node is able to determine its very own "pinging slice", by applying an AOR hash modulo current_no_of_cluster_nodes filter to the DB cluster query.

November 22, 2017, at 03:53 PM by liviu -

Changed line 55 from:

However, the difference is that we are now using the OpenSIPS clusterer layer for all inter-node communication. Immediately, this reduces the number of messages sent ("alice" is reachable here, rather than Alice's contact "deskphone" is now present here), the size of the messages (metadata only, rather than full-blown SIP) and their nature (instantly parse-able binary packets). Furthermore, by using the cluster-based communication, the platform now becomes resilient to the loss of some of its cross-location data links. As long as the "platform graph" stays connected, the cluster-based distributed location service will remain unaffected.

to:

However, the difference is that we are now using the OpenSIPS clusterer layer for all inter-node communication. Immediately, this reduces the number of messages sent ("alice" is reachable here, rather than Alice's contact "deskphone" is now present here), the size of the messages (metadata only, rather than full-blown SIP) and the parsing overhead (binary data vs. SIP syntax). Furthermore, by using the cluster-based communication, the platform now becomes resilient to the loss of some of its cross-location data links. As long as the "platform graph" stays connected, the cluster-based distributed location service will remain unaffected.

November 22, 2017, at 03:51 PM by liviu -

Changed line 12 from:

This page aims at offering high-level information regarding the development of several distributed user location models which are to be included in the OpenSIPS 2.4 release. By putting together several community discussions (2013 "users" mailing list, 2015 public meeting) along with our own experience with regards to this topic, we have put together two models which simultaneously address needs such as horizontal scalability, geo distribution, high availability and NAT traversal.

to:

This page aims at offering high-level information regarding the development of several distributed user location models which are to be included in the OpenSIPS 2.4 release. By putting together several community discussions (2013 "users" mailing list, 2015 public meeting) along with our own experience with regards to this topic, we present two models which simultaneously address needs such as horizontal scalability, geo distribution, high availability and NAT traversal.

October 24, 2017, at 08:21 PM by liviu -

Changed line 138 from:

[TODO 5][optional]: assess performance and implement "ping cache" if the "read" queries due to pinging prove to be a bottleneck

to:

[TODO 5][optional]: assess performance and implement a "ping cache" if the read queries required by NAT pinging prove to be a bottleneck

October 24, 2017, at 08:19 PM by liviu -

Changed line 136 from:

[TODO 3]: extend the cacheDB API and extend its capability set to match the above

to:

[TODO 3]: adapt the cacheDB API and expand its capability set to match the above

October 24, 2017, at 08:12 PM by liviu -

Changed lines 135-136 from:

[TODO 2]: solve NAT pinging problem (DB query filter)
[TODO 3][optional]: assess performance and implement "ping cache" if the "read" queries due to pinging prove to be a bottleneck

to:

[TODO 2]: research all NoSQL backends capable of handling the usrloc data format requirements
[TODO 3]: extend the cacheDB API and extend its capability set to match the above
[TODO 4]: solve NAT pinging problem (DB query filter)
[TODO 5][optional]: assess performance and implement "ping cache" if the "read" queries due to pinging prove to be a bottleneck

October 24, 2017, at 08:08 PM by liviu -

Changed line 115 from:

This solution is to be employed by single sites with high population numbers (order of tenths/hundredths of millions). At these magnitudes of data, we cannot rely on OpenSIPS to manage the user location data anymore (unless we kickstart "OpenSIPS DB") - we would rather pass this on to a specialized, cluster-oriented NoSQL database which offers data partitioning and redundancy.

to:

This solution is to be employed by single sites with high population numbers (order of tens/hundreds of millions). At these magnitudes of data, we cannot rely on OpenSIPS to manage the user location data anymore (unless we kickstart "OpenSIPS DB") - we would rather pass this on to a specialized, cluster-oriented NoSQL database which offers data partitioning and redundancy.

October 24, 2017, at 06:38 PM by liviu -

Changed lines 97-98 from:

easy to configure and manage

to:

node redundancy (each node holds a full dataset copy)
easy to configure and scale up/down

Changed lines 123-124 from:

easy to configure and manage

to:

OpenSIPS node redundancy (any node is capable of saving / looking up contacts)
easy to configure and scale up/down

October 24, 2017, at 06:29 PM by liviu -

Changed line 53 from:

This solution is a heavily optimized version of the previous one, from three perspectives: performance, network link redundancy and scripting difficulty. Similar to the above, the end results, as seen from outside the platform, are similar: global reachability, NAT traversal and pinging.

to:

This solution is a heavily optimized version of the previous one, from three perspectives: performance, network link redundancy and scripting difficulty. Similar to the above, the end results, as seen from outside the platform, stay the same: global reachability, NAT traversal and pinging.

October 24, 2017, at 06:28 PM by liviu -

Changed line 36 from:

possible with OpenSIPS 2.X and glue scripting

to:

possible with some OpenSIPS 2.X scripting

October 24, 2017, at 06:11 PM by liviu -

Changed line 114 from:

This solution is to be employed by single sites with high population numbers (order of tenths/hundredths of millions). At these magnitudes of data, we cannot rely on OpenSIPS to manage the user location data anymore (albeit we kickstart "OpenSIPS DB") - we would rather pass this on to a specialized, cluster-oriented NoSQL database which offers data partitioning and redundancy.

to:

This solution is to be employed by single sites with high population numbers (order of tenths/hundredths of millions). At these magnitudes of data, we cannot rely on OpenSIPS to manage the user location data anymore (unless we kickstart "OpenSIPS DB") - we would rather pass this on to a specialized, cluster-oriented NoSQL database which offers data partitioning and redundancy.

October 24, 2017, at 06:00 PM by liviu -

Changed lines 28-29 from:

SIP driven "user facing" topology

to:

"SIP driven" user facing topology

Changed line 49 from:

Cluster driven "user facing" topology

to:

"Cluster driven" user facing topology

October 24, 2017, at 05:59 PM by liviu -

October 24, 2017, at 05:57 PM by liviu -

Changed line 134 from:

[TODO 3][optional]: assess performance and implement "ping cache" if pinging is a bottleneck

to:

[TODO 3][optional]: assess performance and implement "ping cache" if the "read" queries due to pinging prove to be a bottleneck

October 24, 2017, at 05:52 PM by liviu -

Changed lines 47-48 from:

[TODO 1]: implement a script mechanism to simplify the forking of REGISTERs to all cluster nodes

to:

[TODO 1][optional]: implement a script mechanism to simplify the forking of REGISTERs to all cluster nodes

Changed lines 98-99 from:

built-in solution, no external database required

to:

trivial OpenSIPS scripting
no external database required

Changed lines 108-109 from:

to:

[TODO 2]: solve NAT pinging problem (in-memory filter)

Added lines 113-134:

This solution is to be employed by single sites with high population numbers (order of tenths/hundredths of millions). At these magnitudes of data, we cannot rely on OpenSIPS to manage the user location data anymore (albeit we kickstart "OpenSIPS DB") - we would rather pass this on to a specialized, cluster-oriented NoSQL database which offers data partitioning and redundancy.

Similar to the "basic" solution, the NAT bindings are tied to the SBC layer, with the cluster nodes routing out both call and ping traffic through this layer. With the help of the cluster layer, who is able to signal when a node joins/leaves the network, each node is able to determine its very own "pinging slice", by applying an AOR hash modulo current_no_of_cluster_nodes filter to the DB cluster query.

PROs:

easy to configure and manage
high-end solution, capable of accommodating large subscriber pools
trivial OpenSIPS scripting

CONs:

may not be compatible with too many NoSQL backends

Development:

[TODO 1]: implement full usrloc data handling via CacheDB
[TODO 2]: solve NAT pinging problem (DB query filter)
[TODO 3][optional]: assess performance and implement "ping cache" if pinging is a bottleneck

October 24, 2017, at 05:13 PM by liviu -

Changed lines 93-94 from:

to:

This solution is an appropriate choice for a single site with a medium-sized subscriber population (order of millions), which could all fit into a single OpenSIPS box (all cluster boxes are mirrored). The NAT bindings are tied to the SBC layer, with the cluster nodes routing out both call and ping traffic through this layer. With the help of the cluster layer, who is able to signal when a node joins/leaves the network, each node is able to determine its very own "pinging slice", by performing an AOR hash modulo current_no_of_cluster_nodes.

PROs:

easy to configure and manage
built-in solution, no external database required

CONs:

can only scale up to the maximum number of location contacts handled by a single instance

Development:

[TODO 1]: implement "sync on startup" feature

Added lines 109-110:

http://www.opensips.org/pub/images/distributed-usrloc-2-2.jpg

October 24, 2017, at 04:42 PM by liviu -

Changed line 79 from:

http://www.opensips.org/pub/images/distributed-usrloc-2-1.jpg

to:

http://www.opensips.org/pub/images/distributed-usrloc-2-1.jpg

October 24, 2017, at 04:41 PM by liviu -

Changed lines 18-20 from:

Below is a list of requirements addressed by this model:

geographical distribution - the overall platform may span over several physical locations, with any of its users being reachable from any of these locations.

to:

Below is a set of features specific to this model:

geographically distributed - the overall platform may span across several physical locations, with any of its users being reachable from any of these locations.

Changed line 23 from:

horizontally scalable - locations may be scaled according to needs, by adding additional OpenSIPS cluster nodes

to:

horizontally scalable - locations may be scaled according to needs, by adding additional OpenSIPS cluster nodes and balancing traffic to them (DNS SRV, SIP load balancers, UA policies, etc.)

Changed lines 77-95 from:

"Homogenous cluster" topology

to:

"Homogeneous cluster" topology

http://www.opensips.org/pub/images/distributed-usrloc-2-1.jpg

The homogeneous cluster solves the following problems:

geographical distribution - the overall platform may span across several physical locations, with any of its users being reachable from any of these locations.
NAT traversal - outgoing requests are directed through the SBC layer, which maintains the NAT bindings
NAT pinging - no extraneous pinging; the cluster self-manages pinging responsibilities according to the current node count
horizontal scalability - the service may be scaled up or down by dynamically adding/removing OpenSIPS cluster nodes
high availability - by default, as data is either duplicated on multiple OpenSIPS nodes, or in a manner pertaining to the chosen DB cluster engine

We present two solutions for achieving this setup: a "basic" solution and an "advanced" one.

"Basic" OpenSIPS homogeneous cluster topology

"Advanced" OpenSIPS homogeneous cluster topology

October 24, 2017, at 03:53 PM by liviu -

Changed lines 74-75 from:

[TODO 5]: "node self-recovery" MI command (re-use TODO 4 mechanism)

to:

[TODO 5]: implement a "node sync" mechanism on startup
[TODO 6]: "node self-recovery" MI command (re-use TODO 4 mechanism)

October 24, 2017, at 03:50 PM by liviu -

Changed line 51 from:

http://www.opensips.org/pub/images/distributed-usrloc-1-1.jpg

to:

http://www.opensips.org/pub/images/distributed-usrloc-1-2.jpg

October 24, 2017, at 03:49 PM by liviu -

Changed lines 47-48 from:

[TODO 1] implement a script mechanism to allow forking of REGISTERs to all cluster nodes

to:

[TODO 1]: implement a script mechanism to simplify the forking of REGISTERs to all cluster nodes

Added lines 51-74:

http://www.opensips.org/pub/images/distributed-usrloc-1-1.jpg

This solution is a heavily optimized version of the previous one, from three perspectives: performance, network link redundancy and scripting difficulty. Similar to the above, the end results, as seen from outside the platform, are similar: global reachability, NAT traversal and pinging.

However, the difference is that we are now using the OpenSIPS clusterer layer for all inter-node communication. Immediately, this reduces the number of messages sent ("alice" is reachable here, rather than Alice's contact "deskphone" is now present here), the size of the messages (metadata only, rather than full-blown SIP) and their nature (instantly parse-able binary packets). Furthermore, by using the cluster-based communication, the platform now becomes resilient to the loss of some of its cross-location data links. As long as the "platform graph" stays connected, the cluster-based distributed location service will remain unaffected.

PROs

heavily optimized communication between nodes (number of messages / size of messages / parsing complexity)
service can survive broken links between locations
trivial OpenSIPS scripting (save() and lookup() almost stay the same)
built-in solution, no external database required

CONs

cluster comms packets are cumbersome to troubleshoot (submit Wireshark dissector?)

Development:

[TODO 1]: Address-of-Record metadata replication, e.g. {"aor": "liviu", "dst_cluster_node_id": 1}
[TODO 2]: enhance lookup() to support parallel forking across multiple locations (using metadata info)
[TODO 3]: enhance save() to broadcast data throughout cluster
[TODO 4]: solve "node restart" and "node down" corner-cases (broadcast "I am empty" packet, broadcast all my metadata to all cluster nodes)
[TODO 5]: "node self-recovery" MI command (re-use TODO 4 mechanism)

October 24, 2017, at 03:13 PM by liviu -

Changed line 32 from:

This solution is ideal for SMBs or as a proof of concept. With the SIP driven solution, after saving an incoming registration, the registrar node records itself using a Path header, after which it replicates the REGISTER to all cluster nodes across all locations. This allows the user to be globally reachable, while also making sure it only receives calls through its "home box" (a mandatory NAT requirement in most cases). NAT pinging is only done from the "home box".

to:

This solution is ideal for SMBs or as a proof of concept. With the SIP driven solution, after saving an incoming registration, the registrar node records itself using a Path header, after which it replicates the REGISTER to all cluster nodes across all locations. This allows the user to be globally reachable, while also making sure it only receives calls through its "home box" (a mandatory NAT requirement in most cases). NAT pinging is only performed by the "home box".

October 24, 2017, at 03:12 PM by liviu -

Changed line 32 from:

This solution is ideal for SMBs or as a proof of concept. With the SIP driven solution, after saving an incoming registration, the registrar node records itself using a Path header, after which it replicates the REGISTER to all cluster nodes across all locations. This allows the user to be globally reachable. NAT pinging is only done from the "home box".

to:

This solution is ideal for SMBs or as a proof of concept. With the SIP driven solution, after saving an incoming registration, the registrar node records itself using a Path header, after which it replicates the REGISTER to all cluster nodes across all locations. This allows the user to be globally reachable, while also making sure it only receives calls through its "home box" (a mandatory NAT requirement in most cases). NAT pinging is only done from the "home box".

October 24, 2017, at 03:11 PM by liviu -

Changed lines 14-15 from:

"UA-facing" topology

to:

"User facing" topology

Added lines 25-50:

We present two solutions for achieving this setup: a "SIP driven" solution and a "cluster driven" one.

SIP driven "user facing" topology

http://www.opensips.org/pub/images/distributed-usrloc-1-1.jpg

This solution is ideal for SMBs or as a proof of concept. With the SIP driven solution, after saving an incoming registration, the registrar node records itself using a Path header, after which it replicates the REGISTER to all cluster nodes across all locations. This allows the user to be globally reachable. NAT pinging is only done from the "home box".

PROs:

possible with OpenSIPS 2.X and glue scripting
can easily inspect/troubleshoot network traffic between locations
built-in solution, no external database required

CONs:

complex OpenSIPS script, since it handles the distribution logic, rather than masking it behind existing primitives (save(), lookup())
each OpenSIPS node holds the global (across locations) user location dataset
SIP replication is expensive (network and parsing overhead)

Development:

[TODO 1] implement a script mechanism to allow forking of REGISTERs to all cluster nodes

Cluster driven "user facing" topology

October 24, 2017, at 02:44 PM by liviu -

Changed lines 21-22 from:

NAT traversal - with regards to the above, call to a given user are properly routed out through its "home box" (the box it registered with)
NAT pinging - no extraneous pinging; only the "home box" of a user is responsible for pinging

to:

NAT traversal - with regards to the above, calls to a given user are properly routed out through the user's "home box" (the box it registered with)
NAT pinging - no extraneous pinging; only the "home box" of a user is responsible for pinging the user's device

October 24, 2017, at 02:43 PM by liviu -

Added lines 3-4:

This page has been visited 5476 times. (:toc-float Table of Content:)

Changed line 9 from:

OpenSIPS 2.4 Manual

to:

Distributed User Location Design

Added lines 11-26:

This page aims at offering high-level information regarding the development of several distributed user location models which are to be included in the OpenSIPS 2.4 release. By putting together several community discussions (2013 "users" mailing list, 2015 public meeting) along with our own experience with regards to this topic, we have put together two models which simultaneously address needs such as horizontal scalability, geo distribution, high availability and NAT traversal.

"UA-facing" topology

http://www.opensips.org/pub/images/distributed-usrloc-1.jpg

Below is a list of requirements addressed by this model:

geographical distribution - the overall platform may span over several physical locations, with any of its users being reachable from any of these locations.
NAT traversal - with regards to the above, call to a given user are properly routed out through its "home box" (the box it registered with)
NAT pinging - no extraneous pinging; only the "home box" of a user is responsible for pinging
horizontally scalable - locations may be scaled according to needs, by adding additional OpenSIPS cluster nodes
highly available - optional support for pairing up any of the nodes with a "hot backup" box

"Homogenous cluster" topology

October 24, 2017, at 12:23 PM by liviu -

Changed line 1 from:

Development -> Design-Distributed-User-Location ?

to:

Development -> Design-Distributed-User-Location

October 24, 2017, at 12:23 PM by liviu -

Added lines 1-8:

Development -> Design-Distributed-User-Location ?

(:title Distributed User Location Design:)

OpenSIPS 2.4 Manual

Development.Design-Distributed-User-Location History

SIP driven "user facing" topology

"SIP driven" user facing topology

Cluster driven "user facing" topology

"Cluster driven" user facing topology

"Homogenous cluster" topology

"Homogeneous cluster" topology

"Basic" OpenSIPS homogeneous cluster topology

"Advanced" OpenSIPS homogeneous cluster topology

"UA-facing" topology

"User facing" topology

SIP driven "user facing" topology

Cluster driven "user facing" topology

"UA-facing" topology

"Homogenous cluster" topology

Development -> Design-Distributed-User-Location?

Development -> Design-Distributed-User-Location

Development -> Design-Distributed-User-Location?

Development -> Design-Distributed-User-Location ?

Development -> Design-Distributed-User-Location ?