CLUSTERER Module

The clusterer module is used to organize multiple OpenSIPS instances into groups that can communicate with each other in order to replicate, share information or perform distributed tasks. The module itself only stores information about the nodes in a group/cluster and provides an interface to check or tune their state and parameters. The distributed logic is performed by different modules that use this interface (i.e. the dialog module can replicate profiles, the ratelimit module can share pipes across multiple instances, etc). Provisioning the nodes within a cluster is done over the database but, for efficiency, the node-related information is cached into OpenSIPS memory. This information can be checked or updated by sending commands over the MI interface.

The clusterer module can also detect node availability, by using certain parameters provisioned in the database. When a destination is not reachable, it is put in a probing state - it is periodically pinged until a maximum number of failed attempts is reached, when it is marked as temporarily disabled. It stays in this state for a period (equal to the duration parameter), and then the number of retries reset to 0 and the node is considered up again.

Modules (like dialog or ratelimit can use nodes within the cluster to replicate information. They also register a specific timeout to invalidate data from specific nodes, in case no updates have been within an interval. The clusterer notifies the module if the timeout is reached and puts the node in a temporary disabled state. If a packet has arrived for a temporary disabled server, the packet is dropped and a temporary disabled notification is sent to the registered module. After the disabled period (2 * timeout) has passed, the server is up again.

By default, the state information of the nodes is not persistent. To make them persistent via a database, one must set the persistent_mode parameter.

1.2. Dependencies

1.2.1. OpenSIPS Modules

The following modules must be loaded before this module:

a database module.

1.2.2. External Libraries or Applications

The following libraries or applications must be installed before running OpenSIPS with this module loaded:

None.

1.3. Exported Parameters

1.3.1. `db_url`

The database url.

Default value is “NULL”.

Example 1.1. Set db_url parameter

...
modparam("clusterer", "db_url",
	"mysql://opensips:opensipsrw@localhost/opensips")
...

1.3.2. `db_table`

The name of the table storing the clustering information.

Default value is “clusterer”.

Example 1.2. Set db_table parameter

...
modparam("clusterer", "db_table", "clusterer")
...

1.3.3. `server_id`

It specifies the server_id the current instance has. this field should correspond with one of the machine_id fields in the database.

Default value is 0.

Example 1.3. Set server_id parameter

...
modparam("clusterer", "server_id", 2)
...

1.3.4. `persistent_mode`

If persistent mode is enabled, a timer synchronizes the information used by the clusterer module and the information stored in the database.

Default value is 0 (disabled).

Example 1.4. Set persistent_mode parameter

...
modparam("clusterer", "persistent_mode", 1)
...

1.3.5. `cluster_id_col`

The name of the column in the db_table where the cluster_id is stored.

Default value is “cluster_id”.

Example 1.5. Set cluster_id_col parameter

...
modparam("clusterer", "cluster_id_col", "cluster_id")
...

1.3.6. `machine_id_col`

The name of the column in the db_table where the machine_id is stored.

Default value is “machine_id”.

Example 1.6. Set machine_id_col parameter

...
modparam("clusterer", "machine_id_col", "machine_id")
...

1.3.7. `clusterer_id_col`

The name of the column in the db_table where the machine_id is stored.

Default value is “id”.

Example 1.7. Set clusterer_id_col parameter

...
modparam("clusterer", "clusterer_id_col", "clusterer_id")
...

1.3.8. `state_col`

The name of the column in the db_table where the state is stored.

Default value is “state”.

Example 1.8. Set state_col parameter

...
modparam("clusterer", "state_col", "state")
...

1.3.9. `url_col`

The name of the column in the db_table where the url is stored.

Default value is “url”.

Example 1.9. Set url_col parameter

...
modparam("clusterer", "url_col", "url")
...

1.3.10. `description_col`

The name of the column in the db_table where the machine's description is stored.

Default value is “description”.

Example 1.10. Set description_col parameter

...
modparam("clusterer", "description_col", "description")
...

1.3.11. `failed_attempts_col`

The name of the column in the db_table where the maximum allowed number of failed attempts is stored.

Default value is “failed_attempts”.

Example 1.11. Set failed_attempts_col parameter

...
modparam("clusterer", "failed_attempts_col", "failed_attempts")
...

1.3.12. `last_attempt_col`

The name of the column in the db_table where the UNIX time of last last failed attempt is stored.

Default value is “last_attempt”.

Example 1.12. Set last_attempt_col parameter

...
modparam("clusterer", "last_attempt_col", "last_attempt")
...

1.3.13. `duration_col`

The name of the column in the db_table where the duration of a machine belonging to a certain cluster temporary disabled state is stored.

Default value is “duration”.

Example 1.13. Set duration_col parameter

...
modparam("clusterer", "duration_col", "duration")
...

1.3.14. `no_tries_col`

The name of the column in the db_table where the number of failed connecting tries is stored.

Default value is “no_tries”.

Example 1.14. Set no_tries_col parameter

...
modparam("clusterer", "no_tries_col", "no_tries")
...

1.4. Exported Functions

none

1.5. Exported MI Functions

1.5.1. `clusterer_reload`

Reloads data from the clusterer database. If the persistent mode is disabled the changes made to the locally stored data are lost.

Name: clusterer_reload

Parameters:none

MI FIFO Command Format:

		:clusterer_reload
		_empty_line_

1.5.2. `clusterer_list`

Lists in a table format all the data stored in OpenSIPS cache.

Name: clusterer_list

Parameters:none

MI FIFO Command Format:

		:clusterer_list
		_empty_line_

1.5.3. `clusterer_set_status`

Sets the status(UP, DOWN) of a machine belonging to a certain cluster.

Name: clusterer_set_status

Parameters:

cluster_id - indicates the id of the cluster.
machine_id - indicates the id of the machine.
status - indicates the new status( 0 - permanent down, 1 - up).
protocol - indicates the protocol.

MI FIFO Command Format:

		:clusterer_set_status:
		1
		2
		0
		bin
		_empty_line_

1.6. Usage Example

This section provides an usage example for replicating ratelimit pipes between two OpenSIPS instances. It uses the clusterer module to manage the replicating nodes, and the proto_bin modules to send the replicated information.

The setup topology is simple: we have two OpenSIPS nodes running on two separate machines (although they could run on the same machine as well): Node A has IP 192.168.0.5 and Node B has IP 192.168.0.6. Both have, besides the traffic listeners (UDP, TCP, etc.), bin listeners bound on port 5566. These listeners will be used by the ratelimit module to replicate the pipes. Therefore, we have to provision them in the clusterer table.

Example 1.15. Example database content - clusterer table

+----+------------+------------+----------------------+-------+--------------+-----------------+----------+----------+-------------+
| id | cluster_id | machine_id | url                  | state | last_attempt | failed_attempts | no_tries | duration | description |
+----+------------+------------+----------------------+-------+--------------+-----------------+----------+----------+-------------+
|  1 |          1 |          1 | bin:192.168.0.5:5566 |     1 |            0 |               0 |        0 |       30 | Node A      |
|  2 |          1 |          2 | bin:192.168.0.6:5566 |     1 |            0 |               0 |        0 |       30 | Node B      |
+----+------------+------------+----------------------+-------+--------------+-----------------+----------+----------+-------------+

“cluster_id” - this column represents the identifier of the cluster. All nodes within a group/cluster should have the same id (in our example, both nodes have ID 1)
“machine_id” - this represents the identifier of the machine/node, and each instance within a cluster should have a different ID. In our example, Node A will have ID 1, and Node B ID 2
“url” - this indicates the URL where the instance will receive the replication information. In our example, each node will receive the date over the bin protocol
“state” - this is the state of the machine: 1 means on, 0 means off, and 2 means it is in probing. Note that if you want the node to be active right away, you have to set it in state 1
“last_attempt”, “failed_attempts” and “no_tries” - are fields used for the probing mechanisms, and should be set to 0 by default. They are automatically updated by the clusterer module if the persistent_mode parameter is set to 1
“duration” - is used to specify the period a node stays in the temporary disabled state. In our example, if the node does not respond, it is disabled for 30 seconds before retrying to send data again
“description” - is an opaque value used to identify the node

After provisioning the two nodes in the database, we have to configure the two instances of OpenSIPS. First, we configure Node A:

Example 1.16. Node A configuration

...
listen = bin:192.168.0.5:5566 # bin listener for Node A

loadmodule "proto_bin.so"

loadmodule "clusterer.so"
modparam("clusterer", "db_url", "mysql://opensips@192.168.0.7/opensips")
modparam("clusterer", "server_id", 1) # machine_id for Node A

loadmodule "ratelimit.so"
# replicate pipes to cluster id 1
modparam("ratelimit", "replicate_pipes_to", 1)
# accept replicated data from nodes within cluster 1
modparam("ratelimit", "accept_pipes_from", 1)
# if a node does not reply in a 5 seconds interval,
#the information from that node is invalidated
modparam("ratelimit", "accept_pipes_timeout", 5)
...

Similarly, the configuration for Node B is as follows:

Example 1.17. Node B configuration

...
listen = bin:192.168.0.6:5566 # bin listener for Node B

loadmodule "proto_bin.so"

loadmodule "clusterer.so"
# ideally, use the same database for both nodes
modparam("clusterer", "db_url", "mysql://opensips@192.168.0.7/opensips")
modparam("clusterer", "server_id", 2) # machine_id for Node B

loadmodule "ratelimit.so"
# replicate pipes to cluster id 1
modparam("ratelimit", "replicate_pipes_to", 1)
# accept replicated data from nodes within cluster 1
modparam("ratelimit", "accept_pipes_from", 1)
# if a node does not reply in a 5 seconds interval,
# the information from that node is invalidated
modparam("ratelimit", "accept_pipes_timeout", 5)
...

Note that the server_id parameter for Node B is 2. Starting the two OpenSIPS instances with the above configurations provides your platform the ability to used shared ratelimit pipes in a very efficient and scalable way.

Chapter 2. Developer Guide

2.1. Available Functions

2.1.1. `get_nodes(cluster_id, proto)`

The function will return all a copy of all the needed information from the nodes (machine_id, state, description, sock address) stored in shm, whos state is up(1) and have a certain cluster_id and protocol.

This function is usually used for replication purposes.

This function returns NULL on error.

Meaning of the parameters is as follows:

int cluster_id - the cluster id
int proto - the protocol

Example 2.1. get_nodes usage

...
get_nodes(cluster_id, proto);
...

2.1.2. `free_nodes(nodes)`

This function will free the allocated data for the copy.

Meaning of the parameters is as follows:

clusterer_node_t *nodes - the data returned by the get_nodes function

Example 2.2. free_nodes usage

...
free_nodes(nodes);
...

2.1.3. `set_state(cluster_id, machine_id, state, proto)`

The function sets the state of a machine belonging to a certain cluster, which have the specified protocol.

This function is usually used for replication purposes.

Meaning of the parameters is as follows:

int cluster_id - cluster_id
int machine_id - machine_id
int state - the server state
int proto - protocol

Example 2.3. set_state usage

...
set_state(1,1,2,PROTO_BIN);
...

2.1.4. `check(cluster_id, sockaddr, server_id, proto)`

This function is used to check if the source of a receiving packet is known.

It returns 1 if the source is known, else it returns 0.

Meaning of the parameters is as follows:

int cluster_id - cluster id
union sockaddr_union* sockaddr - incoming connexion socket address
int server_id - incoming connexion server_id
int proto - protocol

Example 2.4. check usage

...
check(1, sockaddr, 2, PROTO_BIN)
...

2.1.5. `get_my_id()`

This function will return the server id's.

Example 2.5. get_my_id usage

...
get_my_id()
...

2.1.6. `send_to(cluster_id, protocol)`

This function will replicate information to the nodes belonging to a cluster_id that have a specific protocol.

Meaning of the parameters is as follows:

int cluster_id - cluster_id
int protocol - protocol

Example 2.6. send_to usage

...
send_to(cluster_id, protocol)
...

2.1.7. `register_module(module_name, protocol, callback_function, timeout, auth_check, cluster_id)`

This function registers a module to a certain protocol. It acts like an intermediary: when a valid packet has arrived, if the auth_check parameter is specified then it is checked for authenticity. After that, the timestamps are updated and the callback function from the registered module is called. The clusterer module checks for every registered module if the duration between the last receiving packet and the current time is greater than the module specified timeout. If it is, the servers are temporary disabled for a period of timestamp * 2. If any packets are received for the temporary disabled servers the registered module is notified.

Meaning of the parameters is as follows:

char *module_name - module name
int protocol - protocol
void (*callback_function)(int, struct receive_info *, int) - the registered module callback function
int timeout - timeput
int auth_check - 0 if the authentication check is disabled, 1 if the authentication check is enabled
int cluster_id - cluster_id

Example 2.7. register_module usage

...
register_module(dialog, PROTO_BIN, cb, timeout, auth_check, cluster_id)
...

Chapter 3. Contributors

3.1. By Commit Statistics

Table 3.1. Top contributors by DevScore⁽¹⁾, authored commits⁽²⁾ and lines added/removed⁽³⁾

	Name	DevScore	Commits	Lines ++	Lines --
1.	Eseanu Marius Cristian (@eseanucristian)	45	10	3142	534
2.	Razvan Crainea (@razvancrainea)	13	7	314	124
3.	Liviu Chircu (@liviuchircu)	11	5	154	217
4.	Bogdan-Andrei Iancu (@bogdan-iancu)	6	4	47	47
5.	Vlad Patrascu (@rvlad-patrascu)	3	1	26	25

(1) DevScore = author_commits + author_lines_added / (project_lines_added / project_commits) + author_lines_deleted / (project_lines_deleted / project_commits)

(2) including any documentation-related commits, excluding merge commits. Regarding imported patches/code, we do our best to count the work on behalf of the proper owner, as per the "fix_authors" and "mod_renames" arrays in opensips/doc/build-contrib.sh. If you identify any patches/commits which do not get properly attributed to you, please submit a pull request which extends "fix_authors" and/or "mod_renames".

(3) ignoring whitespace edits, renamed files and auto-generated files

3.2. By Commit Activity

Table 3.2. Most recently active contributors⁽¹⁾ to this module

	Name	Commit Activity
1.	Liviu Chircu (@liviuchircu)	Mar 2016 - Aug 2018
2.	Bogdan-Andrei Iancu (@bogdan-iancu)	Apr 2016 - Jun 2018
3.	Razvan Crainea (@razvancrainea)	Nov 2015 - Jun 2017
4.	Vlad Patrascu (@rvlad-patrascu)	Jun 2017 - Jun 2017
5.	Eseanu Marius Cristian (@eseanucristian)	Jul 2015 - Sep 2015

(1) including any documentation-related commits, excluding merge commits

Chapter 4. Documentation

4.1. Contributors

Last edited by: Bogdan-Andrei Iancu (@bogdan-iancu), Liviu Chircu (@liviuchircu), Razvan Crainea (@razvancrainea), Eseanu Marius Cristian (@eseanucristian).

CLUSTERER Module

Chapter 1. Admin Guide

1.1. Overview

1.2. Dependencies

1.2.1. OpenSIPS Modules

1.2.2. External Libraries or Applications

1.3. Exported Parameters

1.3.1. db_url

1.3.2. db_table

1.3.3. server_id

1.3.4. persistent_mode

1.3.5. cluster_id_col

1.3.6. machine_id_col

1.3.7. clusterer_id_col

1.3.8. state_col

1.3.9. url_col

1.3.10. description_col

1.3.11. failed_attempts_col

1.3.12. last_attempt_col

1.3.13. duration_col

1.3.14. no_tries_col

1.4. Exported Functions

1.5. Exported MI Functions

1.5.1. clusterer_reload

1.5.2. clusterer_list

1.5.3. clusterer_set_status

1.6. Usage Example

Chapter 2. Developer Guide

2.1. Available Functions

2.1.1. get_nodes(cluster_id, proto)

2.1.2. free_nodes(nodes)

2.1.3. set_state(cluster_id, machine_id, state, proto)

2.1.4. check(cluster_id, sockaddr, server_id, proto)

2.1.5. get_my_id()

2.1.6. send_to(cluster_id, protocol)

2.1.7. register_module(module_name, protocol, callback_function, timeout, auth_check, cluster_id)

Chapter 3. Contributors

3.1. By Commit Statistics

3.2. By Commit Activity

Chapter 4. Documentation

4.1. Contributors

1.3.1. `db_url`

1.3.2. `db_table`

1.3.3. `server_id`

1.3.4. `persistent_mode`

1.3.5. `cluster_id_col`

1.3.6. `machine_id_col`

1.3.7. `clusterer_id_col`

1.3.8. `state_col`

1.3.9. `url_col`

1.3.10. `description_col`

1.3.11. `failed_attempts_col`

1.3.12. `last_attempt_col`

1.3.13. `duration_col`

1.3.14. `no_tries_col`

1.5.1. `clusterer_reload`

1.5.2. `clusterer_list`

1.5.3. `clusterer_set_status`

2.1.1. `get_nodes(cluster_id, proto)`

2.1.2. `free_nodes(nodes)`

2.1.3. `set_state(cluster_id, machine_id, state, proto)`

2.1.4. `check(cluster_id, sockaddr, server_id, proto)`

2.1.5. `get_my_id()`

2.1.6. `send_to(cluster_id, protocol)`

2.1.7. `register_module(module_name, protocol, callback_function, timeout, auth_check, cluster_id)`