openSIPS | About / PerformanceTests-3-4

About

About -> Performance Tests

This page has been visited 5868 times.

Table of Content (hide)

1. Purpose
2. Overview
1. 2.1 Setup Description
2. 2.2 Hardware
3. Raw Results
4. Conclusions

A collection of performance tests and measurements performed on OpenSIPS 3.4, on various subsystems: database, transactions, dialogs, etc. These tests should give you a broad idea on what you could achieve on your own OpenSIPS setup using similar hardware!

1. Purpose

The objective of the stress tests was to re-assess the performance of various OpenSIPS subsystems, ahead of the upcoming 3.4 beta release. Apart from putting updated maximum capacity numbers on these modules, the tests also pinpointed various performance bottlenecks in each scenario, thanks to code profiling.

2. Overview

the stress-tests were broken down into three categories: calling tests, B2B tests and TCP engine tests
within each category, we gradually increased the amount of features (code) ran through by each test
the upper limit of each test was determined by various metrics: either max CPU usage on the OpenSIPS box, various error logs at capacity limit or UDP/TCP accumulating Recv-Queue
once the CPS limit was discovered -> perform profiling, analyze the CPU usage map and try to spot bottlenecks

2.1 Setup Description

all tests used the F_MALLOC memory allocator (the default in all public builds). A performance comparison between F_MALLOC, Q_MALLOC and HP_MALLOC can be found in a separate set of tests below
the CPU-bound tests (1-6) used a maximum of 8 UDP workers (typically 4), in order to minimize context-switching (since the OpenSIPS system was a quad-core -- 1:1 worker/CPU mapping)
starting with test #7, the SIP workers were bumped to 8, to cope with the added I/O operations (8 workers were enough to satisfy the required ~6k CPS)
in tests 1-6, the proxy was pushed to the maximum possible CPU load, while on tests 7-14 the traffic was kept constant at 6000 CPS and we instead monitored the CPU load penalty as we progressed through the tests
average call duration: 30 seconds
UDP was used as transport protocol for the majority of tests, unless stated otherwise
latest git revision the tests were run on: b0068befd (May 9th, master branch)

For all SIP traffic generation purposes, sipp was the main tool which got the job done. Being a single-threaded application, both the sipp UAC and UAS were found to reach their capacity limitation at around 2500 - 3000 CPS. So we simply scale them horizontally, by launching more clients and servers!

2.2 Hardware

Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (4 cores, 8 threads, launch date: Q1'17)
16 GB DDR4 (Kingston)
SSD 850 EVO 250GB (Samsung)

3. Raw Results

The following table shows the raw CPS data used in each scenario. Notes:

the Avg. CPU column represents the average CPU usage of the SIP worker processes, as shown by top
the Load-1m column represents the average OpenSIPS load over 1 minute of the SIP worker processes only, extracted from the load: statistic

3.1 Basic Calling Scenarios (transactions, dialogs)

Unauthenticated Calls

Test ID	Description	CPS	Avg. CPU	Load-1m	Avg. IN/OUT Traffic	Profiling
1	TM	13000	77%	80%	43 MB/s	PDF
2	1 + RR	12500	83%	84%	42 MB/s	PDF
3	2 + DIALOG	10000	95%	94%	36 MB/s	PDF
4	DEF. Script	10500	82%	64%	36 MB/s	PDF
5.1	4 + DIALOG	10000	86%	73%	36 MB/s	PDF
5.2	5.1 + TH(Call-ID)	6250	91%	88%	20 MB/s	PDF

Authenticated Calls

Test ID	Description	CPS	Avg. CPU	Load-1m	Avg. IN/OUT Traffic	Profiling	Notes
6	5.1 + AUTH 1k	6000	54%	65%	26 MB/s	PDF	MySQL 60%+ CPU usage
7	5.1 + AUTH 10k	6000	59%	65%	26 MB/s	PDF	MySQL 65%+ CPU usage
8	7 + Auth-Caching	6000	65%	57%	26 MB/s	PDF	MySQL 0% CPU usage
9	7 + CDR	6000	55%	73%	26 MB/s	PDF	MySQL 110%+ CPU usage
10	9 + Auth-Caching	6000	58%	71%	26 MB/s	PDF	MySQL 70%+ CPU usage
11	7 + CDR-flat	6000	58%	67%	26 MB/s	PDF	MySQL 70%+ CPU usage
12	11 + Auth-Caching	6000	65%	55%	26 MB/s	PDF	MySQL 0% CPU usage

3.2 Complex Calling Scenarios (B2B)

Test ID	Description	CPS	Avg. CPU	Load-1m	Avg. IN/OUT Traffic	Profiling
13.1	B2B - TH	1200	64%	60%	8 MB/s	PDF
13.2	B2B - REFER	1000	66%	61%	6 MB/s	PDF
13.3	B2B - Marketing	900	68%	63%	5 MB/s	PDF

3.3 TCP Test Scenarios

Test ID	Description	CPS	Avg. CPU	Load-1m	Avg. IN/OUT Traffic	Profiling	Notes
14.1	TM-Con-1-Read-0	12500	66%	58%	42 MB/s	PDF	Test start: conn balancing
14.2	TM-Con-1-Read-1	-	-	-	-	PDF	Note: conn READ bug at high volumes, WIP
14.3	TM-Con-1-Read-2	-	-	-	-	PDF	Note: conn READ bug at high volumes, WIP
14.4	TM-Con-N-Read-0	4000	52%	20%	12 MB/s	PDF
14.5	TM-Con-N-Read-1	-	-	-	-	PDF	Note: conn READ bug at high volumes, WIP
14.6	TM-Con-N-Read-2	-	-	-	-	PDF	Note: conn READ bug at high volumes, WIP

4. Conclusions

the newly introduced load: statistic is critical for monitoring the behavior and performance of your OpenSIPS instance. It can help you spot which workers are busy or not. Or when you need extra capacity on your instance, due to being either CPU-bound or I/O-bound.
- recap: this statistic monitors the "idleness" of your OpenSIPS workers. If they are doing anything other than waiting for a new SIP job, then they are "busy". Otherwise, they are "idle". For example, if an OpenSIPS worker is running a sleep(1000) in your opensips.cfg, its load: value will be 100% (fully busy).
- a low CPU usage from your OpenSIPS instance does not mean it's necessarily not loaded. It could be stuck in I/O operations and asking for more SIP workers.
when adding DB query caching to your OpenSIPS instance, do not be surprised if it's running a higher CPU usage, because the database will be at 0% CPU usage afterwards, resulting in a overall net gain of CPU resource, as well as dramatically reduced I/O wait time (again, watch the load: statistic).
the B2B modules currently have a lower CPS performance, due to the internal complexity of the code. We are still evaluating whether there is room for optimization in the current shape of the codebase.
the new OpenSIPS TCP connection balancing is based on the load: statistic, so when doing TCP engine stress-testing in single-connection mode (on the clients' side), make sure to start the UACs gradually, one-by-one in order to give the load: statistic a bit of time to update, such that the new high-throughput connections do not all end up in the same TCP worker!

Edit | History | Print | Recent Changes | Search

Page last modified on May 21, 2023, at 07:05 PM