Development -> New Design -> Requirements
1. Fault Tolerance and Redundancy
- Improve/simplify fault tolerance and redundancy. Improve documentation on this as well.
- Automatically preserve all calls of a failed node by the other peer/peers. Call contexts should be either synchronized or stored in the global memory (ie. memcached)
- Generic "IPC" to allow one opensips process to "talk" to another process. This may be to inform another process of new/deleted dialogs or other routing logic. For example, if a server is using dialog profiling to manage simultaneous call counts. Have a mechanism for one instance to tell the other instances that there is a new dialog to be added to the profile. This should work over IP (maybe even support multicast status broadcasting)
2. Topology Hiding
- Implement a way to completely hide (from every possible sip and media header ) the source and destination traffic from both ends of the proxy. (preventing clients from seeing end carriers IPs and viceversa)
- replace the custom scripting language with existing high level programing languages (perl, php, python, JAVA, etc)
- more than one type of language to be used
- more than one script per OpenSIPS instance (for different purposes) - a script may be a kind of application
- script reloading will be possible without restarting the core
- no special scripting skills will be required (as using common known languages)
- using existing languages, the scripting become more powerful as it has access to all libs for DB, variables, arrays, string ops, etc.
- Enable curl-type integration to control handling of SIP
- Replace current scripting with XML
- Provide a standard library of scripts for typical usage scenarios and configuration
4. Application Layer
- separation between low level functionalities (transport, registration, SIP, NAT, rr+loose_route, TM, dialog, presence, etc) and high level functionalities (routing, authentication, accounting, checks, DB, LDAP, snmp, etc)
- this will allow easy creation of the high-level functionalities without getting into low level things: if I do a routing functionality, why should I care about NAT traversal?? -> reduces the script complexity as it will focus mainly on service creation, rather on making SIP to work)
- OpenSIPS base will be the current core (simplified and maximum efficiency) plus the low level modules (TM, dialog, nat, etc)
- application can be modules or external entities (programs, processes)
- as applications can be external, they can be running on different machines -> the system will scale better as currently the bottle neck is at application level and not at core level
- Might be of interest an interface to the LVS (Linux Virtual Server Project) which could handle load balancing and redundancy.The setup can have LVS for SIP proxys or for external applications or both.
- A unique and powerful interface to the external applications. The applications should be able to register a (ip,port) pair and a set of rules which will tell the proxy the interest for specific SIP messages with particular parameters. No need to tell that the exchange of information from proxy to the application should be asynchronous. The applications should be able to generate messages that will open new dialogs, like UAC requests and use the functionalities within opensips, like authentication/authorization schemes, accounting, db interaction and so on.
- Application composition is a powerful concept which is present in the SIP Servlet especification. It allows to trigger several applications serially upon the reception of a message, each application adding a particular piece of functionality (prepaid, pbx service, web control,custom accounting based on non-standard system, etc...). Applications are completely independent and are not coupled between them at all. The order of composition is determined by an entity called the Application Router which takes care of choosing the right applications based on criteria extracted from the message and the environment. The application composition concept is explained in the chapter 15 of the SIP Servlet especification 1.1 (http://jcp.org/aboutJava/communityprocess/final/jsr289/index.html)
- Ability to load avp based on username & uuid instead of only caller & callee in avp_load_radius.
- Hangup application should not interrupt operations of the core. Possibility to register new application as extention to running core. Memory leaks, crashes, locks in application should not affect core and other applications.
- Check Apache 2 architecture and its implementation of Multi-Processing Modules (MPMs).
Think about hybrid multi-process multi-threaded server model (http://httpd.apache.org/docs/2.2/mod/worker.html) for providing high performance and stability.
- If possible Interactive Voice Response system will be integrated in the further Opensips.
- It would be extremely useful to have the posibility to have the following: For each dialog in inserted in "dialog" table, send an OPTIONS message to each endpoint and EXPECT some response back. If no response is received in X milliseconds drop the dialog by sending BYEs to each endpoint. This might be enabled and configurable from the script.
- Currently there are many functions which support only static values. The functions should support passing script variables, avp variables and flags also.
- Make it work as well as possible as IMS (IP Multimedia Subsystem) Application Server. Implement Diameter Sh interface (can be used at least to resolve if user is registered or not).
5. Message processing
- drop the lump as apply real time changes on the message - this will simplify the way you can operate on a SIP message (you will be able to see your changes after applying them).
- Session admission control limiting the number of calls (incoming, outgoing, overall, by route)
- using threads instead of processes - it will improve the communication between threads and data exchange; also it will be a big boost for TCP/TLS and connection sharing will be trivial.
- threads are highly efficient and inexpensive (like creating one) - it will open the possibility to dynamically scale the capacity by creating threads on demand.
6. Asynchronous processing
- entire processing in core must be based on a asynchronous reactor - there will be no dead/idle time because of I/O ops. The processing capacity will dramatically increase.
- no blocking ops (DNS, DB, TCP, etc) in core -> everything must be build based on contexts in order to allow switching from one context to another.
More details on this: The whole idea is to concentrate all waits in a single point (like you do with a select while waiting for multiple TCP connections) - it is not a serial wait that may lead to idle time (because of the serialization), but a parallel approach where you are waiting in the same time for the termination of any I/O (reading a message from the net, a finished DNS, a DB response).
Just to give an example: let's assume we have 1 process
1) current design:
time T - a message is read from network
t+1 - a DB is invoked
---- idle ------
t+3 - reply from DB
t+4 - done with the message
t+5 - read the next one
2) new design
time T - a message is read from network (reactor gets indication
from the network socket)
t+1 - a DB is invoked (context is suspended and a new socket is
added to the reactor for waiting the DB reply)
t+2 - a second message is read from network (reactor gets again
indication from the network socket)
t+3 - second message is done
t+4 - reply from DB (reactor get indication from the DB socket) -
context is restored for the first message
t+5 - done with the first message
So, i the same amount of time, you use all the time for computing
instead of idle I/O and you can process more traffic.
7. Load Balancing and Memory Design
- Rework the shared memory and pkg memory design so that we can use OpenSIPS in a load balanced envirorment.
- Most SIP load balancers are transaction stateful.
- Some load balancers are actually transaction stateful proxies.
- Introduce a new memory concept called "Global Memory" which can span physical nodes. Any modules which need to cache data or build internal lists would use Global Memory. Global Memory would be backed by something like memcached.
- Memcached has built-in object expirery which most of the SIP data that would be stored could implicitly use. In the Registrar, Dialog or Presence modules, using the Dialog expirey as the memcached expirey time would remove the need for ANY cleanup routines to scan the data
- Memcached can be scaled to N nodes automatically.
- Memcached uses a key/value API so all modules would need to be rewritten to support this
- Resulting memory hierarchy
- pkg_mem - Used for needs within a single process such as DB results
- shared_memory - Used for single node processing such as transaction memory
- global_memory - Used for all nodes and cacheing.
- If no load balancing is used, then shared_memory could be used instead of global_memory in the backend
- If its possible to do, provide some solution that would not require restarts of opensips when 2 nodes are sharing an IP, and node A fails over to node B
- Consider instead of "global_memory", more interaction with a shared db for load balanced environments. The work that would be required to rewrite many of the modules and rearrange the core seems unnecissary when we have models that are sharing the same backend db in current load balanced scenarios. I suggest that we look closer at keeping more info in the db realtime, so that when load balanced/failover scenarios are required, the ALL the data required is readily available there.
- A combination of the above comment regarding global_memory versus shared db: Memcache is typically deemed to be a mechanism to improve performance of a lookup mechanism (db) that is otherwise inefficient. Please don't back anything with memcache, but instead, check memcache first, then a shared db then push to memcache. The idea is that memcache is an unreliable memory cache. If it were to clear, it would not be catastrophic. This is important for considering external memcache engines. Consider what the net effect would be if the memcache store was wiped out by mistake. This is a *likely* possibility during runtime. And even moreso with a distributed memcache backend (as exposure is added with multiple memcache servers). Please see: http://code.google.com/p/memcached/wiki/FAQ#How_is_memcached_redundant?
- Give better and up-to-date module documentation with example scripts
- Something like an updated sipwise.com wizard would be terribly helpful
9. Dynamic Multi-domain
- module usrloc allocates memory each time a new domain is added. But there is no way to reclaim this memory for example if we remove that domain from our subscriber table (at least, that's how it looks to me). So eventually the memory will be exhausted. To permit to truly dynamically control domains, usrloc should have at least a way to remove domains or not allocate memory for each domain at all.