Presence Interdomain Scaling Analysis for SIP/SIMPLEIBMScience Park
RehovotIsraelavshalom@il.ibm.comAOL LLC401 Ellis St.Mountain ViewCA94043USAaoki@aol.netMicrosoft CorporationOne Microsoft WayRedmondWA98052USASriram.Parameswar@microsoft.comMicrosoft CorporationOne Microsoft WayRedmondWA98052USAtimrang@microsoft.comColumbia UniversityDepartment of Computer Science450 Computer
Science BuildingNew YorkNY10027USvs2140@cs.columbia.eduhttp://www.cs.columbia.edu/~vs2140Columbia UniversityDepartment of Computer Science450 Computer
Science BuildingNew YorkNY10027US+1 212 939
7004hgs+ecrit@cs.columbia.eduhttp://www.cs.columbia.edu/~hgs Real Time
SIMPLE WGI-DInternet-
DraftSIMPLEproblem statementThe document analyzes the traffic that is generated due to presence
subscriptions between domains. It is shown that the amount of traffic can be
extremely big. In addition to the very large traffic the document also analyzes
the affects of a large presence system on the memory footprint and the CPU load.
Current approved and in work optimizations to the SIP protocol are analyzed
with the possible impact on the load. Separate documents contain the
requirements for optimizations and suggestions for new optimizations.The document analyzes the SIP protocol for presence (AKA SIMPLE but SIMPLE is not a different protocol then SIP but the name of the working group).
It analyses the
traffic that is generated due to presence subscriptions between domains.
It is shown that the number of messages and the amount of data can be
extremely big. In addition to the very large traffic the document also
analysis the affects of a large presence system on the memory footprint
and the CPU load. Current approved and in work optimizations to the SIP
protocol are analyzed with the possible impact on the load. Another
document provides requirements for optimizations
while other documents contain suggestions for new
optimizations: . This document is intended to be drive work on possible solutions that
will make the deployment of a SIP based presence server less challenging
task. Deployment of highly scalable presence systems is challenging by its
nature and each protocol developers design their own technique for
optimizing their protocol. This document does not try to compare
between protocols and it is behind the scope of this document. The document discusses the following areas. In each area we try to show the
complexity and the load that the presence server has to handle in order to
provide its service.Messages load - By computing the number of messages that are required for
connecting presence systems the document shows that the number of messages is
very big and it is quite obvious that some optimizations are needed. In addition
we also show that the bandwidth required is also very big.State management - Due to the nature of the service that the presence server
provides, the presence server has to manage a relatively big and complex state
and some computations are provided in the document.Processing complexities - The presence server maintains many small objects
and has to do frequent operations on these objects. We show that these
operations and especially the optimizations that are intended to save on the
amount of data that is being sent between watchers and presence servers, are not
so simple and may create a very heavy processing load on the presence
server.Groups - Resource List Servers optimize the number
of sessions that are created between the watchers and the presence server. On
the other hand, this optimization may create an exponential size of subscription
due to the unbearable ease of subscribing to large groups.The term presence domain or presence system appears in the document several
time. By this term we refer to a SIP based presence server that provides presence
subscription and notification services to its users. The system can be a system
that is deployed in a small enterprise or in a very large consumer network.Some optimizations are approved or are being defined for the SIP presence
protocol, but even with these optimizations a very large number of messages &
large bandwidth are needed in order to establish federation between presence
systems of large communities. Further thinking is needed in order to make large
deployment of presence systems less resource demanding.Note that even though this document talks about inter domain traffic, the
introduction of resource list servers (RLSs) introduce
very similar traffic pattern in intra-domain and in inter-domain. See detailed
discussion on resource lists in .The current optimizations that are approved or are approved as working group items
in the SIMPLE working group can be divided into two categories:Dialogs saving optimization - Here we refer to optimizations as the
resource list RFC or to the URI list subscriptions
draft . These documents
define ways to reduce the number of dialogs that are required between the
subscriber and the presence system.Note that dialog optimization or RLS usage as it is used in this
document refers to the usage of a URI that represents a list of a URI list
between domains and not within the same domain. An example is a user Alice in
domain example.org that subsides to URI of e.g. external-reps-list at
example.com or uses a URI list to subscribe at on her watch list in example.com.
Note also that when calculating the traffic that is due to RLS within a domain
the traffic between the RLS and the presence agents should also be taken into
account. However, since in this document we are mostly dealing with inter-
domain traffic, the traffic between the RLS and the presence agents was not
taken into account.Notification optimizations - Here we refer to the optimizations that are
suggested in the subnot-etags draft .
This draft suggests ways to suppress the sending of unnecessary notifies when
for example a subscription is refreshed. There are other drafts that reduce the
size of messages as partial notifications or filtering but in this document we
mostly care about the amount of messages & bandwidth so the partial optimizations
can help a bit in the bandwidth but will not help in the number of messages.In addition to the above optimizations another optimization could have been
considered but it is not taken into account in the computations in this document. This
optimization is the ability to have some of the presence information
received not by the SIP protocol but by offline means as downloading some
persistent presence information directly from a web site or by some other
offline means. The calculations here are based on the assumption that all
data is carried in-bound of the protocol and no optimizations that enable
getting the presence information via out bound means are taken into
account. These optimizations may improve the number of messages and number
of bytes significantly but they are out of scope for this documentIn the document several assumptions are used regarding size of messages, rate
of presence change and more. It should be noted that these assumptions are not
directly based on rigorous statistics that was done on actual SIP based deployments of
presence systems but more from some experience on other types of presence based systems.The following numbers are given more as examples from real deployments and they are not intended to be
completeIn a large consumer network we have seen the following patterns:Approximately 110 users in the watch list in average.There are approximately 12 billion status changes a day (139k/second) across
the network. Of these, when a proprietary binary protocol is used to convey the
status changes the average of the message is about 188 bytes. When SIP NOTIFY is
used the average is about 1228 bytes for the message.The average of logins/logouts in the system is about 2000 logins per second and about 4000 logouts per second.
When something happens - either a promotion, contest, or a network
hiccup that causes many users to login and logout simultaneously, there are about 20,000 logins per second.The peak of the instant messages sent is about 50,000 messages per second.In a deployment in enterprises we have seen the following patterns:Averages watch list size was 200 users.About half of the registered users were online at peak timeStatus change per hour was 2 changes per hour.The average logins/logouts in the system was about 5 logins per second with
additional 15 logins/logouts during start/end of day rush hours.Even though the assumptions in this document are not based on rigorous
statistical data the target here is not to analyze specific system but
show that even with VERY moderate assumptions (which are even less then
the observations mentioned above), the number of messages, the network
bandwidth, the required state management and the load on the CPU are very
high. Real life systems should have a much bigger scalability challenges.
for example the presence state change that we assumed (one presence state
change per hour) is maybe one of the most moderate assumptions that we
have taken. Experience from consumer networks show that the frequency here
is much bigger and especially with the younger generation that use more
presence attributes like mood etc.. In an environment where a user may
have several devices and other resources for presence information as
geographical location and calendar the frequency of presence state changes
will be much higher.It is very hard to measure presence load since it is very much
dependent on the behavior of users and behavior of users differs a lot.
Some users will have a very small number of presentities in their watch
list while others may have hundreds if not thousands. Some users will
change their state a lot and have many sources of presence information
while others may have very small number of changes during the day. In
addition the "rush hour" calculations of when the day starts and ends were
not included yet in this document. Rush hour differs between different
enterprises and is still different in the consumer presence systems. It is
very hard if not impossible to take into a static document all the possible
combinations.Throughout the calculations certain number of users are assumed for the
different models. It does not mean that in actual deployments all the
users of the domain actually subscribed to presence documents and/or
publish their presence document. Observing actual deployments shows that in
the consumer market the number of users that use presence services may be
10 percent or less of the registered users. In the enterprise market
numbers tend to be around 50 percent of the actual enterprise registered
users.The same is correct for the number for of watched presentities per
watcher. if only some percent of the domain users are online at a given
time then this number should have been that percentage. However, trying to
add this assumption to the calculations will make the calculations more
complex then they are since the affect of the watched presentities that
are not online will need to be taken into account. This means that empty
notify should be sent for those when the subscription is created and there
is no updates on them. In order to make the computations less complex
(they are complex enough as they are), the number of the watched
presentities that is used in the calculations is the number of the
federated presentities from the watcher list that are online.The basic SIP subscription dialog involves the following message-
transfer:SUBSCRIBE/200Initial NOTIFY/200(j) NOTIFY/200 where âjâ is the number of presence changes seen by the
watcher(k) SUBSCRIBE/200 where âkâ is the number of subscription dialog refresh
periodsSUBSCRIBE/200 with Expires = 0 to terminate the dialogNOTIFY/200 ending the dialogAn individual watcher will generate X number of SIP subscription
dialogs corresponding to the number of presentities it chooses to watch. The
amount of traffic generated is significantly affected by several factors:Number of watchers connected to the systemNumber of presentities connected to the systemFrequency of changes to presence informationThis document contains several calculations that show the expected
message rate and bandwidth between presence domains. The following sections explain the
assumptions and methods behind the calculations.The following are number of "constants" that we use in the calculations. Some of the
constants are used throughout the calculation while other change between use cases(C01) Subscription lifetime (hours)- The assumed lifetime of a subscription
in hours. We assume 8 hours for all calculations.(C02) Presence state changes / hour - The average time that a presentity
changes his/hers status in one hour. We assumed 3 times per hour for most
calculations. Note that for some users in consumer messaging systems, the actual
number of changes is likely to be much higher.(C03) Subscription refresh interval / hour - The duration of the SUBSCRIBE
session after which it needs to be refreshed. We assumed that the duration is
one hour.(C04) Total federated presentities per watcher - The number of presentities
that the watcher is watching. The number here changes in this document according
to the type of the specific deployment.(C05) Number of dialogs to maintain per watcher - The number of the SUBSCRIBE
dialogs that are maintained per watcher. if a dialog optimization is not assumed
this number is equal to A04, otherwise it is 1.(C06) Total number of watchers in the federated presence domains. The number
here is the number of all watchers in all the federated domains.(C07) SUBSCRIBE message size in bytes. We assume 450 bytes in all
calculations. The size is based on a typical SUBSCIRBE taken from RFCs.(C08) 200 OK for SUBSCRIBE message size in bytes. We assume 370 bytes in all
calculations. The size is based on a typical 200 OK taken from RFCs.(C09) NOTIFY message size not including the presence document. The size of this
message for a single presentity is assumed to be 500 bytes for the NOTIFY
message itself (based on sizes from examples in RFCs).(C10) 200 OK for NOTIFY message size in bytes. We assume 370 bytes in all
calculations. The size is based on a typical 200 OK taken from RFCs.(C11) Size of an average presence document. In the previous version of this
document we have used only the size of 3000 bytes for a presence document. This
number was calculated based on examples of rich presence document in RFCs. Due
to discussion in the SIMPLE list where it was claimed that it may be too big and
due to the fact that we are talking here about federation between communities
where the rich presence document may be of less use, we have done all the
calculations with two sizes of presence document. One size is the minimal size
of the PIDF document which was taken to be 350 bytes based on examples from RFCs and
the other size is the 3000 bytes for rich presence document . It should be noted
that assuming 3000 bytes for presence document is relatively modest if we take
into account multiple devices and location information.(C12) The size of NOTIFY when partial notification is being done.
We have taken this size to be 200 bytes. The size is much smaller then the
example that is given in but the example given there assumes
multiple changes in the presence document and here we assume a single
change.When dialog optimization is used, an RLMI document is being
sent and that document contains the presence documents for the users that are in
the watch list. In previous version of this document we have omitted the
overhead of the RLMI document. This "bug" was found by Victoria Beltran-Martinez
and is being fixed in this document by adding the constants C13, C14 and C15 to the calculations(C13) Item size per each contact in RLMI document, 160 bytes.(C14) The size of the multipart boundary in RLMI notifications, 144 bytes.(C15) The size of the XML root node in RLMI document (once per notification), 144 bytes.The following are the calculations for the messages in the initial phase of
the establishment of the subscriptions. The calculations contain both number of
messages and the number of bytes.(I01) Number of initial SUBSCRIBE messages per watcher = C05.(I02) Number of initial 200 OK messages for SUBSCRIBE messages per watcher = C05.(I03) Number of initial NOTIFY messages per watcher = C05.(I04) Number of initial 200 OK messages for NOTIFY messages per watcher = C05.(I05) Total number and bytes of initial SUBSCRIBE messages for all watchers =
Number - I01*C06, Bytes - I01*C06*C07.(I06) Total number and bytes of initial 200 OK for SUBSCRIBE messages for all
watchers = Number - I01*C06, Bytes - I01*C06*C08.(I07) Total number and bytes of initial NOTIFY messages for all watchers =
Number - I01*C06, The calculation for the number of bytes is different when dialog
optimization is used or not. When dialog optimization is not applied the number of
bytes will be calculated by: (I01*C06*C09)+(I01*C06*C11) and when dialog optimization is
applied the number of bytes will be calculated by
(I01*C06*(C09+C14+C15))+(I01*C06*C04*(C11+C13+C14)).(I08) Total number and bytes of initial 200 OK for NOTIFY messages for all
watchers = Number - I04*C06, Bytes - I04*C06*C10.(I09) Total number and bytes of initial messages per day = Number - numbers
in I05+I06+I07+I08, Size -sizes in I05+I06+I07+I08.Here we describe the calculations for the steady state messages. Steady state
is the time between the initial subscription and the tear down of the
subscription. It contains the notifies due to state change and the subscription refreshes.(S01) NOTIFY messages due to state change per watched presentity per day
(less 2 since the NOTIFY for initial and terminating state is calculated
in the initial and terminating calculations) = (C02*C01-2).(S02) 200 (for NOTIFY due to state change) messages per watched presentity
per day (less 2 since the NOTIFY for initial and terminating state is calculated
in the initial and terminating calculations) = (C02*C01-2).(S03) Total number and size of messages due to state change per day = Number
- (S01+S02)*C06*C04. The calculation for the number of bytes is different when
dialog optimization is used or not. When dialog optimization is not applied the
number of bytes will be calculated by: (C06*C04)*((S01*(C09+C11))+(S02*C10)) and
when dialog optimization is applied the number of bytes will be calculated by
(C06*C04)*((S01*(C09+C11+C13+C14+C15+C14))+(S02*C10)). This includes the the
multipart boundary of the resource list. Note that for dialog optimization it is
assumed that only a single presentity is changed and partial state notification
is used.(S04) Number of SUBSCRIBE messages for refreshes per watcher per day =
((C01/C03)-1)*C05. One is subtracted since the termination is calculated
separately. for example if there are 8 hours in the day and a refresh should
occur every hour, there are 7 refreshes during the day and not 8.(S05) Number of 200 OK messages for SUBSCRIBE messages for refreshes per watcher per day =
((C01/C03)-1)*C05.(S06) Number of NOTIFY messages for refreshes per watcher per day =
((C01/C03)-1)*C05. Since when NOTIFY optimization is used there is no
need to send NOTIFY for refreshes, S06 will be zero when NOTIFY optimizations is
used.(S07) Number of 200 OK messages for NOTIFY messages for refreshes per watcher
per day = ((C01/C03)-1)*C05. Since when NOTIFY optimization is used
there is no need to send NOTIFY for refreshes, S07 will be zero when NOTIFY
optimizations is used.(S08) Total number and size of messages due to SUBSCRIBE refreshes per day =
Number - (S04+S05+S06+S07)*C06. The number of bytes is calculated by adding the
SUBSCIRBE bytes (S04*C06*C07), the OK for SUBSCRIBE bytes (S05*C06*C08), the
NOTIFY bytes C06*(S06*(C09+C11)) and the OK for NOTIFY
(S07*C06*C10). Note that the formula for the notify bytes is for the dialog
optimization is not used and when it used the formula will be:
C06*(S06*((C09+C14+C15)+(C04*(C11+C13+C14)))). Note that a full state should be
given in SUBSCRIBE refreshes in resource lists. See section 5.2 in . The fact that the full state needs to be returned in a
NOTIFY response to refresh makes the NOTIFY optimization more efficient in
conjunction with the dialog optimization.(S09) Total number and bytes of steady messages per day = Number - numbers
in S03+S08, Bytes - sizes in S03+S08.The following are the calculations for the messages in the termination phase of
the of the subscriptions. The calculations contain both number of
messages and the number of bytes.(T01) Number of terminating SUBSCRIBE messages per watcher = C05.(T02) Number of terminating 200 OK messages for SUBSCRIBE messages per watcher = C05.(T03) Number of terminating NOTIFY messages per watcher = C05. Since when NOTIFY optimization is used
there is no need to send NOTIFY for terminations, T03 will be zero when NOTIFY
optimization is used.(T04) Number of terminating 200 OK messages for NOTIFY messages per watcher =
C05. Since when NOTIFY optimization is used
there is no need to send NOTIFY for terminations, T04 will be zero when
NOTIFY optimization is used.(T05) Total number and bytes of terminating SUBSCRIBE messages for all watchers = Number - T01*C06, Bytes - T01*C06*C07.(T06) Total number and bytes of terminating 200 OK for SUBSCRIBE messages for
all watchers = Number - T01*C06, Bytes - T01*C06*C08.(T07) Total number and bytes of terminating NOTIFY messages for all watchers
= Number - T01*C06, The number of bytes is calculated to be: (T03*C06*(C09+C11)
when dialog optimization is not used and:
(T03*C06*(C09+C14+C15))+(T03*C06*C04*(C11+C13+C14)) when dialog optimization is
used. Note that a full state should be given in SUBSCRIBE refreshes in resource
lists. See section 5.2 in .(T08) Total number and bytes of terminating 200 OK for NOTIFY messages for all
watchers = Number - T04*C06, Bytes - T04*C06*C10.(T09) Total number and bytes of terminating messages per day = Number - numbers
in T05+T06+T07+T08, Size -sizes in T05+T06+T07+T08.The following are the calculations of several totals that are based on the
above calculations.(B01) Total number of messages and bytes during the day = Messages - Number
of messages in I09+S09+T09, Bytes - Number of bytes in I09+S09+T09.(B02) Total number of messages and bytes per second = Messages - Number of
messages in B01/(C01*3600) Bytes - Number of bytes in B01/(C01*3600).(B02) Total number of message and bytes per user per day = Messages - number
of messages in B01/C06 Bytes - Number of bytes in B01/C06.With the way that the calculations are built, it is relatively easy to see the
affect of rush hours at the beginning and the end of the day. for the beginning
of the day we should look at the numbers of "(I09) Total number and bytes of
initial messages per day" and for the end of the day we should look at the
number of "(T09) Total number and bytes of terminating messages per day". Taking
these numbers with some assumed percentage of the numbers of users that log in
at the same hour should give good indication for the rush hour load.The following table uses some common presence characteristics to demonstrate
the effect these factors have on state and message rate within a presence domain
using base SIP protocols without any proposed optimizations. In this
example, there are two presence domains with total of 40,000 federating users
with an average of 4 contacts in the peer domain. Note that the main calculation
is done for a presence document size of 350 bytes which is the base PIDF
document size but the bottom line calculation is also given for a presence
document size for rich presence which is assumed to be 3000
bytes based on the examples given in the RFCs. This two folded calculation is
done for every use case in this document.The same analysis provided above is repeated here with the assumption
that the dialog optimization is applied. Note that while the sign-in (ramp
up) and sign-out messages flows are positively affected, the steady state
rates are not.The initial analysis of analysis provided in is
repeated here with the assumption that the notify optimization is applied. The
optimization saves the need for NOTIFY upon refreshing a SUBSCRIBE if there was
no change since the last NOTIFY. It is assumed here that there will be no NOTIFY
message for a SUBSCRIBE refreshes and terminations. As should be expected this
optimization affects the steady and termination state and does not affect the
initial state.Here both optimizations are combined. In all the subsequent use cases we will
show only the analysis with no optimizations and with both optimizations
combined.While scalability issues exist in any large deployment, certain
characteristics make the deployment conducive to the existing
optimizations, and others have characteristics that do not. Following
is a list of federation scenarios that have varying usage
characteristics. For each, a message rate and bandwidth table is
provided reflecting typical changes message rates. Those
characteristics can alter the overall effectiveness of existing
optimizations.Note that the number of users used is not the number of the users
in the domains but the actual logged in users. As was mentioned before
not all the domain users will use the presence service at the same
time. The number used for number of watchers and number of watched
presentities are for online users.In some environments presence federation may be very common, perhaps
even more common than intra-domain presence. An example of this type of
environment is a small ISV or public server. Users in that small ISV
are not likely to subscribe to the presence of other users in the their
server since they do not necessarily have any relationship with each
other aside from receiving service from the same provider. They are
much more likely to be subscribed to the presence of users in one of the
federated domains (whether in consumer domains, academic, other ISVs,
etc). Common characteristics of this deployment are:Federated subscriptions are the majority of subscription trafficIndividual users are likely to subscribe to multiple users in any one
domainThe intersection of users in the deployment watching the same
presentities is quite small (i.e., probability that watchers in the
domain subscribe to the same presentity is low)To account for the extraordinarily high percentage of federation
traffic, the number of federated presentities is increased to 20. The
number of watchers in the domain could also be adjusted to account for
an expected larger community of users being peered with, it is omitted
here for simplificationThe first table below provides the calculations without optimizations
the second table provides the calculations with optimization.In this type of environment, the domain is a collection of associated
users such as an enterprise. Here, federation is once again very
common. However, there is also a strong association between some users
in the deployment. These associations make it somewhat more likely that
users in that domain will be watchers of the same presentity. This can
occur because of business relationships (e.g. two co-workers on a project
federating with a partner company).Common characteristics of this deployment are:Federated subscriptions are large minority or small majority of
subscription trafficIndividual users are likely to subscribe to multiple users in any one
domain, especially their ownThe intersection of users in the deployment watching the same
presentities increasesThis federation type has traffic rates similar to the previous examples
but with different levels of association of the users. In this environment, two or more very large networks create a peering
relationship allowing their users to subscribe to presence in the other
domains. Where as the number of users in other deployment types ranges
from hundreds to several hundred thousand, these large networks host up
to hundreds of millions of users. Examples of these networks are large
wireless carriers and consumer IM networks.Common characteristics of this deployment are:As users become accustomed to network
boundaries disappearing, federated subscriptions become as common as
subscriptions within the same domainIndividual users are highly
likely to want to see presence of multiple presentities in the peer
networkThe intersection of users in the deployment watching the
same presentities is very high (i.e., two or more users in network A are
extremely likely to be watching a same user in network B)Status
changes increase greatly due to typical observed consumer behaviorThe first table below provides the calculations without optimizations
the second table provides the calculations with optimizations. Even
though the optimizations help a lot (almost cut the number of messages by
half), the numbers are still very high. Note also that the bandwidth required is very high.Within a particular domain, multiple presence infrastructures are
deployed with users split between the two. This scenario is unique in
that federated messages do not pass outside the administrative domain's
network. The two infrastructures peer directly inside the domain. A
common example of this is an enterprise IT system with multiple
independent vendor presence solutions deployed (e.g., a presence solution
for desktop messaging deployed alongside a presence solution for IP
telephony).Common characteristics of this deployment areThe difference between subscriptions to presentities in one system vs.
the other are completely arbitrary. Any one presentity is as likely to
be homed on one infrastructure as the other.Active users are almost guaranteed of subscribing to many users in the
peer infrastructure.The level of intersection of presentities is extremely high.The first table below provides the calculations without optimizations
the second table provides the calculations with optimization. Even
though the relatively conservative numbers are used, the amount of
messages is still very high even though optimization may cut the
traffic by more then half Draft define a way for the
watcher to request getting only what was changed in the presence document. The
following is a calculation of the bandwidth that is saved in the very large
peering network case, when we add the partial notification optimization to the
dialog and NOTIFY optimization. It is assumed that except for the initial NOTIFY
all the other NOTIFY messages will be partial. It is also assumed that only a single attribute in the presence
document will be changed each time, thus the size of the partial presence document is assumed to be 200 bytes.SIP is network agnostic protocol, therefore, the protocol carries
additional messages like 200 OK that would have been redundant in a
protocol that is TCP based only.The following calculation assumes an imaginary TCP only based version of SIP that optimizes the following:There is no 200 OK for each message. Since only TCP has to be supported, there is not need to compensate for network
issues.There is no refresh for subscriptions.There is no NOTIFY upon termination of SUBSCRIPTIONThe size of each message is smaller since there is no need for the various headers that SIP uses for routing
etc. So we need to assume smaller message sizes while we will keep the size of
the presence document the same.As notes above the calculations in this document do not assume offline
means of getting parts of the presence information. Therefore, in addition
to the above optimizations, the other optimizations that were assumed in
the document will be assumed here also. These includes partial
notifications and the dialog optimizations. The NOTIFY optimization is not
relevant here since there are no refreshes of subscriptions.The following is a calculation for the very large networks peering scenario
assuming the imaginary TCP only SIP. It is very interesting to note that
the dialog optimization does not reduce the number of bytes when partial
notification optimization is applied (on the contrary) due to the RLMI overhead.In previous sections we have discussed the big amount of messages that need to
be sent to/from a presence server In this section the state that needs to be
maintained by a presence server will be analyzed and shown to be far from
trivial.The presence server has two parallel tasks.Maintain the state of the presentities to which watchers subscribe.Maintain the state of the subscriptions of watchers and provide timely
updates to the watchers.For a single subscription from a single watcher on a presentity, the presence
server has to maintain the following state:Subscription state including all the parameters that are needed in order to
maintain the subscription as timers.Optional filtering information that was requested by the watcher. This
includes enough information that is needed for doing the filtering. In addition
additional information has to be maintained if partial notification is being
supported for the subscriptionOptional rate management information as throttlingWatcher information , that
is the result of the subscription in order to enable watched presentities to see
who is watching them.For each presentity that has been subscribed to in the presence server, the presence server has to maintain the
following state:A list of the subscriptions for the presentity. Note that this is already taken care of from the size calculation
point of view by the subscription state above.Authorization information for the presentity.For each presentity for which there was any publication and the presentity has a state other then a default value,
the presence server has to maintain the current value of the presentity.Lets assume the following sizes:Subscription size - 2K bytes. This includes watcher information that
need to be created by the presence server for each subscription. This is
for each subscription that is done by each watcher to each presentity that
the watcher is watching. So if we have 10K watchers we should have 10K of
these.Subscribed to resource - 1K bytes (for privacy information and other
management info). This is for each presentity that is being watched. No
matter how many watchers are watching it. The subscriptions themselves are
already calculated in the previous bullet.Resource with a state - 6K bytes. This is a moderate assumption if we
take into account the amount of data that is being put in a presence
document as multiple devices, calendar and geographical information. This
is for each presentity that has state other then the default empty state.
It does not matter if it is being watched or not.10K subscriptions = 19M bytes.5K subscribed to presentities = 5M bytes.10K presentities with state = 58M bytes.Total is 82M bytes.100K subscriptions = 195M bytes.50K subscribed to presentities = 49M bytes.100K presentities with state = 586M bytes.Total is 830M bytes.6M subscriptions = 11,718M bytes.3M subscribed to presentities = 2,929M bytes.4M presentities with state = 23437M bytes.Total is 38G bytes.150M subscriptions = 292,969M bytes.75M subscribed to presentities = 73,242M bytes.100M presentities with state = 585,937M bytes.Total is 952G bytes which is a very big number for a very dynamic storage as needed by the presence server.Although the numbers above may seem moderate enough for the sizes that the presence server is
handling we should consider the following:Dynamic state - Although the state may seem not so big for databases even for
the very large system, we need to remember that this state is a very dynamic
state. Subscriptions come and go all the time, the status of presentities is being
updated and so forth. This means that the presence server has to manage its
state in a medium that is very dynamic and for such large sizes this task is not
trivial.Interlinked state - The subscriptions and the subscribed to presentities are
dependent on each other. There needs to be a link from the presentity to the
subscriptions and vice versa. See about the
interlinkage that is created due to resource lists.Moderate assumptions - The size assumptions that were made above are quite
moderate. As presence is becoming more a core middleware functionality that holds
a lot of data on the user. In real-life the numbers above may be even higher and
the presence server can have additional overhead as managing the SIP sessions,
networking and more.Although the calculations above do not show that there is a real issue with
state management of presence in medium systems or even in big systems since it
should be possible to divide the state between different machines, the state
size is still very big. A bigger issue with the state is more when resource lists are involved
and create an interlinked state between many servers. In that case the division of very big state to
multiple servers becomes less trivial...The basic presence paradigm consists from a watcher and a presentity to which the
watcher watches. It sounds simple enough but there are many additions and extensions
that the presence server has to manage that make the processing of the presence server very complex.In this section we show that in addition to the large amount of messages and the big state
that the presence server has to handle, it has also to handle quite intensive processing for aggregation,
partial notify and publish, filtering and privacy. This adds another complexity to the presence server in the CPU front in
addition to the network and memory fronts that were described before.A presence document may contain multiple resources. These resources can be devices of
the presentity, information that is received form external providers of presence information for the presentity as geographical
and calendar information and more.The presence server needs to be able to get the updates from all the resources and
aggregate them correctly into a single presence document. Although this is just "XML processing" task,
the amount of updates that the presence server may get, the need to keep the presence document
aligned with its schema and the need to notify the users as soon as possible create a significant
processing burden on the presence serverDrafts ,
define a way for the watcher to request
getting only what was changed in the presence document and for the publisher of
presence information to publish only what was changed in the presence document
since the last publish. Although these optimizations help in reducing the amount
of the data that is sent from/to the presence server, these optimizations
create additional processing burden on the presence server.When a partial publish is arriving to the presence server, the presence
server has to be able to process the partial publish, change only what is
indicated in the partial publish while keeping the presence document in a well
formed shape according to the schema.In partial notify the processing is even more complex since each
watcher needs to get the partial update based on the last update that was
received by that watcher. Therefore
specifies a versioning mechanism that enables the watcher to get the
updates based on the previous state that it has seen. This versioning
mechanism has to be maintained by the presence server for each watcher that is
subscribed to a presentity and requires partial notify.Filtering as defined in RFCs , enables a watcher to request to be notified only when the
presence document fulfills certain conditions. Although this is a very
convenient feature for watchers, the burden that is put on the presence server
is quite big. For each change in the presence document, the presence server
needs to compute the filtering expressions which can be very complex, decide
whether and what to send to the watcher that have requested filtering.Draft defines presence authorization rules
that can be used by presentities to define who can see what from their presence documents.
The processing that the presence server has to do here is very similar to filtering. When there
is a change to any presence document that has privacy defined for it, the presence server needs to
create different notification for different watchers according to what is defined in the authorization rules.RFC defines a way to subscribe on a single URI while
that URI is actually a list of resources that are being subscribed to by a
single subscription. Although this is quite useful mechanism and it
significantly saves on the number of sessions between the watcher and the
presence server (as we show in the calculations of messages), this feature has
the potential to make the scalability issue of presence systems harder and more
complex.The reasons that resource lists may make the scalability problem of the
presence server even more complex are:Subscriptions and state - The resource list may contain reference to many
other presence servers in many other domains. This requires the RLS to create
subscriptions to other presence servers and buffer the state of all presentities
in order to be able to provide the full state of the presentities in the list
when needed. So in the overall system, the subscriptions that were saved between
the watcher and the presence server are moved to the backend system while state
has been duplicated between the various presence servers that serve the various
presentities and the RLSs. This issue could have been mitigated if there was a
way for the RLS to retrieve the presence information for many watchers while
adhering to privacy when sending the actual notifications to the watchers.Interlinkage - The resource list subscription will reach one RLS that will
open it and send it to many presence servers and to other RLSs (if there is a
subgroup inside the list). This way a complex linkage between the state of many
components is created. This linkage makes state management and other
maintenance of a presence systems quite complex.Big lists are easy - There are two types of groups that may be used with this
feature, private groups that are defined by/for each watcher and public groups
that are defined in the system and can be used by any watcher. Although we should
expect IT administrators to take caution when creating public groups, this may
be not the case in real life. The connection between the size of the public
group and the load on the presence server system may not apparent to everyone.
Furthermore many public groups that are used in presence systems may have been
created for other purposes as email systems (where the size of the lists was not
so important) and are taken as they are to presence systems. So for example we
may very easily find that a public group that actually covers all the users in
the enterprise are used by many users in the enterprise thus creating unbearable
load on the presence server. Note that this issue is not a protocol or design
issue but more a usage issue that may have a real impact on the presence
system.Stopping notifications - A watcher may accidentally subscribe to a very big list
and be overwhelmed by the amount of notifies that it receives from the presence
server. There is no current way to stop this stream of notifies and even
canceling the subscription may take time until being affective.The issues mentioned above are one example of an optimization that helps in
one part of the system but creates even bigger problems in the overall system.
There is a need to think about the problems listed above but more then that
there is a need to make sure that when an optimization is introduced it does not
create issues in other places.This section lists and discusses several optimizations that are either
already part of the SIP protocol or they have been suggested in various drafts.
Several other optimizations that have been suggested but have not been discussed
in any working group yet are summarized in
and in . Note that
trials with batched notifies optimization that is describes in ,
showed an improvement
of 117% in the whole throughput of presence traffic.Subnot-etags - Draft . This draft
suggests ways to suppress the sending of unnecessary notifies when for example a
subscription is refreshed. This suggestion seems to be an efficient
optimization since it saves both the number of messages sent and on the
processing time of the presence server.Resource List Service - enable creating a single subscription
session between the watcher and the presence server for subscribing on a list of users.
This saves the amount of sessions that are created between watchers and presence servers.
On the other hand, this mechanism enables creating very large amount of subscriptions in the
presence server/RLS system thus enabling the creation of a very large number of subscriptions
between presence servers and RLSs with relatively few clients especially if large public groups
are used. It seems that in order to really optimize in this area, the usage of large public
groups should not be considered as BCP and there should be a way for an RLS to create a single
subscription for multiple occurrences of the same resource in resource lists. See consolidates
subscriptions below.Partial notify/publish - Drafts ,
define a way for the
subscriber to request getting only what was changed in the presence document and
for the publisher of presence information to publish only what was changed in
the presence document since the last publish. Although these optimizations help
in reducing the amount of actual data that is sent from/to the presence server,
these optimizations create additional processing burden on the presence
server as was discussed above.Filtering as defined in RFCs , enables a watcher to request to be notified only when the
presence document fulfills certain conditions. Although this optimization
enables saving on the amount of messages that are sent from the presence server to the watcher, this
optimization puts more burden on the processing time of the presence server as
was discussed above.Throttling defines a
mechanism in which a watcher requires to be updated only in certain intervals.
Although this mechanism may give some extra load on the processing time of the
presence server, that load is negligible and the reduction on the amount of
messages sent from the presence server to the watchers is significant. This
optimization is even more important with resource lists where there can be many
resources in the resource lists and if the traffic of updates on resource list
is not regulated, the watcher may get very large amount of notifications.Presence specific sigcomp dictionary
defines a SIGCOMP dictionary for
presence. This optimization will enable to reduce the number of bytes that are
transferred in presence systems by compressing the textual SIP messages and using
the specialized presence dictionary the compression may be more significant then
just using SIGCOMP as is. Note that number of actual messages will remain the
same and a calculation of the amount of bytes that will be saved may be
useful here.Content Indirection enables sending only the URI of the
presence document to the watcher thus offloading the presence server from sending the
presence document to the watcher. This optimization may be useful in some cases especially where there is
a big number of users that get the same presence document.Following is a summary of the various calculations. This is repeated here in
order to ease the understanding of the conclusions that are listed below.The following table summarizes the various constants that are used in ALL calculations.The following table summarizes the results of various optimization factors for the basic use case.The following table summarizes the results of various optimization factors
for the widely distributed inter domain use case.The following table summarizes the results of various optimization factors
for the intra-domain peering use case.The following table summarizes the results of various optimization factors for the very large scale peering networks use case.The following conclusions can be drawn from the above numbers:Due to the overhead of RLMI, the dialog optimization does not help in reducing the number of
bytes nor in the number of the messages. It seems to be more important from the
point of view of the convenience of the user since it enables the user to manage
his/hers watch list on e.g. a web page.The notify optimization optimizes both the number of messages and the number
of bytes.Partial notification saves a lot in the number of bytes especially when the
presence document is a rich presence document which is relatively big.Comparing to very optimized SIP protocol (imaginary TCP only SIP) shows that the
number of messages is less by about a half. The number of bytes is also reduced
by about a half.When looking at the numbers from the perspective of the number of bytes that
a user "consumes" per day the numbers may not look so big. Nevertheless, we
should remember that the overall affect on the network may be quite big since
the network will have to convey dozens of Giga bytes per day for the modest
use cases that are described in this document for presence traffic only. Recalling
that presence is only an enabler for other media these numbers are not so easy
to handle.The document analyzes the scalability of presence systems and of the SIP based
in particular. It is apparent that the scalability of these systems is far from
being trivial from several perspectives: number of messages, network bandwidth,
state management and CPU load.As part of the analysis we have analyzed several optimizations and showed the
effect of these optimizations on the number of messages and the number of bytes
that are sent between the federating domains.We have also computed the number of messages and bytes for a very large scale
peering network while assuming a protocol that has much less overhead then SIP.
Even in that protocol we got relatively high numbers.It is very possible that the issues that are described in this document are
inherent to presence systems in general and not specific to the SIMPLE protocol.
Organizations need to be prepared to invest a lot in network and hardware in
order to create real big systems. However, it is apparent that not all the
possible optimizations were done yet and further work is needed in the IETF in
order to provide better scalability Nevertheless, we should remember that SIP was originally designed for end to
end session creation and number and size of messages are of secondary importance
for end to end session negotiation. For large scale and especially for very
large scale presence the number of messages that are needed and the size of each
message are of extreme importance. It seems that we need to think about the
problem in a different way. We need to think about scalability as part of the
protocol design. The IETF tends not to think about actual deployments when
designing a protocol but in this case it seems that if we do not think about
scalability with the protocol design it will be very hard to scale.We should also consider whether using the same protocol between clients and
servers and between servers is a good choice with this problem? It may be that
in interdomain or even between servers in the same domain (as between RLSs and
presence servers) there is a need to have a different protocol that will be very
optimized for the load and can assume some assumptions about the network (e.g.
do not use unreliable protocol as UDP but only TCP).When servers is connecting to another server using current protocol, there
will be an extreme number of redundant messages due to the overhead of
supporting UDP and to the need to send multiple presence documents for the same
watched user due to privacy issue. A server to server protocol will have to
address these issues. Some initial work to address these issues can be found in:
, and Another issue that is more concerning protocol design is whether NOTIFY
messages should not be considered as media as audio, video and even text
messaging are considered? The SUBSCRIBE can be extended to do similar three way
handshake as INVITE and negotiate where the notify messages should go, rate and
other parameters. This way the load can be offloaded to a specialized NOTIFY
"relays" thus not loading the control path of SIP. One of the possible ideas
(Marc Willekens) is to use the SIP stack for the client/server NOTIFY but make
use of a more optimized and controllable protocol for the server-to-server
interface. Another possibility is to use the MSRP ,
protocol for the notifies.This document discusses scalability issues with the existing SIP/SIMPLE
presence protocol and model. Therefore, there are no security considerations
to be considered for this document. However, a lot of the possible
optimizations that should emerge as a result of this document will have
security implications that will need to be solved.This document has no actions for IANA.Fixed mistakes in calculations that were found by Victoria Beltran-Martinez,
both relate to dialog optimizations. One mistake was not including the multipart
boundary of the resource list itself in S03 when dialog optimizations were used.
The other one was assuming in T07 that only a single presentity is returned in
termination in T07 calculation.Fixed nits that were referred to me by Robert SparksFixed mistake in the formula of I07 and S08 (RLMI was not included).
Affect on total number of bytes was very small.Fixed mistake in the text of the calculation of number of bytes for S08
for non dialog optimization. No actual change in number of bytes since the
excel file calculations were done correctly.Removed general references throughout the text to "other protocols".
This was done in order to avoid the impression that the document tries
to compare SIP protocol with any other presence base protocol.Several other editorial and clarification changesAdded some input from real life deployments and input on a test with batched notifies.Added Calculations of messages and bytes per user.Calculations are now done both for minimal size of presence document and for an average size of rich presence document.Comparison with other protocol is now done using small, tiny and rich presence document sizes.Removed dialog optimization with partial notification since it is not relevantFixed a few issues in calculations that were found by Victoria Beltran-Martinez.Added overhead for RLMI for dialog optimizations (list subscription). This
calculation fix actually shows that dialog optimization is not a real
optimization from the point of view of bytes and number of messages.When NOTIFY optimizations are applied no need for final NOTIFYThe usage of RLS between domains was clarified.Significantly enhanced the conclusions sectionSeveral typo fixesFixed a bug in the calculations. Thanks to Marc Willekens for finding the bug.Clarifications and corrections of the computation model and the computations.Added several more computations to show the influence of different optimizations.The requirements were moved to The new suggestions for optimizations were moved to We would like to thank Jonathan Rosenberg, Ben Campbell, Robert Sparks, Markus Isomaki
Piotr Boni, David Viamonte, Aki Niemi and Peter-Saint Andre for ideas and input. Special thanks
to Marc Willekens and Victoria Beltran-Martinez for finding several issues in the
calculations.