Session Initiation Protocol is used widely for the setup, teardown and management of VOIP calls. Much of its functionality is related to the setup of calls, as its name implies. Part of this setup involves the delivery of the caller's identity so that the called party can decide how to treat the call -- what is, essentially, Internet caller ID.
The basic mechanism for caller ID in the core SIP specification (RFC 3261) works much as it does in e-mail. The caller information has a From header field, including the address. That mechanism worked well enough in an Internet that was largely free of malicious users, but it quickly became clear that the technique could be abused, as it has been in e-mail. It is possible to spoof "From" VOIP headers and hide the sender's true identity.
These problems were remedied by a specification known as P-Asserted-ID (RFC 3325), published in November 2002 by the IETF. With P-Asserted-ID, a single network or a small federation of networks can provide network-verified caller ID services.
P-Asserted-ID was a big step forward, and it has seen widespread use with SIP networks. However, even at the time of publication it was known to be a stopgap solution. The primary problem is that it works only for single provider networks or with small federations of tightly coupled providers enjoying strong mutual trust. To date, this is exactly the kind of VOIP network that has been deployed. Most VOIP networks don't connect with each other over IP and instead rely on the public switched telephone network.
However, it is becoming apparent to many providers that IP is a better form of network interconnection. IP can cost less; enable voice, video and multimedia; provide high-value services such as presence and instant messaging; and enable high-quality wideband speech.
P-Asserted-ID falls apart in larger IP interconnected environments because its assertions of identity are not cryptographic. There is no way to securely verify that the domain of the caller is the one that asserted the identity present in the message. Thus, in a large interconnected group of networks, the value of P-Asserted-ID is equal to the trustworthiness of the least trustworthy network in the group.
Fortunately, specifications have just been completed for a technique known as SIP Identity. These specifications (RFC 4474) were published in August 2006 and provide a giant leap forward in terms of secure caller ID.
The basic mechanism is shown in the above graphic. The caller, Joe, has a SIP uniform resource indicator of sip:email@example.com, which Joe's phone places into the From header field of its SIP messages. When Joe makes a call, Joe's phone emits a SIP INVITE (step 1) and sends this to the server for example.com. This server challenges the message, asking Joe's phone to provide credentials (step 2). Joe's phone obliges, retrying the INVITE with appropriate credentials (step 3).
These credentials verify that the caller is indeed Joe and that the From field is accurate. The example.com server applies a cryptographic signature over portions of the message and includes that signature, along with an HTTP URL for getting its certificate, into the SIP message (step 4). The called party retrieves this certificate (step 5) and checks the signature. If it is validated, it provides strong assurance that the caller really is in the domain example.com.
SIP Identity is also the cornerstone of many of the techniques that can be applied to prevent VOIP spam, also known as spam over Internet telephony, or SPIT. Because of its importance for interconnections and for blocking spam, SIP Identity will play an increasingly important role in future VOIP networks.
Rosenberg is a Cisco Fellow with Cisco. He can be reached at firstname.lastname@example.org.