rfc6189.txt   rfc6189bis.txt 
Internet Engineering Task Force (IETF) P. Zimmermann Internet Engineering Task Force (IETF) P. Zimmermann
Request for Comments: 6189 Zfone Project Request for Comments: 6189bis Zfone Project
Category: Informational A. Johnston, Ed. Category: Informational A. Johnston, Ed.
ISSN: 2070-1721 Avaya ISSN: 2070-1721 Avaya
J. Callas J. Callas
Apple, Inc. Apple, Inc.
April 2011 November 2012
ZRTP: Media Path Key Agreement for Unicast Secure RTP ZRTP: Media Path Key Agreement for Unicast Secure RTP
Abstract Abstract
This document defines ZRTP, a protocol for media path Diffie-Hellman This document defines ZRTP, a protocol for media path Diffie-Hellman
exchange to agree on a session key and parameters for establishing exchange to agree on a session key and parameters for establishing
unicast Secure Real-time Transport Protocol (SRTP) sessions for Voice unicast Secure Real-time Transport Protocol (SRTP) sessions for Voice
over IP (VoIP) applications. The ZRTP protocol is media path keying over IP (VoIP) applications. The ZRTP protocol is media path keying
because it is multiplexed on the same port as RTP and does not because it is multiplexed on the same port as RTP and does not
skipping to change at page 1, line 47 skipping to change at page 1, line 47
This document is a product of the Internet Engineering Task Force This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has (IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Not all documents Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are a candidate for any level of Internet approved by the IESG are a candidate for any level of Internet
Standard; see Section 2 of RFC 5741. Standard; see Section 2 of RFC 5741.
Information about the current status of this document, any errata, Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc6189. http://www.rfc-editor.org/info/rfc6189bis.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 3, line 8 skipping to change at page 3, line 8
4.4.3. Multistream Mode . . . . . . . . . . . . . . . . . . 28 4.4.3. Multistream Mode . . . . . . . . . . . . . . . . . . 28
4.4.3.1. Commitment in Multistream Mode . . . . . . . . . 29 4.4.3.1. Commitment in Multistream Mode . . . . . . . . . 29
4.4.3.2. Shared Secret Calculation for Multistream Mode . 29 4.4.3.2. Shared Secret Calculation for Multistream Mode . 29
4.5. Key Derivations . . . . . . . . . . . . . . . . . . . . . 30 4.5. Key Derivations . . . . . . . . . . . . . . . . . . . . . 30
4.5.1. The ZRTP Key Derivation Function . . . . . . . . . . 31 4.5.1. The ZRTP Key Derivation Function . . . . . . . . . . 31
4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared
Modes . . . . . . . . . . . . . . . . . . . . . . . . 32 Modes . . . . . . . . . . . . . . . . . . . . . . . . 32
4.5.3. Deriving the Rest of the Keys from s0 . . . . . . . . 33 4.5.3. Deriving the Rest of the Keys from s0 . . . . . . . . 33
4.6. Confirmation . . . . . . . . . . . . . . . . . . . . . . 35 4.6. Confirmation . . . . . . . . . . . . . . . . . . . . . . 35
4.6.1. Updating the Cache of Shared Secrets . . . . . . . . 35 4.6.1. Updating the Cache of Shared Secrets . . . . . . . . 35
4.6.1.1. Cache Update Following a Cache Mismatch . . . . . 36 4.6.1.1. Cache Update Following a Cache Mismatch . . . . . 37
4.7. Termination . . . . . . . . . . . . . . . . . . . . . . . 37 4.6.1.2. Cache Update for a PBX Following a Cache
4.7.1. Termination via Error Message . . . . . . . . . . . . 37 Mismatch . . . . . . . . . . . . . . . . . . . . 38
4.7.2. Termination via GoClear Message . . . . . . . . . . . 37 4.7. Termination . . . . . . . . . . . . . . . . . . . . . . . 38
4.7.2.1. Key Destruction for GoClear Message . . . . . . . 39 4.7.1. Termination via Error Message . . . . . . . . . . . . 39
4.7.3. Key Destruction at Termination . . . . . . . . . . . 40 4.7.2. Termination via GoClear Message . . . . . . . . . . . 39
4.8. Random Number Generation . . . . . . . . . . . . . . . . 40 4.7.2.1. Key Destruction for GoClear Message . . . . . . . 40
4.9. ZID and Cache Operation . . . . . . . . . . . . . . . . . 40 4.7.3. Key Destruction at Termination . . . . . . . . . . . 41
4.9.1. Cacheless Implementations . . . . . . . . . . . . . . 42 4.8. Random Number Generation . . . . . . . . . . . . . . . . 41
5. ZRTP Messages . . . . . . . . . . . . . . . . . . . . . . . . 42 4.9. ZID and Cache Operation . . . . . . . . . . . . . . . . . 42
5.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . 44 4.9.1. Cacheless Implementations . . . . . . . . . . . . . . 43
5.1.1. Message Type Block . . . . . . . . . . . . . . . . . 44 5. ZRTP Messages . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 45 5.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . 45
5.1.2.1. Negotiated Hash and MAC Algorithm . . . . . . . . 46 5.1.1. Message Type Block . . . . . . . . . . . . . . . . . 46
5.1.2.2. Implicit Hash and MAC Algorithm . . . . . . . . . 47 5.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 47
5.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 47 5.1.2.1. Negotiated Hash and MAC Algorithm . . . . . . . . 48
5.1.4. Auth Tag Type Block . . . . . . . . . . . . . . . . . 48 5.1.2.2. Implicit Hash and MAC Algorithm . . . . . . . . . 49
5.1.5. Key Agreement Type Block . . . . . . . . . . . . . . 49 5.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 49
5.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . 51 5.1.4. Auth Tag Type Block . . . . . . . . . . . . . . . . . 50
5.1.7. Signature Type Block . . . . . . . . . . . . . . . . 52 5.1.5. Key Agreement Type Block . . . . . . . . . . . . . . 51
5.2. Hello Message . . . . . . . . . . . . . . . . . . . . . . 53 5.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . 53
5.3. HelloACK Message . . . . . . . . . . . . . . . . . . . . 55 5.1.7. Signature Type Block . . . . . . . . . . . . . . . . 54
5.4. Commit Message . . . . . . . . . . . . . . . . . . . . . 56 5.2. Hello Message . . . . . . . . . . . . . . . . . . . . . . 55
5.5. DHPart1 Message . . . . . . . . . . . . . . . . . . . . . 59 5.3. HelloACK Message . . . . . . . . . . . . . . . . . . . . 57
5.6. DHPart2 Message . . . . . . . . . . . . . . . . . . . . . 61 5.4. Commit Message . . . . . . . . . . . . . . . . . . . . . 58
5.7. Confirm1 and Confirm2 Messages . . . . . . . . . . . . . 63 5.5. DHPart1 Message . . . . . . . . . . . . . . . . . . . . . 61
5.8. Conf2ACK Message . . . . . . . . . . . . . . . . . . . . 65 5.6. DHPart2 Message . . . . . . . . . . . . . . . . . . . . . 63
5.9. Error Message . . . . . . . . . . . . . . . . . . . . . . 66 5.7. Confirm1 and Confirm2 Messages . . . . . . . . . . . . . 65
5.10. ErrorACK Message . . . . . . . . . . . . . . . . . . . . 68 5.8. Conf2ACK Message . . . . . . . . . . . . . . . . . . . . 67
5.11. GoClear Message . . . . . . . . . . . . . . . . . . . . . 68 5.9. Error Message . . . . . . . . . . . . . . . . . . . . . . 68
5.12. ClearACK Message . . . . . . . . . . . . . . . . . . . . 68 5.10. ErrorACK Message . . . . . . . . . . . . . . . . . . . . 70
5.13. SASrelay Message . . . . . . . . . . . . . . . . . . . . 69 5.11. GoClear Message . . . . . . . . . . . . . . . . . . . . . 70
5.14. RelayACK Message . . . . . . . . . . . . . . . . . . . . 71 5.12. ClearACK Message . . . . . . . . . . . . . . . . . . . . 70
5.15. Ping Message . . . . . . . . . . . . . . . . . . . . . . 72 5.13. SASrelay Message . . . . . . . . . . . . . . . . . . . . 71
5.16. PingACK Message . . . . . . . . . . . . . . . . . . . . . 73 5.14. RelayACK Message . . . . . . . . . . . . . . . . . . . . 73
6. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 74 5.15. Ping Message . . . . . . . . . . . . . . . . . . . . . . 74
7. Short Authentication String . . . . . . . . . . . . . . . . . 77 5.15.1. Rationale for Ping messages . . . . . . . . . . . . . 75
7.1. SAS Verified Flag . . . . . . . . . . . . . . . . . . . . 78 5.16. PingACK Message . . . . . . . . . . . . . . . . . . . . . 75
7.2. Signing the SAS . . . . . . . . . . . . . . . . . . . . . 79 6. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 77
7.2.1. OpenPGP Signatures . . . . . . . . . . . . . . . . . 81 7. Short Authentication String . . . . . . . . . . . . . . . . . 80
7.2.2. ECDSA Signatures with X.509v3 Certs . . . . . . . . . 82 7.1. SAS Verified Flag . . . . . . . . . . . . . . . . . . . . 80
7.2.3. Signing the SAS without a PKI . . . . . . . . . . . . 83 7.2. Signing the SAS . . . . . . . . . . . . . . . . . . . . . 82
7.3. Relaying the SAS through a PBX . . . . . . . . . . . . . 84 7.2.1. OpenPGP Signatures . . . . . . . . . . . . . . . . . 84
7.3.1. PBX Enrollment and the PBX Enrollment Flag . . . . . 87 7.2.2. ECDSA Signatures with X.509v3 Certs . . . . . . . . . 85
7.2.3. Signing the SAS without a PKI . . . . . . . . . . . . 86
8. Signaling Interactions . . . . . . . . . . . . . . . . . . . 88 7.3. Relaying the SAS through a PBX . . . . . . . . . . . . . 87
7.3.1. PBX Enrollment and the PBX Enrollment Flag . . . . . 90
7.4. Automated Methods of Authenticating the DH Exchange . . . 92
8. Signaling Interactions . . . . . . . . . . . . . . . . . . . 93
8.1. Binding the Media Stream to the Signaling Layer via 8.1. Binding the Media Stream to the Signaling Layer via
the Hello Hash . . . . . . . . . . . . . . . . . . . . . 90 the Hello Hash . . . . . . . . . . . . . . . . . . . . . 95
8.1.1. Integrity-Protected Signaling Enables 8.1.1. Integrity-Protected Signaling Enables
Integrity-Protected DH Exchange . . . . . . . . . . . 91 Integrity-Protected DH Exchange . . . . . . . . . . . 96
8.2. Deriving the SRTP Secret (srtps) from the Signaling 8.2. Combining ZRTP With SDES . . . . . . . . . . . . . . . . 98
Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8.2.1. Deriving auxsecret from SDES Key Material . . . . . . 99
8.3. Codec Selection for Secure Media . . . . . . . . . . . . 94 8.3. Codec Selection for Secure Media . . . . . . . . . . . . 101
9. False ZRTP Packet Rejection . . . . . . . . . . . . . . . . . 94 9. False ZRTP Packet Rejection . . . . . . . . . . . . . . . . . 101
10. Intermediary ZRTP Devices . . . . . . . . . . . . . . . . . . 96 10. Intermediary ZRTP Devices . . . . . . . . . . . . . . . . . . 103
11. The ZRTP Disclosure Flag . . . . . . . . . . . . . . . . . . 98 10.1. On Reducing PBX MiTM Behavior . . . . . . . . . . . . . . 105
11. The ZRTP Disclosure Flag . . . . . . . . . . . . . . . . . . 107
11.1. Guidelines on Proper Implementation of the Disclosure 11.1. Guidelines on Proper Implementation of the Disclosure
Flag . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Flag . . . . . . . . . . . . . . . . . . . . . . . . . . 108
12. Mapping between ZID and AOR (SIP URI) . . . . . . . . . . . . 100 12. Mapping between ZID and AOR (SIP URI) . . . . . . . . . . . . 109
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 101 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 110
14. Media Security Requirements . . . . . . . . . . . . . . . . . 102 14. Media Security Requirements . . . . . . . . . . . . . . . . . 111
15. Security Considerations . . . . . . . . . . . . . . . . . . . 103 15. Security Considerations . . . . . . . . . . . . . . . . . . . 113
15.1. Self-Healing Key Continuity Feature . . . . . . . . . . . 106 15.1. Self-Healing Key Continuity Feature . . . . . . . . . . . 116
16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 108 16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 117
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 108 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 117
17.1. Normative References . . . . . . . . . . . . . . . . . . 108 17.1. Normative References . . . . . . . . . . . . . . . . . . 117
17.2. Informative References . . . . . . . . . . . . . . . . . 111 17.2. Informative References . . . . . . . . . . . . . . . . . 120
1. Introduction 1. Introduction
ZRTP is a key agreement protocol that performs a Diffie-Hellman key ZRTP is a key agreement protocol that performs a Diffie-Hellman key
exchange during call setup in the media path and is transported over exchange during call setup in the media path and is transported over
the same port as the Real-time Transport Protocol (RTP) [RFC3550] the same port as the Real-time Transport Protocol (RTP) [RFC3550]
media stream which has been established using a signaling protocol media stream which has been established using a signaling protocol
such as Session Initiation Protocol (SIP) [RFC3261]. This generates such as Session Initiation Protocol (SIP) [RFC3261]. This generates
a shared secret, which is then used to generate keys and salt for a a shared secret, which is then used to generate keys and salt for a
Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from
skipping to change at page 10, line 15 skipping to change at page 10, line 15
When Multistream mode is indicated in the Commit message, a call flow When Multistream mode is indicated in the Commit message, a call flow
similar to Figure 1 is used, but no DH calculation is performed by similar to Figure 1 is used, but no DH calculation is performed by
either endpoint and the DHPart1 and DHPart2 messages are omitted. either endpoint and the DHPart1 and DHPart2 messages are omitted.
The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since
the cache is not affected during this mode, multiple Multistream ZRTP the cache is not affected during this mode, multiple Multistream ZRTP
exchanges can be performed in parallel between two endpoints. exchanges can be performed in parallel between two endpoints.
When adding additional media streams to an existing call, only When adding additional media streams to an existing call, only
Multistream mode is used. Only one DH operation is performed, just Multistream mode is used. Only one DH operation is performed, just
for the first media stream. for the first media stream. Consequently, all the media streams in
the session share the same SAS (Section 7).
4. Protocol Description 4. Protocol Description
This section begins the normative description of the protocol. This section begins the normative description of the protocol.
ZRTP MUST be multiplexed on the same ports as the RTP media packets. ZRTP MUST be multiplexed on the same ports as the RTP media packets.
To support best effort encryption from the Media Security To support best effort encryption from the Media Security
Requirements [RFC5479], ZRTP uses normal RTP/AVP profile (AVP) media Requirements [RFC5479], ZRTP uses normal RTP/AVP profile (AVP) media
lines in the initial offer/answer exchange. The ZRTP SDP attribute lines in the initial offer/answer exchange. The ZRTP SDP attribute
skipping to change at page 16, line 16 skipping to change at page 16, line 16
The use of this shared secret cache is described in Section 4.9. The use of this shared secret cache is described in Section 4.9.
If no secret of a given type is available, a random value is If no secret of a given type is available, a random value is
generated and used for that secret to ensure a mismatch in the hash generated and used for that secret to ensure a mismatch in the hash
comparisons in the DHPart1 and DHPart2 messages. This prevents an comparisons in the DHPart1 and DHPart2 messages. This prevents an
eavesdropper from knowing which types of shared secrets are available eavesdropper from knowing which types of shared secrets are available
between the endpoints. between the endpoints.
Section 4.3.1 refers to the auxiliary shared secret auxsecret. The Section 4.3.1 refers to the auxiliary shared secret auxsecret. The
auxsecret shared secret may be defined by the VoIP user agent out-of- auxsecret shared secret may be defined by the VoIP user agent out-of-
band from the ZRTP protocol. In some cases, it may be provided by band from the ZRTP protocol. It may be manually provisioned in
the signaling layer as srtps, which is defined in Section 8.2. If it application-specific ways, such as computed from a hashed pass phrase
is not provided by the signaling layer, the auxsecret shared secret by prior agreement between the two parties or supplied by a hardware
may be manually provisioned in other application-specific ways that token. Or, it may be a family key used by an institution to which
are out of band, such as computed from a hashed pass phrase by prior the two parties both belong. It is a generalized mechanism for
agreement between the two parties or supplied by a hardware token. providing a shared secret that is agreed to between the two parties
Or, it may be a family key used by an institution to which the two out of scope of the ZRTP protocol. It is expected that most typical
parties both belong. It is a generalized mechanism for providing a ZRTP endpoints will rarely use auxsecret.
shared secret that is agreed to between the two parties out of scope
of the ZRTP protocol. It is expected that most typical ZRTP
endpoints will rarely use auxsecret.
For both the initiator and the responder, the shared secrets s1, s2, For both the initiator and the responder, the shared secrets s1, s2,
and s3 will be calculated so that they can all be used later to and s3 will be calculated so that they can all be used later to
calculate s0 in Section 4.4.1.4. Here is how s1, s2, and s3 are calculate s0 in Section 4.4.1.4. Here is how s1, s2, and s3 are
calculated by both parties. calculated by both parties.
The shared secret s1 will be either the initiator's rs1 or the The shared secret s1 will be either the initiator's rs1 or the
initiator's rs2, depending on which of them can be found in the initiator's rs2, depending on which of them can be found in the
responder's cache. If the initiator's rs1 matches the responder's responder's cache. If the initiator's rs1 matches the responder's
rs1 or rs2, then s1 MUST be set to the initiator's rs1. If and only rs1 or rs2, then s1 MUST be set to the initiator's rs1. If and only
skipping to change at page 36, line 18 skipping to change at page 36, line 18
Section 4.9. Section 4.9.
(3) The responder MUST receive the initiator's Confirm2 message (3) The responder MUST receive the initiator's Confirm2 message
before updating the responder's cache. before updating the responder's cache.
(4) The initiator MUST receive either the responder's Conf2ACK (4) The initiator MUST receive either the responder's Conf2ACK
message or the responder's SRTP media (with a valid SRTP auth message or the responder's SRTP media (with a valid SRTP auth
tag) before updating the initiator's cache. tag) before updating the initiator's cache.
The cache update may also be affected by a cache mismatch, according The cache update may also be affected by a cache mismatch, according
to Section 4.6.1.1. to Section 4.6.1.1 or Section 4.6.1.2.
For DH mode only, before updating the retained shared secret rs1 in For DH mode only, before updating the retained shared secret rs1 in
the cache, each party first discards their old rs2 and copies their the cache, each party first discards their old rs2 and copies their
old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of
session interruption after one party has updated his own rs1 but session interruption after one party has updated his own rs1 but
before the other party has enough information to update her own rs1. before the other party has enough information to update her own rs1.
If that happens, they may regain cache sync in the next session by If that happens, they may regain cache sync in the next session by
using rs2 (per Section 4.3). This mitigates the well-known Two using rs2 (per Section 4.3). This mitigates the well-known Two
Generals' Problem [Byzantine]. The old rs1 value is not saved in Generals' Problem [Byzantine]. The old rs1 value is not saved in
Preshared mode. Preshared mode.
skipping to change at page 36, line 41 skipping to change at page 36, line 41
from s0 via the ZRTP key derivation function (Section 4.5.1): from s0 via the ZRTP key derivation function (Section 4.5.1):
rs1 = KDF(s0, "retained secret", KDF_Context, 256) rs1 = KDF(s0, "retained secret", KDF_Context, 256)
Note that KDF_Context is unique for each media stream, but only the Note that KDF_Context is unique for each media stream, but only the
first media stream is permitted to update rs1. first media stream is permitted to update rs1.
Each media stream has its own s0. At this point in the protocol for Each media stream has its own s0. At this point in the protocol for
each media stream, the corresponding s0 MUST be erased. each media stream, the corresponding s0 MUST be erased.
If a cache update is appropriate, subject to the above conditions and
not delayed by a cache mismatch, it should be done as follows. Both
ZRTP endpoints SHOULD commit the new rs1 to nonvolatile storage
immediately upon receiving the remote party's Confirm message. The
initiator should write the new rs1 before sending the Confirm2
message, and the responder should write the new rs1 before sending
any SRTP media. This means no SRTP media will be sent by either
party until the new rs1 is saved by both parties. After receiving
evidence that the remote party has committed the new rs1 to
nonvolatile storage, rs2 (the old value of rs1) SHOULD be discarded.
Receiving a few packets of properly formed SRTP media after the
Confirm message would be evidence that the remote party has remained
functioning long enough to commit the new rs1 to nonvolatile storage.
A brief interval (about one second of encrypted media) should be
sufficient for rs1 to be properly saved across a cluster of
distributed load-sharing PBXs that share a common cache. A good
strategy is to hold back from committing rs2 to nonvolatile storage
for this brief interval, and commit it to nonvolatile storage only if
the connection is lost during that interval, or if encrypted media
fails to appear within a reasonable time. Since this would be a rare
event, in most cases rs2 would not be saved. If rs2 is saved
unconditionally, it would have the undesirable effect of lengthening
the window of vulnerability for a MiTM attack if the cache is
captured by an attacker, as described in Section 15.1.
4.6.1.1. Cache Update Following a Cache Mismatch 4.6.1.1. Cache Update Following a Cache Mismatch
If a shared secret cache mismatch (as defined in Section 4.3.2) is If a shared secret cache mismatch (as defined in Section 4.3.2) is
detected in the current session, it indicates a possible MiTM attack. detected in the current session, it indicates a possible MiTM attack.
However, there may be evidence to the contrary, if either one of the However, there may be evidence to the contrary, if either one of the
following conditions are met: following conditions are met:
o Successful use of the mechanism described in Section 8.1.1, but o Successful use of the mechanism described in Section 8.1.1, but
only if fully supported by end-to-end integrity-protected delivery only if fully supported by end-to-end integrity-protected delivery
of the a=zrtp-hash in the signaling via SIP Identity [RFC4474] or of the a=zrtp-hash in the signaling via SIP Identity [RFC4474] or
skipping to change at page 37, line 16 skipping to change at page 37, line 41
o A good signature is received and verified using the digital o A good signature is received and verified using the digital
signature feature on the SAS hash, as described in Section 7.2, if signature feature on the SAS hash, as described in Section 7.2, if
this feature is supported. this feature is supported.
If there is a cache mismatch in the absence of the aforementioned If there is a cache mismatch in the absence of the aforementioned
mitigating evidence, the cache update MUST be delayed in the current mitigating evidence, the cache update MUST be delayed in the current
session until the user verbally compares the SAS with his partner session until the user verbally compares the SAS with his partner
during the call and confirms a successful SAS verify via his user during the call and confirms a successful SAS verify via his user
interface as described in Section 7.1. If the session ends before interface as described in Section 7.1. If the session ends before
that happens, the cache update is not performed, leaving the rs1/rs2 that happens, the cache update is not performed, leaving the rs1/rs2
values unmodified in the cache. Regardless of whether a cache values unmodified in the cache. The local SAS Verified (V) flag is
mismatch occurs, s0 must still be erased. also left unmodified in this case.
This means the caches will continue to be mismatched on subsequent
calls, and the user will thus be alerted of this security condition
on every call until the SAS is verified. Or, if the cache mismatches
are caused by an actual MiTM attack instead of a cache mishap, the
alerts will continue on every call until the caches match again
because the MiTM attacker ceased his attacks. In that case, the
cache entries and related (V) flags are unscathed by the MiTM
attacker when the attacks cease. The MiTM attacker is thus foiled
from even having a denial-of-service effect on the caches.
If the user verbally compares the SAS with his partner during the
call and confirms a successful SAS verify via his user interface, the
local cache is then updated. Note that in this case rs2 (the old
value of rs1) must also be saved, to mitigate the possibility of the
remote user failing to update.
Regardless of whether a cache mismatch occurs, s0 must still be
erased.
If no cache entry exists, as is the case in the initial call, the If no cache entry exists, as is the case in the initial call, the
cache update is handled in the normal fashion. cache update is handled in the normal fashion.
4.6.1.2. Cache Update for a PBX Following a Cache Mismatch
In the event of a cache mismatch, a PBX MUST NOT update the cache if
there is a pbxsecret defined on the PBX, but it does not match the
pbxsecret of the remote endpoint. Otherwise, the PBX MUST update the
cache, notwithstanding Section 4.6.1.1.
Rationale: If a ZRTP endpoint is enrolled with a PBX, it is desirable
that the PBX's cache is not easily disrupted by an attempted MiTM
attack. The enrolled phone should also not update the cache per
Section 4.6.1.1. A PBX has no human to verify the SAS, so the PBX
assumes the cache should be updated unless a pbxsecret mismatch
suggests otherwise. Note that unenrolled phones will lose cache sync
after an attempted MiTM attack, because the PBX will update the cache
during the attack.
However, this loss of cache sync for an unenrolled phone may be
easily remedied by calling an enrolled phone behind the PBX (with the
PBX acting as a MiTM) and re-verifying the SAS with a human. That
would update the cache on both the unenrolled phone and the PBX, re-
establishing cache sync.
The PBX's lack of human assisted SAS verification following a cache
mismatch is one more reason to reduce the PBX's MiTM role whenever
possible, as explained in Section 10.1.
4.7. Termination 4.7. Termination
A ZRTP session is normally terminated at the end of a call, but it A ZRTP session is normally terminated at the end of a call, but it
may be terminated early by either the Error message or the GoClear may be terminated early by either the Error message or the GoClear
message. message.
4.7.1. Termination via Error Message 4.7.1. Termination via Error Message
The Error message (Section 5.9) is used to terminate an in-progress The Error message (Section 5.9) is used to terminate an in-progress
ZRTP exchange due to an error. The Error message contains an integer ZRTP exchange due to an error. The Error message contains an integer
skipping to change at page 42, line 12 skipping to change at page 43, line 36
note it as an unexpected security event when the next key negotiation note it as an unexpected security event when the next key negotiation
occurs between the same two parties. This means there need not be occurs between the same two parties. This means there need not be
perfectly synchronized deletion of expired secrets from the two perfectly synchronized deletion of expired secrets from the two
caches, and makes it easy to avoid a race condition that might caches, and makes it easy to avoid a race condition that might
otherwise be caused by clock skew. otherwise be caused by clock skew.
If the expiration interval is not properly agreed to by both If the expiration interval is not properly agreed to by both
endpoints, it may later result in false alarms of MiTM attacks, due endpoints, it may later result in false alarms of MiTM attacks, due
to apparent cache mismatches (Section 4.3.2). to apparent cache mismatches (Section 4.3.2).
It is essential that each cache entry have some form of human-
readable name associated with it. If cache entries are stored
without human-readable names, a MiTM attack is possible for an
attacker who has previously established cache entries with both
parties, as explained in Section 12. Users would have to do a verbal
SAS compare for every call, greatly diminishing the value of caching.
The relationship between a ZID and a SIP AOR is explained in The relationship between a ZID and a SIP AOR is explained in
Section 12. Section 12.
4.9.1. Cacheless Implementations 4.9.1. Cacheless Implementations
It is possible to implement a simplified but nonetheless useful (and It is possible to implement a simplified but nonetheless useful (and
still compliant) profile of the ZRTP protocol that does not support still compliant) profile of the ZRTP protocol that does not support
any caching of shared secrets. In this case, the users would have to any caching of shared secrets. In this case, the users would have to
rely exclusively on the verbal SAS comparison for every call. That rely exclusively on the verbal SAS comparison for every call. That
is, unless MiTM protection is provided by the mechanisms in Section is, unless MiTM protection is provided by the mechanisms in Section
skipping to change at page 47, line 48 skipping to change at page 49, line 48
The block cipher algorithm is negotiated via the Cipher Type Block The block cipher algorithm is negotiated via the Cipher Type Block
found in the Hello message (Section 5.2) and the Commit message found in the Hello message (Section 5.2) and the Commit message
(Section 5.4). (Section 5.4).
All ZRTP endpoints MUST support AES-128 (AES1) and MAY support AES- All ZRTP endpoints MUST support AES-128 (AES1) and MAY support AES-
192 (AES2), AES-256 (AES3), or other Cipher Types. The Advanced 192 (AES2), AES-256 (AES3), or other Cipher Types. The Advanced
Encryption Standard is defined in [FIPS-197]. Encryption Standard is defined in [FIPS-197].
The use of AES-128 in SRTP is defined by [RFC3711]. The use of AES- The use of AES-128 in SRTP is defined by [RFC3711]. The use of AES-
192 and AES-256 in SRTP is defined by [RFC6188]. The choice of the 192 and AES-256 in SRTP is defined by [RFC6188]. All ZRTP endpoints
AES key length is coupled to the Key Agreement Type, as explained in must support AES in counter mode for SRTP. The choice of the AES key
length is coupled to the Key Agreement Type, as explained in
Section 5.1.5. Section 5.1.5.
Other block ciphers may be supported that have the same block size Other block ciphers may be supported that have the same block size
and key sizes as AES. If implemented, they may be used anywhere in and key sizes as AES. If implemented, they may be used anywhere in
ZRTP or SRTP in place of the AES, in the same modes of operation and ZRTP or SRTP in place of the AES, in the same modes of operation and
key size. Notably, in counter mode to replace AES-CM in [RFC3711] key size. Notably, in counter mode to replace AES-CM in [RFC3711]
and [RFC6188], as well as in CFB mode to encrypt a portion of the and [RFC6188], as well as in CFB mode to encrypt a portion of the
Confirm message (Figure 10) and SASrelay message (Figure 16). ZRTP Confirm message (Figure 10) and SASrelay message (Figure 16). ZRTP
endpoints MAY support the TwoFish [TwoFish] block cipher. endpoints MAY support the TwoFish [TwoFish] block cipher.
skipping to change at page 49, line 5 skipping to change at page 51, line 5
short SRTP payloads. short SRTP payloads.
The Skein MAC key is computed by the SRTP key derivation function, The Skein MAC key is computed by the SRTP key derivation function,
which is also referred to as the AES-CM PRF, or pseudorandom which is also referred to as the AES-CM PRF, or pseudorandom
function. This is defined either in [RFC3711] or in [RFC6188], function. This is defined either in [RFC3711] or in [RFC6188],
depending on the selected SRTP AES key length. To compute a Skein depending on the selected SRTP AES key length. To compute a Skein
MAC key, the SRTP PRF output for the authentication key is left MAC key, the SRTP PRF output for the authentication key is left
untruncated at 256 bits, instead of the usual truncated length of 160 untruncated at 256 bits, instead of the usual truncated length of 160
bits (the key length used by HMAC-SHA1). bits (the key length used by HMAC-SHA1).
In [RFC3711], Section 9.5 prohibits the use of 32-bit auth tags for
SRTCP, regardless of the SRTP auth tag length. Accordingly, if Skein
is used for SRTP auth tags, SRTCP MUST use Skein 64-bit auth tags,
regardless of the negotiated SRTP auth tag length.
Auth Tag Type Block | Meaning Auth Tag Type Block | Meaning
---------------------------------------------------------- ----------------------------------------------------------
"HS32" | 32-bit authentication tag based on "HS32" | 32-bit authentication tag based on
| HMAC-SHA1 as defined in RFC 3711. | HMAC-SHA1 as defined in RFC 3711.
---------------------------------------------------------- ----------------------------------------------------------
"HS80" | 80-bit authentication tag based on "HS80" | 80-bit authentication tag based on
| HMAC-SHA1 as defined in RFC 3711. | HMAC-SHA1 as defined in RFC 3711.
---------------------------------------------------------- ----------------------------------------------------------
"SK32" | 32-bit authentication tag based on "SK32" | 32-bit authentication tag based on
| Skein-512-MAC as defined in [Skein], | Skein-512-MAC as defined in [Skein],
skipping to change at page 70, line 16 skipping to change at page 72, line 16
The next 8 bits are used for flags. Undefined flags are set to zero The next 8 bits are used for flags. Undefined flags are set to zero
and ignored. Three flags are currently defined. The Disclosure Flag and ignored. Three flags are currently defined. The Disclosure Flag
(D) is a Boolean bit defined in Section 11. The Allow Clear flag (A) (D) is a Boolean bit defined in Section 11. The Allow Clear flag (A)
is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V) is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V)
is a Boolean bit defined in Section 7.1. These flags are updated is a Boolean bit defined in Section 7.1. These flags are updated
values to the same flags provided earlier in the Confirm message, but values to the same flags provided earlier in the Confirm message, but
they are updated to reflect the new flag information relayed by the they are updated to reflect the new flag information relayed by the
PBX from the other party. PBX from the other party.
The relayed V flag comes from the ZRTP endpoint on the other side of
the PBX. If this relayed V flag is zero, the local ZRTP user agent
should render a conspicuous display of the SAS to prompt the human to
verbally verify it. However, a relayed V flag should not affect the
local V flag, unlike the V flag received in the Confirm message.
The next 32-bit word contains the SAS rendering scheme for the The next 32-bit word contains the SAS rendering scheme for the
relayed sashash, which will be the same rendering scheme used by the relayed sashash, which will be the same rendering scheme used by the
other party on the other side of the trusted MiTM. Section 7.3 other party on the other side of the trusted MiTM. Section 7.3
describes how the PBX determines whether the ZRTP client regards the describes how the PBX determines whether the ZRTP client regards the
PBX as a trusted MiTM. If the PBX determines that the ZRTP client PBX as a trusted MiTM. If the PBX determines that the ZRTP client
trusts the PBX, the next 8 words contain the sashash relayed from the trusts the PBX, the next 8 words contain the sashash relayed from the
other party. The first 32-bit word of the sashash contains the other party. The first 32-bit word of the sashash contains the
sasvalue, which may be rendered to the user using the specified SAS sasvalue, which may be rendered to the user using the specified SAS
rendering scheme. If this SASrelay message is being sent to a ZRTP rendering scheme. If this SASrelay message is being sent to a ZRTP
client that does not trust this MiTM, the sashash will be ignored by client that does not trust this MiTM, the sashash will be ignored by
skipping to change at page 73, line 21 skipping to change at page 75, line 21
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version="1.10" (1 word) | | version="1.10" (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EndpointHash (2 words) | | EndpointHash (2 words) |
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 18: Ping Message Format Figure 18: Ping Message Format
5.15.1. Rationale for Ping messages
Ping messages are useful for implementing ZRTP proxies. A ZRTP proxy
(Section 10) is a "bump-in-the-wire" that sits between a (usually
non-ZRTP-enabled) VoIP client and the Internet. It attempts to
secure the VoIP call by examining the RTP media streams, detecting
the call, and intervening to encrypt the call "on the fly".
This is not always easy to do, as it may have to be done without help
from the signaling layer. The VoIP client may make internal
decisions on how to do NAT traversal, which are not readily apparent
to the proxy. The proxy has to reverse engineer this knowledge by
inspecting all the RTP streams. The RTP stream from Alice to Bob
might not follow the same path, through the same ports, as the RTP
stream from Bob to Alice. One stream may go directly peer to peer,
while the reverse stream may take a detour through a media relay.
The two parties may have both audio and video streams between them,
and may also be simultaneously talking to others in a conference
call, and some of those parties may be behind the same PBX. All of
these RTP streams have to be sorted out and associated with the
correct ZRTP endpoints. Related audio and video streams have to be
matched up between two parties, and not confused with other streams
to nearby parties behind the same PBX. Ping and PingACK messages
make this possible.
5.16. PingACK Message 5.16. PingACK Message
A PingACK message is sent only in response to a Ping. A ZRTP A PingACK message is sent only in response to a Ping. A ZRTP
endpoint MUST respond to a Ping with a PingACK message. The version endpoint MUST respond to a Ping with a PingACK message. The version
of PingACK requested is contained in the Ping message. If that of PingACK requested is contained in the Ping message. If that
version number is supported, a PingACK with a format that matches version number is supported, a PingACK with a format that matches
that version MUST be sent. Otherwise, if the version number of the that version MUST be sent. Otherwise, if the version number of the
Ping is not supported, a PingACK SHOULD be sent in the format of the Ping is not supported, a PingACK SHOULD be sent in the format of the
highest supported version known to the Ping responder. Only version highest supported version known to the Ping responder. Only version
"1.10" is supported in this specification. "1.10" is supported in this specification.
skipping to change at page 78, line 29 skipping to change at page 81, line 14
is available to the client software, it allows for the possibility is available to the client software, it allows for the possibility
that the client software could render to the user that the SAS verify that the client software could render to the user that the SAS verify
procedure was carried out in a previous session. procedure was carried out in a previous session.
Regardless of whether there is a user interface element to allow the Regardless of whether there is a user interface element to allow the
user to set the SAS Verified flag, it is worth caching a shared user to set the SAS Verified flag, it is worth caching a shared
secret, because doing so reduces opportunities for an attacker in the secret, because doing so reduces opportunities for an attacker in the
next call. next call.
If at any time the users carry out the SAS comparison procedure, and If at any time the users carry out the SAS comparison procedure, and
it actually fails to match, then this means there is a very it actually fails to match, then this indicates a very resourceful
resourceful MiTM. If this is the first call, the MiTM was there on MiTM. If the SAS comparison fails on the very first call, that would
the first call, which is impressive enough. If it happens in a later indicate an attacker who had some foresight, agility, and fortuitous
call, it also means the MiTM must also know the cached shared secret, positioning, but he is still caught by the SAS comparison. If the
because you could not have carried out any voice traffic at all MiTM misses the first call and attacks later, this will trigger a
unless the session key was correctly computed and is also known to cache mismatch alarm. If the SAS fails to match without a cache
the attacker. This implies the MiTM must have been present in all mismatch alarm, it means the MiTM knows the cached shared secret.
the previous sessions, since the initial establishment of the first This either implies the MiTM attacker has somehow stolen the cached
shared secret. This is indeed a resourceful attacker. It also means shared secret from one of the two parties, or it implies the MiTM
that if at any time he ceases his participation as a MiTM on one of must have been present in all the previous sessions, since the
your calls, the protocol will detect that the cached shared secret is initial establishment of the first shared secret. This is indeed a
no longer valid -- because it was really two different shared secrets resourceful attacker. It also means that if at any time he ceases
all along, one of them between Alice and the attacker, and the other his participation as a MiTM on one of the calls, the protocol will
between the attacker and Bob. The continuity of the cached shared detect that the cached shared secret is no longer valid -- because it
secrets makes it possible for us to detect the MiTM when he inserts was really two different shared secrets all along, one of them
himself into the ongoing relationship, as well as when he leaves. between Alice and the attacker, and the other between the attacker
Also, if the attacker tries to stay with a long lineage of calls, but and Bob. The continuity of the cached shared secrets makes it
fails to execute a DH MiTM attack for even one missed call, he is possible to detect the MiTM when he inserts himself into the ongoing
permanently excluded. He can no longer resynchronize with the chain relationship, as well as when he leaves. Also, if the attacker tries
of cached shared secrets. to stay with a long lineage of calls, but fails to execute a DH MiTM
attack for even one missed call, he is permanently excluded. He can
no longer resynchronize with the chain of cached shared secrets.
This is discussed further in Section 15.1.
A user interface element (i.e., a checkbox or button) is needed to A user interface element (i.e., a checkbox or button) is needed to
allow the user to tell the software the SAS verify was successful, allow the user to tell the software the SAS verify was successful,
causing the software to set the SAS Verified flag (V), which causing the software to set the SAS Verified flag (V), which
(together with our cached shared secret) obviates the need to perform (together with our cached shared secret) obviates the need to perform
the SAS procedure in the next call. An additional user interface the SAS procedure in the next call. An additional user interface
element can be provided to let the user tell the software he detected element can be provided to let the user tell the software he detected
an actual SAS mismatch, which indicates a MiTM attack. The software an actual SAS mismatch, which indicates a MiTM attack. The software
can then take appropriate action, clearing the SAS Verified flag, and can then take appropriate action, clearing the SAS Verified flag, and
erase the cached shared secret from this session. It is up to the erase the cached shared secret from this session. It is up to the
skipping to change at page 80, line 20 skipping to change at page 83, line 8
is independent of the hash used in the sashash. The sashash is is independent of the hash used in the sashash. The sashash is
determined by the negotiated Hash Type (Section 5.1.2), while the determined by the negotiated Hash Type (Section 5.1.2), while the
hash used by the digital signature is separately defined by the hash used by the digital signature is separately defined by the
digital signature algorithm. For example, the sashash may be based digital signature algorithm. For example, the sashash may be based
on SHA-256, while the digital signature might use SHA-384, if an on SHA-256, while the digital signature might use SHA-384, if an
ECDSA P-384 key is used. ECDSA P-384 key is used.
If the sashash (which is always truncated to 256 bits) is shorter If the sashash (which is always truncated to 256 bits) is shorter
than the signature hash, the security is not weakened because the than the signature hash, the security is not weakened because the
hash commitment precludes the attacker from searching for sashash hash commitment precludes the attacker from searching for sashash
collisions. collisions, as explained in Section 4.4.1.1.
ECDSA algorithms may be used with either OpenPGP-formatted keys, or ECDSA algorithms may be used with either OpenPGP-formatted keys, or
X.509v3 certificates. If the ZRTP key exchange is ECDH, and the SAS X.509v3 certificates. If the ZRTP key exchange is ECDH, and the SAS
is signed, then the signature SHOULD be ECDSA, and SHOULD use the is signed, then the signature SHOULD be ECDSA, and SHOULD use the
same size curve as the ECDH exchange if an ECDSA key of that size is same size curve as the ECDH exchange if an ECDSA key of that size is
available. available.
If a ZRTP endpoint supports incoming signatures (evidenced by setting If a ZRTP endpoint supports incoming signatures (evidenced by setting
the (S) flag in the Hello message), it SHOULD be able to parse the (S) flag in the Hello message), it SHOULD be able to parse
signatures from the other endpoint in OpenPGP format and MUST be able signatures from the other endpoint in OpenPGP format and MUST be able
skipping to change at page 81, line 25 skipping to change at page 84, line 13
are over prime fields, drawn from Appendix D.1.2 of [FIPS-186-3]. are over prime fields, drawn from Appendix D.1.2 of [FIPS-186-3].
7.2.1. OpenPGP Signatures 7.2.1. OpenPGP Signatures
If the SAS Signature Type (Section 5.1.7) specifies an OpenPGP If the SAS Signature Type (Section 5.1.7) specifies an OpenPGP
signature ("PGP "), the signature-related fields are arranged as signature ("PGP "), the signature-related fields are arranged as
follows. follows.
The first field after the 4-octet Signature Type Block is the OpenPGP The first field after the 4-octet Signature Type Block is the OpenPGP
signature. The format of this signature and the algorithms that signature. The format of this signature and the algorithms that
create it are specified by [RFC4880]. The signature is comprised of create it are specified by [RFC4880] and [RFC6637]. The signature is
a complete OpenPGP version 4 signature in binary form (not Radix-64), comprised of a complete OpenPGP version 4 signature in binary form
as specified in RFC 4880, Section 5.2.3, enclosed in the full OpenPGP (not Radix-64), as specified in RFC 4880, Section 5.2.3, enclosed in
packet syntax. The length of the OpenPGP signature is parseable from the full OpenPGP packet syntax. The length of the OpenPGP signature
the signature, and depends on the type and length of the signing key. is parseable from the signature, and depends on the type and length
of the signing key.
If OpenPGP signatures are supported, an implementation SHOULD NOT If OpenPGP signatures are supported, an implementation SHOULD NOT
generate signatures using any other signature algorithm except DSA or generate signatures using any other signature algorithm except DSA or
ECDSA (ECDSA is a reserved algorithm type in RFC 4880), but MAY ECDSA (ECDSA in OpenPGP is defined in [RFC6637]), but MAY accept
accept other signature types from the other party. DSA signatures other signature types from the other party. DSA signatures with keys
with keys shorter than 2048 bits or longer than 3072 bits MUST NOT be shorter than 2048 bits or longer than 3072 bits MUST NOT be
generated. generated.
Implementers should be aware that ECDSA signatures for OpenPGP are Any use of ECDSA signatures in ZRTP SHOULD NOT generate signatures
expected to become available when the work in progress [ECC-OpenPGP] using ECDSA key sizes other than P-224, P-256, and P-384, as defined
becomes an RFC. Any use of ECDSA signatures in ZRTP SHOULD NOT in [FIPS-186-3].
generate signatures using ECDSA key sizes other than P-224, P-256,
and P-384, as defined in [FIPS-186-3].
RFC 4880, Section 5.2.3.18, specifies a way to embed, in an OpenPGP RFC 4880, Section 5.2.3.18, specifies a way to embed, in an OpenPGP
signature, a URI of the preferred key server. The URI should be signature, a URI of the preferred key server. The URI should be
fully specified to obtain the public key of the signing key that fully specified to obtain the public key of the signing key that
created the signature. This URI MUST be present. It is up to the created the signature. This URI MUST be present. It is up to the
recipient of the signature to obtain the public key of the signing recipient of the signature to obtain the public key of the signing
key and determine its validity status using the OpenPGP trust model key and determine its validity status using the OpenPGP trust model
discussed in [RFC4880]. discussed in [RFC4880].
The contents of Figure 20 lie inside the encrypted region of the The contents of Figure 20 lie inside the encrypted region of the
skipping to change at page 86, line 25 skipping to change at page 89, line 16
a relayed SAS from an untrusted MiTM, because it may be relayed by a a relayed SAS from an untrusted MiTM, because it may be relayed by a
MiTM attacker. See the SASrelay message definition (Figure 16) for MiTM attacker. See the SASrelay message definition (Figure 16) for
further details. further details.
To ensure that both Alice and Bob will use the same SAS rendering To ensure that both Alice and Bob will use the same SAS rendering
scheme after the keys are negotiated, the PBX also sends the SASrelay scheme after the keys are negotiated, the PBX also sends the SASrelay
message to the unenrolled party (which does not regard this PBX as a message to the unenrolled party (which does not regard this PBX as a
trusted MiTM), conveying the SAS rendering scheme, but not the trusted MiTM), conveying the SAS rendering scheme, but not the
sashash, which it sets to zero. The unenrolled party will ignore the sashash, which it sets to zero. The unenrolled party will ignore the
relayed SAS field, but will use the specified SAS rendering scheme. relayed SAS field, but will use the specified SAS rendering scheme.
If both endpoints are enrolled, one of them will still receive an
"empty" SASrelay message. If and only if a PBX relays an SAS to one
endpoint, it MUST also send an "empty" SASrelay to the other
endpoint, containing a null sashash.
It is possible to route a call through two ZRTP-enabled PBXs using It is possible to route a call through two ZRTP-enabled PBXs using
this scheme. Assume Alice is a ZRTP endpoint who trusts her local this scheme. Assume Alice is a ZRTP endpoint who trusts her local
PBX in Atlanta, and Bob is a ZRTP endpoint who trusts his local PBX PBX in Atlanta, and Bob is a ZRTP endpoint who trusts his local PBX
in Biloxi. The call is routed from Alice to the Atlanta PBX to the in Biloxi. The call is routed from Alice to the Atlanta PBX to the
Biloxi PBX to Bob. Atlanta would relay the Atlanta-Biloxi SAS to Biloxi PBX to Bob. Atlanta would relay the Atlanta-Biloxi SAS to
Alice because Alice is enrolled with Atlanta, and Biloxi would relay Alice because Alice is enrolled with Atlanta, and Biloxi would relay
the Atlanta-Biloxi SAS to Bob because Bob is enrolled with Biloxi. the Atlanta-Biloxi SAS to Bob because Bob is enrolled with Biloxi.
The two PBXs are not assumed to be enrolled with each other in this The two PBXs are not assumed to be enrolled with each other in this
example. Both Alice and Bob would view and verbally compare the same example. Both Alice and Bob would view and verbally compare the same
relayed SAS, the Atlanta-Biloxi SAS. No more than two trusted MiTM relayed SAS, the Atlanta-Biloxi SAS. No more than two trusted MiTM
nodes can be traversed with this relaying scheme. This behavior is nodes can be traversed with this relaying scheme. This behavior is
extended to two PBXs that are enrolled with each other, via this extended to two PBXs that are enrolled with each other, via this
rule: In the case of a PBX sharing trusted MiTM keys with both rule: In the case of a PBX sharing trusted MiTM keys with both
endpoints (i.e., both enrolled with this PBX), one of which is endpoints (i.e., both enrolled with this PBX), one of which is
another PBX (evidenced by the M-flag) and one of which is a non-PBX, another PBX (evidenced by the M-flag) and one of which is a non-PBX,
the MiTM PBX must always relay the PBX-to-PBX SAS to the non-PBX the MiTM PBX MUST always relay the PBX-to-PBX SAS to the non-PBX
endpoint. endpoint.
A ZRTP endpoint phone that trusts a PBX to act as a trusted MiTM is A ZRTP endpoint phone that trusts a PBX to act as a trusted MiTM is
effectively delegating its own policy decisions of algorithm effectively delegating its own policy decisions of algorithm
negotiation to the PBX. negotiation to the PBX.
When a PBX is between two ZRTP endpoints and is terminating their When a PBX is between two ZRTP endpoints and is terminating their
media streams at the PBX, the PBX presents its own ZID to the two media streams at the PBX, the PBX presents its own ZID to the two
parties, eclipsing the ZIDs of the two parties from each other. For parties, eclipsing the ZIDs of the two parties from each other. For
example, if several different calls are routed through such a PBX to example, if several different calls are routed through such a PBX to
several different ZRTP-enabled phones behind the PBX, only a single several different ZRTP-enabled phones behind the PBX, only a single
ZID is presented to the calling party in every case -- the ZID of the ZID is presented to the calling party in every case -- the ZID of the
PBX itself. PBX itself.
This SAS relay mechanism imposes a cognitive burden on the user, and
the number of intermediaries does not scale up beyond two PBXs
trusted by their respective local users. The ZRTP ecosystem becomes
more elegant if all PBXs and other media intermediaries avoid the
MiTM role whenever possible, as explained in Section 10.1.
The next section describes the initial enrollment procedure that The next section describes the initial enrollment procedure that
establishes a special shared secret, a trusted MiTM key, between a establishes a special shared secret, a trusted MiTM key, between a
PBX and a phone, so that the phone will learn to recognize the PBX as PBX and a phone, so that the phone will learn to recognize the PBX as
a trusted MiTM. a trusted MiTM.
7.3.1. PBX Enrollment and the PBX Enrollment Flag 7.3.1. PBX Enrollment and the PBX Enrollment Flag
Both the PBX and the endpoint need to know when enrollment is taking Both the PBX and the endpoint need to know when enrollment is taking
place. One way of doing this is to set up an enrollment extension on place. One way of doing this is to set up an enrollment extension on
the PBX that a newly configured endpoint would call and establish a the PBX that a newly configured endpoint would call and establish a
skipping to change at page 88, line 47 skipping to change at page 91, line 48
that this puts the PBX in a position to wiretap the calls. that this puts the PBX in a position to wiretap the calls.
It is recommended that a ZRTP client not proceed with the PBX It is recommended that a ZRTP client not proceed with the PBX
enrollment procedure without evidence that a MiTM attack is not enrollment procedure without evidence that a MiTM attack is not
taking place during the enrollment session. It would be especially taking place during the enrollment session. It would be especially
damaging if a MiTM tricks the client into enrolling with the wrong damaging if a MiTM tricks the client into enrolling with the wrong
PBX. That would enable the malevolent MiTM to wiretap all future PBX. That would enable the malevolent MiTM to wiretap all future
calls without arousing suspicion, because he would appear to be calls without arousing suspicion, because he would appear to be
trusted. trusted.
To this end, the client ZRTP endpoint should not proceed with PBX
enrollment unless at least one of the following conditions apply:
o An automated mechanism is used, from Section 7.4. TLS-protected
signaling may be especially well-suited in this special case, for
reasons explained in Section 8.1.1.
o The SAS is verified with a live human on the PBX side during the
enrollment session.
o It is the judgement of the administrator supervising the
enrollment that the threat model and the circumstances indicate a
low probability of a MiTM being present, perhaps because this is
the first call to the PBX, or because the enrollment is conducted
over a relatively safe network. For example, a mobile smart phone
can be enrolled through a protected WiFi local network near the
PBX, before issuing it to an employee for international travel.
This leap of faith is usually justified in benign environments.
7.4. Automated Methods of Authenticating the DH Exchange
Alternate methods of authenticating the DH exchange may be used when
interacting with an automated remote system, when no human is
available at the remote endpoint to verbally compare the SAS. Usage
scenarios include leaving or retrieving voicemail, interacting with a
conference bridge, or the PBX security enrollment procedure
(Section 7.3.1).
Here are the automated ways to have ZRTP authenticate the DH
exchange:
o Successful use of the mechanism described in Section 8.1.1, but
only if fully supported by end-to-end integrity-protected delivery
of the a=zrtp-hash in the signaling. This might be achieved via
[RFC4474] or better still, Dan Wing's SIP Identity using Media
Path [SIP-IDENTITY]. This allows authentication of the DH
exchange without human assistance. However, in most usage
scenarios that access an automated system, the entire end-to-end
path is comprised of only one hop, so TLS provides sufficient
integrity protection in this special case. This is explained in
detail in Section 8.1.1.
o The SAS was previously verified with the remote system in an
earlier session, evidenced by the SAS verified flag (V)
(Section 7.1) at both ends and a matching cache entry. If
circumstances permit this method, it has the advantage of not
requiring a PKI.
o A good signature is received and verified using the digital
signature feature on the SAS hash, as described in Section 7.2, if
this feature is supported. Note that for PBX enrollment, only the
PBX endpoint needs to supply the signature, because the trust
decision is made on the client side only.
In any PKI-backed scheme, there is the disadvantage of having to
decide what to do if the connection fails to authenticate because of
a certificate problem. Warning messages may not be effective because
users become habituated to security warnings [Sunshine] about PKI
certificates. Implementors should carefully weigh the cognitive
burden on the user before they invoke such a heavyweight mechanism.
ZRTP is intended to be a lightweight protocol with a low activation
energy and minimal cognitive burden.
When calling an automated system for the first time, the threat model
and circumstances should be examined to decide if a PKI is the only
way to protect against a MiTM. A reasonable alternative to a PKI
would be to rely on the leap of faith that a MiTM attack is less
likely in the initial session, an assumption that seems to work well
enough for SSH. After the first session, cached shared secrets
should suffice.
8. Signaling Interactions 8. Signaling Interactions
This section discusses how ZRTP, SIP, and SDP work together. This section discusses how ZRTP, SIP, and SDP work together.
Note that ZRTP may be implemented without coupling with the SIP Note that ZRTP may be implemented without coupling with the SIP
signaling. For example, ZRTP can be implemented as a "bump in the signaling. For example, ZRTP can be implemented as a "bump in the
wire" or as a "bump in the stack" in which RTP sent by the SIP User wire" or as a "bump in the stack" in which RTP sent by the SIP User
Agent (UA) is converted to ZRTP. In these cases, the SIP UA will Agent (UA) is converted to ZRTP. In these cases, the SIP UA will
have no knowledge of ZRTP. As a result, the signaling path discovery have no knowledge of ZRTP. As a result, the signaling path discovery
mechanisms introduced in this section should not be definitive -- mechanisms introduced in this section should not be definitive --
skipping to change at page 89, line 27 skipping to change at page 94, line 4
to Section 8.1. to Section 8.1.
Aside from the advantages described in Section 8.1, there are a Aside from the advantages described in Section 8.1, there are a
number of potential uses for this attribute. It is useful when number of potential uses for this attribute. It is useful when
signaling elements would like to know when ZRTP may be utilized by signaling elements would like to know when ZRTP may be utilized by
endpoints. It is also useful if endpoints support multiple methods endpoints. It is also useful if endpoints support multiple methods
of SRTP key management. The ZRTP attribute can be used to ensure of SRTP key management. The ZRTP attribute can be used to ensure
that these key management approaches work together instead of against that these key management approaches work together instead of against
each other. For example, if only one endpoint supports ZRTP, but each other. For example, if only one endpoint supports ZRTP, but
both support another method to key SRTP, then the other method will both support another method to key SRTP, then the other method will
be used instead. When used in parallel, an SRTP secret carried in an be used instead. When the a=crypto [RFC4568] attribute and the
a=keymgt [RFC4567] or a=crypto [RFC4568] attribute can be used as a a=zrtp-hash attribute are both used in parallel, the media can
shared secret for the srtps computation defined in Section 8.2. The transition from SDES-keyed SRTP to ZRTP-keyed SRTP, as described in
ZRTP attribute is also used to signal to an intermediary ZRTP device Section 8.2. The ZRTP attribute is also used to signal to an
not to act as a ZRTP endpoint, as discussed in Section 10. intermediary ZRTP device not to act as a ZRTP endpoint, as discussed
in Section 10 and Section 10.1.
The a=zrtp-hash attribute can only be included in the SDP at the The a=zrtp-hash attribute can only be included in the SDP at the
media level since Hello messages sent in different media streams will media level since Hello messages sent in different media streams will
have unique hashes. have unique hashes. A separate a=zrtp-hash attribute should be
included for each media stream. Both ZRTP endpoints should provide
a=zrtp-hash attributes in their SDP.
The ABNF for the ZRTP attribute is as follows: The ABNF for the ZRTP attribute is as follows:
zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value
zrtp-version = token zrtp-version = token
zrtp-hash-value = 1*(HEXDIG) zrtp-hash-value = 1*(HEXDIG)
Here's an example of the ZRTP attribute in an initial SDP offer or Here's an example of the ZRTP attribute in an initial SDP offer or
skipping to change at page 93, line 13 skipping to change at page 97, line 40
integrity becomes more problematic if E.164 numbers [RFC3824] are integrity becomes more problematic if E.164 numbers [RFC3824] are
used in SIP. Thus, real-world implementations of ZRTP endpoints will used in SIP. Thus, real-world implementations of ZRTP endpoints will
continue to depend on SAS authentication for quite some time. Even continue to depend on SAS authentication for quite some time. Even
after there is widespread availability of SIP user agents that offer after there is widespread availability of SIP user agents that offer
integrity protected delivery of SDP attributes, many users will still integrity protected delivery of SDP attributes, many users will still
be faced with the fact that the signaling path may be controlled by be faced with the fact that the signaling path may be controlled by
institutions that do not have the best interests of the end user in institutions that do not have the best interests of the end user in
mind. In those cases, SAS authentication will remain the gold mind. In those cases, SAS authentication will remain the gold
standard for the prudent user. standard for the prudent user.
Even without SIP integrity protection, the Media Security The SIP layer can obtain hop-wise integrity protection simply by
using TLS [RFC5246], but this does not achieve full end-to-end
integrity protection of the a=zrtp-hash attribute in the multi-hop
general case. However, if the entire end-to-end signaling path is
comprised of only one hop, TLS is good enough, provided the
associated PKI complexity can be contained. This usually covers the
use cases where a client is traversing one TLS hop to access the
automated remote services of its own PBX, where no human is available
to verbally compare the SAS. Examples include leaving or retrieving
voicemail, interacting with an IVR or conference bridge, or
performing the PBX security enrollment procedure (Section 7.3.1).
Note that the risk of trusting the SIP server or PBX becomes moot
when the PBX itself is the intended ZRTP endpoint. Thus, TLS-
protected signaling is recommended and preferred for these special
use cases. TLS-protected signaling is usually justified for its own
separate reasons, to mitigate exposure to traffic analysis, which
means the signaling layer already would have borne the additional
cost of TLS.
Even without SIP end-to-end integrity protection, the Media Security
Requirements [RFC5479] R-ACT-ACT requirement can be met by ZRTP's SAS Requirements [RFC5479] R-ACT-ACT requirement can be met by ZRTP's SAS
mechanism. Although ZRTP may benefit from an integrity-protected SIP mechanism. Although ZRTP may benefit from an integrity-protected SIP
layer, it is fortunate that ZRTP's self-contained MiTM defenses do layer, it is fortunate that ZRTP's self-contained MiTM defenses do
not actually require an integrity-protected SIP layer. ZRTP can not actually require an integrity-protected SIP layer. ZRTP can
bypass the delays and problems that SIP integrity faces, such as bypass the delays and problems that SIP integrity faces, such as
E.164 number usage, and the complexity of building and maintaining a E.164 number usage, and the complexity of building and maintaining a
PKI. PKI.
In contrast, DTLS-SRTP [RFC5764] appears to depend heavily on end-to- In contrast, DTLS-SRTP [RFC5764] appears to depend heavily on end-to-
end integrity protection in the SIP layer. Further, DTLS-SRTP must end integrity protection in the SIP layer. Further, DTLS-SRTP must
bear the additional cost of a signature calculation of its own, in bear the additional cost of a signature calculation of its own, in
addition to the signature calculation the SIP layer uses to achieve addition to the signature calculation the SIP layer uses to achieve
its integrity protection. ZRTP needs no signature calculation of its its integrity protection. ZRTP needs no signature calculation of its
own to leverage the signature calculation carried out in the SIP own to leverage the signature calculation carried out in the SIP
layer. layer.
8.2. Deriving the SRTP Secret (srtps) from the Signaling Layer 8.2. Combining ZRTP With SDES
The signaling layer may negotiate its own SRTP master key and salt,
using the SDP Security Descriptions (SDES [RFC4568]) or [RFC4567].
This section describes how ZRTP may be used in combination with SDES.
Most ZRTP endpoints are expected to use TLS [RFC5246] to protect the
signaling layer, just because it's a good idea to hide the signaling
from eavesdroppers who want to see who you are calling. If TLS is
used for the signaling, SDES incurs no additional cost in packets or
computation.
However, SDES has significant security vulnerabilities if used alone.
Because the SDES keying material is known to the SIP server, SDES is
vulnerable to any SIP server controlled by a wiretapper. For that
reason, SDES must be regarded as a "wiretap-friendly" protocol. ZRTP
does not reveal key material to the signaling layer. Further, most
TLS cipher suites found in the wild lack Perfect Forward Secrecy
(PFS), so SDES would inherit that deficiency. Conversely, ZRTP's
a=zrtp-hash attribute, which is also communicated in the signaling,
does not depend on PFS as this value is already known to the
attacker. Despite these deficiencies of SDES, it is useful against
other threat models, and can complement ZRTP's strengths.
The advantages of combining SDES with ZRTP:
Protects media in the RTP session that precedes a ZRTP exchange.
For example, the first few packets of video may expose sensitive
information and may be transmitted before a ZRTP exchange
completes.
If ZRTP fails for any reason (e.g. an opponent blocks it in the
media layer), the media remains protected by SDES-keyed SRTP,
which may provide better confidentiality than having no media
encryption at all.
If and only if SDES is chosen in the SDP answer and both the SDP
offer and answer for the media session contain the a=zrtp-hash
attribute, the SRTP stack MUST, upon completion of the ZRTP exchange,
replace its keying from SDES-provided key material to ZRTP-provided
key material. In this case, both ZRTP endpoints MUST clear the Allow
Clear flag (A) in their respective Confirm messages (Figure 10),
which disables the GoClear mechanism (Section 4.7.2). Also in this
case, ZRTP MAY include imported SDES key material via auxsecret, as
described in Section 8.2.1.
If either endpoint fails to explicitly provide the a=zrtp-hash
attribute via SDP, the SRTP stack MUST NOT be rekeyed by the ZRTP
exchange. Instead, the plaintext media MUST continue to be encrypted
with the keys negotiated via SDES. This SDES-keyed ciphertext media
MUST then be treated as though it were plaintext RTP and enciphered
with a second, independent SRTP context keyed by ZRTP. The result is
that the media will pass through two layers of SRTP encryption, with
the inner layer keyed by SDES, and the outer layer keyed by ZRTP.
This relatively inefficient scenario is expected to be rare, and
applies mainly to "bump-in-the-wire" ZRTP proxies (Section 10) that
have no access to the signaling layer, such as [Zfone]. Note that
this paragraph breaks backward compatibility with RFC 6189 for any
ZRTP devices which negotiate SDES via SDP but fail to send the
a=zrtp-hash attribute in their SDP.
8.2.1. Deriving auxsecret from SDES Key Material
The shared secret calculations defined in Section 4.3 make use of the The shared secret calculations defined in Section 4.3 make use of the
SRTP secret (srtps), if it is provided by the signaling layer. auxsecret, which may be optionally provided by various out-of-band
sources. In this section, we show how auxsecret may be derived from
SDES [RFC4568] keying information that may be present in the
signaling layer.
It is desirable for only one SRTP key negotiation protocol to be If only one SRTP key negotiation protocol is to be used, that
used, and that protocol should be ZRTP. But in the event the protocol should be ZRTP. But in the event the signaling layer
signaling layer negotiates its own SRTP master key and salt, using negotiates its own SRTP master key and salt, using the SDP Security
the SDP Security Descriptions (SDES [RFC4568]) or [RFC4567], it can Descriptions (SDES [RFC4568]) or [RFC4567], it can be passed from the
be passed from the signaling to the ZRTP layer and mixed into ZRTP's signaling to the ZRTP layer and mixed into ZRTP's own shared secret
own shared secret calculations, without compromising security by calculations, without compromising security by creating a dependency
creating a dependency on the signaling for media encryption. on the signaling for media encryption. ZRTP endpoints may make use
of SDES parameters from any signaling protocol that provides it.
ZRTP computes srtps from the SRTP master key and salt parameters If SDES is used in the signaling layer, there are two separate SRTP
provided by the signaling layer in this manner, truncating the result master keys and salts provided by SDES, one for each direction of
to 256 bits: media flow. These two keys and salts are combined here into a single
shared secret, auxsecret, to feed into the mix of ZRTP shared secret
calculations.
srtps = KDF(SRTP master key, "SRTP Secret", auxsecret = KDF(hash(len(srtpmki) || srtpmki ||
(ZIDi || ZIDr || SRTP master salt), 256) len(srtpmkr) || srtpmkr),
"SRTP Secret",
(ZIDi || ZIDr ||
srtpmsi || srtpmsr),
negotiated hash length)
It is expected that the srtps parameter will be rarely computed or In the above formula, the parameters srtpmki and srtpmsi are
used in typical ZRTP endpoints, because it is likely and desirable extracted from the SDES transmitted in the signaling by the SIP
that ZRTP will be the sole means of negotiating SRTP keys, needing no initiator, while srtpmkr and srtpmsr are extracted from the SDES
help from [RFC4568] or [RFC4567]. If srtps is computed, it will be transmitted in the signaling by the SIP responder. These keys and
stored in the auxiliary shared secret auxsecret, defined in salts are in binary form, not the base64 representation used by SDES.
Section 4.3 and used in Section 4.3.1. The explicit length fields, len(), in the above hash are 32-bit big-
endian integers, giving the length in octets of the field that
follows. The length in octets of srtpmki or srtpmkr can only be 16,
24, or 32, if the AES is used. srtpmki is the SIP initiator's SRTP
master key, srtpmkr is the SIP responder's SRTP master key, srtpmsi
is the SIP initiator's SRTP master salt, and srtpmsr is the SIP
responder's SRTP master salt. The length of the SRTP master salts
are defined as 112 bits in [RFC3711]. ZIDi is the ZRTP initiator's
ZID, and ZIDr is the ZRTP responder's ZID.
This mechanism only provides a way to import the associated SDES
keying material from the first media stream in a ZRTP exchange. Any
additional media stream would be keyed by ZRTP's Multistream mode
(Section 4.4.3), and thus would not import any additional SDES keying
material associated with the additional media stream.
The inclusion of SDES keying material is optional for a ZRTP
endpoint. Even if only one endpoint computes auxsecret from the SDES
material, ZRTP protocol completion is still possible if security
policy permits a non-matching auxsecret, as can be seen in
Section 4.3. SDES key material MUST NOT be imported into ZRTP except
in circumstances defined in Section 8.2, when the a=zrtp-hash
attribute is also present in the signaling.
There are no security enhancements conferred by importing SDES
material into ZRTP, that are not already conferred by using the
a=zrtp-hash attribute. Both enhance security only if the SIP server
is trustworthy. For this reason, this section may be deprecated in
future versions of this specification.
8.3. Codec Selection for Secure Media 8.3. Codec Selection for Secure Media
Codec selection is negotiated in the signaling layer. If the Codec selection is negotiated in the signaling layer. If the
signaling layer determines that ZRTP is supported by both endpoints, signaling layer determines that ZRTP is supported by both endpoints,
this should provide guidance in codec selection to avoid variable this should provide guidance in codec selection to avoid variable
bitrate (VBR) codecs that leak information. bitrate (VBR) codecs that leak information.
When voice is compressed with a VBR codec, the packet lengths vary When voice is compressed with a VBR codec, the packet lengths vary
depending on the types of sounds being compressed. This leaks a lot depending on the types of sounds being compressed. This leaks a lot
skipping to change at page 94, line 33 skipping to change at page 101, line 35
bitrate depending on the type of sound being compressed. bitrate depending on the type of sound being compressed.
It also appears that voice activity detection (VAD) leaks information It also appears that voice activity detection (VAD) leaks information
about the content of the conversation, but to a lesser extent than about the content of the conversation, but to a lesser extent than
VBR. This effect can be mitigated by lengthening the VAD hangover VBR. This effect can be mitigated by lengthening the VAD hangover
time by a random amount between 1 and 2 seconds, if this is feasible time by a random amount between 1 and 2 seconds, if this is feasible
in your application. Only short bursts of speech would benefit from in your application. Only short bursts of speech would benefit from
lengthening the VAD hangover time. lengthening the VAD hangover time.
The security problems of VBR and VAD are addressed in detail by the The security problems of VBR and VAD are addressed in detail by the
guidelines in [VBR-AUDIO]. It is RECOMMENDED that ZRTP endpoints guidelines in [RFC6562]. It is RECOMMENDED that ZRTP endpoints
follow these guidelines. follow these guidelines.
9. False ZRTP Packet Rejection 9. False ZRTP Packet Rejection
An attacker who is not in the media path may attempt to inject false An attacker who is not in the media path may attempt to inject false
ZRTP protocol packets, possibly to effect a denial-of-service attack ZRTP protocol packets, possibly to effect a denial-of-service attack
or to inject his own media stream into the call. VoIP, by its or to inject his own media stream into the call. VoIP, by its
nature, invites various forms of denial-of-service attacks and nature, invites various forms of denial-of-service attacks and
requires protocol features to reject such attacks. While bogus SRTP requires protocol features to reject such attacks. While bogus SRTP
packets may be easily rejected via the SRTP auth tag field, that can packets may be easily rejected via the SRTP auth tag field, that can
skipping to change at page 98, line 5 skipping to change at page 105, line 5
(IVR), voicemail system, or speech recognition system. The display (IVR), voicemail system, or speech recognition system. The display
of SAS strings to users should be disabled in these cases. of SAS strings to users should be disabled in these cases.
It is possible that an intermediary device acting as a ZRTP endpoint It is possible that an intermediary device acting as a ZRTP endpoint
might still receive ZRTP Hello and other messages from the inside might still receive ZRTP Hello and other messages from the inside
endpoint. This could occur if there is another inline ZRTP device endpoint. This could occur if there is another inline ZRTP device
that does not include the ZRTP SDP attribute flag. An intermediary that does not include the ZRTP SDP attribute flag. An intermediary
acting as a ZRTP endpoint receiving ZRTP Hello and other messages acting as a ZRTP endpoint receiving ZRTP Hello and other messages
from the inside endpoint MUST NOT pass these ZRTP messages. from the inside endpoint MUST NOT pass these ZRTP messages.
10.1. On Reducing PBX MiTM Behavior
ZRTP is designed to negotiate session keys directly between two
users, and to detect a man-in-the-middle (MiTM) attack. A PBX often
tries to be a MiTM, as part of its natural functionality. This
creates a conflict between the objectives of a ZRTP client and the
objectives of a PBX. This conflict may be resolved by using the
trusted MiTM mechanism (Section 7.3), but this adds complexity and
only works well between users of a single trusted PBX. It can be
stretched further to handle calls between two PBXs trusted by their
respective local users, but breaks down if more intermediaries are
involved. It also imposes a cognitive burden on the user, who may
not be aware of the security properties or trustworthiness of all the
intermediaries.
The client usually prefers to negotiate ZRTP end-to-end with the
other client, without exposing the keys or plaintext to the PBX, and
use the PBX as a trusted MiTM only when necessary. A PBX should
allow this whenever possible, even if the clients trust the PBX.
The PBX may avoid acting as a MiTM either by allowing the media to
completely bypass the PBX, with the two clients routing their media
peer-to-peer, or by acting as a media relay in a manner similar to a
TURN server. The advantages of the latter approach are mainly to
facilitate NAT traversal. If only one of the two parties is a ZRTP
endpoint, and the PBX is capable of serving as a ZRTP endpoint, the
PBX MUST attempt to negotiate a ZRTP session with the client that
supports ZRTP, so that at least one leg of the call is secure. This
is a far better choice than directly connecting the media streams
between a ZRTP client and a non-ZRTP client, and having the ZRTP
negotiation fail completely.
The PBX SHOULD make best efforts to not act as a MiTM if the PBX has
evidence that both VoIP clients support ZRTP. Evidence of ZRTP
support is best indicated by the presence of the optional a=zrtp-hash
attribute (Section 8) in the signaling layer of both the caller and
callee. Evidence of ZRTP support or non-support in the clients may
also be available to the PBX in the form of configuration information
stored in the PBX.
If the client sends the a=zrtp-hash attribute, and the PBX acts as a
MiTM nonetheless, the client SHOULD alert the user to the fact that
the security level is less than expected. The client can readily
detect this condition by receiving an SASrelay message (Figure 16)
from the PBX. The severity of the alert is left to the application,
which would be relying on the trusted MiTM mechanism.
A PBX should not act as a MiTM unless there is a compelling reason to
do so. Transcoding is fundamentally incompatible with end-to-end
secure media. It should be done only when there is no alternative,
when the two ZRTP endpoints do not share a common codec. ZRTP
clients should implement a repertoire of codecs sufficient to
minimize the need for PBX transcoding. Transcoding between two ZRTP
clients forces a PBX to act as a MiTM. If only one media stream
needs transcoding in a multimedia session, all of the media streams
in that session must be handled in MiTM mode.
If there is more than one media stream in a session between two ZRTP
endpoints, a PBX MUST either act as a MiTM for all of them, or for
none of them. This is because all the media streams between two ZRTP
endpoints must share the same SAS (Section 7), due to the use of
Multistream mode (Section 3.1.3). This includes the related RTCP/
SRTCP streams.
A PBX may forgo end-to-end security and choose MiTM mode for policy
reasons. An institution may choose to present a single ZRTP endpoint
to the outside world, through its locally trusted PBX. Or, a client
application may explicitly request a PBX to act as a MiTM for a
particular call, for example via a special dial prefix.
It's especially harmful if a PBX that lacks its own ZRTP stack
performs unnecessary transcoding between two ZRTP endpoints, ruling
out the possibility of any secure connection at all. Not even the
trusted MiTM mechanism is available, because this PBX is incapable of
acting as a back-to-back ZRTP MiTM. Even if the PBX avoids
transcoding, it might terminate the media streams for other reasons,
reasons that are likely to be less important than the clients' need
for a secure call. If this kind of PBX sees the a=zrtp-hash
attribute in the caller's signaling, and the two clients share at
least one common codec, the PBX should at least attempt to do no
harm, and get out of the way of ZRTP. Let the users speak Navajo
with each other if they want.
A common usage scenario for a ZRTP-enabled PBX is for a VoIP client
to call a PBX trusted by the client, in order to bridge to a PSTN
gateway in or near the PBX. In such a case, the PBX SHOULD act as a
ZRTP endpoint so that the VoIP leg of the call is secured. The call
should be regarded as not secure past the ZRTP endpoint closest to
the PSTN gateway. If the PSTN gateway is distant from the PBX, the
PBX should provide a secure connection to the PSTN gateway, perhaps
through a VPN connection. Even then, the call becomes vulnerable
when it enters the PSTN. Nonetheless, this would be appropriate for
a caller who originates his ZRTP session from a hostile environment,
but is less concerned about the wiretap threat near the PSTN gateway.
11. The ZRTP Disclosure Flag 11. The ZRTP Disclosure Flag
There are no back doors defined in the ZRTP protocol specification. There are no back doors defined in the ZRTP protocol specification.
The designers of ZRTP would like to discourage back doors in ZRTP- The designers of ZRTP would like to discourage back doors in ZRTP-
enabled products. However, despite the lack of back doors in the enabled products. However, despite the lack of back doors in the
actual ZRTP protocol, it must be recognized that a ZRTP implementer actual ZRTP protocol, it must be recognized that a ZRTP implementer
might still deliberately create a rogue ZRTP-enabled product that might still deliberately create a rogue ZRTP-enabled product that
implements a back door outside the scope of the ZRTP protocol. For implements a back door outside the scope of the ZRTP protocol. For
example, they could create a product that discloses the SRTP session example, they could create a product that discloses the SRTP session
key generated using ZRTP out-of-band to a third party. They may even key generated using ZRTP out-of-band to a third party. They may even
skipping to change at page 100, line 50 skipping to change at page 109, line 50
several ZIDs, and a single ZID may be associated with several SIP several ZIDs, and a single ZID may be associated with several SIP
URIs on the same client. URIs on the same client.
Not only that, but ZRTP is independent of which signaling protocol is Not only that, but ZRTP is independent of which signaling protocol is
used. It works equally well with SIP, Jingle, H.323, or any used. It works equally well with SIP, Jingle, H.323, or any
proprietary signaling protocol. Thus, a ZRTP ZID has little to do proprietary signaling protocol. Thus, a ZRTP ZID has little to do
with SIP, per se, which means it has little to do with a SIP URI. with SIP, per se, which means it has little to do with a SIP URI.
Even though a ZID is associated with a device, not a human, it is Even though a ZID is associated with a device, not a human, it is
often the case that a ZRTP endpoint is controlled mainly by a often the case that a ZRTP endpoint is controlled mainly by a
particular human. For example, it may be a mobile phone. To get the particular human. For example, it may be a mobile phone. For the
full benefit of the key continuity features, a local cache entry (and key continuity features (Section 15.1) to be effective, a local cache
thus a ZID) should be associated with some sort of name of the remote entry (and thus a ZID) should be associated with some sort of name of
party. That name could be a human name, or it could be made more the remote party. That name could be a human name, or it could be
precise by specifying which ZRTP endpoint he's using. For example made more precise by specifying which ZRTP endpoint he's using. For
"Jon Callas", or "Jon Callas on his iPhone", or "Jon on his iPad", or example "Jon Callas", or "Jon Callas on his iPhone", or "Jon on his
"Alice on her office phone". These name strings can be stored in the iPad", or "Alice on her office phone". These name strings can be
local cache, indexed by ZID, and may have been initially provided by stored in the local cache, indexed by ZID, and may have been
the local user by hand. Or the local cache entry may contain a initially provided by the local user by hand. Or the local cache
pointer to an entry in the local address book. When a secure session entry may contain a pointer to an entry in the local address book.
is established, if a prior session has established a cache entry, and When a secure session is established, if a prior session has
the new session has a matching cache entry indexed by the same ZID, established a cache entry, and the new session has a matching cache
and the SAS has been previously verified, the person's name stored in entry indexed by the same ZID, and the SAS has been previously
that cache entry should be displayed. verified, the person's name stored in that cache entry should be
displayed.
It is absolutely essential to have these human-readable names
associated with cache entries. If the cache is implemented without
them, it opens the door to a simple form of MiTM attack. An attacker
who has previously established a cache entry with both parties (or
simply captures a phone that has) can later act as a MiTM between
those two parties without triggering a cache mismatch, which means
the users will not be alerted to do an SAS compare. This MiTM attack
would be easily detected if the name stored with the cache entry is
displayed for the user, so that the user can readily see that he is
not connected to the remote party he expected.
If the remote ZID originates from a PBX, the displayed name would be If the remote ZID originates from a PBX, the displayed name would be
the name of that PBX, which might be the name of the company who owns the name of that PBX, which might be the name of the company who owns
that PBX. that PBX.
If it is desirable to associate some key material with a particular If it is desirable to associate some key material with a particular
AOR, digital signatures (Section 7.2) may be used, with public key AOR, digital signatures (Section 7.2) may be used, with public key
certificates that associate the signature key with an AOR. If more certificates that associate the signature key with an AOR. If more
than one ZRTP endpoint shares the same AOR, they may all use the same than one ZRTP endpoint shares the same AOR, they may all use the same
signature key and provide the same public key certificate with their signature key and provide the same public key certificate with their
skipping to change at page 109, line 33 skipping to change at page 118, line 48
"Requirements and Analysis of Media Security Management "Requirements and Analysis of Media Security Management
Protocols", RFC 5479, April 2009. Protocols", RFC 5479, April 2009.
[RFC5759] Solinas, J. and L. Zieglar, "Suite B Certificate and [RFC5759] Solinas, J. and L. Zieglar, "Suite B Certificate and
Certificate Revocation List (CRL) Profile", RFC 5759, Certificate Revocation List (CRL) Profile", RFC 5759,
January 2010. January 2010.
[RFC6188] McGrew, D., "The Use of AES-192 and AES-256 in Secure [RFC6188] McGrew, D., "The Use of AES-192 and AES-256 in Secure
RTP", RFC 6188, March 2011. RTP", RFC 6188, March 2011.
[RFC6637] Jivsov, A., "Elliptic Curve Cryptography (ECC) in
OpenPGP", RFC 6637, June 2012.
[FIPS-140-2-Annex-A] [FIPS-140-2-Annex-A]
"Annex A: Approved Security Functions for FIPS PUB 140-2", "Annex A: Approved Security Functions for FIPS PUB 140-2",
NIST FIPS PUB 140-2 Annex A, January 2011. NIST FIPS PUB 140-2 Annex A, January 2011.
[FIPS-140-2-Annex-D] [FIPS-140-2-Annex-D]
"Annex D: Approved Key Establishment Techniques for FIPS "Annex D: Approved Key Establishment Techniques for FIPS
PUB 140-2", NIST FIPS PUB 140-2 Annex D, January 2011. PUB 140-2", NIST FIPS PUB 140-2 Annex D, January 2011.
[FIPS-180-3] [FIPS-180-3]
"Secure Hash Standard (SHS)", NIST FIPS PUB 180-3, October "Secure Hash Standard (SHS)", NIST FIPS PUB 180-3, October
skipping to change at page 112, line 11 skipping to change at page 121, line 30
BCP 119, RFC 4579, August 2006. BCP 119, RFC 4579, August 2006.
[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117,
January 2008. January 2008.
[RFC5245] Rosenberg, J., "Interactive Connectivity Establishment [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment
(ICE): A Protocol for Network Address Translator (NAT) (ICE): A Protocol for Network Address Translator (NAT)
Traversal for Offer/Answer Protocols", RFC 5245, Traversal for Offer/Answer Protocols", RFC 5245,
April 2010. April 2010.
[RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security
(TLS) Protocol Version 1.2", RFC 5246, August 2008.
[RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer
Security (DTLS) Extension to Establish Keys for the Secure Security (DTLS) Extension to Establish Keys for the Secure
Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.
[RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand
Key Derivation Function (HKDF)", RFC 5869, May 2010. Key Derivation Function (HKDF)", RFC 5869, May 2010.
[RFC6090] McGrew, D., Igoe, K., and M. Salter, "Fundamental Elliptic [RFC6090] McGrew, D., Igoe, K., and M. Salter, "Fundamental Elliptic
Curve Cryptography Algorithms", RFC 6090, February 2011. Curve Cryptography Algorithms", RFC 6090, February 2011.
[RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of
Variable Bit Rate Audio with Secure RTP", RFC 6562,
March 2012.
[SRTP-AES-GCM] [SRTP-AES-GCM]
McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption
in Secure RTP (SRTP)", Work in Progress, January 2011. in Secure RTP (SRTP)", Work in Progress, January 2011.
[ECC-OpenPGP]
Jivsov, A., "ECC in OpenPGP", Work in Progress,
March 2011.
[VBR-AUDIO]
Perkins, C. and J. Valin, "Guidelines for the use of
Variable Bit Rate Audio with Secure RTP", Work
in Progress, December 2010.
[SIP-IDENTITY] [SIP-IDENTITY]
Wing, D. and H. Kaplan, "SIP Identity using Media Path", Wing, D. and H. Kaplan, "SIP Identity using Media Path",
Work in Progress, February 2008. Work in Progress, February 2008.
[NIST-SP800-57-Part1] [NIST-SP800-57-Part1]
Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid, Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid,
"Recommendation for Key Management - Part 1: General "Recommendation for Key Management - Part 1: General
(Revised)", NIST Special Publication 800-57 - Part (Revised)", NIST Special Publication 800-57 - Part
1 Revised March 2007. 1 Revised March 2007.
 End of changes. 44 change blocks. 
171 lines changed or deleted 599 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/