1 # Protocol Buffers in Swift 2 3 ## Objective 4 5 This document describes the user-facing API and internal implementation of 6 proto2 and proto3 messages in Apples Swift programming language. 7 8 One of the key goals of protobufs is to provide idiomatic APIs for each 9 language. In that vein, **interoperability with Objective-C is a non-goal of 10 this proposal.** Protobuf users who need to pass messages between Objective-C 11 and Swift code in the same application should use the existing Objective-C proto 12 library. The goal of the effort described here is to provide an API for protobuf 13 messages that uses features specific to Swiftoptional types, algebraic 14 enumerated types, value types, and so forthin a natural way that will delight, 15 rather than surprise, users of the language. 16 17 ## Naming 18 19 * By convention, both typical protobuf message names and Swift structs/classes 20 are `UpperCamelCase`, so for most messages, the name of a message can be the 21 same as the name of its generated type. (However, see the discussion below 22 about prefixes under [Packages](#packages).) 23 24 * Enum cases in protobufs typically are `UPPERCASE_WITH_UNDERSCORES`, whereas 25 in Swift they are `lowerCamelCase` (as of the Swift 3 API design 26 guidelines). We will transform the names to match Swift convention, using 27 a whitelist similar to the Objective-C compiler plugin to handle commonly 28 used acronyms. 29 30 * Typical fields in proto messages are `lowercase_with_underscores`, while in 31 Swift they are `lowerCamelCase`. We will transform the names to match 32 Swift convention by removing the underscores and uppercasing the subsequent 33 letter. 34 35 ## Swift reserved words 36 37 Swift has a large set of reserved wordssome always reserved and some 38 contextually reserved (that is, they can be used as identifiers in contexts 39 where they would not be confused). As of Swift 2.2, the set of always-reserved 40 words is: 41 42 ``` 43 _, #available, #column, #else, #elseif, #endif, #file, #function, #if, #line, 44 #selector, as, associatedtype, break, case, catch, class, continue, default, 45 defer, deinit, do, dynamicType, else, enum, extension, fallthrough, false, for, 46 func, guard, if, import, in, init, inout, internal, is, let, nil, operator, 47 private, protocol, public, repeat, rethrows, return, self, Self, static, 48 struct, subscript, super, switch, throw, throws, true, try, typealias, var, 49 where, while 50 ``` 51 52 The set of contextually reserved words is: 53 54 ``` 55 associativity, convenience, dynamic, didSet, final, get, infix, indirect, 56 lazy, left, mutating, none, nonmutating, optional, override, postfix, 57 precedence, prefix, Protocol, required, right, set, Type, unowned, weak, 58 willSet 59 ``` 60 61 It is possible to use any reserved word as an identifier by escaping it with 62 backticks (for example, ``let `class` = 5``). Other name-mangling schemes would 63 require us to transform the names themselves (for example, by appending an 64 underscore), which requires us to then ensure that the new name does not collide 65 with something else in the same namespace. 66 67 While the backtick feature may not be widely known by all Swift developers, a 68 small amount of user education can address this and it seems like the best 69 approach. We can unconditionally surround all property names with backticks to 70 simplify generation. 71 72 Some remapping will still be required, though, to avoid collisions between 73 generated properties and the names of methods and properties defined in the base 74 protocol/implementation of messages. 75 76 # Features of Protocol Buffers 77 78 This section describes how the features of the protocol buffer syntaxes (proto2 79 and proto3) map to features in Swiftwhat the code generated from a proto will 80 look like, and how it will be implemented in the underlying library. 81 82 ## Packages 83 84 Modules are the main form of namespacing in Swift, but they are not declared 85 using syntactic constructs like namespaces in C++ or packages in Java. Instead, 86 they are tied to build targets in Xcode (or, in the future with open-source 87 Swift, declarations in a Swift Package Manager manifest). They also do not 88 easily support nesting submodules (Clang module maps support this, but pure 89 Swift does not yet provide a way to define submodules). 90 91 We will generate types with fully-qualified underscore-delimited names. For 92 example, a message `Baz` in package `foo.bar` would generate a struct named 93 `Foo_Bar_Baz`. For each fully-qualified proto message, there will be exactly one 94 unique type symbol emitted in the generated binary. 95 96 Users are likely to balk at the ugliness of underscore-delimited names for every 97 generated type. To improve upon this situation, we will add a new string file 98 level option, `swift_package_typealias`, that can be added to `.proto` files. 99 When present, this will cause `typealias`es to be added to the generated Swift 100 messages that replace the package name prefix with the provided string. For 101 example, the following `.proto` file: 102 103 ```protobuf 104 option swift_package_typealias = "FBP"; 105 package foo.bar; 106 107 message Baz { 108 // Message fields 109 } 110 ``` 111 112 would generate the following Swift source: 113 114 ```swift 115 public struct Foo_Bar_Baz { 116 // Message fields and other methods 117 } 118 119 typealias FBPBaz = Foo_Bar_Baz 120 ``` 121 122 It should be noted that this type alias is recorded in the generated 123 `.swiftmodule` so that code importing the module can refer to it, but it does 124 not cause a new symbol to be generated in the compiled binary (i.e., we do not 125 risk compiled size bloat by adding `typealias`es for every type). 126 127 Other strategies to handle packages that were considered and rejected can be 128 found in [Appendix A](#appendix-a-rejected-strategies-to-handle-packages). 129 130 ## Messages 131 132 Proto messages are natural value types and we will generate messages as structs 133 instead of classes. Users will benefit from Swifts built-in behavior with 134 regard to mutability. We will define a `ProtoMessage` protocol that defines the 135 common methods and properties for all messages (such as serialization) and also 136 lets users treat messages polymorphically. Any shared method implementations 137 that do not differ between individual messages can be implemented in a protocol 138 extension. 139 140 The backing storage itself for fields of a message will be managed by a 141 `ProtoFieldStorage` type that uses an internal dictionary keyed by field number, 142 and whose values are the value of the field with that number (up-cast to Swifts 143 `Any` type). This class will provide type-safe getters and setters so that 144 generated messages can manipulate this storage, and core serialization logic 145 will live here as well. Furthermore, factoring the storage out into a separate 146 type, rather than inlining the fields as stored properties in the message 147 itself, lets us implement copy-on-write efficiently to support passing around 148 large messages. (Furthermore, because the messages themselves are value types, 149 inlining fields is not possible if the fields are submessages of the same type, 150 or a type that eventually includes a submessage of the same type.) 151 152 ### Required fields (proto2 only) 153 154 Required fields in proto2 messages seem like they could be naturally represented 155 by non-optional properties in Swift, but this presents some problems/concerns. 156 157 Serialization APIs permit partial serialization, which allows required fields to 158 remain unset. Furthermore, other language APIs still provide `has*` and `clear*` 159 methods for required fields, and knowing whether a property has a value when the 160 message is in memory is still useful. 161 162 For example, an e-mail draft message may have the to address required on the 163 wire, but when the user constructs it in memory, it doesnt make sense to force 164 a value until they provide one. We only want to force a value to be present when 165 the message is serialized to the wire. Using non-optional properties prevents 166 this use case, and makes client usage awkward because the user would be forced 167 to select a sentinel or placeholder value for any required fields at the time 168 the message was created. 169 170 ### Default values 171 172 In proto2, fields can have a default value specified that may be a value other 173 than the default value for its corresponding language type (for example, a 174 default value of 5 instead of 0 for an integer). When reading a field that is 175 not explicitly set, the user expects to get that value. This makes Swift 176 optionals (i.e., `Foo?`) unsuitable for fields in general. Unfortunately, we 177 cannot implement our own enhanced optional type without severely complicating 178 usage (Swifts use of type inference and its lack of implicit conversions would 179 require manual unwrapping of every property value). 180 181 Instead, we can use **implicitly unwrapped optionals.** For example, a property 182 generated for a field of type `int32` would have Swift type `Int32!`. These 183 properties would behave with the following characteristics, which mirror the 184 nil-resettable properties used elsewhere in Apples SDKs (for example, 185 `UIView.tintColor`): 186 187 * Assigning a non-nil value to a property sets the field to that value. 188 * Assigning nil to a property clears the field (its internal representation is 189 nilled out). 190 * Reading the value of a property returns its value if it is set, or returns 191 its default value if it is not set. Reading a property never returns nil. 192 193 The final point in the list above implies that the optional cannot be checked to 194 determine if the field is set to a value other than its default: it will never 195 be nil. Instead, we must provide `has*` methods for each field to allow the user 196 to check this. These methods will be public in proto2. In proto3, these methods 197 will be private (if generated at all), since the user can test the returned 198 value against the zero value for that type. 199 200 ### Autocreation of nested messages 201 202 For convenience, dotting into an unset field representing a nested message will 203 return an instance of that message with default values. As in the Objective-C 204 implementation, this does not actually cause the field to be set until the 205 returned message is mutated. Fortunately, thanks to the way mutability of value 206 types is implemented in Swift, the language automatically handles the 207 reassignment-on-mutation for us. A static singleton instance containing default 208 values can be associated with each message that can be returned when reading, so 209 copies are only made by the Swift runtime when mutation occurs. For example, 210 given the following proto: 211 212 ```protobuf 213 message Node { 214 Node child = 1; 215 string value = 2 [default = "foo"]; 216 } 217 ``` 218 219 The following Swift code would act as commented, where setting deeply nested 220 properties causes the copies and mutations to occur as the assignment statement 221 is unwound: 222 223 ```swift 224 var node = Node() 225 226 let s = node.child.child.value 227 // 1. node.child returns the "default Node". 228 // 2. Reading .child on the result of (1) returns the same default Node. 229 // 3. Reading .value on the result of (2) returns the default value "foo". 230 231 node.child.child.value = "bar" 232 // 4. Setting .value on the default Node causes a copy to be made and sets 233 // the property on that copy. Subsequently, the language updates the 234 // value of "node.child.child" to point to that copy. 235 // 5. Updating "node.child.child" in (4) requires another copy, because 236 // "node.child" was also the instance of the default node. The copy is 237 // assigned back to "node.child". 238 // 6. Setting "node.child" in (5) is a simple value reassignment, since 239 // "node" is a mutable var. 240 ``` 241 242 In other words, the generated messages do not internally have to manage parental 243 relationships to backfill the appropriate properties on mutation. Swift provides 244 this for free. 245 246 ## Scalar value fields 247 248 Proto scalar value fields will map to Swift types in the following way: 249 250 .proto Type | Swift Type 251 ----------- | ------------------- 252 `double` | `Double` 253 `float` | `Float` 254 `int32` | `Int32` 255 `int64` | `Int64` 256 `uint32` | `UInt32` 257 `uint64` | `UInt64` 258 `sint32` | `Int32` 259 `sint64` | `Int64` 260 `fixed32` | `UInt32` 261 `fixed64` | `UInt64` 262 `sfixed32` | `Int32` 263 `sfixed64` | `Int64` 264 `bool` | `Bool` 265 `string` | `String` 266 `bytes` | `Foundation.NSData` 267 268 The proto spec defines a number of integral types that map to the same Swift 269 type; for example, `intXX`, `sintXX`, and `sfixedXX` are all signed integers, 270 and `uintXX` and `fixedXX` are both unsigned integers. No other language 271 implementation distinguishes these further, so we do not do so either. The 272 rationale is that the various types only serve to distinguish how the value is 273 **encoded on the wire**; once loaded in memory, the user is not concerned about 274 these variations. 275 276 Swifts lack of implicit conversions among types will make it slightly annoying 277 to use these types in a context expecting an `Int`, or vice-versa, but since 278 this is a data-interchange format with explicitly-sized fields, we should not 279 hide that information from the user. Users will have to explicitly write 280 `Int(message.myField)`, for example. 281 282 ## Embedded message fields 283 284 Embedded message fields can be represented using an optional variable of the 285 generated message type. Thus, the message 286 287 ```protobuf 288 message Foo { 289 Bar bar = 1; 290 } 291 ``` 292 293 would be represented in Swift as 294 295 ```swift 296 public struct Foo: ProtoMessage { 297 public var bar: Bar! { 298 get { ... } 299 set { ... } 300 } 301 } 302 ``` 303 304 If the user explicitly sets `bar` to nil, or if it was never set when read from 305 the wire, retrieving the value of `bar` would return a default, statically 306 allocated instance of `Bar` containing default values for its fields. This 307 achieves the desired behavior for default values in the same way that scalar 308 fields are designed, and also allows users to deep-drill into complex object 309 graphs to get or set fields without checking for nil at each step. 310 311 ## Enum fields 312 313 The design and implementation of enum fields will differ somewhat drastically 314 depending on whether the message being generated is a proto2 or proto3 message. 315 316 ### proto2 enums 317 318 For proto2, we do not need to be concerned about unknown enum values, so we can 319 use the simple raw-value enum syntax provided by Swift. So the following enum in 320 proto2: 321 322 ```protobuf 323 enum ContentType { 324 TEXT = 0; 325 IMAGE = 1; 326 } 327 ``` 328 329 would become this Swift enum: 330 331 ```swift 332 public enum ContentType: Int32, NilLiteralConvertible { 333 case text = 0 334 case image = 1 335 336 public init(nilLiteral: ()) { 337 self = .text 338 } 339 } 340 ``` 341 342 See below for the discussion about `NilLiteralConvertible`. 343 344 ### proto3 enums 345 346 For proto3, we need to be able to preserve unknown enum values that may come 347 across the wire so that they can be written back if unmodified. We can 348 accomplish this in Swift by using a case with an associated value for unknowns. 349 So the following enum in proto3: 350 351 ```protobuf 352 enum ContentType { 353 TEXT = 0; 354 IMAGE = 1; 355 } 356 ``` 357 358 would become this Swift enum: 359 360 ```swift 361 public enum ContentType: RawRepresentable, NilLiteralConvertible { 362 case text 363 case image 364 case UNKNOWN_VALUE(Int32) 365 366 public typealias RawValue = Int32 367 368 public init(nilLiteral: ()) { 369 self = .text 370 } 371 372 public init(rawValue: RawValue) { 373 switch rawValue { 374 case 0: self = .text 375 case 1: self = .image 376 default: self = .UNKNOWN_VALUE(rawValue) 377 } 378 379 public var rawValue: RawValue { 380 switch self { 381 case .text: return 0 382 case .image: return 1 383 case .UNKNOWN_VALUE(let value): return value 384 } 385 } 386 } 387 ``` 388 389 Note that the use of a parameterized case prevents us from inheriting from the 390 raw `Int32` type; Swift does not allow an enum with a raw type to have cases 391 with arguments. Instead, we must implement the raw value initializer and 392 computed property manually. The `UNKNOWN_VALUE` case is explicitly chosen to be 393 "ugly" so that it stands out and does not conflict with other possible case 394 names. 395 396 Using this approach, proto3 consumers must always have a default case or handle 397 the `.UNKNOWN_VALUE` case to satisfy case exhaustion in a switch statement; the 398 Swift compiler considers it an error if switch statements are not exhaustive. 399 400 ### NilLiteralConvertible conformance 401 402 This is required to clean up the usage of enum-typed properties in switch 403 statements. Unlike other field types, enum properties cannot be 404 implicitly-unwrapped optionals without requiring that uses in switch statements 405 be explicitly unwrapped. For example, if we consider a message with the enum 406 above, this usage will fail to compile: 407 408 ```swift 409 // Without NilLiteralConvertible conformance on ContentType 410 public struct SomeMessage: ProtoMessage { 411 public var contentType: ContentType! { ... } 412 } 413 414 // ERROR: no case named text or image 415 switch someMessage.contentType { 416 case .text: { ... } 417 case .image: { ... } 418 } 419 ``` 420 421 Even though our implementation guarantees that `contentType` will never be nil, 422 if it is an optional type, its cases would be `some` and `none`, not the cases 423 of the underlying enum type. In order to use it in this context, the user must 424 write `someMessage.contentType!` in their switch statement. 425 426 Making the enum itself `NilLiteralConvertible` permits us to make the property 427 non-optional, so the user can still set it to nil to clear it (i.e., reset it to 428 its default value), while eliminating the need to explicitly unwrap it in a 429 switch statement. 430 431 ```swift 432 // With NilLiteralConvertible conformance on ContentType 433 public struct SomeMessage: ProtoMessage { 434 // Note that the property type is no longer optional 435 public var contentType: ContentType { ... } 436 } 437 438 // OK: Compiles and runs as expected 439 switch someMessage.contentType { 440 case .text: { ... } 441 case .image: { ... } 442 } 443 444 // The enum can be reset to its default value this way 445 someMessage.contentType = nil 446 ``` 447 448 One minor oddity with this approach is that nil will be auto-converted to the 449 default value of the enum in any context, not just field assignment. In other 450 words, this is valid: 451 452 ```swift 453 func foo(contentType: ContentType) { ... } 454 foo(nil) // Inside foo, contentType == .text 455 ``` 456 457 That being said, the advantage of being able to simultaneously support 458 nil-resettability and switch-without-unwrapping outweighs this side effect, 459 especially if appropriately documented. It is our hope that a new form of 460 resettable properties will be added to Swift that eliminates this inconsistency. 461 Some community members have already drafted or sent proposals for review that 462 would benefit our designs: 463 464 * [SE-0030: Property Behaviors] 465 (https://github.com/apple/swift-evolution/blob/master/proposals/0030-property-behavior-decls.md) 466 * [Drafted: Resettable Properties] 467 (https://github.com/patters/swift-evolution/blob/master/proposals/0000-resettable-properties.md) 468 469 ### Enum aliases 470 471 The `allow_alias` option in protobuf slightly complicates the use of Swift enums 472 to represent that type, because raw values of cases in an enum must be unique. 473 Swift lets us define static variables in an enum that alias actual cases. For 474 example, the following protobuf enum: 475 476 ```protobuf 477 enum Foo { 478 option allow_alias = true; 479 BAR = 0; 480 BAZ = 0; 481 } 482 ``` 483 484 will be represented in Swift as: 485 486 ```swift 487 public enum Foo: Int32, NilLiteralConvertible { 488 case bar = 0 489 static public let baz = bar 490 491 // ... etc. 492 } 493 494 // Can still use .baz shorthand to reference the alias in contexts 495 // where the type is inferred 496 ``` 497 498 That is, we use the first name as the actual case and use static variables for 499 the other aliases. One drawback to this approach is that the static aliases 500 cannot be used as cases in a switch statement (the compiler emits the error 501 *Enum case baz not found in type Foo*). However, in our own code bases, 502 there are only a few places where enum aliases are not mere renamings of an 503 older value, but they also dont appear to be the type of value that one would 504 expect to switch on (for example, a group of named constants representing 505 metrics rather than a set of options), so this restriction is not significant. 506 507 This strategy also implies that changing the name of an enum and adding the old 508 name as an alias below the new name will be a breaking change in the generated 509 Swift code. 510 511 ## Oneof types 512 513 The `oneof` feature represents a variant/union data type that maps nicely to 514 Swift enums with associated values (algebraic types). These fields can also be 515 accessed independently though, and, specifically in the case of proto2, its 516 reasonable to expect access to default values when accessing a field that is not 517 explicitly set. 518 519 Taking all this into account, we can represent a `oneof` in Swift with two sets 520 of constructs: 521 522 * Properties in the message that correspond to the `oneof` fields. 523 * A nested enum named after the `oneof` and which provides the corresponding 524 field values as case arguments. 525 526 This approach fulfills the needs of proto consumers by providing a 527 Swift-idiomatic way of simultaneously checking which field is set and accessing 528 its value, providing individual properties to access the default values 529 (important for proto2), and safely allows a field to be moved into a `oneof` 530 without breaking clients. 531 532 Consider the following proto: 533 534 ```protobuf 535 message MyMessage { 536 oneof record { 537 string name = 1 [default = "unnamed"]; 538 int32 id_number = 2 [default = 0]; 539 } 540 } 541 ``` 542 543 In Swift, we would generate an enum, a property for that enum, and properties 544 for the fields themselves: 545 546 ```swift 547 public struct MyMessage: ProtoMessage { 548 public enum Record: NilLiteralConvertible { 549 case name(String) 550 case idNumber(Int32) 551 case NOT_SET 552 553 public init(nilLiteral: ()) { self = .NOT_SET } 554 } 555 556 // This is the "Swifty" way of accessing the value 557 public var record: Record { ... } 558 559 // Direct access to the underlying fields 560 public var name: String! { ... } 561 public var idNumber: Int32! { ... } 562 } 563 ``` 564 565 This makes both usage patterns possible: 566 567 ```swift 568 // Usage 1: Case-based dispatch 569 switch message.record { 570 case .name(let name): 571 // Do something with name if it was explicitly set 572 case .idNumber(let id): 573 // Do something with id_number if it was explicitly set 574 case .NOT_SET: 575 // Do something if its not set 576 } 577 578 // Usage 2: Direct access for default value fallback 579 // Sets the label text to the name if it was explicitly set, or to 580 // "unnamed" (the default value for the field) if id_number was set 581 // instead 582 let myLabel = UILabel() 583 myLabel.text = message.name 584 ``` 585 586 As with proto enums, the generated `oneof` enum conforms to 587 `NilLiteralConvertible` to avoid switch statement issues. Setting the property 588 to nil will clear it (i.e., reset it to `NOT_SET`). 589 590 ## Unknown Fields (proto2 only) 591 592 To be written. 593 594 ## Extensions (proto2 only) 595 596 To be written. 597 598 ## Reflection and Descriptors 599 600 We will not include reflection or descriptors in the first version of the Swift 601 library. The use cases for reflection on mobile are not as strong and the static 602 data to represent the descriptors would add bloat when we wish to keep the code 603 size small. 604 605 In the future, we will investigate whether they can be included as extensions 606 which might be able to be excluded from a build and/or automatically dead 607 stripped by the compiler if they are not used. 608 609 ## Appendix A: Rejected strategies to handle packages 610 611 ### Each package is its own Swift module 612 613 Each proto package could be declared as its own Swift module, replacing dots 614 with underscores (e.g., package `foo.bar` becomes module `Foo_Bar`). Then, users 615 would simply import modules containing whatever proto modules they want to use 616 and refer to the generated types by their short names. 617 618 **This solution is simply not possible, however.** Swift modules cannot 619 circularly reference each other, but there is no restriction against proto 620 packages doing so. Circular imports are forbidden (e.g., `foo.proto` importing 621 `bar.proto` importing `foo.proto`), but nothing prevents package `foo` from 622 using a type in package `bar` which uses a different type in package `foo`, as 623 long as there is no import cycle. If these packages were generated as Swift 624 modules, then `Foo` would contain an `import Bar` statement and `Bar` would 625 contain an `import Foo` statement, and there is no way to compile this. 626 627 ### Ad hoc namespacing with structs 628 629 We can fake namespaces in Swift by declaring empty structs with private 630 initializers. Since modules are constructed based on compiler arguments, not by 631 syntactic constructs, and because there is no pure Swift way to define 632 submodules (even though Clang module maps support this), there is no 633 source-drive way to group generated code into namespaces aside from this 634 approach. 635 636 Types can be added to those intermediate package structs using Swift extensions. 637 For example, a message `Baz` in package `foo.bar` could be represented in Swift 638 as follows: 639 640 ```swift 641 public struct Foo { 642 private init() {} 643 } 644 645 public extension Foo { 646 public struct Bar { 647 private init() {} 648 } 649 } 650 651 public extension Foo.Bar { 652 public struct Baz { 653 // Message fields and other methods 654 } 655 } 656 657 let baz = Foo.Bar.Baz() 658 ``` 659 660 Each of these constructs would actually be defined in a separate file; Swift 661 lets us keep them separate and add multiple structs to a single namespace 662 through extensions. 663 664 Unfortunately, these intermediate structs generate symbols of their own 665 (metatype information in the data segment). This becomes problematic if multiple 666 build targets contain Swift sources generated from different messages in the 667 same package. At link time, these symbols would collide, resulting in multiple 668 definition errors. 669 670 This approach also has the disadvantage that there is no automatic short way 671 to refer to the generated messages at the deepest nesting levels; since this use 672 of structs is a hack around the lack of namespaces, there is no equivalent to 673 import (Java) or using (C++) to simplify this. Users would have to declare type 674 aliases to make this cleaner, or we would have to generate them for users. 675