1 ============== 2 File Time Type 3 ============== 4 5 .. contents:: 6 :local: 7 8 .. _file-time-type-motivation: 9 10 Motivation 11 ========== 12 13 The filesystem library provides interfaces for getting and setting the last 14 write time of a file or directory. The interfaces use the ``file_time_type`` 15 type, which is a specialization of ``chrono::time_point`` for the 16 "filesystem clock". According to [fs.filesystem.syn] 17 18 trivial-clock is an implementation-defined type that satisfies the 19 Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of 20 representing and measuring file time values. Implementations should ensure 21 that the resolution and range of file_time_type reflect the operating 22 system dependent resolution and range of file time values. 23 24 25 On POSIX systems, file times are represented using the ``timespec`` struct, 26 which is defined as follows: 27 28 .. code-block:: cpp 29 30 struct timespec { 31 time_t tv_sec; 32 long tv_nsec; 33 }; 34 35 To represent the range and resolution of ``timespec``, we need to (A) have 36 nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``). 37 38 As the standard requires us to use the ``chrono`` interface, we have to define 39 our own filesystem clock which specifies the period and representation of 40 the time points and duration it provides. It will look like this: 41 42 .. code-block:: cpp 43 44 struct _FilesystemClock { 45 using period = nano; 46 using rep = TBD; // What is this? 47 48 using duration = chrono::duration<rep, period>; 49 using time_point = chrono::time_point<_FilesystemClock>; 50 51 // ... // 52 }; 53 54 using file_time_type = _FilesystemClock::time_point; 55 56 57 To get nanosecond resolution, we simply define ``period`` to be ``std::nano``. 58 But what type can we use as the arithmetic representation that is capable 59 of representing the range of the ``timespec`` struct? 60 61 Problems To Consider 62 ==================== 63 64 Before considering solutions, let's consider the problems they should solve, 65 and how important solving those problems are: 66 67 68 Having a Smaller Range than ``timespec`` 69 ---------------------------------------- 70 71 One solution to the range problem is to simply reduce the resolution of 72 ``file_time_type`` to be less than that of nanoseconds. This is what libc++'s 73 initial implementation of ``file_time_type`` did; it's also what 74 ``std::system_clock`` does. As a result, it can represent time points about 75 292 thousand years on either side of the epoch, as opposed to only 292 years 76 at nanosecond resolution. 77 78 ``timespec`` can represent time points +/- 292 billion years from the epoch 79 (just in case you needed a time point 200 billion years before the big bang, 80 and with nanosecond resolution). 81 82 To get the same range, we would need to drop our resolution to that of seconds 83 to come close to having the same range. 84 85 This begs the question, is the range problem "really a problem"? Sane usages 86 of file time stamps shouldn't exceed +/- 300 years, so should we care to support it? 87 88 I believe the answer is yes. We're not designing the filesystem time API, we're 89 providing glorified C++ wrappers for it. If the underlying API supports 90 a value, then we should too. Our wrappers should not place artificial restrictions 91 on users that are not present in the underlying filesystem. 92 93 Having a smaller range that the underlying filesystem forces the 94 implementation to report ``value_too_large`` errors when it encounters a time 95 point that it can't represent. This can cause the call to ``last_write_time`` 96 to throw in cases where the user was confident the call should succeed. (See below) 97 98 99 .. code-block:: cpp 100 101 #include <filesystem> 102 using namespace std::filesystem; 103 104 // Set the times using the system interface. 105 void set_file_times(const char* path, struct timespec ts) { 106 timespec both_times[2]; 107 both_times[0] = ts; 108 both_times[1] = ts; 109 int result = ::utimensat(AT_FDCWD, path, both_times, 0); 110 assert(result != -1); 111 } 112 113 // Called elsewhere to set the file time to something insane, and way 114 // out of the 300 year range we might expect. 115 void some_bad_persons_code() { 116 struct timespec new_times; 117 new_times.tv_sec = numeric_limits<time_t>::max(); 118 new_times.tv_nsec = 0; 119 set_file_times("/tmp/foo", new_times); // OK, supported by most FSes 120 } 121 122 int main() { 123 path p = "/tmp/foo"; 124 file_status st = status(p); 125 if (!exists(st) || !is_regular_file(st)) 126 return 1; 127 if ((st.permissions() & perms::others_read) == perms::none) 128 return 1; 129 // It seems reasonable to assume this call should succeed. 130 file_time_type tp = last_write_time(p); // BAD! Throws value_too_large. 131 } 132 133 134 Having a Smaller Resolution than ``timespec`` 135 --------------------------------------------- 136 137 As mentioned in the previous section, one way to solve the range problem 138 is by reducing the resolution. But matching the range of ``timespec`` using a 139 64 bit representation requires limiting the resolution to seconds. 140 141 So we might ask: Do users "need" nanosecond precision? Is seconds not good enough? 142 I limit my consideration of the point to this: Why was it not good enough for 143 the underlying system interfaces? If it wasn't good enough for them, then it 144 isn't good enough for us. Our job is to match the filesystems range and 145 representation, not design it. 146 147 148 Having a Larger Range than ``timespec`` 149 ---------------------------------------- 150 151 We should also consider the opposite problem of having a ``file_time_type`` 152 that is able to represent a larger range than ``timespec``. At least in 153 this case ``last_write_time`` can be used to get and set all possible values 154 supported by the underlying filesystem; meaning ``last_write_time(p)`` will 155 never throw a overflow error when retrieving a value. 156 157 However, this introduces a new problem, where users are allowed to attempt to 158 create a time point beyond what the filesystem can represent. Two particular 159 values which cause this are ``file_time_type::min()`` and 160 ``file_time_type::max()``. As a result, the following code would throw: 161 162 .. code-block:: cpp 163 164 void test() { 165 last_write_time("/tmp/foo", file_time_type::max()); // Throws 166 last_write_time("/tmp/foo", file_time_type::min()); // Throws. 167 } 168 169 Apart from cases explicitly using ``min`` and ``max``, I don't see users taking 170 a valid time point, adding a couple hundred billions of years in error, 171 and then trying to update a file's write time to that value very often. 172 173 Compared to having a smaller range, this problem seems preferable. At least 174 now we can represent any time point the filesystem can, so users won't be forced 175 to revert back to system interfaces to avoid limitations in the C++ STL. 176 177 I posit that we should only consider this concern *after* we have something 178 with at least the same range and resolution of the underlying filesystem. The 179 latter two problems are much more important to solve. 180 181 Potential Solutions And Their Complications 182 =========================================== 183 184 Source Code Portability Across Implementations 185 ----------------------------------------------- 186 187 As we've discussed, ``file_time_type`` needs a representation that uses more 188 than 64 bits. The possible solutions include using ``__int128_t``, emulating a 189 128 bit integer using a class, or potentially defining a ``timespec`` like 190 arithmetic type. All three will allow us to, at minimum, match the range 191 and resolution, and the last one might even allow us to match them exactly. 192 193 But when considering these potential solutions we need to consider more than 194 just the values they can represent. We need to consider the effects they will 195 have on users and their code. For example, each of them breaks the following 196 code in some way: 197 198 .. code-block:: cpp 199 200 // Bug caused by an unexpected 'rep' type returned by count. 201 void print_time(path p) { 202 // __int128_t doesn't have streaming operators, and neither would our 203 // custom arithmetic types. 204 cout << last_write_time(p).time_since_epoch().count() << endl; 205 } 206 207 // Overflow during creation bug. 208 file_time_type timespec_to_file_time_type(struct timespec ts) { 209 // woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation 210 // this may overflow before it's converted to a file_time_type. 211 auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec); 212 return file_time_type(dur); 213 } 214 215 file_time_type correct_timespec_to_file_time_type(struct timespec ts) { 216 // This is the correct version of the above example, where we 217 // avoid using the chrono typedefs as they're not sufficient. 218 // Can we expect users to avoid this bug? 219 using fs_seconds = chrono::duration<file_time_type::rep>; 220 using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>; 221 auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec); 222 return file_time_type(dur); 223 } 224 225 // Implicit truncation during conversion bug. 226 intmax_t get_time_in_seconds(path p) { 227 using fs_seconds = duration<file_time_type::rep, ratio<1, 1> >; 228 auto tp = last_write_time(p); 229 230 // This works with truncation for __int128_t, but what does it do for 231 // our custom arithmetic types. 232 return duration_cast<fs_seconds>().count(); 233 } 234 235 236 Each of the above examples would require a user to adjust their filesystem code 237 to the particular eccentricities of the representation, hopefully only in such 238 a way that the code is still portable across implementations. 239 240 At least some of the above issues are unavoidable, no matter what 241 representation we choose. But some representations may be quirkier than others, 242 and, as I'll argue later, using an actual arithmetic type (``__int128_t``) 243 provides the least aberrant behavior. 244 245 246 Chrono and ``timespec`` Emulation. 247 ---------------------------------- 248 249 One of the options we've considered is using something akin to ``timespec`` 250 to represent the ``file_time_type``. It only seems natural seeing as that's 251 what the underlying system uses, and because it might allow us to match 252 the range and resolution exactly. But would it work with chrono? And could 253 it still act at all like a ``timespec`` struct? 254 255 For ease of consideration, let's consider what the implementation might 256 look like. 257 258 .. code-block:: cpp 259 260 struct fs_timespec_rep { 261 fs_timespec_rep(long long v) 262 : tv_sec(v / nano::den), tv_nsec(v % nano::den) 263 { } 264 private: 265 time_t tv_sec; 266 long tv_nsec; 267 }; 268 bool operator==(fs_timespec_rep, fs_timespec_rep); 269 fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep); 270 // ... arithmetic operators ... // 271 272 The first thing to notice is that we can't construct ``fs_timespec_rep`` like 273 a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to 274 constructing it from a single 64 bit integer. 275 276 We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values 277 directly. A ``chrono::duration`` represents its value as a tick period and a 278 number of ticks stored using ``rep``. The representation is unaware of the 279 tick period it is being used to represent, but ``timespec`` is setup to assume 280 a nanosecond tick period; which is the only case where the names ``tv_sec`` 281 and ``tv_nsec`` match the values they store. 282 283 When we convert a nanosecond duration to seconds, ``fs_timespec_rep`` will 284 use ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec`` the 285 remaining seconds. Let's consider how this might cause a bug were users allowed 286 to manipulate the fields directly. 287 288 .. code-block:: cpp 289 290 template <class Period> 291 timespec convert_to_timespec(duration<fs_time_rep, Period> dur) { 292 fs_timespec_rep rep = dur.count(); 293 return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds. 294 } 295 296 template <class Duration> 297 Duration convert_to_duration(timespec ts) { 298 Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds. 299 return file_time_type(dur); 300 file_time_type tp = last_write_time(p); 301 auto dur = 302 } 303 304 time_t extract_seconds(file_time_type tp) { 305 // Converting to seconds is a silly bug, but I could see it happening. 306 using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>; 307 auto secs = duration_cast<Secs>(tp.time_since_epoch()); 308 // tv_sec is now representing gigaseconds. 309 return secs.count().tv_sec; // Oops! 310 } 311 312 Despite ``fs_timespec_rep`` not being usable in any manner resembling 313 ``timespec``, it still might buy us our goal of matching its range exactly, 314 right? 315 316 Sort of. Chrono provides a specialization point which specifies the minimum 317 and maximum values for a custom representation. It looks like this: 318 319 .. code-block:: cpp 320 321 template <> 322 struct duration_values<fs_timespec_rep> { 323 static fs_timespec_rep zero(); 324 static fs_timespec_rep min(); 325 static fs_timespec_rep max() { // assume friendship. 326 fs_timespec_rep val; 327 val.tv_sec = numeric_limits<time_t>::max(); 328 val.tv_nsec = nano::den - 1; 329 return val; 330 } 331 }; 332 333 Notice that ``duration_values`` doesn't tell the representation what tick 334 period it's actually representing. This would indeed correctly limit the range 335 of ``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But 336 nanoseconds isn't the only tick period it will be used to represent. For 337 example: 338 339 .. code-block:: cpp 340 341 void test() { 342 using rep = file_time_type::rep; 343 using fs_nsec = duration<rep, nano>; 344 using fs_sec = duration<rep>; 345 fs_nsec nsecs(fs_seconds::max()); // Truncates 346 } 347 348 Though the above example may appear silly, I think it follows from the incorrect 349 notion that using a ``timespec`` rep in chrono actually makes it act as if it 350 were an actual ``timespec``. 351 352 Interactions with 32 bit ``time_t`` 353 ----------------------------------- 354 355 Up until now we've only be considering cases where ``time_t`` is 64 bits, but what 356 about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case 357 for 32 bit builds). 358 359 When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit 360 ``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation 361 involved. And nor should we, as it would suffer from the numerous complications 362 described by this paper. 363 364 Obviously our implementation for 32-bit builds should act as similarly to the 365 64-bit build as possible. Code which compiles in one, should compile in the other. 366 This consideration is important when choosing between ``__int128_t`` and 367 emulating ``timespec``. The solution which provides the most uniformity with 368 the least eccentricity is the preferable one. 369 370 Summary 371 ======= 372 373 The ``file_time_type`` time point is used to represent the write times for files. 374 Its job is to act as part of a C++ wrapper for less ideal system interfaces. The 375 underlying filesystem uses the ``timespec`` struct for the same purpose. 376 377 However, the initial implementation of ``file_time_type`` could not represent 378 either the range or resolution of ``timespec``, making it unsuitable. Fixing 379 this requires an implementation which uses more than 64 bits to store the 380 time point. 381 382 We primarily considered two solutions: Using ``__int128_t`` and using a 383 arithmetic emulation of ``timespec``. Each has its pros and cons, and both 384 come with more than one complication. 385 386 The Potential Solutions 387 ----------------------- 388 389 ``long long`` - The Status Quo 390 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 391 392 Pros: 393 394 * As a type ``long long`` plays the nicest with others: 395 396 * It works with streaming operators and other library entities which support 397 builtin integer types, but don't support ``__int128_t``. 398 * Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs. 399 400 Cons: 401 402 * It cannot provide the same resolution as ``timespec`` unless we limit it 403 to a range of +/- 300 years from the epoch. 404 * It cannot provide the same range as ``timespec`` unless we limit its resolution 405 to seconds. 406 * ``last_write_time`` has to report an error when the time reported by the filesystem 407 is unrepresentable. 408 409 __int128_t 410 ~~~~~~~~~~~ 411 412 Pros: 413 414 * It is an integer type. 415 * It makes the implementation simple and efficient. 416 * Acts exactly like other arithmetic types. 417 * Can be implicitly converted to a builtin integer type by the user. 418 419 * This is important for doing things like: 420 421 .. code-block:: cpp 422 423 void c_interface_using_time_t(const char* p, time_t); 424 425 void foo(path p) { 426 file_time_type tp = last_write_time(p); 427 time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count(); 428 c_interface_using_time_t(p.c_str(), secs); 429 } 430 431 Cons: 432 433 * It isn't always available (but on 64 bit machines, it normally is). 434 * It causes ``file_time_type`` to have a larger range than ``timespec``. 435 * It doesn't always act the same as other builtin integer types. For example 436 with ``cout`` or ``to_string``. 437 * Allows implicit truncation to 64 bit integers. 438 * It can be implicitly converted to a builtin integer type by the user, 439 truncating its value. 440 441 Arithmetic ``timespec`` Emulation 442 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 443 444 Pros: 445 446 * It has the exact same range and resolution of ``timespec`` when representing 447 a nanosecond tick period. 448 * It's always available, unlike ``__int128_t``. 449 450 Cons: 451 452 * It has a larger range when representing any period longer than a nanosecond. 453 * Doesn't actually allow users to use it like a ``timespec``. 454 * The required representation of using ``tv_sec`` to store the giga tick count 455 and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer, 456 but complicates a lot. 457 * It isn't a builtin integer type, and can't be used anything like one. 458 * Chrono can be made to work with it, but not nicely. 459 * Emulating arithmetic classes come with their own host of problems regarding 460 overload resolution (Each operator needs three SFINAE constrained versions of 461 it in order to act like builtin integer types). 462 * It offers little over simply using ``__int128_t``. 463 * It acts the most differently than implementations using an actual integer type, 464 which has a high chance of breaking source compatibility. 465 466 467 Selected Solution - Using ``__int128_t`` 468 ========================================= 469 470 The solution I selected for libc++ is using ``__int128_t`` when available, 471 and otherwise falling back to using ``long long`` with nanosecond precision. 472 473 When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation 474 provides same resolution and a greater range than ``timespec``. Otherwise 475 it still provides the same resolution, but is limited to a range of +/- 300 476 years. This final case should be rather rare, as ``__int128_t`` 477 is normally available in 64-bit builds, and ``time_t`` is normally 32-bits 478 during 32-bit builds. 479 480 Although falling back to ``long long`` and nanosecond precision is less than 481 ideal, it also happens to be the implementation provided by both libstdc++ 482 and MSVC. (So that makes it better, right?) 483 484 Although the ``timespec`` emulation solution is feasible and would largely 485 do what we want, it comes with too many complications, potential problems 486 and discrepancies when compared to "normal" chrono time points and durations. 487 488 An emulation of a builtin arithmetic type using a class is never going to act 489 exactly the same, and the difference will be felt by users. It's not reasonable 490 to expect them to tolerate and work around these differences. And once 491 we commit to an ABI it will be too late to change. Committing to this seems 492 risky. 493 494 Therefore, ``__int128_t`` seems like the better solution. 495