Home | History | Annotate | Download | only in DesignDocs
      1 ==============
      2 File Time Type
      3 ==============
      4 
      5 .. contents::
      6    :local:
      7 
      8 .. _file-time-type-motivation:
      9 
     10 Motivation
     11 ==========
     12 
     13 The filesystem library provides interfaces for getting and setting the last
     14 write time of a file or directory. The interfaces use the ``file_time_type``
     15 type, which is a specialization of ``chrono::time_point`` for the
     16 "filesystem clock". According to [fs.filesystem.syn]
     17 
     18   trivial-clock is an implementation-defined type that satisfies the
     19   Cpp17TrivialClock requirements ([time.clock.req]) and that is capable of
     20   representing and measuring file time values. Implementations should ensure
     21   that the resolution and range of file_time_type reflect the operating
     22   system dependent resolution and range of file time values.
     23 
     24 
     25 On POSIX systems, file times are represented using the ``timespec`` struct,
     26 which is defined as follows:
     27 
     28 .. code-block:: cpp
     29 
     30   struct timespec {
     31     time_t tv_sec;
     32     long   tv_nsec;
     33   };
     34 
     35 To represent the range and resolution of ``timespec``, we need to (A) have
     36 nanosecond resolution, and (B) use more than 64 bits (assuming a 64 bit ``time_t``).
     37 
     38 As the standard requires us to use the ``chrono`` interface, we have to define
     39 our own filesystem clock which specifies the period and representation of
     40 the time points and duration it provides. It will look like this:
     41 
     42 .. code-block:: cpp
     43 
     44   struct _FilesystemClock {
     45     using period = nano;
     46     using rep = TBD; // What is this?
     47 
     48     using duration = chrono::duration<rep, period>;
     49     using time_point = chrono::time_point<_FilesystemClock>;
     50 
     51     // ... //
     52   };
     53 
     54   using file_time_type = _FilesystemClock::time_point;
     55 
     56 
     57 To get nanosecond resolution, we simply define ``period`` to be ``std::nano``.
     58 But what type can we use as the arithmetic representation that is capable
     59 of representing the range of the ``timespec`` struct?
     60 
     61 Problems To Consider
     62 ====================
     63 
     64 Before considering solutions, let's consider the problems they should solve,
     65 and how important solving those problems are:
     66 
     67 
     68 Having a Smaller Range than ``timespec``
     69 ----------------------------------------
     70 
     71 One solution to the range problem is to simply reduce the resolution of
     72 ``file_time_type`` to be less than that of nanoseconds. This is what libc++'s
     73 initial implementation of ``file_time_type`` did; it's also what
     74 ``std::system_clock`` does. As a result, it can represent time points about
     75 292 thousand years on either side of the epoch, as opposed to only 292 years
     76 at nanosecond resolution.
     77 
     78 ``timespec`` can represent time points +/- 292 billion years from the epoch
     79 (just in case you needed a time point 200 billion years before the big bang,
     80 and with nanosecond resolution).
     81 
     82 To get the same range, we would need to drop our resolution to that of seconds
     83 to come close to having the same range.
     84 
     85 This begs the question, is the range problem "really a problem"? Sane usages
     86 of file time stamps shouldn't exceed +/- 300 years, so should we care to support it?
     87 
     88 I believe the answer is yes. We're not designing the filesystem time API, we're
     89 providing glorified C++ wrappers for it. If the underlying API supports
     90 a value, then we should too. Our wrappers should not place artificial restrictions
     91 on users that are not present in the underlying filesystem.
     92 
     93 Having a smaller range that the underlying filesystem forces the
     94 implementation to report ``value_too_large`` errors when it encounters a time
     95 point that it can't represent. This can cause the call to ``last_write_time``
     96 to throw in cases where the user was confident the call should succeed. (See below)
     97 
     98 
     99 .. code-block:: cpp
    100 
    101   #include <filesystem>
    102   using namespace std::filesystem;
    103 
    104   // Set the times using the system interface.
    105   void set_file_times(const char* path, struct timespec ts) {
    106     timespec both_times[2];
    107     both_times[0] = ts;
    108     both_times[1] = ts;
    109     int result = ::utimensat(AT_FDCWD, path, both_times, 0);
    110     assert(result != -1);
    111   }
    112 
    113   // Called elsewhere to set the file time to something insane, and way
    114   // out of the 300 year range we might expect.
    115   void some_bad_persons_code() {
    116     struct timespec new_times;
    117     new_times.tv_sec = numeric_limits<time_t>::max();
    118     new_times.tv_nsec = 0;
    119     set_file_times("/tmp/foo", new_times); // OK, supported by most FSes
    120   }
    121 
    122   int main() {
    123     path p = "/tmp/foo";
    124     file_status st = status(p);
    125     if (!exists(st) || !is_regular_file(st))
    126       return 1;
    127     if ((st.permissions() & perms::others_read) == perms::none)
    128       return 1;
    129     // It seems reasonable to assume this call should succeed.
    130     file_time_type tp = last_write_time(p); // BAD! Throws value_too_large.
    131   }
    132 
    133 
    134 Having a Smaller Resolution than ``timespec``
    135 ---------------------------------------------
    136 
    137 As mentioned in the previous section, one way to solve the range problem
    138 is by reducing the resolution. But matching the range of ``timespec`` using a
    139 64 bit representation requires limiting the resolution to seconds.
    140 
    141 So we might ask: Do users "need" nanosecond precision? Is seconds not good enough?
    142 I limit my consideration of the point to this: Why was it not good enough for
    143 the underlying system interfaces? If it wasn't good enough for them, then it
    144 isn't good enough for us. Our job is to match the filesystems range and
    145 representation, not design it.
    146 
    147 
    148 Having a Larger Range than ``timespec``
    149 ----------------------------------------
    150 
    151 We should also consider the opposite problem of having a ``file_time_type``
    152 that is able to represent a larger range than ``timespec``. At least in
    153 this case ``last_write_time`` can be used to get and set all possible values
    154 supported by the underlying filesystem; meaning ``last_write_time(p)`` will
    155 never throw a overflow error when retrieving a value.
    156 
    157 However, this introduces a new problem, where users are allowed to attempt to
    158 create a time point beyond what the filesystem can represent. Two particular
    159 values which cause this are ``file_time_type::min()`` and
    160 ``file_time_type::max()``. As a result, the following code would throw:
    161 
    162 .. code-block:: cpp
    163 
    164   void test() {
    165     last_write_time("/tmp/foo", file_time_type::max()); // Throws
    166     last_write_time("/tmp/foo", file_time_type::min()); // Throws.
    167   }
    168 
    169 Apart from cases explicitly using ``min`` and ``max``, I don't see users taking
    170 a valid time point, adding a couple hundred billions of years in error,
    171 and then trying to update a file's write time to that value very often.
    172 
    173 Compared to having a smaller range, this problem seems preferable. At least
    174 now we can represent any time point the filesystem can, so users won't be forced
    175 to revert back to system interfaces to avoid limitations in the C++ STL.
    176 
    177 I posit that we should only consider this concern *after* we have something
    178 with at least the same range and resolution of the underlying filesystem. The
    179 latter two problems are much more important to solve.
    180 
    181 Potential Solutions And Their Complications
    182 ===========================================
    183 
    184 Source Code Portability Across Implementations
    185 -----------------------------------------------
    186 
    187 As we've discussed, ``file_time_type`` needs a representation that uses more
    188 than 64 bits. The possible solutions include using ``__int128_t``, emulating a
    189 128 bit integer using a class, or potentially defining a ``timespec`` like
    190 arithmetic type. All three will allow us to, at minimum, match the range
    191 and resolution, and the last one might even allow us to match them exactly.
    192 
    193 But when considering these potential solutions we need to consider more than
    194 just the values they can represent. We need to consider the effects they will
    195 have on users and their code. For example, each of them breaks the following
    196 code in some way:
    197 
    198 .. code-block:: cpp
    199 
    200   // Bug caused by an unexpected 'rep' type returned by count.
    201   void print_time(path p) {
    202     // __int128_t doesn't have streaming operators, and neither would our
    203     // custom arithmetic types.
    204     cout << last_write_time(p).time_since_epoch().count() << endl;
    205   }
    206 
    207   // Overflow during creation bug.
    208   file_time_type timespec_to_file_time_type(struct timespec ts) {
    209     // woops! chrono::seconds and chrono::nanoseconds use a 64 bit representation
    210     // this may overflow before it's converted to a file_time_type.
    211     auto dur = seconds(ts.tv_sec) + nanoseconds(ts.tv_nsec);
    212     return file_time_type(dur);
    213   }
    214 
    215   file_time_type correct_timespec_to_file_time_type(struct timespec ts) {
    216     // This is the correct version of the above example, where we
    217     // avoid using the chrono typedefs as they're not sufficient.
    218     // Can we expect users to avoid this bug?
    219     using fs_seconds = chrono::duration<file_time_type::rep>;
    220     using fs_nanoseconds = chrono::duration<file_time_type::rep, nano>;
    221     auto dur = fs_seconds(ts.tv_sec) + fs_nanoseconds(tv.tv_nsec);
    222     return file_time_type(dur);
    223   }
    224 
    225   // Implicit truncation during conversion bug.
    226   intmax_t get_time_in_seconds(path p) {
    227     using fs_seconds = duration<file_time_type::rep, ratio<1, 1> >;
    228     auto tp = last_write_time(p);
    229 
    230     // This works with truncation for __int128_t, but what does it do for
    231     // our custom arithmetic types.
    232     return duration_cast<fs_seconds>().count();
    233   }
    234 
    235 
    236 Each of the above examples would require a user to adjust their filesystem code
    237 to the particular eccentricities of the representation, hopefully only in such
    238 a way that the code is still portable across implementations.
    239 
    240 At least some of the above issues are unavoidable, no matter what
    241 representation we choose. But some representations may be quirkier than others,
    242 and, as I'll argue later, using an actual arithmetic type (``__int128_t``)
    243 provides the least aberrant behavior.
    244 
    245 
    246 Chrono and ``timespec`` Emulation.
    247 ----------------------------------
    248 
    249 One of the options we've considered is using something akin to ``timespec``
    250 to represent the ``file_time_type``. It only seems natural seeing as that's
    251 what the underlying system uses, and because it might allow us to match
    252 the range and resolution exactly. But would it work with chrono? And could
    253 it still act at all like a ``timespec`` struct?
    254 
    255 For ease of consideration, let's consider what the implementation might
    256 look like.
    257 
    258 .. code-block:: cpp
    259 
    260   struct fs_timespec_rep {
    261     fs_timespec_rep(long long v)
    262       : tv_sec(v / nano::den), tv_nsec(v % nano::den)
    263     { }
    264   private:
    265     time_t tv_sec;
    266     long tv_nsec;
    267   };
    268   bool operator==(fs_timespec_rep, fs_timespec_rep);
    269   fs_int128_rep operator+(fs_timespec_rep, fs_timespec_rep);
    270   // ... arithmetic operators ... //
    271 
    272 The first thing to notice is that we can't construct ``fs_timespec_rep`` like
    273 a ``timespec`` by passing ``{secs, nsecs}``. Instead we're limited to
    274 constructing it from a single 64 bit integer.
    275 
    276 We also can't allow the user to inspect the ``tv_sec`` or ``tv_nsec`` values
    277 directly. A ``chrono::duration`` represents its value as a tick period and a
    278 number of ticks stored using ``rep``. The representation is unaware of the
    279 tick period it is being used to represent, but ``timespec`` is setup to assume
    280 a nanosecond tick period; which is the only case where the names ``tv_sec``
    281 and ``tv_nsec`` match the values they store.
    282 
    283 When we convert a nanosecond duration to seconds, ``fs_timespec_rep`` will
    284 use ``tv_sec`` to represent the number of giga seconds, and ``tv_nsec`` the
    285 remaining seconds. Let's consider how this might cause a bug were users allowed
    286 to manipulate the fields directly.
    287 
    288 .. code-block:: cpp
    289 
    290   template <class Period>
    291   timespec convert_to_timespec(duration<fs_time_rep, Period> dur) {
    292     fs_timespec_rep rep = dur.count();
    293     return {rep.tv_sec, rep.tv_nsec}; // Oops! Period may not be nanoseconds.
    294   }
    295 
    296   template <class Duration>
    297   Duration convert_to_duration(timespec ts) {
    298     Duration dur({ts.tv_sec, ts.tv_nsec}); // Oops! Period may not be nanoseconds.
    299     return file_time_type(dur);
    300     file_time_type tp = last_write_time(p);
    301     auto dur =
    302   }
    303 
    304   time_t extract_seconds(file_time_type tp) {
    305     // Converting to seconds is a silly bug, but I could see it happening.
    306     using SecsT = chrono::duration<file_time_type::rep, ratio<1, 1>>;
    307     auto secs = duration_cast<Secs>(tp.time_since_epoch());
    308     // tv_sec is now representing gigaseconds.
    309     return secs.count().tv_sec; // Oops!
    310   }
    311 
    312 Despite ``fs_timespec_rep`` not being usable in any manner resembling
    313 ``timespec``, it still might buy us our goal of matching its range exactly,
    314 right?
    315 
    316 Sort of. Chrono provides a specialization point which specifies the minimum
    317 and maximum values for a custom representation. It looks like this:
    318 
    319 .. code-block:: cpp
    320 
    321   template <>
    322   struct duration_values<fs_timespec_rep> {
    323     static fs_timespec_rep zero();
    324     static fs_timespec_rep min();
    325     static fs_timespec_rep max() { // assume friendship.
    326       fs_timespec_rep val;
    327       val.tv_sec = numeric_limits<time_t>::max();
    328       val.tv_nsec = nano::den - 1;
    329       return val;
    330     }
    331   };
    332 
    333 Notice that ``duration_values`` doesn't tell the representation what tick
    334 period it's actually representing. This would indeed correctly limit the range
    335 of ``duration<fs_timespec_rep, nano>`` to exactly that of ``timespec``. But
    336 nanoseconds isn't the only tick period it will be used to represent. For
    337 example:
    338 
    339 .. code-block:: cpp
    340 
    341   void test() {
    342     using rep = file_time_type::rep;
    343     using fs_nsec = duration<rep, nano>;
    344     using fs_sec = duration<rep>;
    345     fs_nsec nsecs(fs_seconds::max()); // Truncates
    346   }
    347 
    348 Though the above example may appear silly, I think it follows from the incorrect
    349 notion that using a ``timespec`` rep in chrono actually makes it act as if it
    350 were an actual ``timespec``.
    351 
    352 Interactions with 32 bit ``time_t``
    353 -----------------------------------
    354 
    355 Up until now we've only be considering cases where ``time_t`` is 64 bits, but what
    356 about 32 bit systems/builds where ``time_t`` is 32 bits? (this is the common case
    357 for 32 bit builds).
    358 
    359 When ``time_t`` is 32 bits, we can implement ``file_time_type`` simply using 64-bit
    360 ``long long``. There is no need to get either ``__int128_t`` or ``timespec`` emulation
    361 involved. And nor should we, as it would suffer from the numerous complications
    362 described by this paper.
    363 
    364 Obviously our implementation for 32-bit builds should act as similarly to the
    365 64-bit build as possible. Code which compiles in one, should compile in the other.
    366 This consideration is important when choosing between ``__int128_t`` and
    367 emulating ``timespec``. The solution which provides the most uniformity with
    368 the least eccentricity is the preferable one.
    369 
    370 Summary
    371 =======
    372 
    373 The ``file_time_type`` time point is used to represent the write times for files.
    374 Its job is to act as part of a C++ wrapper for less ideal system interfaces. The
    375 underlying filesystem uses the ``timespec`` struct for the same purpose.
    376 
    377 However, the initial implementation of ``file_time_type`` could not represent
    378 either the range or resolution of ``timespec``, making it unsuitable. Fixing
    379 this requires an implementation which uses more than 64 bits to store the
    380 time point.
    381 
    382 We primarily considered two solutions: Using ``__int128_t`` and using a
    383 arithmetic emulation of ``timespec``. Each has its pros and cons, and both
    384 come with more than one complication.
    385 
    386 The Potential Solutions
    387 -----------------------
    388 
    389 ``long long`` - The Status Quo
    390 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    391 
    392 Pros:
    393 
    394 * As a type ``long long`` plays the nicest with others:
    395 
    396   * It works with streaming operators and other library entities which support
    397     builtin integer types, but don't support ``__int128_t``.
    398   * Its the representation used by chrono's ``nanosecond`` and ``second`` typedefs.
    399 
    400 Cons:
    401 
    402 * It cannot provide the same resolution as ``timespec`` unless we limit it
    403   to a range of +/- 300 years from the epoch.
    404 * It cannot provide the same range as ``timespec`` unless we limit its resolution
    405   to seconds.
    406 * ``last_write_time`` has to report an error when the time reported by the filesystem
    407   is unrepresentable.
    408 
    409 __int128_t
    410 ~~~~~~~~~~~
    411 
    412 Pros:
    413 
    414 * It is an integer type.
    415 * It makes the implementation simple and efficient.
    416 * Acts exactly like other arithmetic types.
    417 * Can be implicitly converted to a builtin integer type by the user.
    418 
    419   * This is important for doing things like:
    420 
    421     .. code-block:: cpp
    422 
    423       void c_interface_using_time_t(const char* p, time_t);
    424 
    425       void foo(path p) {
    426         file_time_type tp = last_write_time(p);
    427         time_t secs = duration_cast<seconds>(tp.time_since_epoch()).count();
    428         c_interface_using_time_t(p.c_str(), secs);
    429       }
    430 
    431 Cons:
    432 
    433 * It isn't always available (but on 64 bit machines, it normally is).
    434 * It causes ``file_time_type`` to have a larger range than ``timespec``.
    435 * It doesn't always act the same as other builtin integer types. For example
    436   with ``cout`` or ``to_string``.
    437 * Allows implicit truncation to 64 bit integers.
    438 * It can be implicitly converted to a builtin integer type by the user,
    439   truncating its value.
    440 
    441 Arithmetic ``timespec`` Emulation
    442 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    443 
    444 Pros:
    445 
    446 * It has the exact same range and resolution of ``timespec`` when representing
    447   a nanosecond tick period.
    448 * It's always available, unlike ``__int128_t``.
    449 
    450 Cons:
    451 
    452 * It has a larger range when representing any period longer than a nanosecond.
    453 * Doesn't actually allow users to use it like a ``timespec``.
    454 * The required representation of using ``tv_sec`` to store the giga tick count
    455   and ``tv_nsec`` to store the remainder adds nothing over a 128 bit integer,
    456   but complicates a lot.
    457 * It isn't a builtin integer type, and can't be used anything like one.
    458 * Chrono can be made to work with it, but not nicely.
    459 * Emulating arithmetic classes come with their own host of problems regarding
    460   overload resolution (Each operator needs three SFINAE constrained versions of
    461   it in order to act like builtin integer types).
    462 * It offers little over simply using ``__int128_t``.
    463 * It acts the most differently than implementations using an actual integer type,
    464   which has a high chance of breaking source compatibility.
    465 
    466 
    467 Selected Solution - Using ``__int128_t``
    468 =========================================
    469 
    470 The solution I selected for libc++ is using ``__int128_t`` when available,
    471 and otherwise falling back to using ``long long`` with nanosecond precision.
    472 
    473 When ``__int128_t`` is available, or when ``time_t`` is 32-bits, the implementation
    474 provides same resolution and a greater range than ``timespec``. Otherwise
    475 it still provides the same resolution, but is limited to a range of +/- 300
    476 years. This final case should be rather rare, as ``__int128_t``
    477 is normally available in 64-bit builds, and ``time_t`` is normally 32-bits
    478 during 32-bit builds.
    479 
    480 Although falling back to ``long long`` and nanosecond precision is less than
    481 ideal, it also happens to be the implementation provided by both libstdc++
    482 and MSVC. (So that makes it better, right?)
    483 
    484 Although the ``timespec`` emulation solution is feasible and would largely
    485 do what we want, it comes with too many complications, potential problems
    486 and discrepancies when compared to "normal" chrono time points and durations.
    487 
    488 An emulation of a builtin arithmetic type using a class is never going to act
    489 exactly the same, and the difference will be felt by users. It's not reasonable
    490 to expect them to tolerate and work around these differences. And once
    491 we commit to an ABI it will be too late to change. Committing to this seems
    492 risky.
    493 
    494 Therefore, ``__int128_t`` seems like the better solution.
    495