Home | History | Annotate | Download | only in Proposals
      1 ==============================
      2 Moving LLVM Projects to GitHub
      3 ==============================
      4 
      5 .. contents:: Table of Contents
      6   :depth: 4
      7   :local:
      8 
      9 Introduction
     10 ============
     11 
     12 This is a proposal to move our current revision control system from our own
     13 hosted Subversion to GitHub. Below are the financial and technical arguments as
     14 to why we are proposing such a move and how people (and validation
     15 infrastructure) will continue to work with a Git-based LLVM.
     16 
     17 There will be a survey pointing at this document which we'll use to gauge the
     18 community's reaction and, if we collectively decide to move, the time-frame. Be
     19 sure to make your view count.
     20 
     21 Additionally, we will discuss this during a BoF at the next US LLVM Developer
     22 meeting (http://llvm.org/devmtg/2016-11/).
     23 
     24 What This Proposal is *Not* About
     25 =================================
     26 
     27 Changing the development policy.
     28 
     29 This proposal relates only to moving the hosting of our source-code repository
     30 from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
     31 using GitHub's issue tracker, pull-requests, or code-review.
     32 
     33 Contributors will continue to earn commit access on demand under the Developer
     34 Policy, except that that a GitHub account will be required instead of SVN
     35 username/password-hash.
     36 
     37 Why Git, and Why GitHub?
     38 ========================
     39 
     40 Why Move At All?
     41 ----------------
     42 
     43 This discussion began because we currently host our own Subversion server
     44 and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
     45 provides limited support, but there is only so much it can do.
     46 
     47 Volunteers are not sysadmins themselves, but compiler engineers that happen
     48 to know a thing or two about hosting servers. We also don't have 24/7 support,
     49 and we sometimes wake up to see that continuous integration is broken because
     50 the SVN server is either down or unresponsive.
     51 
     52 We should take advantage of one of the services out there (GitHub, GitLab,
     53 and BitBucket, among others) that offer better service (24/7 stability, disk
     54 space, Git server, code browsing, forking facilities, etc) for free.
     55 
     56 Why Git?
     57 --------
     58 
     59 Many new coders nowadays start with Git, and a lot of people have never used
     60 SVN, CVS, or anything else. Websites like GitHub have changed the landscape
     61 of open source contributions, reducing the cost of first contribution and
     62 fostering collaboration.
     63 
     64 Git is also the version control many LLVM developers use. Despite the
     65 sources being stored in a SVN server, these developers are already using Git
     66 through the Git-SVN integration.
     67 
     68 Git allows you to:
     69 
     70 * Commit, squash, merge, and fork locally without touching the remote server.
     71 * Maintain local branches, enabling multiple threads of development.
     72 * Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
     73 * Inspect the repository history (blame, log, bisect) without Internet access.
     74 * Maintain remote forks and branches on Git hosting services and
     75   integrate back to the main repository.
     76 
     77 In addition, because Git seems to be replacing many OSS projects' version
     78 control systems, there are many tools that are built over Git.
     79 Future tooling may support Git first (if not only).
     80 
     81 Why GitHub?
     82 -----------
     83 
     84 GitHub, like GitLab and BitBucket, provides free code hosting for open source
     85 projects. Any of these could replace the code-hosting infrastructure that we
     86 have today.
     87 
     88 These services also have a dedicated team to monitor, migrate, improve and
     89 distribute the contents of the repositories depending on region and load.
     90 
     91 GitHub has one important advantage over GitLab and
     92 BitBucket: it offers read-write **SVN** access to the repository
     93 (https://github.com/blog/626-announcing-svn-support).
     94 This would enable people to continue working post-migration as though our code
     95 were still canonically in an SVN repository.
     96 
     97 In addition, there are already multiple LLVM mirrors on GitHub, indicating that
     98 part of our community has already settled there.
     99 
    100 On Managing Revision Numbers with Git
    101 -------------------------------------
    102 
    103 The current SVN repository hosts all the LLVM sub-projects alongside each other.
    104 A single revision number (e.g. r123456) thus identifies a consistent version of
    105 all LLVM sub-projects.
    106 
    107 Git does not use sequential integer revision number but instead uses a hash to
    108 identify each commit. (Linus mentioned that the lack of such revision number
    109 is "the only real design mistake" in Git [TorvaldRevNum]_.)
    110 
    111 The loss of a sequential integer revision number has been a sticking point in
    112 past discussions about Git:
    113 
    114 - "The 'branch' I most care about is mainline, and losing the ability to say
    115   'fixed in r1234' (with some sort of monotonically increasing number) would
    116   be a tragic loss." [LattnerRevNum]_
    117 - "I like those results sorted by time and the chronology should be obvious, but
    118   timestamps are incredibly cumbersome and make it difficult to verify that a
    119   given checkout matches a given set of results." [TrickRevNum]_
    120 - "There is still the major regression with unreadable version numbers.
    121   Given the amount of Bugzilla traffic with 'Fixed in...', that's a
    122   non-trivial issue." [JSonnRevNum]_
    123 - "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.
    124 
    125 However, Git can emulate this increasing revision number:
    126 ``git rev-list --count <commit-hash>``. This identifier is unique only
    127 within a single branch, but this means the tuple `(num, branch-name)` uniquely
    128 identifies a commit.
    129 
    130 We can thus use this revision number to ensure that e.g. `clang -v` reports a
    131 user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
    132 the objections raised above with respect to this aspect of Git.
    133 
    134 What About Branches and Merges?
    135 -------------------------------
    136 
    137 In contrast to SVN, Git makes branching easy. Git's commit history is
    138 represented as a DAG, a departure from SVN's linear history. However, we propose
    139 to mandate making merge commits illegal in our canonical Git repository.
    140 
    141 Unfortunately, GitHub does not support server side hooks to enforce such a
    142 policy.  We must rely on the community to avoid pushing merge commits.
    143 
    144 GitHub offers a feature called `Status Checks`: a branch protected by
    145 `status checks` requires commits to be whitelisted before the push can happen.
    146 We could supply a pre-push hook on the client side that would run and check the
    147 history, before whitelisting the commit being pushed [statuschecks]_.
    148 However this solution would be somewhat fragile (how do you update a script
    149 installed on every developer machine?) and prevents SVN access to the
    150 repository.
    151 
    152 What About Commit Emails?
    153 -------------------------
    154 
    155 We will need a new bot to send emails for each commit. This proposal leaves the
    156 email format unchanged besides the commit URL.
    157 
    158 Straw Man Migration Plan
    159 ========================
    160 
    161 Step #1 : Before The Move
    162 -------------------------
    163 
    164 1. Update docs to mention the move, so people are aware of what is going on.
    165 2. Set up a read-only version of the GitHub project, mirroring our current SVN
    166    repository.
    167 3. Add the required bots to implement the commit emails, as well as the
    168    umbrella repository update (if the multirepo is selected) or the read-only
    169    Git views for the sub-projects (if the monorepo is selected).
    170 
    171 Step #2 : Git Move
    172 ------------------
    173 
    174 4. Update the buildbots to pick up updates and commits from the GitHub
    175    repository. Not all bots have to migrate at this point, but it'll help
    176    provide infrastructure testing.
    177 5. Update Phabricator to pick up commits from the GitHub repository.
    178 6. LNT and llvmlab have to be updated: they rely on unique monotonically
    179    increasing integer across branch [MatthewsRevNum]_.
    180 7. Instruct downstream integrators to pick up commits from the GitHub
    181    repository.
    182 8. Review and prepare an update for the LLVM documentation.
    183 
    184 Until this point nothing has changed for developers, it will just
    185 boil down to a lot of work for buildbot and other infrastructure
    186 owners.
    187 
    188 The migration will pause here until all dependencies have cleared, and all
    189 problems have been solved.
    190 
    191 Step #3: Write Access Move
    192 --------------------------
    193 
    194 9. Collect developers' GitHub account information, and add them to the project.
    195 10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
    196 11. Update the documentation.
    197 12. Mirror Git to SVN.
    198 
    199 Step #4 : Post Move
    200 -------------------
    201 
    202 13. Archive the SVN repository.
    203 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
    204     point to GitHub instead.
    205 
    206 One or Multiple Repositories?
    207 =============================
    208 
    209 There are two major variants for how to structure our Git repository: The
    210 "multirepo" and the "monorepo".
    211 
    212 Multirepo Variant
    213 -----------------
    214 
    215 This variant recommends moving each LLVM sub-project to a separate Git
    216 repository. This mimics the existing official read-only Git repositories
    217 (e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical
    218 repositories for each sub-project.
    219 
    220 This will allow the individual sub-projects to remain distinct: a
    221 developer interested only in compiler-rt can checkout only this repository,
    222 build it, and work in isolation of the other sub-projects.
    223 
    224 A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or
    225 clang+llvm+libcxx for example) at a specific revision.
    226 
    227 A tuple of revisions (one entry per repository) accurately describes the state
    228 across the sub-projects.
    229 For example, a given version of clang would be
    230 *<LLVM-12345, clang-5432, libcxx-123, etc.>*.
    231 
    232 Umbrella Repository
    233 ^^^^^^^^^^^^^^^^^^^
    234 
    235 To make this more convenient, a separate *umbrella* repository will be
    236 provided. This repository will be used for the sole purpose of understanding
    237 the sequence in which commits were pushed to the different repositories and to
    238 provide a single revision number.
    239 
    240 This umbrella repository will be read-only and continuously updated
    241 to record the above tuple. The proposed form to record this is to use Git
    242 [submodules]_, possibly along with a set of scripts to help check out a
    243 specific revision of the LLVM distribution.
    244 
    245 A regular LLVM developer does not need to interact with the umbrella repository
    246 -- the individual repositories can be checked out independently -- but you would
    247 need to use the umbrella repository to bisect multiple sub-projects at the same
    248 time, or to check-out old revisions of LLVM with another sub-project at a
    249 consistent state.
    250 
    251 This umbrella repository will be updated automatically by a bot (running on
    252 notice from a webhook on every push, and periodically) on a per commit basis: a
    253 single commit in the umbrella repository would match a single commit in a
    254 sub-project.
    255 
    256 Living Downstream
    257 ^^^^^^^^^^^^^^^^^
    258 
    259 Downstream SVN users can use the read/write SVN bridges with the following
    260 caveats:
    261 
    262  * Be prepared for a one-time change to the upstream revision numbers.
    263  * The upstream sub-project revision numbers will no longer be in sync.
    264 
    265 Downstream Git users can continue without any major changes, with the minor
    266 change of upstreaming using `git push` instead of `git svn dcommit`.
    267 
    268 Git users also have the option of adopting an umbrella repository downstream.
    269 The tooling for the upstream umbrella can easily be reused for downstream needs,
    270 incorporating extra sub-projects and branching in parallel with sub-project
    271 branches.
    272 
    273 Multirepo Preview
    274 ^^^^^^^^^^^^^^^^^
    275 
    276 As a preview (disclaimer: this rough prototype, not polished and not
    277 representative of the final solution), you can look at the following:
    278 
    279   * Repository: https://github.com/llvm-beanz/llvm-submodules
    280   * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/
    281 
    282 Concerns
    283 ^^^^^^^^
    284 
    285  * Because GitHub does not allow server-side hooks, and because there is no
    286    "push timestamp" in Git, the umbrella repository sequence isn't totally
    287    exact: commits from different repositories pushed around the same time can
    288    appear in different orders. However, we don't expect it to be the common case
    289    or to cause serious issues in practice.
    290  * You can't have a single cross-projects commit that would update both LLVM and
    291    other sub-projects (something that can be achieved now). It would be possible
    292    to establish a protocol whereby users add a special token to their commit
    293    messages that causes the umbrella repo's updater bot to group all of them
    294    into a single revision.
    295  * Another option is to group commits that were pushed closely enough together
    296    in the umbrella repository. This has the advantage of allowing cross-project
    297    commits, and is less sensitive to mis-ordering commits. However, this has the
    298    potential to group unrelated commits together, especially if the bot goes
    299    down and needs to catch up.
    300  * This variant relies on heavier tooling. But the current prototype shows that
    301    it is not out-of-reach.
    302  * Submodules don't have a good reputation / are complicating the command line.
    303    However, in the proposed setup, a regular developer will seldom interact with
    304    submodules directly, and certainly never update them.
    305  * Refactoring across projects is not friendly: taking some functions from clang
    306    to make it part of a utility in libSupport wouldn't carry the history of the
    307    code in the llvm repo, preventing recursively applying `git blame` for
    308    instance. However, this is not very different than how most people are
    309    Interacting with the repository today, by splitting such change in multiple
    310    commits.
    311 
    312 Workflows
    313 ^^^^^^^^^
    314 
    315  * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
    316  * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-multicheckout-nocommit>`.
    317  * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-multicheckout-multicommit>`.
    318  * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
    319  * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-multi-branching>`.
    320  * :ref:`Bisecting <workflow-multi-bisecting>`.
    321 
    322 Monorepo Variant
    323 ----------------
    324 
    325 This variant recommends moving all LLVM sub-projects to a single Git repository,
    326 similar to https://github.com/llvm-project/llvm-project.
    327 This would mimic an export of the current SVN repository, with each sub-project
    328 having its own top-level directory.
    329 Not all sub-projects are used for building toolchains. In practice, www/
    330 and test-suite/ will probably stay out of the monorepo.
    331 
    332 Putting all sub-projects in a single checkout makes cross-project refactoring
    333 naturally simple:
    334 
    335  * New sub-projects can be trivially split out for better reuse and/or layering
    336    (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
    337    dependency on LLVM).
    338  * Changing an API in LLVM and upgrading the sub-projects will always be done in
    339    a single commit, designing away a common source of temporary build breakage.
    340  * Moving code across sub-project (during refactoring for instance) in a single
    341    commit enables accurate `git blame` when tracking code change history.
    342  * Tooling based on `git grep` works natively across sub-projects, allowing to
    343    easier find refactoring opportunities across projects (for example reusing a
    344    datastructure initially in LLDB by moving it into libSupport).
    345  * Having all the sources present encourages maintaining the other sub-projects
    346    when changing API.
    347 
    348 Finally, the monorepo maintains the property of the existing SVN repository that
    349 the sub-projects move synchronously, and a single revision number (or commit
    350 hash) identifies the state of the development across all projects.
    351 
    352 .. _build_single_project:
    353 
    354 Building a single sub-project
    355 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    356 
    357 Nobody will be forced to build unnecessary projects.  The exact structure
    358 is TBD, but making it trivial to configure builds for a single sub-project
    359 (or a subset of sub-projects) is a hard requirement.
    360 
    361 As an example, it could look like the following::
    362 
    363   mkdir build && cd build
    364   # Configure only LLVM (default)
    365   cmake path/to/monorepo
    366   # Configure LLVM and lld
    367   cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
    368   # Configure LLVM and clang
    369   cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang
    370 
    371 .. _git-svn-mirror:
    372 
    373 Read/write sub-project mirrors
    374 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    375 
    376 With the Monorepo, the existing single-subproject mirrors (e.g.
    377 http://llvm.org/git/compiler-rt.git) with git-svn read-write access would
    378 continue to be maintained: developers would continue to be able to use the
    379 existing single-subproject git repositories as they do today, with *no changes
    380 to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to
    381 work identically to how it works today. The monorepo can be set-up such that the
    382 SVN revision number matches the SVN revision in the GitHub SVN-bridge.
    383 
    384 Living Downstream
    385 ^^^^^^^^^^^^^^^^^
    386 
    387 Downstream SVN users can use the read/write SVN bridge. The SVN revision
    388 number can be preserved in the monorepo, minimizing the impact.
    389 
    390 Downstream Git users can continue without any major changes, by using the
    391 git-svn mirrors on top of the SVN bridge.
    392 
    393 Git users can also work upstream with monorepo even if their downstream
    394 fork has split repositories.  They can apply patches in the appropriate
    395 subdirectories of the monorepo using, e.g., `git am --directory=...`, or
    396 plain `diff` and `patch`.
    397 
    398 Alternatively, Git users can migrate their own fork to the monorepo.  As a
    399 demonstration, we've migrated the "CHERI" fork to the monorepo in two ways:
    400 
    401  * Using a script that rewrites history (including merges) so that it looks
    402    like the fork always lived in the monorepo [LebarCHERI]_.  The upside of
    403    this is when you check out an old revision, you get a copy of all llvm
    404    sub-projects at a consistent revision.  (For instance, if it's a clang
    405    fork, when you check out an old revision you'll get a consistent version
    406    of llvm proper.)  The downside is that this changes the fork's commit
    407    hashes.
    408 
    409  * Merging the fork into the monorepo [AminiCHERI]_.  This preserves the
    410    fork's commit hashes, but when you check out an old commit you only get
    411    the one sub-project.
    412 
    413 Monorepo Preview
    414 ^^^^^^^^^^^^^^^^^
    415 
    416 As a preview (disclaimer: this rough prototype, not polished and not
    417 representative of the final solution), you can look at the following:
    418 
    419   * Full Repository: https://github.com/joker-eph/llvm-project
    420   * Single sub-project view with *SVN write access* to the full repo:
    421     https://github.com/joker-eph/compiler-rt
    422 
    423 Concerns
    424 ^^^^^^^^
    425 
    426  * Using the monolithic repository may add overhead for those contributing to a
    427    standalone sub-project, particularly on runtimes like libcxx and compiler-rt
    428    that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs.
    429    1GB for the monorepo), and the commit rate of LLVM may cause more frequent
    430    `git push` collisions when upstreaming. Affected contributors can continue to
    431    use the SVN bridge or the single-subproject Git mirrors with git-svn for
    432    read-write.
    433  * Using the monolithic repository may add overhead for those *integrating* a
    434    standalone sub-project, even if they aren't contributing to it, due to the
    435    same disk space concern as the point above. The availability of the
    436    sub-project Git mirror addresses this, even without SVN access.
    437  * Preservation of the existing read/write SVN-based workflows relies on the
    438    GitHub SVN bridge, which is an extra dependency.  Maintaining this locks us
    439    into GitHub and could restrict future workflow changes.
    440 
    441 Workflows
    442 ^^^^^^^^^
    443 
    444  * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
    445  * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-monocheckout-nocommit>`.
    446  * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
    447  * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
    448  * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
    449  * :ref:`Bisecting <workflow-mono-bisecting>`.
    450 
    451 Multi/Mono Hybrid Variant
    452 -------------------------
    453 
    454 This variant recommends moving only the LLVM sub-projects that are *rev-locked*
    455 to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo
    456 proposal for the rest.  While neither variant recommends combining sub-projects
    457 like www/ and test-suite/ (which are completely standalone), this goes further
    458 and keeps sub-projects like libcxx and compiler-rt in their own distinct
    459 repositories.
    460 
    461 Concerns
    462 ^^^^^^^^
    463 
    464  * This has most disadvantages of multirepo and monorepo, without bringing many
    465    of the advantages.
    466  * Downstream have to upgrade to the monorepo structure, but only partially. So
    467    they will keep the infrastructure to integrate the other separate
    468    sub-projects.
    469  * All projects that use LIT for testing are effectively rev-locked to LLVM.
    470    Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang.
    471    It's not clear where to draw the lines.
    472 
    473 
    474 Workflow Before/After
    475 =====================
    476 
    477 This section goes through a few examples of workflows, intended to illustrate
    478 how end-users or developers would interact with the repository for
    479 various use-cases.
    480 
    481 .. _workflow-checkout-commit:
    482 
    483 Checkout/Clone a Single Project, without Commit Access
    484 ------------------------------------------------------
    485 
    486 Except the URL, nothing changes. The possibilities today are::
    487 
    488   svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
    489   # or with Git
    490   git clone http://llvm.org/git/llvm.git
    491 
    492 After the move to GitHub, you would do either::
    493 
    494   git clone https://github.com/llvm-project/llvm.git
    495   # or using the GitHub svn native bridge
    496   svn co https://github.com/llvm-project/llvm/trunk
    497 
    498 The above works for both the monorepo and the multirepo, as we'll maintain the
    499 existing read-only views of the individual sub-projects.
    500 
    501 Checkout/Clone a Single Project, with Commit Access
    502 ---------------------------------------------------
    503 
    504 Currently
    505 ^^^^^^^^^
    506 
    507 ::
    508 
    509   # direct SVN checkout
    510   svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
    511   # or using the read-only Git view, with git-svn
    512   git clone http://llvm.org/git/llvm.git
    513   cd llvm
    514   git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
    515   git config svn-remote.svn.fetch :refs/remotes/origin/master
    516   git svn rebase -l  # -l avoids fetching ahead of the git mirror.
    517 
    518 Commits are performed using `svn commit` or with the sequence `git commit` and
    519 `git svn dcommit`.
    520 
    521 .. _workflow-multicheckout-nocommit:
    522 
    523 Multirepo Variant
    524 ^^^^^^^^^^^^^^^^^
    525 
    526 With the multirepo variant, nothing changes but the URL, and commits can be
    527 performed using `svn commit` or `git commit` and `git push`::
    528 
    529   git clone https://github.com/llvm/llvm.git llvm
    530   # or using the GitHub svn native bridge
    531   svn co https://github.com/llvm/llvm/trunk/ llvm
    532 
    533 .. _workflow-monocheckout-nocommit:
    534 
    535 Monorepo Variant
    536 ^^^^^^^^^^^^^^^^
    537 
    538 With the monorepo variant, there are a few options, depending on your
    539 constraints. First, you could just clone the full repository::
    540 
    541   git clone https://github.com/llvm/llvm-projects.git llvm
    542   # or using the GitHub svn native bridge
    543   svn co https://github.com/llvm/llvm-projects/trunk/ llvm
    544 
    545 At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
    546 :ref:`doesn't imply you have to build all of them <build_single_project>`. You
    547 can still build only compiler-rt for instance. In this way it's not different
    548 from someone who would check out all the projects with SVN today.
    549 
    550 You can commit as normal using `git commit` and `git push` or `svn commit`, and
    551 read the history for a single project (`git log libcxx` for example).
    552 
    553 Secondly, there are a few options to avoid checking out all the sources.
    554 
    555 **Using the GitHub SVN bridge**
    556 
    557 The GitHub SVN native bridge allows to checkout a subdirectory directly:
    558 
    559   svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt  username=...
    560 
    561 This checks out only compiler-rt and provides commit access using "svn commit",
    562 in the same way as it would do today.
    563 
    564 **Using a Subproject Git Nirror**
    565 
    566 You can use *git-svn* and one of the sub-project mirrors::
    567 
    568   # Clone from the single read-only Git repo
    569   git clone http://llvm.org/git/llvm.git
    570   cd llvm
    571   # Configure the SVN remote and initialize the svn metadata
    572   $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm username=...
    573   git config svn-remote.svn.fetch :refs/remotes/origin/master
    574   git svn rebase -l
    575 
    576 In this case the repository contains only a single sub-project, and commits can
    577 be made using `git svn dcommit`, again exactly as we do today.
    578 
    579 **Using a Sparse Checkouts**
    580 
    581 You can hide the other directories using a Git sparse checkout::
    582 
    583   git config core.sparseCheckout true
    584   echo /compiler-rt > .git/info/sparse-checkout
    585   git read-tree -mu HEAD
    586 
    587 The data for all sub-projects is still in your `.git` directory, but in your
    588 checkout, you only see `compiler-rt`.
    589 Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
    590 usual.
    591 
    592 Note that when you fetch you'll likely pull in changes to sub-projects you don't
    593 care about. If you are using spasre checkout, the files from other projects
    594 won't appear on your disk. The only effect is that your commit hash changes.
    595 
    596 You can check whether the changes in the last fetch are relevant to your commit
    597 by running::
    598 
    599   git log origin/master@{1}..origin/master -- libcxx
    600 
    601 This command can be hidden in a script so that `git llvmpush` would perform all
    602 these steps, fail only if such a dependent change exists, and show immediately
    603 the change that prevented the push. An immediate repeat of the command would
    604 (almost) certainly result in a successful push.
    605 Note that today with SVN or git-svn, this step is not possible since the
    606 "rebase" implicitly happens while committing (unless a conflict occurs).
    607 
    608 Checkout/Clone Multiple Projects, with Commit Access
    609 ----------------------------------------------------
    610 
    611 Let's look how to assemble llvm+clang+libcxx at a given revision.
    612 
    613 Currently
    614 ^^^^^^^^^
    615 
    616 ::
    617 
    618   svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
    619   cd llvm/tools
    620   svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
    621   cd ../projects
    622   svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
    623 
    624 Or using git-svn::
    625 
    626   git clone http://llvm.org/git/llvm.git
    627   cd llvm/
    628   git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
    629   git config svn-remote.svn.fetch :refs/remotes/origin/master
    630   git svn rebase -l
    631   git checkout `git svn find-rev -B r258109`
    632   cd tools
    633   git clone http://llvm.org/git/clang.git
    634   cd clang/
    635   git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
    636   git config svn-remote.svn.fetch :refs/remotes/origin/master
    637   git svn rebase -l
    638   git checkout `git svn find-rev -B r258109`
    639   cd ../../projects/
    640   git clone http://llvm.org/git/libcxx.git
    641   cd libcxx
    642   git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
    643   git config svn-remote.svn.fetch :refs/remotes/origin/master
    644   git svn rebase -l
    645   git checkout `git svn find-rev -B r258109`
    646 
    647 Note that the list would be longer with more sub-projects.
    648 
    649 .. _workflow-multicheckout-multicommit:
    650 
    651 Multirepo Variant
    652 ^^^^^^^^^^^^^^^^^
    653 
    654 With the multirepo variant, the umbrella repository will be used. This is
    655 where the mapping from a single revision number to the individual repositories
    656 revisions is stored.::
    657 
    658   git clone https://github.com/llvm-beanz/llvm-submodules
    659   cd llvm-submodules
    660   git checkout $REVISION
    661   git submodule init
    662   git submodule update clang llvm libcxx
    663   # the list of sub-project is optional, `git submodule update` would get them all.
    664 
    665 At this point the clang, llvm, and libcxx individual repositories are cloned
    666 and stored alongside each other. There are CMake flags to describe the directory
    667 structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`,
    668 etc.
    669 
    670 Another option is to checkout repositories based on the commit timestamp::
    671 
    672   git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`
    673 
    674 .. _workflow-monocheckout-multicommit:
    675 
    676 Monorepo Variant
    677 ^^^^^^^^^^^^^^^^
    678 
    679 The repository contains natively the source for every sub-projects at the right
    680 revision, which makes this straightforward::
    681 
    682   git clone https://github.com/llvm/llvm-projects.git llvm-projects
    683   cd llvm-projects
    684   git checkout $REVISION
    685 
    686 As before, at this point clang, llvm, and libcxx are stored in directories
    687 alongside each other.
    688 
    689 .. _workflow-cross-repo-commit:
    690 
    691 Commit an API Change in LLVM and Update the Sub-projects
    692 --------------------------------------------------------
    693 
    694 Today this is possible, even though not common (at least not documented) for
    695 subversion users and for git-svn users. For example, few Git users try to update
    696 LLD or Clang in the same commit as they change an LLVM API.
    697 
    698 The multirepo variant does not address this: one would have to commit and push
    699 separately in every individual repository. It would be possible to establish a
    700 protocol whereby users add a special token to their commit messages that causes
    701 the umbrella repo's updater bot to group all of them into a single revision.
    702 
    703 The monorepo variant handles this natively.
    704 
    705 Branching/Stashing/Updating for Local Development or Experiments
    706 ----------------------------------------------------------------
    707 
    708 Currently
    709 ^^^^^^^^^
    710 
    711 SVN does not allow this use case, but developers that are currently using
    712 git-svn can do it. Let's look in practice what it means when dealing with
    713 multiple sub-projects.
    714 
    715 To update the repository to tip of trunk::
    716 
    717   git pull
    718   cd tools/clang
    719   git pull
    720   cd ../../projects/libcxx
    721   git pull
    722 
    723 To create a new branch::
    724 
    725   git checkout -b MyBranch
    726   cd tools/clang
    727   git checkout -b MyBranch
    728   cd ../../projects/libcxx
    729   git checkout -b MyBranch
    730 
    731 To switch branches::
    732 
    733   git checkout AnotherBranch
    734   cd tools/clang
    735   git checkout AnotherBranch
    736   cd ../../projects/libcxx
    737   git checkout AnotherBranch
    738 
    739 .. _workflow-multi-branching:
    740 
    741 Multirepo Variant
    742 ^^^^^^^^^^^^^^^^^
    743 
    744 The multirepo works the same as the current Git workflow: every command needs
    745 to be applied to each of the individual repositories.
    746 However, the umbrella repository makes this easy using `git submodule foreach`
    747 to replicate a command on all the individual repositories (or submodules
    748 in this case):
    749 
    750 To create a new branch::
    751 
    752   git submodule foreach git checkout -b MyBranch
    753 
    754 To switch branches::
    755 
    756   git submodule foreach git checkout AnotherBranch
    757 
    758 .. _workflow-mono-branching:
    759 
    760 Monorepo Variant
    761 ^^^^^^^^^^^^^^^^
    762 
    763 Regular Git commands are sufficient, because everything is in a single
    764 repository:
    765 
    766 To update the repository to tip of trunk::
    767 
    768   git pull
    769 
    770 To create a new branch::
    771 
    772   git checkout -b MyBranch
    773 
    774 To switch branches::
    775 
    776   git checkout AnotherBranch
    777 
    778 Bisecting
    779 ---------
    780 
    781 Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).
    782 
    783 Currently
    784 ^^^^^^^^^
    785 
    786 SVN does not have builtin bisection support, but the single revision across
    787 sub-projects makes it possible to script around.
    788 
    789 Using the existing Git read-only view of the repositories, it is possible to use
    790 the native Git bisection script over the llvm repository, and use some scripting
    791 to synchronize the clang repository to match the llvm revision.
    792 
    793 .. _workflow-multi-bisecting:
    794 
    795 Multirepo Variant
    796 ^^^^^^^^^^^^^^^^^
    797 
    798 With the multi-repositories variant, the cross-repository synchronization is
    799 achieved using the umbrella repository. This repository contains only
    800 submodules for the other sub-projects. The native Git bisection can be used on
    801 the umbrella repository directly. A subtlety is that the bisect script itself
    802 needs to make sure the submodules are updated accordingly.
    803 
    804 For example, to find which commit introduces a regression where clang-3.9
    805 crashes but not clang-3.8 passes, one should be able to simply do::
    806 
    807   git bisect start release_39 release_38
    808   git bisect run ./bisect_script.sh
    809 
    810 With the `bisect_script.sh` script being::
    811 
    812   #!/bin/sh
    813   cd $UMBRELLA_DIRECTORY
    814   git submodule update llvm clang libcxx #....
    815   cd $BUILD_DIR
    816 
    817   ninja clang || exit 125   # an exit code of 125 asks "git bisect"
    818                             # to "skip" the current commit
    819 
    820   ./bin/clang some_crash_test.cpp
    821 
    822 When the `git bisect run` command returns, the umbrella repository is set to
    823 the state where the regression is introduced. The commit diff in the umbrella
    824 indicate which submodule was updated, and the last commit in this sub-projects
    825 is the one that the bisect found.
    826 
    827 .. _workflow-mono-bisecting:
    828 
    829 Monorepo Variant
    830 ^^^^^^^^^^^^^^^^
    831 
    832 Bisecting on the monorepo is straightforward, and very similar to the above,
    833 except that the bisection script does not need to include the
    834 `git submodule update` step.
    835 
    836 The same example, finding which commit introduces a regression where clang-3.9
    837 crashes but not clang-3.8 passes, will look like::
    838 
    839   git bisect start release_39 release_38
    840   git bisect run ./bisect_script.sh
    841 
    842 With the `bisect_script.sh` script being::
    843 
    844   #!/bin/sh
    845   cd $BUILD_DIR
    846 
    847   ninja clang || exit 125   # an exit code of 125 asks "git bisect"
    848                             # to "skip" the current commit
    849 
    850   ./bin/clang some_crash_test.cpp
    851 
    852 Also, since the monorepo handles commits update across multiple projects, you're
    853 less like to encounter a build failure where a commit change an API in LLVM and
    854 another later one "fixes" the build in clang.
    855 
    856 
    857 References
    858 ==========
    859 
    860 .. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
    861 .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
    862 .. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
    863 .. [TorvaldRevNum] Linus Torvald, http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html
    864 .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
    865 .. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules)
    866 .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
    867 .. [LebarCHERI] Port *CHERI* to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html
    868 .. [AminiCHERI] Port *CHERI* to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html
    869