1 ============================== 2 Moving LLVM Projects to GitHub 3 ============================== 4 5 .. contents:: Table of Contents 6 :depth: 4 7 :local: 8 9 Introduction 10 ============ 11 12 This is a proposal to move our current revision control system from our own 13 hosted Subversion to GitHub. Below are the financial and technical arguments as 14 to why we are proposing such a move and how people (and validation 15 infrastructure) will continue to work with a Git-based LLVM. 16 17 There will be a survey pointing at this document which we'll use to gauge the 18 community's reaction and, if we collectively decide to move, the time-frame. Be 19 sure to make your view count. 20 21 Additionally, we will discuss this during a BoF at the next US LLVM Developer 22 meeting (http://llvm.org/devmtg/2016-11/). 23 24 What This Proposal is *Not* About 25 ================================= 26 27 Changing the development policy. 28 29 This proposal relates only to moving the hosting of our source-code repository 30 from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing 31 using GitHub's issue tracker, pull-requests, or code-review. 32 33 Contributors will continue to earn commit access on demand under the Developer 34 Policy, except that that a GitHub account will be required instead of SVN 35 username/password-hash. 36 37 Why Git, and Why GitHub? 38 ======================== 39 40 Why Move At All? 41 ---------------- 42 43 This discussion began because we currently host our own Subversion server 44 and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and 45 provides limited support, but there is only so much it can do. 46 47 Volunteers are not sysadmins themselves, but compiler engineers that happen 48 to know a thing or two about hosting servers. We also don't have 24/7 support, 49 and we sometimes wake up to see that continuous integration is broken because 50 the SVN server is either down or unresponsive. 51 52 We should take advantage of one of the services out there (GitHub, GitLab, 53 and BitBucket, among others) that offer better service (24/7 stability, disk 54 space, Git server, code browsing, forking facilities, etc) for free. 55 56 Why Git? 57 -------- 58 59 Many new coders nowadays start with Git, and a lot of people have never used 60 SVN, CVS, or anything else. Websites like GitHub have changed the landscape 61 of open source contributions, reducing the cost of first contribution and 62 fostering collaboration. 63 64 Git is also the version control many LLVM developers use. Despite the 65 sources being stored in a SVN server, these developers are already using Git 66 through the Git-SVN integration. 67 68 Git allows you to: 69 70 * Commit, squash, merge, and fork locally without touching the remote server. 71 * Maintain local branches, enabling multiple threads of development. 72 * Collaborate on these branches (e.g. through your own fork of llvm on GitHub). 73 * Inspect the repository history (blame, log, bisect) without Internet access. 74 * Maintain remote forks and branches on Git hosting services and 75 integrate back to the main repository. 76 77 In addition, because Git seems to be replacing many OSS projects' version 78 control systems, there are many tools that are built over Git. 79 Future tooling may support Git first (if not only). 80 81 Why GitHub? 82 ----------- 83 84 GitHub, like GitLab and BitBucket, provides free code hosting for open source 85 projects. Any of these could replace the code-hosting infrastructure that we 86 have today. 87 88 These services also have a dedicated team to monitor, migrate, improve and 89 distribute the contents of the repositories depending on region and load. 90 91 GitHub has one important advantage over GitLab and 92 BitBucket: it offers read-write **SVN** access to the repository 93 (https://github.com/blog/626-announcing-svn-support). 94 This would enable people to continue working post-migration as though our code 95 were still canonically in an SVN repository. 96 97 In addition, there are already multiple LLVM mirrors on GitHub, indicating that 98 part of our community has already settled there. 99 100 On Managing Revision Numbers with Git 101 ------------------------------------- 102 103 The current SVN repository hosts all the LLVM sub-projects alongside each other. 104 A single revision number (e.g. r123456) thus identifies a consistent version of 105 all LLVM sub-projects. 106 107 Git does not use sequential integer revision number but instead uses a hash to 108 identify each commit. (Linus mentioned that the lack of such revision number 109 is "the only real design mistake" in Git [TorvaldRevNum]_.) 110 111 The loss of a sequential integer revision number has been a sticking point in 112 past discussions about Git: 113 114 - "The 'branch' I most care about is mainline, and losing the ability to say 115 'fixed in r1234' (with some sort of monotonically increasing number) would 116 be a tragic loss." [LattnerRevNum]_ 117 - "I like those results sorted by time and the chronology should be obvious, but 118 timestamps are incredibly cumbersome and make it difficult to verify that a 119 given checkout matches a given set of results." [TrickRevNum]_ 120 - "There is still the major regression with unreadable version numbers. 121 Given the amount of Bugzilla traffic with 'Fixed in...', that's a 122 non-trivial issue." [JSonnRevNum]_ 123 - "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_. 124 125 However, Git can emulate this increasing revision number: 126 ``git rev-list --count <commit-hash>``. This identifier is unique only 127 within a single branch, but this means the tuple `(num, branch-name)` uniquely 128 identifies a commit. 129 130 We can thus use this revision number to ensure that e.g. `clang -v` reports a 131 user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing 132 the objections raised above with respect to this aspect of Git. 133 134 What About Branches and Merges? 135 ------------------------------- 136 137 In contrast to SVN, Git makes branching easy. Git's commit history is 138 represented as a DAG, a departure from SVN's linear history. However, we propose 139 to mandate making merge commits illegal in our canonical Git repository. 140 141 Unfortunately, GitHub does not support server side hooks to enforce such a 142 policy. We must rely on the community to avoid pushing merge commits. 143 144 GitHub offers a feature called `Status Checks`: a branch protected by 145 `status checks` requires commits to be whitelisted before the push can happen. 146 We could supply a pre-push hook on the client side that would run and check the 147 history, before whitelisting the commit being pushed [statuschecks]_. 148 However this solution would be somewhat fragile (how do you update a script 149 installed on every developer machine?) and prevents SVN access to the 150 repository. 151 152 What About Commit Emails? 153 ------------------------- 154 155 We will need a new bot to send emails for each commit. This proposal leaves the 156 email format unchanged besides the commit URL. 157 158 Straw Man Migration Plan 159 ======================== 160 161 Step #1 : Before The Move 162 ------------------------- 163 164 1. Update docs to mention the move, so people are aware of what is going on. 165 2. Set up a read-only version of the GitHub project, mirroring our current SVN 166 repository. 167 3. Add the required bots to implement the commit emails, as well as the 168 umbrella repository update (if the multirepo is selected) or the read-only 169 Git views for the sub-projects (if the monorepo is selected). 170 171 Step #2 : Git Move 172 ------------------ 173 174 4. Update the buildbots to pick up updates and commits from the GitHub 175 repository. Not all bots have to migrate at this point, but it'll help 176 provide infrastructure testing. 177 5. Update Phabricator to pick up commits from the GitHub repository. 178 6. LNT and llvmlab have to be updated: they rely on unique monotonically 179 increasing integer across branch [MatthewsRevNum]_. 180 7. Instruct downstream integrators to pick up commits from the GitHub 181 repository. 182 8. Review and prepare an update for the LLVM documentation. 183 184 Until this point nothing has changed for developers, it will just 185 boil down to a lot of work for buildbot and other infrastructure 186 owners. 187 188 The migration will pause here until all dependencies have cleared, and all 189 problems have been solved. 190 191 Step #3: Write Access Move 192 -------------------------- 193 194 9. Collect developers' GitHub account information, and add them to the project. 195 10. Switch the SVN repository to read-only and allow pushes to the GitHub repository. 196 11. Update the documentation. 197 12. Mirror Git to SVN. 198 199 Step #4 : Post Move 200 ------------------- 201 202 13. Archive the SVN repository. 203 14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to 204 point to GitHub instead. 205 206 One or Multiple Repositories? 207 ============================= 208 209 There are two major variants for how to structure our Git repository: The 210 "multirepo" and the "monorepo". 211 212 Multirepo Variant 213 ----------------- 214 215 This variant recommends moving each LLVM sub-project to a separate Git 216 repository. This mimics the existing official read-only Git repositories 217 (e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical 218 repositories for each sub-project. 219 220 This will allow the individual sub-projects to remain distinct: a 221 developer interested only in compiler-rt can checkout only this repository, 222 build it, and work in isolation of the other sub-projects. 223 224 A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or 225 clang+llvm+libcxx for example) at a specific revision. 226 227 A tuple of revisions (one entry per repository) accurately describes the state 228 across the sub-projects. 229 For example, a given version of clang would be 230 *<LLVM-12345, clang-5432, libcxx-123, etc.>*. 231 232 Umbrella Repository 233 ^^^^^^^^^^^^^^^^^^^ 234 235 To make this more convenient, a separate *umbrella* repository will be 236 provided. This repository will be used for the sole purpose of understanding 237 the sequence in which commits were pushed to the different repositories and to 238 provide a single revision number. 239 240 This umbrella repository will be read-only and continuously updated 241 to record the above tuple. The proposed form to record this is to use Git 242 [submodules]_, possibly along with a set of scripts to help check out a 243 specific revision of the LLVM distribution. 244 245 A regular LLVM developer does not need to interact with the umbrella repository 246 -- the individual repositories can be checked out independently -- but you would 247 need to use the umbrella repository to bisect multiple sub-projects at the same 248 time, or to check-out old revisions of LLVM with another sub-project at a 249 consistent state. 250 251 This umbrella repository will be updated automatically by a bot (running on 252 notice from a webhook on every push, and periodically) on a per commit basis: a 253 single commit in the umbrella repository would match a single commit in a 254 sub-project. 255 256 Living Downstream 257 ^^^^^^^^^^^^^^^^^ 258 259 Downstream SVN users can use the read/write SVN bridges with the following 260 caveats: 261 262 * Be prepared for a one-time change to the upstream revision numbers. 263 * The upstream sub-project revision numbers will no longer be in sync. 264 265 Downstream Git users can continue without any major changes, with the minor 266 change of upstreaming using `git push` instead of `git svn dcommit`. 267 268 Git users also have the option of adopting an umbrella repository downstream. 269 The tooling for the upstream umbrella can easily be reused for downstream needs, 270 incorporating extra sub-projects and branching in parallel with sub-project 271 branches. 272 273 Multirepo Preview 274 ^^^^^^^^^^^^^^^^^ 275 276 As a preview (disclaimer: this rough prototype, not polished and not 277 representative of the final solution), you can look at the following: 278 279 * Repository: https://github.com/llvm-beanz/llvm-submodules 280 * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/ 281 282 Concerns 283 ^^^^^^^^ 284 285 * Because GitHub does not allow server-side hooks, and because there is no 286 "push timestamp" in Git, the umbrella repository sequence isn't totally 287 exact: commits from different repositories pushed around the same time can 288 appear in different orders. However, we don't expect it to be the common case 289 or to cause serious issues in practice. 290 * You can't have a single cross-projects commit that would update both LLVM and 291 other sub-projects (something that can be achieved now). It would be possible 292 to establish a protocol whereby users add a special token to their commit 293 messages that causes the umbrella repo's updater bot to group all of them 294 into a single revision. 295 * Another option is to group commits that were pushed closely enough together 296 in the umbrella repository. This has the advantage of allowing cross-project 297 commits, and is less sensitive to mis-ordering commits. However, this has the 298 potential to group unrelated commits together, especially if the bot goes 299 down and needs to catch up. 300 * This variant relies on heavier tooling. But the current prototype shows that 301 it is not out-of-reach. 302 * Submodules don't have a good reputation / are complicating the command line. 303 However, in the proposed setup, a regular developer will seldom interact with 304 submodules directly, and certainly never update them. 305 * Refactoring across projects is not friendly: taking some functions from clang 306 to make it part of a utility in libSupport wouldn't carry the history of the 307 code in the llvm repo, preventing recursively applying `git blame` for 308 instance. However, this is not very different than how most people are 309 Interacting with the repository today, by splitting such change in multiple 310 commits. 311 312 Workflows 313 ^^^^^^^^^ 314 315 * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`. 316 * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-multicheckout-nocommit>`. 317 * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-multicheckout-multicommit>`. 318 * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`. 319 * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-multi-branching>`. 320 * :ref:`Bisecting <workflow-multi-bisecting>`. 321 322 Monorepo Variant 323 ---------------- 324 325 This variant recommends moving all LLVM sub-projects to a single Git repository, 326 similar to https://github.com/llvm-project/llvm-project. 327 This would mimic an export of the current SVN repository, with each sub-project 328 having its own top-level directory. 329 Not all sub-projects are used for building toolchains. In practice, www/ 330 and test-suite/ will probably stay out of the monorepo. 331 332 Putting all sub-projects in a single checkout makes cross-project refactoring 333 naturally simple: 334 335 * New sub-projects can be trivially split out for better reuse and/or layering 336 (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a 337 dependency on LLVM). 338 * Changing an API in LLVM and upgrading the sub-projects will always be done in 339 a single commit, designing away a common source of temporary build breakage. 340 * Moving code across sub-project (during refactoring for instance) in a single 341 commit enables accurate `git blame` when tracking code change history. 342 * Tooling based on `git grep` works natively across sub-projects, allowing to 343 easier find refactoring opportunities across projects (for example reusing a 344 datastructure initially in LLDB by moving it into libSupport). 345 * Having all the sources present encourages maintaining the other sub-projects 346 when changing API. 347 348 Finally, the monorepo maintains the property of the existing SVN repository that 349 the sub-projects move synchronously, and a single revision number (or commit 350 hash) identifies the state of the development across all projects. 351 352 .. _build_single_project: 353 354 Building a single sub-project 355 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 356 357 Nobody will be forced to build unnecessary projects. The exact structure 358 is TBD, but making it trivial to configure builds for a single sub-project 359 (or a subset of sub-projects) is a hard requirement. 360 361 As an example, it could look like the following:: 362 363 mkdir build && cd build 364 # Configure only LLVM (default) 365 cmake path/to/monorepo 366 # Configure LLVM and lld 367 cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld 368 # Configure LLVM and clang 369 cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang 370 371 .. _git-svn-mirror: 372 373 Read/write sub-project mirrors 374 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 375 376 With the Monorepo, the existing single-subproject mirrors (e.g. 377 http://llvm.org/git/compiler-rt.git) with git-svn read-write access would 378 continue to be maintained: developers would continue to be able to use the 379 existing single-subproject git repositories as they do today, with *no changes 380 to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to 381 work identically to how it works today. The monorepo can be set-up such that the 382 SVN revision number matches the SVN revision in the GitHub SVN-bridge. 383 384 Living Downstream 385 ^^^^^^^^^^^^^^^^^ 386 387 Downstream SVN users can use the read/write SVN bridge. The SVN revision 388 number can be preserved in the monorepo, minimizing the impact. 389 390 Downstream Git users can continue without any major changes, by using the 391 git-svn mirrors on top of the SVN bridge. 392 393 Git users can also work upstream with monorepo even if their downstream 394 fork has split repositories. They can apply patches in the appropriate 395 subdirectories of the monorepo using, e.g., `git am --directory=...`, or 396 plain `diff` and `patch`. 397 398 Alternatively, Git users can migrate their own fork to the monorepo. As a 399 demonstration, we've migrated the "CHERI" fork to the monorepo in two ways: 400 401 * Using a script that rewrites history (including merges) so that it looks 402 like the fork always lived in the monorepo [LebarCHERI]_. The upside of 403 this is when you check out an old revision, you get a copy of all llvm 404 sub-projects at a consistent revision. (For instance, if it's a clang 405 fork, when you check out an old revision you'll get a consistent version 406 of llvm proper.) The downside is that this changes the fork's commit 407 hashes. 408 409 * Merging the fork into the monorepo [AminiCHERI]_. This preserves the 410 fork's commit hashes, but when you check out an old commit you only get 411 the one sub-project. 412 413 Monorepo Preview 414 ^^^^^^^^^^^^^^^^^ 415 416 As a preview (disclaimer: this rough prototype, not polished and not 417 representative of the final solution), you can look at the following: 418 419 * Full Repository: https://github.com/joker-eph/llvm-project 420 * Single sub-project view with *SVN write access* to the full repo: 421 https://github.com/joker-eph/compiler-rt 422 423 Concerns 424 ^^^^^^^^ 425 426 * Using the monolithic repository may add overhead for those contributing to a 427 standalone sub-project, particularly on runtimes like libcxx and compiler-rt 428 that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs. 429 1GB for the monorepo), and the commit rate of LLVM may cause more frequent 430 `git push` collisions when upstreaming. Affected contributors can continue to 431 use the SVN bridge or the single-subproject Git mirrors with git-svn for 432 read-write. 433 * Using the monolithic repository may add overhead for those *integrating* a 434 standalone sub-project, even if they aren't contributing to it, due to the 435 same disk space concern as the point above. The availability of the 436 sub-project Git mirror addresses this, even without SVN access. 437 * Preservation of the existing read/write SVN-based workflows relies on the 438 GitHub SVN bridge, which is an extra dependency. Maintaining this locks us 439 into GitHub and could restrict future workflow changes. 440 441 Workflows 442 ^^^^^^^^^ 443 444 * :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`. 445 * :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-monocheckout-nocommit>`. 446 * :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`. 447 * :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`. 448 * :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`. 449 * :ref:`Bisecting <workflow-mono-bisecting>`. 450 451 Multi/Mono Hybrid Variant 452 ------------------------- 453 454 This variant recommends moving only the LLVM sub-projects that are *rev-locked* 455 to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo 456 proposal for the rest. While neither variant recommends combining sub-projects 457 like www/ and test-suite/ (which are completely standalone), this goes further 458 and keeps sub-projects like libcxx and compiler-rt in their own distinct 459 repositories. 460 461 Concerns 462 ^^^^^^^^ 463 464 * This has most disadvantages of multirepo and monorepo, without bringing many 465 of the advantages. 466 * Downstream have to upgrade to the monorepo structure, but only partially. So 467 they will keep the infrastructure to integrate the other separate 468 sub-projects. 469 * All projects that use LIT for testing are effectively rev-locked to LLVM. 470 Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang. 471 It's not clear where to draw the lines. 472 473 474 Workflow Before/After 475 ===================== 476 477 This section goes through a few examples of workflows, intended to illustrate 478 how end-users or developers would interact with the repository for 479 various use-cases. 480 481 .. _workflow-checkout-commit: 482 483 Checkout/Clone a Single Project, without Commit Access 484 ------------------------------------------------------ 485 486 Except the URL, nothing changes. The possibilities today are:: 487 488 svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm 489 # or with Git 490 git clone http://llvm.org/git/llvm.git 491 492 After the move to GitHub, you would do either:: 493 494 git clone https://github.com/llvm-project/llvm.git 495 # or using the GitHub svn native bridge 496 svn co https://github.com/llvm-project/llvm/trunk 497 498 The above works for both the monorepo and the multirepo, as we'll maintain the 499 existing read-only views of the individual sub-projects. 500 501 Checkout/Clone a Single Project, with Commit Access 502 --------------------------------------------------- 503 504 Currently 505 ^^^^^^^^^ 506 507 :: 508 509 # direct SVN checkout 510 svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm 511 # or using the read-only Git view, with git-svn 512 git clone http://llvm.org/git/llvm.git 513 cd llvm 514 git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> 515 git config svn-remote.svn.fetch :refs/remotes/origin/master 516 git svn rebase -l # -l avoids fetching ahead of the git mirror. 517 518 Commits are performed using `svn commit` or with the sequence `git commit` and 519 `git svn dcommit`. 520 521 .. _workflow-multicheckout-nocommit: 522 523 Multirepo Variant 524 ^^^^^^^^^^^^^^^^^ 525 526 With the multirepo variant, nothing changes but the URL, and commits can be 527 performed using `svn commit` or `git commit` and `git push`:: 528 529 git clone https://github.com/llvm/llvm.git llvm 530 # or using the GitHub svn native bridge 531 svn co https://github.com/llvm/llvm/trunk/ llvm 532 533 .. _workflow-monocheckout-nocommit: 534 535 Monorepo Variant 536 ^^^^^^^^^^^^^^^^ 537 538 With the monorepo variant, there are a few options, depending on your 539 constraints. First, you could just clone the full repository:: 540 541 git clone https://github.com/llvm/llvm-projects.git llvm 542 # or using the GitHub svn native bridge 543 svn co https://github.com/llvm/llvm-projects/trunk/ llvm 544 545 At this point you have every sub-project (llvm, clang, lld, lldb, ...), which 546 :ref:`doesn't imply you have to build all of them <build_single_project>`. You 547 can still build only compiler-rt for instance. In this way it's not different 548 from someone who would check out all the projects with SVN today. 549 550 You can commit as normal using `git commit` and `git push` or `svn commit`, and 551 read the history for a single project (`git log libcxx` for example). 552 553 Secondly, there are a few options to avoid checking out all the sources. 554 555 **Using the GitHub SVN bridge** 556 557 The GitHub SVN native bridge allows to checkout a subdirectory directly: 558 559 svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt username=... 560 561 This checks out only compiler-rt and provides commit access using "svn commit", 562 in the same way as it would do today. 563 564 **Using a Subproject Git Nirror** 565 566 You can use *git-svn* and one of the sub-project mirrors:: 567 568 # Clone from the single read-only Git repo 569 git clone http://llvm.org/git/llvm.git 570 cd llvm 571 # Configure the SVN remote and initialize the svn metadata 572 $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm username=... 573 git config svn-remote.svn.fetch :refs/remotes/origin/master 574 git svn rebase -l 575 576 In this case the repository contains only a single sub-project, and commits can 577 be made using `git svn dcommit`, again exactly as we do today. 578 579 **Using a Sparse Checkouts** 580 581 You can hide the other directories using a Git sparse checkout:: 582 583 git config core.sparseCheckout true 584 echo /compiler-rt > .git/info/sparse-checkout 585 git read-tree -mu HEAD 586 587 The data for all sub-projects is still in your `.git` directory, but in your 588 checkout, you only see `compiler-rt`. 589 Before you push, you'll need to fetch and rebase (`git pull --rebase`) as 590 usual. 591 592 Note that when you fetch you'll likely pull in changes to sub-projects you don't 593 care about. If you are using spasre checkout, the files from other projects 594 won't appear on your disk. The only effect is that your commit hash changes. 595 596 You can check whether the changes in the last fetch are relevant to your commit 597 by running:: 598 599 git log origin/master@{1}..origin/master -- libcxx 600 601 This command can be hidden in a script so that `git llvmpush` would perform all 602 these steps, fail only if such a dependent change exists, and show immediately 603 the change that prevented the push. An immediate repeat of the command would 604 (almost) certainly result in a successful push. 605 Note that today with SVN or git-svn, this step is not possible since the 606 "rebase" implicitly happens while committing (unless a conflict occurs). 607 608 Checkout/Clone Multiple Projects, with Commit Access 609 ---------------------------------------------------- 610 611 Let's look how to assemble llvm+clang+libcxx at a given revision. 612 613 Currently 614 ^^^^^^^^^ 615 616 :: 617 618 svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION 619 cd llvm/tools 620 svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION 621 cd ../projects 622 svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION 623 624 Or using git-svn:: 625 626 git clone http://llvm.org/git/llvm.git 627 cd llvm/ 628 git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> 629 git config svn-remote.svn.fetch :refs/remotes/origin/master 630 git svn rebase -l 631 git checkout `git svn find-rev -B r258109` 632 cd tools 633 git clone http://llvm.org/git/clang.git 634 cd clang/ 635 git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username> 636 git config svn-remote.svn.fetch :refs/remotes/origin/master 637 git svn rebase -l 638 git checkout `git svn find-rev -B r258109` 639 cd ../../projects/ 640 git clone http://llvm.org/git/libcxx.git 641 cd libcxx 642 git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username> 643 git config svn-remote.svn.fetch :refs/remotes/origin/master 644 git svn rebase -l 645 git checkout `git svn find-rev -B r258109` 646 647 Note that the list would be longer with more sub-projects. 648 649 .. _workflow-multicheckout-multicommit: 650 651 Multirepo Variant 652 ^^^^^^^^^^^^^^^^^ 653 654 With the multirepo variant, the umbrella repository will be used. This is 655 where the mapping from a single revision number to the individual repositories 656 revisions is stored.:: 657 658 git clone https://github.com/llvm-beanz/llvm-submodules 659 cd llvm-submodules 660 git checkout $REVISION 661 git submodule init 662 git submodule update clang llvm libcxx 663 # the list of sub-project is optional, `git submodule update` would get them all. 664 665 At this point the clang, llvm, and libcxx individual repositories are cloned 666 and stored alongside each other. There are CMake flags to describe the directory 667 structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`, 668 etc. 669 670 Another option is to checkout repositories based on the commit timestamp:: 671 672 git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master` 673 674 .. _workflow-monocheckout-multicommit: 675 676 Monorepo Variant 677 ^^^^^^^^^^^^^^^^ 678 679 The repository contains natively the source for every sub-projects at the right 680 revision, which makes this straightforward:: 681 682 git clone https://github.com/llvm/llvm-projects.git llvm-projects 683 cd llvm-projects 684 git checkout $REVISION 685 686 As before, at this point clang, llvm, and libcxx are stored in directories 687 alongside each other. 688 689 .. _workflow-cross-repo-commit: 690 691 Commit an API Change in LLVM and Update the Sub-projects 692 -------------------------------------------------------- 693 694 Today this is possible, even though not common (at least not documented) for 695 subversion users and for git-svn users. For example, few Git users try to update 696 LLD or Clang in the same commit as they change an LLVM API. 697 698 The multirepo variant does not address this: one would have to commit and push 699 separately in every individual repository. It would be possible to establish a 700 protocol whereby users add a special token to their commit messages that causes 701 the umbrella repo's updater bot to group all of them into a single revision. 702 703 The monorepo variant handles this natively. 704 705 Branching/Stashing/Updating for Local Development or Experiments 706 ---------------------------------------------------------------- 707 708 Currently 709 ^^^^^^^^^ 710 711 SVN does not allow this use case, but developers that are currently using 712 git-svn can do it. Let's look in practice what it means when dealing with 713 multiple sub-projects. 714 715 To update the repository to tip of trunk:: 716 717 git pull 718 cd tools/clang 719 git pull 720 cd ../../projects/libcxx 721 git pull 722 723 To create a new branch:: 724 725 git checkout -b MyBranch 726 cd tools/clang 727 git checkout -b MyBranch 728 cd ../../projects/libcxx 729 git checkout -b MyBranch 730 731 To switch branches:: 732 733 git checkout AnotherBranch 734 cd tools/clang 735 git checkout AnotherBranch 736 cd ../../projects/libcxx 737 git checkout AnotherBranch 738 739 .. _workflow-multi-branching: 740 741 Multirepo Variant 742 ^^^^^^^^^^^^^^^^^ 743 744 The multirepo works the same as the current Git workflow: every command needs 745 to be applied to each of the individual repositories. 746 However, the umbrella repository makes this easy using `git submodule foreach` 747 to replicate a command on all the individual repositories (or submodules 748 in this case): 749 750 To create a new branch:: 751 752 git submodule foreach git checkout -b MyBranch 753 754 To switch branches:: 755 756 git submodule foreach git checkout AnotherBranch 757 758 .. _workflow-mono-branching: 759 760 Monorepo Variant 761 ^^^^^^^^^^^^^^^^ 762 763 Regular Git commands are sufficient, because everything is in a single 764 repository: 765 766 To update the repository to tip of trunk:: 767 768 git pull 769 770 To create a new branch:: 771 772 git checkout -b MyBranch 773 774 To switch branches:: 775 776 git checkout AnotherBranch 777 778 Bisecting 779 --------- 780 781 Assuming a developer is looking for a bug in clang (or lld, or lldb, ...). 782 783 Currently 784 ^^^^^^^^^ 785 786 SVN does not have builtin bisection support, but the single revision across 787 sub-projects makes it possible to script around. 788 789 Using the existing Git read-only view of the repositories, it is possible to use 790 the native Git bisection script over the llvm repository, and use some scripting 791 to synchronize the clang repository to match the llvm revision. 792 793 .. _workflow-multi-bisecting: 794 795 Multirepo Variant 796 ^^^^^^^^^^^^^^^^^ 797 798 With the multi-repositories variant, the cross-repository synchronization is 799 achieved using the umbrella repository. This repository contains only 800 submodules for the other sub-projects. The native Git bisection can be used on 801 the umbrella repository directly. A subtlety is that the bisect script itself 802 needs to make sure the submodules are updated accordingly. 803 804 For example, to find which commit introduces a regression where clang-3.9 805 crashes but not clang-3.8 passes, one should be able to simply do:: 806 807 git bisect start release_39 release_38 808 git bisect run ./bisect_script.sh 809 810 With the `bisect_script.sh` script being:: 811 812 #!/bin/sh 813 cd $UMBRELLA_DIRECTORY 814 git submodule update llvm clang libcxx #.... 815 cd $BUILD_DIR 816 817 ninja clang || exit 125 # an exit code of 125 asks "git bisect" 818 # to "skip" the current commit 819 820 ./bin/clang some_crash_test.cpp 821 822 When the `git bisect run` command returns, the umbrella repository is set to 823 the state where the regression is introduced. The commit diff in the umbrella 824 indicate which submodule was updated, and the last commit in this sub-projects 825 is the one that the bisect found. 826 827 .. _workflow-mono-bisecting: 828 829 Monorepo Variant 830 ^^^^^^^^^^^^^^^^ 831 832 Bisecting on the monorepo is straightforward, and very similar to the above, 833 except that the bisection script does not need to include the 834 `git submodule update` step. 835 836 The same example, finding which commit introduces a regression where clang-3.9 837 crashes but not clang-3.8 passes, will look like:: 838 839 git bisect start release_39 release_38 840 git bisect run ./bisect_script.sh 841 842 With the `bisect_script.sh` script being:: 843 844 #!/bin/sh 845 cd $BUILD_DIR 846 847 ninja clang || exit 125 # an exit code of 125 asks "git bisect" 848 # to "skip" the current commit 849 850 ./bin/clang some_crash_test.cpp 851 852 Also, since the monorepo handles commits update across multiple projects, you're 853 less like to encounter a build failure where a commit change an API in LLVM and 854 another later one "fixes" the build in clang. 855 856 857 References 858 ========== 859 860 .. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html 861 .. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html 862 .. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html 863 .. [TorvaldRevNum] Linus Torvald, http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html 864 .. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html 865 .. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules) 866 .. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/ 867 .. [LebarCHERI] Port *CHERI* to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html 868 .. [AminiCHERI] Port *CHERI* to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html 869