Home | History | Annotate | Download | only in testing
      1 SkiaLab
      2 =======
      3 
      4 Overview
      5 --------
      6 
      7 Skia's buildbots are hosted in three places:
      8 
      9 * Google Compute Engine. This is the preferred location for bots which don't
     10   need to run on physical hardware, ie. anything that doesn't require a GPU,
     11   stable performance numbers, or a specific hardware configuration. Most of our
     12   compile bots live here, along with some non-GPU test bots on Linux and
     13   Windows.
     14 * Chrome Golo. This is the preferred location for bots which require specific
     15   hardware or OS configurations that are not supported by GCE. We have several
     16   Mac, Linux, and Windows bots in the Golo.
     17 * The local SkiaLab in Chapel Hill. Anything we can't get in GCE or the Golo
     18   lives here. This includes newer or uncommon GPUs and all Android, ChromeOS,
     19   and iOS devices.
     20 
     21 This page covers the local SkiaLab in Chapel Hill.
     22 
     23 
     24 Layout
     25 ------
     26 
     27 The SkiaLab consists of three wireframe racks which hold machines connected to
     28 two KVM switches. Each KVM switch has a monitor, mouse, and keyboard and is the
     29 primary mode of access to the lab machines. In general, the machines are on the
     30 same rack as the KVM switch used to access them. The switch nearest the door
     31 (labeled "DOOR"), is connected to machines on its own rack as well as a smaller
     32 rack closer to the door.
     33 
     34 Each machine is labeled with its hostname and the number or letter used to
     35 access it on the KVM switch. Android devices are located on the rack nearest
     36 the interior of the office (the KVM switch is labeled "OFFICE"). They are
     37 labeled with their serial number and the name of the buildslave they are
     38 associated with. Each device connects to a host machine, either directly or
     39 by way of a powered USB hub.
     40 
     41 **Disclaimer: Please ONLY make changes on a lab machine as a last resort, as it
     42 is disruptive to the running bots and can leave the machines in a dirty state.
     43 If you must make changes, such as cloning a copy of Skia to run tests and debug
     44 failures, be sure to clean up after yourself. If a permanent change needs to be
     45 made on the machine (such as a driver update), please contact an infra team
     46 member.**
     47 
     48 
     49 Common Tasks
     50 ------------
     51 
     52 ### Locating the host machine for a failing bot
     53 
     54 Sometimes failures can only be reproduced on a particular hardware
     55 configuration. In these cases, it is sometimes necessary to log into the host
     56 machine where a failing bot is running in order to debug the failure.
     57 
     58 From the [Status](https://status.skia.org/) page:
     59 
     60 1. Click on the box associated with a failed build.
     61 2. A popup will appear with some information about the build, including the
     62    builder and buildslave. Click the "Lookup" link next to "Host machine". This
     63    will bring you to the [SkiaLab Hosts](https://status.skia.org/hosts) page,
     64    which contains information about the machines in the lab, pre-filtered to
     65    select the machine which runs the buildslave in question.
     66 3. The information box will display the hostname of the machine as well as the
     67    KVM switch and number used to access the machine, if the machine is in the
     68    SkiaLab.
     69 4. Walk over to the lab. While standing at the KVM switch indicated by the host
     70    information page, double tap \<ctrl\> and then press the number or letter from
     71    the information page. It may be necessary to move or click the mouse to wake
     72    the machine up.
     73 5. Log in to the machine if necessary. The password is stored in
     74    [Valentine](https://valentine/) as "Chapel Hill buildbot slave password".
     75 
     76 ### Rebooting a problematic Android device
     77 
     78 Follow the same process as above, with some slight changes:
     79 
     80 1. On the [Status](https://status.skia.org/) page, click the box for the failed
     81    build.
     82 2. Click the "Lookup" link for the host machine. Remember the name of the
     83    buildslave which ran the build.
     84 3. The hosts page will display the information used to access the host machine
     85    for the device as well as the serial number for the device next to the name
     86    of its buildsave.
     87 4. Walk over to the lab and find the Android device with the serial number from
     88    the hosts page. Hold the power and volume-up buttons until the device
     89    reboots.
     90 5. Access the host machine for the device, per the above instructions. Use the
     91    `which_devices.py` script to verify that the device has re-attached. From
     92    the home directory:
     93 
     94         $ python buildbot/scripts/which_devices.py
     95 
     96 
     97 Maintenance Tasks
     98 -----------------
     99 
    100 ### Bringing up a new buildbot host machine
    101 
    102 This assumes that we're just adding a host machine for a new buildbot slave,
    103 and doesn't cover how to make changes to the buildbot code to change the
    104 behavior of the builder itself.
    105 
    106 1. Obtain the machine itself and place it on the racks in the lab. Connect
    107    power, ethernet, and KVM cables.
    108 2. If we already have a disk image appropriate for this machine, follow the
    109    instructions for flashing a disk image to a machine below. Otherwise, follow
    110    the instructions for bringing up a new machine from scratch.
    111 3. Power on the machine. Be sure to kill any buildbot processes that start up,
    112    eg. `killall python` on Linux and Mac, and just close any cmd instances which
    113    pop up on Windows.
    114 4. Set the hostname for the machine.
    115 5. Ensure that the machine is labeled with its hostname and KVM number.
    116 6. Add the new slave to the slaves.cfg file on the appropriate master, eg.
    117    https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.client.skia/slaves.cfg,
    118    and upload the change for code review.
    119 7. Add an entry for the new host machine to the slave_hosts_cfg.py file in the
    120    Skia infra repo: https://skia.googlesource.com/buildbot/+/master/site_config/slave_hosts_cfg.py,
    121    and upload it for review.
    122 8. Commit the change to add the slave to the master. Once it lands, commit the
    123    slave_hosts_cfg.py change immediately afterward.
    124 9. Restart the build master. Either ask borenet@ to do this or file a
    125    [ticket](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure&labels=Infra-Labs,Restrict-View-Google,Infra-Troopers&summary=Restart%20request%20for%20[%20name%20]&comment=Please%20provide%20the%20reason%20for%20restart.%0A%0ASet%20to%20Pri-0%20if%20immediate%20restarted%20is%20required,%20otherwise%20please%20set%20to%20Pri-1%20and%20the%20restart%20will%20happen%20when%20the%20trooper%20gets%20a%20free%20moment.) for a trooper to do it.
    126 10. Reboot the machine and monitor the build master to ensure that it connects.
    127     This can take some time, since the bot needs to sync Chrome.
    128 
    129 
    130 ### Bringing up a new Android bot
    131 
    132 1. Locate or add a host machine. We generally want to keep the number of
    133    devices attached to each host below 5 or so. If a new host machine is
    134    required, follow the above instructions for bringing up a new buildbot
    135    host machine, with the exception that the slave corresponds to the Android
    136    device, not the host machine itself.
    137 2. Ensure that the buildslave is not yet running:
    138 
    139         $ killall python
    140 
    141 3. Disable MTP and PTP on the device.  Some devices require one or the other to
    142    be enabled; in that case, select PTP and choose to 'do nothing' when
    143    attaching to the host machine.
    144 4. Connect the device to the host machine, either through a powered USB hub or
    145    directly to the machine.
    146 5. Make sure that the device is in developer mode and that USB debugging is
    147    enabled.
    148 6. Authorize the device for USB debugging on the host machine by checking the
    149    "always allow" box on dialog box which appears on the Android device after
    150    plugging it into the host.
    151 7. Ensure that the device appears as "connected" when you run the
    152    `which_devices.py` script:
    153 
    154         $ python buildbot/scripts/which_devices.py
    155 
    156 8. Reboot the machine to start the buildslave.
    157 
    158 
    159 ### Bringing up a new machine from scratch
    160 
    161 TODO(borenet): Migrate from Google Docs.
    162 
    163 OS-specific instructions are available in a
    164 [Google Doc](https://docs.google.com/document/d/1X7Hvsj33AlBmj-KEWfFbmdCArUJJAICLkB7ipDcxRV8/edit)
    165 
    166 
    167 ### Flashing a disk image to a machine
    168 
    169 1. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the
    170    machine.
    171 2. Turn on the machine and load the boot menu. For Shuttle machines, press
    172    \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and
    173    press the \<option\> key at boot. Boot from the USB key. It's typically UEFI
    174    and named something like "FlashBlu" or "Kanguru".
    175 3. At the Clonezilla menu, choose the "to RAM" option.
    176 4. Choose your preferred language.
    177 5. "Don't touch keymap".
    178 6. "Start Clonezilla".
    179 7. "device-image".
    180 8. "local_dev".
    181 9. Unplug the flash drive and plug in the external hard drive labeled, "Disk
    182    images." Wait for the "Attached Enclosure device" message to appear, then
    183    hit \<enter\>.
    184 10. Select the external drive to use for /home/partimag, something like,
    185     "1000GB_ntfs_My_Passport".
    186 11. Select the bot_img directory.
    187 12. Hit \<enter\> to continue.
    188 13. "Beginner"
    189 14. "restoredisk"
    190 15. Select the image to use. Make sure that it's compatible with this machine.
    191 16. Choose the hard drive in the machine. It should be the only option.
    192 17. "y" and "y"
    193 18. Choose "reboot" after flashing the image to the machine.
    194 19. Set the hostname of the machine so that it doesn't conflict with any
    195     existing machines.
    196 
    197 ### Capturing a disk image
    198 
    199 1. Make sure that the machine is in a clean state: no pre-existing buildslave
    200    checkouts, extra software, etc.
    201 2. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the
    202    machine.
    203 3. Turn on the machine and load the boot menu. For Shuttle machines, press
    204    \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and
    205    press the \<option\> key at boot. Boot from the USB key. It's typically UEFI
    206    and named something like "FlashBlu" or "Kanguru".
    207 4. At the Clonezilla menu, choose the "to RAM" option.
    208 5. Choose your preferred language.
    209 6. "Don't touch keymap".
    210 7. "Start Clonezilla".
    211 8. "device-image".
    212 9. "local_dev"
    213 10. Unplug the flash drive and plug in the external hard drive labeled, "Disk
    214     images." Wait for the "Attached Enclosure device" message to appear, then
    215     hit \<enter\>.
    216 11. Select the external drive to use for /home/partimag, something like,
    217     "1000GB_ntfs_My_Passport".
    218 12. Select the bot_img directory.
    219 13. "Beginner"
    220 14. "savedisk"
    221 15. Choose a name for the disk image. The convention is:
    222     `skiabot-<hardware type>-<OS>-<disk image revision #>`
    223 12. Choose the hard drive in the machine. It should be the only option.
    224 13. "y"
    225 14. Choose "reboot" or "shut down" when finished.
    226