Home | History | Annotate | Download | only in addlhelp
      1 # -*- coding: utf-8 -*-
      2 # Copyright 2012 Google Inc. All Rights Reserved.
      3 #
      4 # Licensed under the Apache License, Version 2.0 (the "License");
      5 # you may not use this file except in compliance with the License.
      6 # You may obtain a copy of the License at
      7 #
      8 #     http://www.apache.org/licenses/LICENSE-2.0
      9 #
     10 # Unless required by applicable law or agreed to in writing, software
     11 # distributed under the License is distributed on an "AS IS" BASIS,
     12 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     13 # See the License for the specific language governing permissions and
     14 # limitations under the License.
     15 """Additional help about object versioning."""
     16 
     17 from __future__ import absolute_import
     18 
     19 from gslib.help_provider import HelpProvider
     20 
     21 _DETAILED_HELP_TEXT = ("""
     22 <B>OVERVIEW</B>
     23   Versioning-enabled buckets maintain an archive of objects, providing a way to
     24   un-delete data that you accidentally deleted, or to retrieve older versions of
     25   your data. You can turn versioning on or off for a bucket at any time. Turning
     26   versioning off leaves existing object versions in place, and simply causes the
     27   bucket to stop accumulating new object versions. In this case, if you upload
     28   to an existing object the current version is overwritten instead of creating
     29   a new version.
     30 
     31   Regardless of whether you have enabled versioning on a bucket, every object
     32   has two associated positive integer fields:
     33 
     34   - the generation, which is updated when the content of an object is
     35     overwritten.
     36   - the metageneration, which identifies the metadata generation. It starts
     37     at 1; is updated every time the metadata (e.g., ACL or Content-Type) for a
     38     given content generation is updated; and gets reset when the generation
     39     number changes.
     40 
     41   Of these two integers, only the generation is used when working with versioned
     42   data. Both generation and metageneration can be used with concurrency control
     43   (discussed in a later section).
     44 
     45   To work with object versioning in gsutil, you can use a flavor of storage URIs
     46   that that embed the object generation, which we refer to as version-specific
     47   URIs. For example, the version-less object URI:
     48 
     49     gs://bucket/object
     50 
     51   might have have two versions, with these version-specific URIs:
     52 
     53     gs://bucket/object#1360383693690000
     54     gs://bucket/object#1360383802725000
     55 
     56   The following sections discuss how to work with versioning and concurrency
     57   control.
     58 
     59 
     60 <B>OBJECT VERSIONING</B>
     61   You can view, enable, and disable object versioning on a bucket using
     62   the 'versioning get' and 'versioning set' commands. For example:
     63 
     64     gsutil versioning set on gs://bucket
     65 
     66   will enable versioning for the named bucket. See 'gsutil help versioning'
     67   for additional details.
     68 
     69   To see all object versions in a versioning-enabled bucket along with
     70   their generation.metageneration information, use gsutil ls -a:
     71 
     72     gsutil ls -a gs://bucket
     73 
     74   You can also specify particular objects for which you want to find the
     75   version-specific URI(s), or you can use wildcards:
     76 
     77     gsutil ls -a gs://bucket/object1 gs://bucket/images/*.jpg
     78 
     79   The generation values form a monotonically increasing sequence as you create
     80   additional object versions.  Because of this, the latest object version is
     81   always the last one listed in the gsutil ls output for a particular object.
     82   For example, if a bucket contains these three versions of gs://bucket/object:
     83 
     84     gs://bucket/object#1360035307075000
     85     gs://bucket/object#1360101007329000
     86     gs://bucket/object#1360102216114000
     87 
     88   then gs://bucket/object#1360102216114000 is the latest version and
     89   gs://bucket/object#1360035307075000 is the oldest available version.
     90 
     91   If you specify version-less URIs with gsutil, you will operate on the
     92   latest not-deleted version of an object, for example:
     93 
     94     gsutil cp gs://bucket/object ./dir
     95 
     96   or:
     97 
     98     gsutil rm gs://bucket/object
     99 
    100   To operate on a specific object version, use a version-specific URI.
    101   For example, suppose the output of the above gsutil ls -a command is:
    102 
    103     gs://bucket/object#1360035307075000
    104     gs://bucket/object#1360101007329000
    105 
    106   In this case, the command:
    107 
    108     gsutil cp gs://bucket/object#1360035307075000 ./dir
    109 
    110   will retrieve the second most recent version of the object.
    111 
    112   Note that version-specific URIs cannot be the target of the gsutil cp
    113   command (trying to do so will result in an error), because writing to a
    114   versioned object always creates a new version.
    115 
    116   If an object has been deleted, it will not show up in a normal gsutil ls
    117   listing (i.e., ls without the -a option). You can restore a deleted object by
    118   running gsutil ls -a to find the available versions, and then copying one of
    119   the version-specific URIs to the version-less URI, for example:
    120 
    121     gsutil cp gs://bucket/object#1360101007329000 gs://bucket/object
    122 
    123   Note that when you do this it creates a new object version, which will incur
    124   additional charges. You can get rid of the extra copy by deleting the older
    125   version-specfic object:
    126 
    127     gsutil rm gs://bucket/object#1360101007329000
    128 
    129   Or you can combine the two steps by using the gsutil mv command:
    130 
    131     gsutil mv gs://bucket/object#1360101007329000 gs://bucket/object
    132 
    133   If you want to remove all versions of an object use the gsutil rm -a option:
    134 
    135     gsutil rm -a gs://bucket/object
    136 
    137   Note that there is no limit to the number of older versions of an object you
    138   will create if you continue to upload to the same object in a versioning-
    139   enabled bucket. It is your responsibility to delete versions beyond the ones
    140   you want to retain.
    141 
    142 
    143 <B>COPYING VERSIONED BUCKETS</B>
    144   You can copy data between two versioned buckets, using a command like:
    145 
    146     gsutil cp -r -A gs://bucket1/* gs://bucket2
    147 
    148   When run using versioned buckets, this command will cause every object version
    149   to be copied. The copies made in gs://bucket2 will have different generation
    150   numbers (since a new generation is assigned when the object copy is made),
    151   but the object sort order will remain consistent. For example, gs://bucket1
    152   might contain:
    153 
    154     % gsutil ls -la gs://bucket1 10  2013-06-06T02:33:11Z
    155     53  2013-02-02T22:30:57Z  gs://bucket1/file#1359844257574000  metageneration=1
    156     12  2013-02-02T22:30:57Z  gs://bucket1/file#1359844257615000  metageneration=1
    157     97  2013-02-02T22:30:57Z  gs://bucket1/file#1359844257665000  metageneration=1
    158 
    159   and after the copy, gs://bucket2 might contain:
    160 
    161     % gsutil ls -la gs://bucket2
    162     53  2013-06-06T02:33:11Z  gs://bucket2/file#1370485991580000  metageneration=1
    163     12  2013-06-06T02:33:14Z  gs://bucket2/file#1370485994328000  metageneration=1
    164     97  2013-06-06T02:33:17Z  gs://bucket2/file#1370485997376000  metageneration=1
    165 
    166   Note that the object versions are in the same order (as can be seen by the
    167   same sequence of sizes in both listings), but the generation numbers (and
    168   timestamps) are newer in gs://bucket2.
    169 
    170 
    171 
    172 <B>CONCURRENCY CONTROL</B>
    173   If you are building an application using Google Cloud Storage, you may need to
    174   be careful about concurrency control. Normally gsutil itself isn't used for
    175   this purpose, but it's possible to write scripts around gsutil that perform
    176   concurrency control.
    177 
    178   For example, suppose you want to implement a "rolling update" system using
    179   gsutil, where a periodic job computes some data and uploads it to the cloud.
    180   On each run, the job starts with the data that it computed from last run, and
    181   computes a new value. To make this system robust, you need to have multiple
    182   machines on which the job can run, which raises the possibility that two
    183   simultaneous runs could attempt to update an object at the same time. This
    184   leads to the following potential race condition:
    185 
    186   - job 1 computes the new value to be written
    187   - job 2 computes the new value to be written
    188   - job 2 writes the new value
    189   - job 1 writes the new value
    190 
    191   In this case, the value that job 1 read is no longer current by the time
    192   it goes to write the updated object, and writing at this point would result
    193   in stale (or, depending on the application, corrupt) data.
    194 
    195   To prevent this, you can find the version-specific name of the object that was
    196   created, and then use the information contained in that URI to specify an
    197   x-goog-if-generation-match header on a subsequent gsutil cp command. You can
    198   do this in two steps. First, use the gsutil cp -v option at upload time to get
    199   the version-specific name of the object that was created, for example:
    200 
    201     gsutil cp -v file gs://bucket/object
    202 
    203   might output:
    204 
    205     Created: gs://bucket/object#1360432179236000
    206 
    207   You can extract the generation value from this object and then construct a
    208   subsequent gsutil command like this:
    209 
    210     gsutil -h x-goog-if-generation-match:1360432179236000 cp newfile \\
    211         gs://bucket/object
    212 
    213   This command requests Google Cloud Storage to attempt to upload newfile
    214   but to fail the request if the generation of newfile that is live at the
    215   time of the upload does not match that specified.
    216 
    217   If the command you use updates object metadata, you will need to find the
    218   current metageneration for an object. To do this, use the gsutil ls -a and
    219   -l options. For example, the command:
    220 
    221     gsutil ls -l -a gs://bucket/object
    222 
    223   will output something like:
    224 
    225       64  2013-02-12T19:59:13Z  gs://bucket/object#1360699153986000  metageneration=3
    226     1521  2013-02-13T02:04:08Z  gs://bucket/object#1360721048778000  metageneration=2
    227 
    228   Given this information, you could use the following command to request setting
    229   the ACL on the older version of the object, such that the command will fail
    230   unless that is the current version of the data+metadata:
    231 
    232     gsutil -h x-goog-if-generation-match:1360699153986000 -h \\
    233       x-goog-if-metageneration-match:3 acl set public-read \\
    234       gs://bucket/object#1360699153986000
    235 
    236   Without adding these headers, the update would simply overwrite the existing
    237   ACL. Note that in contrast, the "gsutil acl ch" command uses these headers
    238   automatically, because it performs a read-modify-write cycle in order to edit
    239   ACLs.
    240 
    241   If you want to experiment with how generations and metagenerations work, try
    242   the following. First, upload an object; then use gsutil ls -l -a to list all
    243   versions of the object, along with each version's metageneration; then re-
    244   upload the object and repeat the gsutil ls -l -a. You should see two object
    245   versions, each with metageneration=1. Now try setting the ACL, and rerun the
    246   gsutil ls -l -a. You should see the most recent object generation now has
    247   metageneration=2.
    248 
    249 
    250 <B>FOR MORE INFORMATION</B>
    251   For more details on how to use versioning and preconditions, see
    252   https://developers.google.com/storage/docs/object-versioning
    253 """)
    254 
    255 
    256 class CommandOptions(HelpProvider):
    257   """Additional help about object versioning."""
    258 
    259   # Help specification. See help_provider.py for documentation.
    260   help_spec = HelpProvider.HelpSpec(
    261       help_name='versions',
    262       help_name_aliases=['concurrency', 'concurrency control'],
    263       help_type='additional_help',
    264       help_one_line_summary='Object Versioning and Concurrency Control',
    265       help_text=_DETAILED_HELP_TEXT,
    266       subcommand_help_text={},
    267   )
    268