1 # -*- coding: utf-8 -*- 2 # Copyright 2012 Google Inc. All Rights Reserved. 3 # 4 # Licensed under the Apache License, Version 2.0 (the "License"); 5 # you may not use this file except in compliance with the License. 6 # You may obtain a copy of the License at 7 # 8 # http://www.apache.org/licenses/LICENSE-2.0 9 # 10 # Unless required by applicable law or agreed to in writing, software 11 # distributed under the License is distributed on an "AS IS" BASIS, 12 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 # See the License for the specific language governing permissions and 14 # limitations under the License. 15 """Additional help about object versioning.""" 16 17 from __future__ import absolute_import 18 19 from gslib.help_provider import HelpProvider 20 21 _DETAILED_HELP_TEXT = (""" 22 <B>OVERVIEW</B> 23 Versioning-enabled buckets maintain an archive of objects, providing a way to 24 un-delete data that you accidentally deleted, or to retrieve older versions of 25 your data. You can turn versioning on or off for a bucket at any time. Turning 26 versioning off leaves existing object versions in place, and simply causes the 27 bucket to stop accumulating new object versions. In this case, if you upload 28 to an existing object the current version is overwritten instead of creating 29 a new version. 30 31 Regardless of whether you have enabled versioning on a bucket, every object 32 has two associated positive integer fields: 33 34 - the generation, which is updated when the content of an object is 35 overwritten. 36 - the metageneration, which identifies the metadata generation. It starts 37 at 1; is updated every time the metadata (e.g., ACL or Content-Type) for a 38 given content generation is updated; and gets reset when the generation 39 number changes. 40 41 Of these two integers, only the generation is used when working with versioned 42 data. Both generation and metageneration can be used with concurrency control 43 (discussed in a later section). 44 45 To work with object versioning in gsutil, you can use a flavor of storage URIs 46 that that embed the object generation, which we refer to as version-specific 47 URIs. For example, the version-less object URI: 48 49 gs://bucket/object 50 51 might have have two versions, with these version-specific URIs: 52 53 gs://bucket/object#1360383693690000 54 gs://bucket/object#1360383802725000 55 56 The following sections discuss how to work with versioning and concurrency 57 control. 58 59 60 <B>OBJECT VERSIONING</B> 61 You can view, enable, and disable object versioning on a bucket using 62 the 'versioning get' and 'versioning set' commands. For example: 63 64 gsutil versioning set on gs://bucket 65 66 will enable versioning for the named bucket. See 'gsutil help versioning' 67 for additional details. 68 69 To see all object versions in a versioning-enabled bucket along with 70 their generation.metageneration information, use gsutil ls -a: 71 72 gsutil ls -a gs://bucket 73 74 You can also specify particular objects for which you want to find the 75 version-specific URI(s), or you can use wildcards: 76 77 gsutil ls -a gs://bucket/object1 gs://bucket/images/*.jpg 78 79 The generation values form a monotonically increasing sequence as you create 80 additional object versions. Because of this, the latest object version is 81 always the last one listed in the gsutil ls output for a particular object. 82 For example, if a bucket contains these three versions of gs://bucket/object: 83 84 gs://bucket/object#1360035307075000 85 gs://bucket/object#1360101007329000 86 gs://bucket/object#1360102216114000 87 88 then gs://bucket/object#1360102216114000 is the latest version and 89 gs://bucket/object#1360035307075000 is the oldest available version. 90 91 If you specify version-less URIs with gsutil, you will operate on the 92 latest not-deleted version of an object, for example: 93 94 gsutil cp gs://bucket/object ./dir 95 96 or: 97 98 gsutil rm gs://bucket/object 99 100 To operate on a specific object version, use a version-specific URI. 101 For example, suppose the output of the above gsutil ls -a command is: 102 103 gs://bucket/object#1360035307075000 104 gs://bucket/object#1360101007329000 105 106 In this case, the command: 107 108 gsutil cp gs://bucket/object#1360035307075000 ./dir 109 110 will retrieve the second most recent version of the object. 111 112 Note that version-specific URIs cannot be the target of the gsutil cp 113 command (trying to do so will result in an error), because writing to a 114 versioned object always creates a new version. 115 116 If an object has been deleted, it will not show up in a normal gsutil ls 117 listing (i.e., ls without the -a option). You can restore a deleted object by 118 running gsutil ls -a to find the available versions, and then copying one of 119 the version-specific URIs to the version-less URI, for example: 120 121 gsutil cp gs://bucket/object#1360101007329000 gs://bucket/object 122 123 Note that when you do this it creates a new object version, which will incur 124 additional charges. You can get rid of the extra copy by deleting the older 125 version-specfic object: 126 127 gsutil rm gs://bucket/object#1360101007329000 128 129 Or you can combine the two steps by using the gsutil mv command: 130 131 gsutil mv gs://bucket/object#1360101007329000 gs://bucket/object 132 133 If you want to remove all versions of an object use the gsutil rm -a option: 134 135 gsutil rm -a gs://bucket/object 136 137 Note that there is no limit to the number of older versions of an object you 138 will create if you continue to upload to the same object in a versioning- 139 enabled bucket. It is your responsibility to delete versions beyond the ones 140 you want to retain. 141 142 143 <B>COPYING VERSIONED BUCKETS</B> 144 You can copy data between two versioned buckets, using a command like: 145 146 gsutil cp -r -A gs://bucket1/* gs://bucket2 147 148 When run using versioned buckets, this command will cause every object version 149 to be copied. The copies made in gs://bucket2 will have different generation 150 numbers (since a new generation is assigned when the object copy is made), 151 but the object sort order will remain consistent. For example, gs://bucket1 152 might contain: 153 154 % gsutil ls -la gs://bucket1 10 2013-06-06T02:33:11Z 155 53 2013-02-02T22:30:57Z gs://bucket1/file#1359844257574000 metageneration=1 156 12 2013-02-02T22:30:57Z gs://bucket1/file#1359844257615000 metageneration=1 157 97 2013-02-02T22:30:57Z gs://bucket1/file#1359844257665000 metageneration=1 158 159 and after the copy, gs://bucket2 might contain: 160 161 % gsutil ls -la gs://bucket2 162 53 2013-06-06T02:33:11Z gs://bucket2/file#1370485991580000 metageneration=1 163 12 2013-06-06T02:33:14Z gs://bucket2/file#1370485994328000 metageneration=1 164 97 2013-06-06T02:33:17Z gs://bucket2/file#1370485997376000 metageneration=1 165 166 Note that the object versions are in the same order (as can be seen by the 167 same sequence of sizes in both listings), but the generation numbers (and 168 timestamps) are newer in gs://bucket2. 169 170 171 172 <B>CONCURRENCY CONTROL</B> 173 If you are building an application using Google Cloud Storage, you may need to 174 be careful about concurrency control. Normally gsutil itself isn't used for 175 this purpose, but it's possible to write scripts around gsutil that perform 176 concurrency control. 177 178 For example, suppose you want to implement a "rolling update" system using 179 gsutil, where a periodic job computes some data and uploads it to the cloud. 180 On each run, the job starts with the data that it computed from last run, and 181 computes a new value. To make this system robust, you need to have multiple 182 machines on which the job can run, which raises the possibility that two 183 simultaneous runs could attempt to update an object at the same time. This 184 leads to the following potential race condition: 185 186 - job 1 computes the new value to be written 187 - job 2 computes the new value to be written 188 - job 2 writes the new value 189 - job 1 writes the new value 190 191 In this case, the value that job 1 read is no longer current by the time 192 it goes to write the updated object, and writing at this point would result 193 in stale (or, depending on the application, corrupt) data. 194 195 To prevent this, you can find the version-specific name of the object that was 196 created, and then use the information contained in that URI to specify an 197 x-goog-if-generation-match header on a subsequent gsutil cp command. You can 198 do this in two steps. First, use the gsutil cp -v option at upload time to get 199 the version-specific name of the object that was created, for example: 200 201 gsutil cp -v file gs://bucket/object 202 203 might output: 204 205 Created: gs://bucket/object#1360432179236000 206 207 You can extract the generation value from this object and then construct a 208 subsequent gsutil command like this: 209 210 gsutil -h x-goog-if-generation-match:1360432179236000 cp newfile \\ 211 gs://bucket/object 212 213 This command requests Google Cloud Storage to attempt to upload newfile 214 but to fail the request if the generation of newfile that is live at the 215 time of the upload does not match that specified. 216 217 If the command you use updates object metadata, you will need to find the 218 current metageneration for an object. To do this, use the gsutil ls -a and 219 -l options. For example, the command: 220 221 gsutil ls -l -a gs://bucket/object 222 223 will output something like: 224 225 64 2013-02-12T19:59:13Z gs://bucket/object#1360699153986000 metageneration=3 226 1521 2013-02-13T02:04:08Z gs://bucket/object#1360721048778000 metageneration=2 227 228 Given this information, you could use the following command to request setting 229 the ACL on the older version of the object, such that the command will fail 230 unless that is the current version of the data+metadata: 231 232 gsutil -h x-goog-if-generation-match:1360699153986000 -h \\ 233 x-goog-if-metageneration-match:3 acl set public-read \\ 234 gs://bucket/object#1360699153986000 235 236 Without adding these headers, the update would simply overwrite the existing 237 ACL. Note that in contrast, the "gsutil acl ch" command uses these headers 238 automatically, because it performs a read-modify-write cycle in order to edit 239 ACLs. 240 241 If you want to experiment with how generations and metagenerations work, try 242 the following. First, upload an object; then use gsutil ls -l -a to list all 243 versions of the object, along with each version's metageneration; then re- 244 upload the object and repeat the gsutil ls -l -a. You should see two object 245 versions, each with metageneration=1. Now try setting the ACL, and rerun the 246 gsutil ls -l -a. You should see the most recent object generation now has 247 metageneration=2. 248 249 250 <B>FOR MORE INFORMATION</B> 251 For more details on how to use versioning and preconditions, see 252 https://developers.google.com/storage/docs/object-versioning 253 """) 254 255 256 class CommandOptions(HelpProvider): 257 """Additional help about object versioning.""" 258 259 # Help specification. See help_provider.py for documentation. 260 help_spec = HelpProvider.HelpSpec( 261 help_name='versions', 262 help_name_aliases=['concurrency', 'concurrency control'], 263 help_type='additional_help', 264 help_one_line_summary='Object Versioning and Concurrency Control', 265 help_text=_DETAILED_HELP_TEXT, 266 subcommand_help_text={}, 267 ) 268