Thursday, September 19, 2013

How to Get an OAuth 2.0 Token's Scopes

Occasionally, you may have an OAuth 2.0 refresh token for a Google account and need to know which scopes it's valid for.

What you'll need:
  • A refresh token or a session token.
  • The client ID
  • The client secret

First, if we have a refresh token instead of a session token, we'll need to convert that refresh token into a valid session token (set CLIENT_ID, CLIENT_SECRET, and REFRESH_TOKEN environment variables appropriately):

$ curl -s -S -X POST -H "Host: accounts.google.com" -H "Content-Type: application/x-www-form-urlencoded" -d "client_id=${CLIENT_ID}&client_secret=${CLIENT_SECRET}&refresh_token=${REFRESH_TOKEN}&grant_type=refresh_token" "https://accounts.google.com/o/oauth2/token" | jq .access_token

Bam, session token.

Second, we need to get a list of scopes for which the session token is valid:

curl -s -S "https://www.googleapis.com/oauth2/v1/tokeninfo?access_token=${access_token}" | jq .scope

Bam, list of scopes!

Here's the whole thing as a simple bash script:

<pre  style="font-family:arial;font-size:12px;border:1px dashed #CCCCCC;width:99%;height:auto;overflow:auto;background:#f0f0f0;padding:0px;color:#000000;text-align:left;line-height:20px;"><code style="color:#000000;word-wrap:normal;">#!/bin/bash  
   
 if [ $# -ne 3 ]; then  
  echo  
  echo "Requires 3 arguments: client_id client_secret refresh_token"  
  echo  
  exit 65  
 fi  
   
 echo $1, $2, $3  
   
 CLIENT_ID=$1  
 CLIENT_SECRET=$2  
 REFRESH_TOKEN=$3  
   
 #set -x  
   
 access_token=`curl -s -S -X POST -H "Host: accounts.google.com" -H "Content-Type: application/x-www-form-urlencoded" -d "client_id=${CLIENT_ID}&amp;client_secret=${CLIENT_SECRET}&amp;refresh_token=${REFRESH_TOKEN}&amp;grant_type=refresh_token" "https://accounts.google.com/o/oauth2/token" | jq .access_token`  
 curl -s -S "https://www.googleapis.com/oauth2/v1/tokeninfo?access_token=${access_token}" | jq .scope  
</code></pre>

And here's how we'd invoke it:
$ get_scopes 23482398423892389.googleapps.totallyfake 328238-234asdfasdssdf 1/2323xz8s8se7327

Tuesday, September 17, 2013

Calculating an MD5 Hash for Google Cloud Storage

When you upload an object to cloud storage, it's a good idea to make sure that the object that you have in the cloud is the same object that you have locally. The simplest way to test this would be to download the object and compare it to yours directly, but that's annoying for big objects. Fortunately, Google Cloud Storage automatically calculates an MD5 hash of of every object that you upload.

MD5 hashes are small numbers that are calculated by examining all of the bytes of a file. If you make any changes to the file, the MD5 hash changes. For purposes of error checking, it's statistically impossible to change the file without its MD5 hash changing as well. So if you have an object with an MD5 hash of def51393b98548cf7f9471d2820a0347, and the object in the cloud has the same hash, you can be pretty darn sure that they are the same object.

An MD5 hash is a 128 bit number. There are several ways to represent it. The popular command md5sum spits out MD5s as a string of characters 0-F representing a hexadecimal number.

$ md5sum importantFile.txt
def51393b98548cf7f9471d2820a0347 /home/automaton/importantFile.txt

Another way to represent an MD5 hash takes the MD5 hash (as a binary number) and encodes that binary value into a base64 representation. This is the way that both HTTP's "Content-MD5" header and Google Cloud Storage usually think about MD5s.

Here is how we generate this style of MD5:

$ openssl dgst -md5 -binary importantFile.txt | openssl enc -base64
3vUTk7mFSM9/lHHSggoDRw==

Just to restate, def51393b98548cf7f9471d2820a0347 and 3vUTk7mFSM9/lHHSggoDRw== are two different ways to encode the same MD5 value.

Here is how we would generate the first style of MD5 in Python:

import hashlib
hash = hashlib.md5(open(file_name,'rb').read()).hexdigest()
# 'def51393b98548cf7f9471d2820a0347'

And here's how we would generate the second style (the one used for Content-MD5 or Google Cloud Storage hashes):

import hashlib
import base64
binary_hash = hashlib.md5(open(file_name,'rb').read()).digest()
hash = base64.b64encode(binary_hash)
# "3vUTk7mFSM9/lHHSggoDRw=="

Note that one common mistake people make is to take the hexadecimal string value, "def51393b98548cf7f9471d2820a0347", and base64 it. This produces something that looks like the right kind of base64'd data, but it's wrong.

So now we have an MD5 hash. Great! We can now upload the value to cloud storage and then verify that its hash is correct. Still, it's kind of a pain that we have to first upload it and then check its metadata to make sure that it's right.

Well, actually, we don't! We can tell Google Cloud Storage exactly what hash the object we're uploading is going to have. Using the XML interface, we can shove the value into the Content-MD5 header, and using the JSON interface, we can set the md5Hash property of the metadata. If the data that shows up on the remote server doesn't have exactly that MD5 value, the upload request will fail, and we can just try and upload it again.