Text-to-Columns in Google Docs Spreadsheets (with Python)

Despite being an Excel power user, I’m currently without Excel on my computer for the time being so for some data analysis I had to do I had to make do with Google Docs’ spreadsheets. While decent, and impressive for a web application, unfortunately it’s not great for dealing with large amounts of data. Additionally, it lacks some glaring functionality, particularly a “Text to Columns” utility, which is useful for converting in-line a csv-like delimited string of data into individual cells. While there are a couple of third-party-written add-ins for this, they don’t work very well.

My Fix

The best way I found to deal with this involved a couple of manual steps, but saved me a lot of time overall. The process was effectively the following:

  1. Copy cell with data from spreadsheet
  2. Paste into Python
  3. Process with Python, returning string ready to be copy/pasted back into Google Docs
  4. Paste into Google docs in appropriate row

The biggest challenge here was preparing a string to return that Google Docs would automatically paste into the corresponding columns as appropriate rather than a single cell. I figured out the right formatting by analyzing exactly what the contents of my clipboard were when I copied multiple columns from the spreadsheet. It turned out that the big secret was that the data was simply tab-delimited, so all I had to do was generate tab-delimited strings to paste back into Google Docs.

I used Python specifically in order to take advantage of the excellent csv built-in module, which is great at parsing all kinds of oddly formatted and delimited files.

Code you can use would be somewhat as follows:

import csv, StringIO

def text_to_columns(text):
    '''
    A function that takes in a single row of delimited data from a spreadsheet, and returns a tab-delimited string response meant to be copied and pasted back into the spreadsheet to be automatically resolved into the appropriate columns.  Note that you will likely have to `print` the return string in order to copy the tabs correctly.
    '''

    # We need to use StringIO here because csv.reader only reads buffers
    reader = csv.reader(StringIO.StringIO(text))
    print '\t'.join(reader.next())

I copied the appropriate cell from the spreadsheet, pasted into the console like so:

>>> t = '''<paste>'''<enter>

Or for multiple lines, for convenience, like so:

>>> t = ['''<paste>''',
         '''<paste>''',
         ...]

And then I ran them all through the function:

>>> for item in t:
        text_to_columns(item)

And then I copied the printed result.

Note that I used print rather than simply returning the result because I needed output that correctly rendered the tabs rather than rendering \t, which is Python’s in-language way of displaying tabs. Google Docs would only correctly place the data into separate columns if I had copied actual tabs.

Extending this solution

The true best way of extending this would be to find something as good as Python’s csv.reader in Javascript and building a Google Apps Script with it.

Otherwise, there is likely a way to have Python pull the contents of the clipboard and then place the results of the function in the clipboard. If you know of how to do this, I’d be interested to hear how.

Update

I ran across this excellent blog post that describes some useful strategies and libraries for trying to access the clipboard from Python. I haven’t tried it yet, but it’s worth looking into. Note that if you use pyobjc it takes a very long time to compile if you install using pip.