We’re using Foursquare as a data logger for one of our assignments in the Telling Stories with Sensors, Data and Humans class at ITP. As an aid to begin understanding the relationships between venues for our tracks, it’s helpful to munge the KML into CSV so it can be plotted and played with in a spreadsheet, Illustrator, R, Processing or whatever…
Below is a short python script to parse a Foursquare KML file into a simple CSV file. It outputs the check-in name, description, timestamp and location (as lat, lon). The Foursquare KML feed is available at the Feeds page on their site.This script relies on BeautifulSoup and, of course, python. Give the script the filename of the KML file you want to parse and it will output out.csv in the current directory. eg.
$ python parse_kml.py MyFoursquareFeed.kml
Here’s the file: parse_kml
# parse each file from the photo collection and export data into CSV. # will need: os.listdir(path) import sys import os import codecs import csv from BeautifulSoup import BeautifulStoneSoup # get the file list: if len(sys.argv) > 1: dir = sys.argv[1] else: dir = os.getcwd() file = dir # create the output dictionary outputData = [] # sanity checking, only work on kml files if file.endswith('.kml') == 0: sys.exit(-1) print "Reading file: "+file fh = codecs.open(file,'r',"utf-8") html = fh.read() fh.close() soup = BeautifulStoneSoup(html) #print soup.prettify() # create a new dictionary for the current image's data imageData = dict(); # get the image data: dataTable = soup.findAll('placemark') for i in dataTable: row = i.contents # add the current data to the dict imageData = {} imageData['Name'] = row[0].contents[0].string.encode("ascii","ignore") imageData['Description'] = row[1].contents[0].string.encode("ascii","ignore") imageData['Time'] = row[3].contents[0].string.encode("ascii","ignore") coord = row[5].coordinates.contents[0].string.encode("ascii","ignore") imageData['Lon'] = coord.split(',')[0] imageData['Lat'] = coord.split(',')[1] # add this image's data to the list outputData.append(imageData) #print outputData # create the output file out = codecs.open(os.getcwd() + "/out.csv", 'w',"utf-8") firstRun = 1 print "Writing output file: "+ out.name try: fieldnames = sorted(outputData[0].keys()) fieldnames.reverse() writer = csv.DictWriter(out,dialect='excel', fieldnames=fieldnames, extrasaction='ignore', quoting=csv.QUOTE_NONNUMERIC) headers = dict( (n,n) for n in fieldnames ) writer.writerow(headers) for row in outputData: writer.writerow(row) finally: out.close()
Leave a Reply
You must be logged in to post a comment.