Parsing foursquare KML files

We’re using Foursquare as a data logger for one of our assignments in the Telling Stories with Sensors, Data and Humans class at ITP. As an aid to begin understanding the relationships between venues for our tracks, it’s helpful to munge the KML into CSV so it can be plotted and played with in a spreadsheet, Illustrator, R, Processing or whatever…

Below is a short python script to parse a Foursquare KML file into a simple CSV file. It outputs the check-in name, description, timestamp and location (as lat, lon). The Foursquare KML feed is available at the Feeds page on their site.This script relies on BeautifulSoup and, of course, python. Give the script the filename of the KML file you want to parse and it will output out.csv in the current directory. eg.

$ python parse_kml.py MyFoursquareFeed.kml

Here’s the file: parse_kml

# parse each file from the photo collection and export data into CSV.
# will need: os.listdir(path)
import sys
import os
import codecs
import csv
from BeautifulSoup import BeautifulStoneSoup

# get the file list:
if len(sys.argv) > 1:
    dir = sys.argv[1]
else:
    dir = os.getcwd()
    
file = dir

# create the output dictionary
outputData = []

# sanity checking, only work on kml files
if file.endswith('.kml') == 0: sys.exit(-1)

print "Reading file: "+file

fh = codecs.open(file,'r',"utf-8")
html = fh.read()
fh.close()

soup = BeautifulStoneSoup(html)
#print soup.prettify()

# create a new dictionary for the current image's data
imageData = dict();

# get the image data:
dataTable = soup.findAll('placemark')
for i in dataTable:
    row = i.contents

    # add the current data to the dict
    imageData = {}
    imageData['Name'] = row[0].contents[0].string.encode("ascii","ignore")
    imageData['Description'] = row[1].contents[0].string.encode("ascii","ignore")
    imageData['Time'] = row[3].contents[0].string.encode("ascii","ignore")
    coord = row[5].coordinates.contents[0].string.encode("ascii","ignore")
    imageData['Lon'] = coord.split(',')[0]
    imageData['Lat'] = coord.split(',')[1]

    # add this image's data to the list
    outputData.append(imageData)

#print outputData

# create the output file
out = codecs.open(os.getcwd() + "/out.csv", 'w',"utf-8")
firstRun = 1

print "Writing output file: "+ out.name
try:
    fieldnames = sorted(outputData[0].keys())
    fieldnames.reverse()
    writer = csv.DictWriter(out,dialect='excel', fieldnames=fieldnames, extrasaction='ignore', quoting=csv.QUOTE_NONNUMERIC)
    headers = dict( (n,n) for n in fieldnames )
    writer.writerow(headers)

    for row in outputData:
         writer.writerow(row)

finally:
    out.close()


Posted

in

by

Comments

One response to “Parsing foursquare KML files”

  1. […] gave us a rough idea of what the check-ins looked like geographically. I put together a simple python script which parses the KML into an easy-to-ingest CSV file. I analyzed this data in R for several days, […]

Leave a Reply