Using twitter4j with Scala to access streaming tweets

Topics: twitter, twitter4j, sbt

Introduction

My previous post provided a walk-through for using the Twitter streaming API from the command line, but tweets can be more flexibly obtained and processed using an API for accessing Twitter using your programming language of choice. In this tutorial, I walk-through basic setup and some simple uses of the twitter4j library with Scala. Much of what I show here should be useful for those using other JVM languages like Clojure and Java. If you haven’t gone through the previous tutorial, have a look now before going on as this tutorial covers much of the same material but using twitter4j rather than HTTP requests.

I’ll introduce code, bit by bit, for accessing the Twitter data in different ways. If you get lost with what should go where, all of the code necessary to run the commands is available in this github gist, so you can compare to that as you move through the tutorial.

Update: The tutorial is set up to take you from nothing to being able to obtain tweets in various ways, but you can also get all the relevant code by looking at the twitter4j-tutorial repository. For this tutorial, the tag is v0.1.0, and you can also download a tarball of that version.

Getting set up

An easy way to use the twitter4j library in the context of a tutorial like this is for the reader to set up a new SBT project, declare it as a dependency, and then compile and run code within SBT. (See my tutorial on using Jerkson for processing JSON with Scala for another example of this.) This sorts out the process of obtaining external libraries and setting up the classpath so that they are available. Follow the instructions in this section to do so.

[sourcecode language=”bash”]
$ mkdir ~/twitter4j-tutorial
$ cd ~/twitter4j-tutorial/
$ wget http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.12.2/sbt-launch.jar
[/sourcecode]

Now, save the following as the file ~/twitter4j-tutorial/build.sbt. Be aware that it is important to keep the empty lines between each of the declarations.

[sourcecode language=”scala”]
name := "twitter4j-tutorial"

version := "0.1.0 "

scalaVersion := "2.10.0"

libraryDependencies += "org.twitter4j" % "twitter4j-stream" % "3.0.3"
[/sourcecode]

Then save the following as the file ~/twitter4j-tutorial/build.

[sourcecode language=”bash”]
java -Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=384M -jar `dirname $0`/sbt-launch.jar "$@"
[/sourcecode]

Make that file executable and run it, which will show SBT doing a bunch of work and then leave you with the SBT prompt. At the SBT prompt, invoke the update command.

[sourcecode language=”bash”]
$ cd ~/twitter4j-tutorial
$ chmod a+x build
$ ./build
[info] Set current project to twitter4j-tutorial (in build file:/Users/jbaldrid/twitter4j-tutorial/)
> update
[info] Updating {file:/Users/jbaldrid/twitter4j-tutorial/}default-570731…
[info] Resolving org.twitter4j#twitter4j-core;3.0.3 …
[info] Done updating.
[success] Total time: 1 s, completed Feb 8, 2013 12:55:41 PM
[/sourcecode]

To test whether you have access to twitter4j now, go to the SBT console and import the classes from the main twitter4j package.

[sourcecode language=”scala”]
> console
[info] Starting scala interpreter…
[info]
Welcome to Scala version 2.10.0 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_37).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import twitter4j._
import twitter4j._
[/sourcecode]

If nothing further is output, then you are all set (exit the console using CTRL-D). If things are amiss (or if you are running in the default Scala REPL), you’ll instead see something like the following.

[sourcecode language=”scala”]
scala> import twitter4j._
<console>:7: error: not found: value twitter4j
import twitter4j._
^
[/sourcecode]

If this is what you got, try to follow the instructions above again to make sure that your setup is exactly as above (check the versions, etc).

If you just want to see some examples of using twitter4j as an API and are happy adding its jars by hand to your classpath or are using an IDE like Eclipse, then it is unnecessary to do the SBT setup — just read on and adapt the examples as necessary.

Write, compile and run a simple main method

To set the stage for how we’ll run programs in this tutorial, let’s create a simple main method and ensure it can be run in SBT. Do the following:

[sourcecode language=”bash”]
$ mkdir -p ~/twitter4j-tutorial/src/main/scala/
[/sourcecode]

Next, save the following code as ~/twitter4j-tutorial/src/main/scala/TwitterStream.scala.

[sourcecode language=”scala”]
package bcomposes.twitter

import twitter4j._

object StatusStreamer {
def main(args: Array[String]) {
println("hi")
}
}
[/sourcecode]

Next, at the SBT prompt for the twitter4j-tutorial project, use the run-main command as follows.

[sourcecode language=”scala”]
> run-main bcomposes.twitter.StatusStreamer
[info] Compiling 1 Scala source to /Users/jbaldrid/twitter4j-tutorial/target/scala-2.10/classes…
[info] Running bcomposes.twitter.StatusStreamer
hi
[success] Total time: 2 s, completed Feb 8, 2013 1:36:32 PM
[/sourcecode]

SBT compiles the code, and then runs it. This is a generally handy way of running code with all the dependencies available without having to worry about explicitly handling the classpath.

In what comes below, we’ll flesh out that main method so that it does more interesting work.

Setting up authorization

When using the Twitter streaming API to access tweets via HTTP requests, you must supply your Twitter username and password. To use twitter4j, you also must provide authentication details; however, for this you need to set up OAuth authentication. This is straightforward:

  1. Go to https://dev.twitter.com/apps and click on the button that says “Create a new application” (of course, you’ll need to log in with your Twitter username and password in order to do this)
  2. Fill in the name, description and website fields. Don’t worry too much about this: put in whatever you like for the name and description (e.g. “My example application” and “Tutorial app for me”). For the website, give the URL of your Twitter account if you don’t have anything better to use.
  3. A new screen will come up for your application. Click on the button at the bottom that says “Create my access token”.
  4. Click on the “OAuth tool” tab and you’ll see four fields for authentication which you need in order to use twitter4j to access tweets and other information from Twitter: Consumer key, Consumer secret, Access token, and Access token secret.

Based on these authorization details, you now need to create a twitter4j.conf.Configuration object that will allow twitter4j to access the Twitter API on your behalf. This can be done in a number of different ways, including environment variables, properties files, and in code. To keep it as simple as possible for this tutorial, we’ll go with the latter option.

Add the following object after the definition of StatusStreamer, providing your details rather than the descriptions given below.

[sourcecode language=”scala”]
object Util {
val config = new twitter4j.conf.ConfigurationBuilder()
.setOAuthConsumerKey("[your consumer key here]")
.setOAuthConsumerSecret("[your consumer secret here]")
.setOAuthAccessToken("[your access token here]")
.setOAuthAccessTokenSecret("[your access token secret here]")
.build
}
[/sourcecode]

You should of course be careful not to let your details be known to others, so make sure that this code stays on your machine. When you start developing for real, you’ll use other means to get the authorization information injected into your application.

Pulling tweets from the sample stream

In the previous tutorial, the most basic sort of access was to get a random sample of tweets from https://stream.twitter.com/1/statuses/sample.json, so let’s use twitter4j to do the same.

To do this, we are going to create a TwitterStream instance that gives us an authorized connection to the Twitter API. To see all the methods associated with the TwitterStream class, see the API documentation for TwitterStream.  A TwitterStream instance is able to get tweets (and other information) and then provide them to any listeners that have registered with it. So, in order to do something useful with the tweets, you need to implement the StatusListener interface and connect it to the TwitterStream.

Before showing the code for creating and using the stream, let’s create a StatusListener that will perform a simple action based on tweets streaming in. Add the following code to the Util object created earlier.

[sourcecode language=”scala”]
def simpleStatusListener = new StatusListener() {
def onStatus(status: Status) { println(status.getText) }
def onDeletionNotice(statusDeletionNotice: StatusDeletionNotice) {}
def onTrackLimitationNotice(numberOfLimitedStatuses: Int) {}
def onException(ex: Exception) { ex.printStackTrace }
def onScrubGeo(arg0: Long, arg1: Long) {}
def onStallWarning(warning: StallWarning) {}
}
[/sourcecode]

This method creates objects that implement StatusListener (though it only does something useful for the onStatus method and otherwise ignores all other events sent to it). Clearly, what it is going to do is take a Twitter status (which is all of the information associated with a tweet, including author, retweets, geographic coordinates, etc) and output the text of the status—i.e., what we usually think of as a “tweet”.

The following code puts it all together. We create a TwitterStream object by using the TwitterStreamFactory and the configuration, add a simpleStatusListener to the stream, and then call the sample method of TwitterStream to start receiving tweets. If that were the last line of the program, it would just keep receiving tweets until the process was killed. Here, I’ve added a 2 second sleep so that we can see some tweets, then clean up the connection and shut it down cleanly. (We could let it run indefinitely, but then to kill the process, we would need to use CTRL-C, which will kill not only that process, but also the process that is running SBT.)

[sourcecode language=”scala”]
object StatusStreamer {
def main(args: Array[String]) {
val twitterStream = new TwitterStreamFactory(Util.config).getInstance
twitterStream.addListener(Util.simpleStatusListener)
twitterStream.sample
Thread.sleep(2000)
twitterStream.cleanUp
twitterStream.shutdown
}
}
[/sourcecode]

To run this code, simply put in the same run-main command in SBT as before.

[sourcecode language=”scala”]
> run-main bcomposes.twitter.StatusStreamer
[/sourcecode]

You should see tweets stream by for a couple of seconds and then you’ll be returned to the SBT prompt.

Pulling tweets with specific properties

As with the HTTP streaming, it’s easy to use twitter4j to follow a particular set of users, particular search terms, or tweets produced within certain geographic regions. All that is required is creating appropriate FilterQuery objects and then using the filter method of TwitterStream rather than the sample method.

FilterQuery has several constructors, one of which allows an Array of Long values to be provided, each of which is the id of a Twitter user who is to be followed by the stream. (See the previous tutorial to see one easy way to get the id of a user based on their username.)

[sourcecode language=”scala”]
object FollowIdsStreamer {
def main(args: Array[String]) {
val twitterStream = new TwitterStreamFactory(Util.config).getInstance
twitterStream.addListener(Util.simpleStatusListener)
twitterStream.filter(new FilterQuery(Array(1344951,5988062,807095,3108351)))
Thread.sleep(10000)
twitterStream.cleanUp
twitterStream.shutdown
}
}
[/sourcecode]

These are the IDs for Wired Magazine (@wired), The Economist (@theeconomist), the New York Times (@nytimes), and the Wall Street Journal (@wsj). Add the code to TwitterStream.scala and then run it in SBT. Note that I’ve made the program sleep for 10 seconds in order to give more time for tweets to arrive (since these are just four accounts and will have varying activity). If you are not seeing anything show up, increase the sleep time.

[sourcecode language=”scala”]
> run-main bcomposes.twitter.FollowIdsStreamer
[/sourcecode]

To track tweets that contain particular terms, create a FilterQuery with the default constructor and then call the track method with an Array of strings that contains the query terms you are interested in. The object below does this, and uses the args Array as the container for the query terms.

[sourcecode language=”scala”]
object SearchStreamer {
def main(args: Array[String]) {
val twitterStream = new TwitterStreamFactory(Util.config).getInstance
twitterStream.addListener(Util.simpleStatusListener)
twitterStream.filter(new FilterQuery().track(args))
Thread.sleep(10000)
twitterStream.cleanUp
twitterStream.shutdown
}
}
[/sourcecode]

With things set up this way, you can track arbitrary queries by specifying them on the command line.

[sourcecode language=”scala”]
> run-main bcomposes.twitter.SearchStreamer scala
> run-main bcomposes.twitter.SearchStreamer scala python java
> run-main bcomposes.twitter.SearchStreamer "sentiment analysis" "machine learning" "text analytics"
[/sourcecode]

If the search terms are not particularly common, you’ll need to increase the sleep time.

To filter by location, again create a FilterQuery with the default constructor, but then use the locations method, with an Array[Array[Double]] argument — basically an Array of two-element Arrays, each of which contains the latitude and longitude of a corner of a bounding box. Here’s an example that creates bounding box for Austin and uses it.

[sourcecode language=”scala”]
object AustinStreamer {
def main(args: Array[String]) {
val twitterStream = new TwitterStreamFactory(Util.config).getInstance
twitterStream.addListener(Util.simpleStatusListener)
val austinBox = Array(Array(-97.8,30.25),Array(-97.65,30.35))
twitterStream.filter(new FilterQuery().locations(austinBox))
Thread.sleep(10000)
twitterStream.cleanUp
twitterStream.shutdown
}
}
[/sourcecode]

To make things more flexible, we can take the bounding box information on the command line, convert the Strings into Doubles and pair them up.

[sourcecode language=”scala”]
object LocationStreamer {
def main(args: Array[String]) {
val boundingBoxes = args.map(_.toDouble).grouped(2).toArray
val twitterStream = new TwitterStreamFactory(Util.config).getInstance
twitterStream.addListener(Util.simpleStatusListener)
twitterStream.filter(new FilterQuery().locations(boundingBoxes))
Thread.sleep(10000)
twitterStream.cleanUp
twitterStream.shutdown
}
}
[/sourcecode]

We can call LocationStreamer with multiple bounding boxes, e.g. as follows for Austin, San Francisco, and New York City.

[sourcecode language=”scala”]
> run-main bcomposes.twitter.LocationStreamer -97.8 30.25 -97.65 30.35 -122.75 36.8 -121.75 37.8 -74 40 -73 41
[/sourcecode]

Conclusion

This shows the start of how you can use twitter4j with Scala for streaming. It also supports programmatic access to the actions that any Twitter user can take, including posting messages, retweeting, following, and more. I’ll cover that in a later tutorial. Also, some examples of using twitter4j will start showing up soon in the tshrldu project.

A walk-through for the Twitter streaming API

Topics: Twitter, streaming API

Introduction

Analyzing tweets is all the rage, and if you are new to the game you want to know how to get them programmatically. There are many ways to do this, but a great start is to use the Twitter streaming API, a RESTful service that allows you to pull tweets in real time based on criteria you specify. For most people, this will mean having access to the spritzer, which provides only a very small percentage of all the tweets going through Twitter at any given moment. For access to more, you need to have a special relationship with Twitter or pay Twitter or an affiliate like Gnip.

This post provides a basic walk-through for using the Twitter streaming API. You can get all of this based on the documentation provided by Twitter, but this will be slightly easier going for those new to such services. (This post is mainly geared for the first phase of the course project for students in my Applied Natural Language Processing class this semester.)

You need to have a Twitter account to do this walk-through, so obtain one now if you don’t have one already.

Accessing a random sample of tweets

First, trying pulling a random sample of tweets using your browser by going to the following link.

  • https://stream.twitter.com/1/statuses/sample.json

You should see a growing, unwieldy list of raw tweets flowing by. It should look something like the following image.

tweets_sample

Here’s an example of a “raw” tweet (which comes in JSON, or JavaScript Object Notation):

[sourcecode language=”json”]
{"text":"#LetsGoMavs til the end RT @dallasmavs: Are You ALL IN?","truncated":false,"retweeted":false,"geo":null,"retweet_count":0,"source":"web","in_reply_to_status_id_str":null,"created_at":"Wed Apr 25 15:47:39 +0000 2012","in_reply_to_user_id_str":null,"id_str":"195177260792299521","coordinates":null,"in_reply_to_user_id":null,"favorited":false,"entities":{"hashtags":[{"text":"LetsGoMavs","indices":[0,11]}],"urls":[],"user_mentions":[{"indices":[27,38],"screen_name":"dallasmavs","id_str":"22185437","name":"Dallas Mavericks","id":22185437}]},"contributors":null,"user":{"show_all_inline_media":true,"statuses_count":3101,"following":null,"profile_background_image_url_https":"https://si0.twimg.com/profile_background_images/285480449/AAC_med500.jpg","profile_sidebar_border_color":"eeeeee","screen_name":"flyingcape","follow_request_sent":null,"verified":false,"listed_count":2,"profile_use_background_image":true,"time_zone":"Mountain Time (US &amp; Canada)","description":"HUGE ROCKETS &amp; MAVS fan. Lets take down the Lakers &amp; beat up on the East. Inaugural member of the FC Dallas – Fort Worth fan club.","profile_text_color":"333333","default_profile":false,"profile_background_image_url":"http://a0.twimg.com/profile_background_images/285480449/AAC_med500.jpg","created_at":"Thu Oct 21 15:40:21 +0000 2010","is_translator":false,"profile_link_color":"1212cc","followers_count":35,"url":null,"profile_image_url_https":"https://si0.twimg.com/profile_images/1658982184/204970_10100514487859080_7909803_68807593_5366704_o_normal.jpg","profile_image_url":"http://a0.twimg.com/profile_images/1658982184/204970_10100514487859080_7909803_68807593_5366704_o_normal.jpg","id_str":"205774740","protected":false,"contributors_enabled":false,"geo_enabled":true,"notifications":null,"profile_background_color":"0a2afa","name":"Mandy","default_profile_image":false,"lang":"en","profile_background_tile":true,"friends_count":48,"location":"ATX / FDub. From Galveston !","id":205774740,"utc_offset":-25200,"favourites_count":231,"profile_sidebar_fill_color":"efefef"},"id":195177260792299521,"place":{"bounding_box":{"type":"Polygon","coordinates":[[[-97.938383,30.098659],[-97.56842,30.098659],[-97.56842,30.49685],[-97.938383,30.49685]]]},"country":"United States","url":"http://api.twitter.com/1/geo/id/c3f37afa9efcf94b.json","attributes":{},"full_name":"Austin, TX","country_code":"US","name":"Austin","place_type":"city","id":"c3f37afa9efcf94b"},"in_reply_to_screen_name":null,"in_reply_to_status_id":null}
[/sourcecode]

There is a lot of information in there beyond the tweet text itself, which is simply “#LetsGoMavs til the end RT @dallasmavs: Are You ALL IN?” It is basically a map from attributes to values (and values may themselves be such a map, e.g. for the “user” attribute above). You can see whether the tweet has been retweeted (which will be zero when the tweet is first published), what time it was created, the unique tweet id, the geo-coordinates (if available), and more. If an attribute does not have a value for the tweet, it is ‘null’.

I will return to JSON processing of tweets in a later tutorial, but you can get a head start by seeing my tutorial on using Scala to process JSON in general.

Command line access to tweets

Assuming you were successful in being able to view tweets in the browser, we can now proceed to using the command line. For this, it will be convenient to first set environment variables for your Twitter username and password.

[sourcecode language=”bash”]
$ export TWUSER=foo
$ export TWPWD=bar
[/sourcecode]

Obviously, you need to provide your Twitter account details instead of foo and bar…

Next, we’ll use the program curl to interact with the API. Try it out by downloading this blog post.

[sourcecode language=”bash”]
$ curl http://bcomposes.wordpress.com/2013/01/25/a-walk-through-for-the-twitter-streaming-api/ > bcomposes-twitter-api.html
$ less bcomposes-twitter-api.html
[/sourcecode]

Given that you pulled tweets from the API using your web browser, and that curl can access web pages in this way, it is simple to use curl to get tweets and direct them straight to a file.

[sourcecode language=”bash”]
$ curl https://stream.twitter.com/1/statuses/sample.json -u$TWUSER:$TWPWD > tweets.json
[/sourcecode]

That’s it: you now have an ever-growing file with randomly sampled tweets. Have a look and try not to lose your faith in humanity. 😉

Pulling tweets with specific properties

You might want to get the tweets from specific users rather than a random sample. This requires user ids rather than the user names we usually see. The id for a user can be obtained from the Twitter API by looking at the /users/show endpoint. For example, the following gives my information:

  • https://api.twitter.com/1/users/show.xml?screen_name=jasonbaldridge

Which gives:

[sourcecode language=”xml”]

<user>
<id>119837224</id>
<name>Jason Baldridge</name>
<screen_name>jasonbaldridge</screen_name>
<location>Austin, Texas</location>
<description>
Assoc. Prof., Computational Linguistics, UT Austin. Senior Data Scientist, Converseon. OpenNLP developer. Scala, Java, R, and Python programmer.
</description>
…MORE…

[/sourcecode]

So, to follow @jasonbaldridge via the Twitter API, you need user id 119837224. You can pull my tweets via the API using the “follow” query parameter.

[sourcecode language=”bash”]
$ curl -d follow=119837224 https://stream.twitter.com/1/statuses/filter.json -u$TWUSER:$TWPWD
[/sourcecode]

There is a good chance I’m not tweeting right now, so you’ll probably not see anything. Let’s follow more users, which we can do by adding more id’s separated by commas.

[sourcecode language=”bash”]
$ curl -d follow=1344951,5988062,807095,3108351 https://stream.twitter.com/1/statuses/filter.json -u$TWUSER:$TWPWD
[/sourcecode]

This will follow Wired Magazine (@wired), The Economist (@theeconomist), the New York Times (@nytimes), and the Wall Street Journal (@wsj).

You can also write those ids to a file and read them from the file. For example:

[sourcecode language=”bash”]
$ echo "follow=1344951,5988062,807095,3108351" > following
$ curl -d @following https://stream.twitter.com/1/statuses/filter.json -u$TWUSER:$TWPWD
[/sourcecode]

You can of course edit the file “following” rather than using echo to create it. Also, the file name can be named whatever you like (“following” as the name is not important here).

You can search for a particular term in tweets, such as “Scala”, using the “track” query parameter.

[sourcecode language=”bash”]
$ curl -d track=scala https://stream.twitter.com/1/statuses/filter.json -u$TWUSER:$TWPWD
[/sourcecode]

And, no surprise, you can search for multiple items by using commas to separate them.

[sourcecode language=”bash”]
$ curl -d track=scala,python,java https://stream.twitter.com/1/statuses/filter.json -u$TWUSER:$TWPWD
[/sourcecode]

However, this only requires that a tweet match at least one of these terms. If you want to ensure that multiple terms match, you’ll need to write them to a file and then refer to that file. For example, to get tweets that have both “sentiment” and “analysis” OR both “machine” and “learning” OR both “text” and “analytics”, you could do the following:

[sourcecode language=”bash”]
$ echo "track=sentiment analysis,machine learning,text analytics" > tracking
$ curl -d @tracking https://stream.twitter.com/1/statuses/filter.json -u$TWUSER:$TWPWD
[/sourcecode]

You can pull tweets from a specific rectangular area (bounding box) on the Earth’s surface. For example, the following pulls geotagged tweets from Austin, Texas.

[sourcecode language=”bash”]
$ curl -d locations=-97.8,30.25,-97.65,30.35 https://stream.twitter.com/1/statuses/filter.json -u$TWUSER:$TWPWD
[/sourcecode]

The bounding box is given as latitude (bottom left), longitude (bottom left), latitude (top right), longitude (top right). You can add further bounding boxes to capture more locations. For example, the following captures tweets from Austin, San Francisco, and New York City.

[sourcecode language=”bash”]
$ curl -d locations=-97.8,30.25,-97.65,30.35,-122.75,36.8,-121.75,37.8,-74,40,-73,41 https://stream.twitter.com/1/statuses/filter.json -u$TWUSER:$TWPWD
[/sourcecode]

Conclusion

It’s all pretty straightforward, and quite handy for many kinds of tweet-gathering needs. One of the problems is that Twitter will drop the connection at times, and you’ll end up missing tweets until you start a new process. If you need constant monitoring,  see UT Austin’s Twools (Twitter tools) for obtaining a steady stream of tweets that picks up whenever Twitter drops your connection.

In a later post, I’ll detail how to use an API like twitter4j to pull tweets and interact with Twitter at a more fundamental level.

Slides and video for geolocation talk at CLSP

I gave a talk on “Text Geolocation and (Text) Dating” last week at Johns Hopkins University’s Center for Language and Speech Processing. It covers some of our recent work at UT Austin on automaticall geolocating and time-stamping documents. It also includes some initial thoughts on how this can help us move toward quasi-grounded models of word meaning, which is a research area that Katrin Erk and I have recently begun to work on. Here’s the abstract:

It used to be that computational linguists had to collaborate with roboticists in order to work on grounding language in the real world. However, since the advent of the internet, and particularly in the last decade, the world has been brought within digital scope. People’s social and business interactions are increasingly mediated through a medium that is dominated by text. They tweet from places, express their opinions openly, give descriptions of photos, and generally reveal a great deal about themselves in doing so, including their location, gender, age, social status, relationships and more. In this talk, I’ll discuss work on geolocation and dating of texts, that is, identifying a sets of latitude-longitude pairs and time periods that a document is about or related to. These applications and the models developed for them set the stage for deeper investigations into computational models of word meaning that go beyond standard word vectors and into augmented multi-component representations that include dimensions connected to the real world via geographic and temporal values and beyond.

CLSP records all their seminars, and the video for my talk has been posted.

Jason Baldridge – “Text Geolocation and Dating: Light-Weight Language Grounding” from Chris Callison-Burch on Vimeo.

Unfortunately, the slides are not visible in the talk, so I’ve posted them here: slides for Text Geolocation and Dating.

The code that has been used for most of the research discussed is open source, and available on Github: Textgrounder. You can see related publications on my papers page.

Errata: I believe I say at one point that Alessandro Moschitti had created the TweetToLife application that I used to get temporal distributions on Twitter over the day, but that was actually Marco Baroni I had in mind (joint work with Amaç Herdağdelen). Getting my Italians mixed-up! (I had just read a nice paper by Moschitti, so my prior was off…)