Arrowsmith by Sinclair Lewis

Sinclair Lewis’s novel Arrowsmith is a fascinating look into the personal and societal questions surrounding scientific research and the lives of those who pursue it.

I just finished Sinclair Lewis’s fascinating novel Arrowsmith, published in 1925 (plain text here). If you are a scientist or are very interested in science, you should consider reading it. Arrowsmith plots the trajectory from youth to middle age of Martin Arrowsmith, a medical doctor turned researcher, and it touches on many of the daily topics a researcher encounters, as well as the personal and societal impact and questions of their work. In no small part, the force of this book comes from the fact that it was actually co-authored by Paul de Kruif, who had worked both as a professor at the University of Michigan and researcher at the Rockefeller Institute. In addition to the tremendous attention to scientific detail, it is likely that many of the characters and situations arose from his personal experience.

The book influenced many people, especially in increasing their appreciation of carefully controlled clinical trials and their skepticism of quack remedies and “scientific” therapies that were rushed to market before they’d been properly vetted. The book goes into great depth about the pressures put on a scientist (Arrowsmith) to rush to publication and creation of remedies. Lewis touches on these topics both in the context of a university (in the fictional mid-western state of Winnemac) and a top-flight research laboratory in New York City. These topics and many other aspects of the book remain relevant today.
 
The plotting is a bit slow at times, but the writing is delightful. The brief and wonderful descriptions of incidental characters especially stand out. Here’s a great example:

Watters’s house was new, and furnished in a highly built-in and leaded-glass manner. He had in three years of practice already become didactic and incredibly married; he had put on weight and infallibility; and he had learned many new things about which to be dull. Having been graduated a year earlier than Martin and having married an almost rich wife, he was kind and hospitable with an emphasis which aroused a desire to do homicide.

As someone raised in Michigan and who did my undergraduate degree in Ohio, I love the critiques of the boosterism and conformity in small to mid-sized towns in the context of bland sameness across the mid-west. As an atheist, I also appreciate the way Lewis brought up the explicit religious overtones of the mid-west and navigated his essentially agnostic/atheist main characters through that without hammering on it too much (which probably would have been unthinkable for a novel in the 1920s anyway). It was interesting for me to discover that de Kruif was born in Zeeland, Michigan and died in Holland, Michigan. Zeeland is less than an hour drive from my hometown of Rockford, and when I was growing up, we used to joke that God had his address in Zeeland (because it was such a religious town). In general, West Michigan is overwhelmingly and stridently Christian of the you-will-go-to-hell-if-you-don’t-believe-in-our-version-of-Jesus variety. This became annoying and tiresome for me growing up as a non-Christian in that area, so I personally appreciated the well-placed satirical points on organized religion in Arrowsmith.

As a scientist, I love the description of the joys of research and the tensions between doing research, earning a living, and having time/headspace for the other things in life. Here’s a great passage about Arrowsmith’s burning passion for research at a time when he is working an intern en route to becoming a doctor:

But on night duty, alone, he had to face the self he had been afraid to uncover, and he was homesick for the laboratory, for the thrill of uncharted discoveries, the quest below the surface and beyond the moment, the search for fundamental laws which the scientist (however blasphemously and colloquially he may describe it) exalts above temporary healing as the religious exalts the nature and terrible glory of God above pleasant daily virtues. With this sadness there was envy that he should be left out of things, that others should go ahead of him, ever surer in technique, more widely aware of the phenomena of biological chemistry, more deeply daring to explain laws at which the pioneers had but fumbled and hinted.

I have often felt this fear of missing out while devoting myself to other things (teaching, family, etc), and this passage captures that brilliantly. Arrowsmith eventually returns to research, after a circuitous path through being a physician and public health worker. Though I myself have chosen quite different priorities on these than Arrowsmith does, I’ve experienced the same scientific thrills and motivations, and I know plenty of people who tend more toward the pained but satisfied scientific asceticism that Arrowsmith ultimately reaches. A review of Arrowsmith by Noortje Jacobs puts it well like this: “the novel in many ways also presents its readers with a bleak vision on the possibility of having a scientific life while remaining a sociable human being.” I think it is fair to say that pretty much everybody engaged in serious scientific research navigates this tension: when research is going really well, it is an amazing experience of flow that begs for more and provides further rewards if you give it; however, we are also social animals that must nurture the relationships we choose to (or must) keep. Arrowsmith provides a detailed window into a person who chooses to live for his pure research and it highlights the costs of that choice for others, without getting sentimental about it.

So much has changed, but so much has stayed the same. While I was reading the book, I found myself already working out the storyline for a modern day Arrowsmith, with an emphasis on artificial intelligence rather than biology and clinical medicine. We have lots of ethical issues to sort out in this front, and as an atheist mid-westerner who has worked in academia as a research professor and in industry as both a consultant and co-founder and chief scientist of a startup and who cares a lot about what we do with machine learning, I’m perhaps particularly well-suited to do that someday.

References for my IZEAfest talk

I gave a talk at IZEAfest about the science of sharing. I wove together a narrative based on recent research in social network analysis and some work we’ve done at People Pattern. It was fun preparing and delivering it!

The livestream of the talk is here [NOTE: the video is currently unavailable and I’m working on finding a new link], and here are the slides.

 

In preparing for the talk, I looked at and referred to a lot of papers. Here’s a list of papers referenced in the talk! Each paper is followed by a link to a PDF or site where you can get the PDF. A huge shout-out to the fantastic work by these scholars—I was really impressed by social network analysis work and the questions and methods they have been working on.


Chen et al. (2015). “Making Use of Derived Personality: The Case of Social Media Ad Targeting.” – http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10508

Cheng et al. (2015). “Antisocial Behavior in Online Discussion Communities.” – http://arxiv.org/abs/1504.00680

Friggeri et al. (2015). “Rumor Cascades.” – http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8122

Goel et al. (2015). “The Structural Virality of Online Diffusion.” – https://5harad.com/papers/twiral.pdf

Gomez-Rodriguez et al. (2014). “Quantifying Information Overload in Social Media and its Impact on Social Contagions.” – http://arxiv.org/abs/1403.6838

Gomez Rodriguez et al. (2014). “Uncovering the structure and temporal dynamics of information propagation.” –  http://www.mpi-sws.org/~manuelgr/pubs/S2050124214000034a.pdf

Iacobelli et al. (2015). “Large Scale Personality Classification of Bloggers.” – http://www.research.ed.ac.uk/portal/files/12949424/Iacobelli_Gill_et_al_2011_Large_scale_personality_classification_of_bloggers.pdf

Kang and Lerman (2015). “User Effort and Network Structure Mediate Access to Information in Networks.” – http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10483

Kooti et al. (2015). “Evolution of Conversations in the Age of Email Overload.” – http://arxiv.org/abs/1504.00704

Kulshrestha et al (2015). “Characterizing Information Diets of Social Media Users.”  – https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewFile/10595/10505

Lerman et al. (2015). “The Majority Illusion in Social Networks.” – http://arxiv.org/abs/1506.03022

Leskovec et al. (2009). “Meme-tracking and the Dynamics of the News Cycle.” – http://www.memetracker.org/quotes-kdd09.pdf

Niculae et al. (2015). “QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns.” – http://snap.stanford.edu/quotus/

Weng et al. (2014). “Predicting Successful Memes using Network and Community Structure.” – http://arxiv.org/abs/1403.6199

Yarkoni (2010). “Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers.” – http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885844/

It’s okay for academic software to suck

Bozhidar Bozhanov wrote an blog post titled “The Low Quality of Academic Code“, in which he observed that most academic software is poorly written. He’s makes plenty of fair points, e.g.:

… there are too many freshman mistakes – not considering thread-safety, cryptic, ugly and/or stringly-typed APIs, lack of type-safety, poorly named variables and methods, choosing bad/slow serialization formats, writing debug messages to System.err (or out), lack of documentation, lack of tests.

But, here’s the thing — I would argue that this lack of engineering quality in academic software is a feature, not a bug. For academics, there is basically little to no incentive to produce high quality software, and that is how it should be. Our currency is ideas and publications based on them, and those are obtained not by creating wonderful software, but by having great results. We have limited time, and that time is best put into thinking about interesting models and careful evaluation and analysis. The code is there to support that, and is fine as long as it is correct.

The truly important metric for me is whether the code supports replication of results from the paper it supports. The code can be as ugly as you can possibly imagine as long as it does this. Unfortunately, a lot of academic software doesn’t make replication easy. Nonetheless, having the code open sourced makes it at least possible to hack with it to try to replicate previous results. In the last few years, I’ve personally put a lot of effort into having my work and my students’ work easy to replicate. I’m particularly proud of how I put code, data and documentation together for a paper I did on topic model evaluation with James Scott for AISTATS in 2013, “A recursive estimate for the predictive likelihood in a topic model.” That was a lot of work, but I’ve already benefited from it myself (in terms of being able to get the data and run my own code). Check out the “code” links in some of my other papers for some other examples that my students have done for their research.

Having said the above, I think it is really interesting to see how people who have made their code easy to use (though not always well-engineered) have benefited from doing so in the academic realm. A good example is word2vec and how the software that was released for it generated tons of interest in industry as well as academia and probably led to much wider dissemination of that work, and to more follow on work. Academia itself doesn’t reward that directly, nor should it. That’s one reason you see it coming out of companies like Google, but it might be worth it to some researchers in some cases, especially PhD students who seek industry jobs after they defend their dissertation.

I read an blog post last year in which the author encouraged people to open source their code and not worry about how crappy it was. (I wish I could remember the link, so if you have it, please add in a comment. Here is the post, “It’s okay for your open source library to be a bit shitty.“) I think this is a really important point. We should be careful to not get overly critical about code that people have made available to the world for free—not because we don’t want to damage their fragile egos, but because we want to make sure that people generally feel comfortable open sourcing. This is especially important for academic code, which is often the best recipe, no matter how flawed it might be, that future researchers can use to replicate results and produce new work that meaningfully builds on or compares to that work.

Update: Adam Lopez pointed out this very nice related article by John Regehr “Producing good software from academia“.

Addendum: When I was a graduate student at the University of Edinburgh, I wrote a software package called OpenNLP Maxent (now part of the OpenNLP toolkit, which I also started then and which is still used widely today). While I was still a student, a couple of companies paid me to improve aspects of the code and documentation, which really helped me make ends meet at the time and made the code much better. I highly encourage this model — if there is an academic open source package that you think your company could benefit from, consider hiring the grad student author to make it better for the things that matter for your needs! (Or do it yourself and do a pull request, which is much easier today with Github than it was in 2000 with Sourceforge.)

Update: Thanks to the commenters below for providing the link to the post I couldn’t remember, It’s okay for your open source library to be a bit shitty.! As a further note, the author surprisingly connects this topic to feminism in a cool way.

A hidden gem in Manning and Schutze: what to call 4+-grams?

I’m a longtime fan of Chris Manning and Hinrich Schutze’s “Foundations of Natural Language Processing” — I’ve learned from it, I’ve taught from it, and I still find myself thumbing through it from time to time. Last week, I wrote a blog post on SXSW titles that involved looking at n-grams of different lengths, including unigrams, bigrams, trigrams and … well, what do we call the next one up? Manning and Schutze devoted an entire paragraph to it on page 193 which I absolutely love and thought would be fun to share for those who haven’t seen it.

Before continuing with model-building, let us pause for a brief interlude on naming. The cases of n-gram language models that people usually use are for n=2,3,4, and these alternatives are usually referred to as a bigram, a trigram, and a four-gram model, respectively. Revealing this will surely be enough to cause an Classicists who are reading this book to stop, and leave the field to uneducated engineering sorts: “gram” is a Greek root and so should be put together with Greek number prefixes. Shannon actually did use the term “digram”, but with the declining levels of education in recent decades, this usage has not survived. As non-prescriptive linguists, however, we think that the curious mix of English, Greek, and Latin that our colleagues actually use is quite fun. So we will not try to stamp it out. (1)

And footnote (1) follows this up with a note on four-grams.

1. Rather than “four-gram”, some people do make an attempt at appearing educated by saying “quadgram”, but this is not really correct use of a Latin number prefix (which would be “quadrigram”, cf. “quadrilateral”), let alone correct use of a Greek number prefix, which would give us “a tetragram model.”

In part to be cheeky, I went with “quadrigram” in my post, which was obviously a good choice as it has led to the term being the favorite word of the week for Ken Cho, my People Pattern cofounder, and the office in general. (“Hey Jason, got any good quadrigrams in our models?”)

If you want to try out some n-gram analysis, check out my followup blog post on using Unix, Mallet, and BerkelyLM for analyzing SXSW titles. You can call 4+-grams whatever you like.

Emotional Contagion: Contextualizing the Controversy

Note: This is a repost of a blog post about the Facebook emotional contagion experiment that I wrote on People Pattern’s blog.


 

This is the first in a series of posts responding to the controversial Facebook study on Emotional Contagion

The past two weeks have seen a great deal of discussion around the recent computational social science study of Kramer, Guillory and Hancock (2014) “Experimental evidence of massive-scale emotional contagion through social networks” . I encourage you to read the published paper before getting caught up in the maelstrom of commentary. The wider issues are critical to address, and I have summarized the often conflicting but thoughtful perspectives below. These issues strike close to home, given our company’s expertise in computational linguistics and reliance on social media.

In this post, I provide a brief description of the original paper itself along with a synopsis of the many perspectives that have been put forth in the past two weeks. This post sets the stage for two posts to follow tomorrow and Tuesday next week that provide our take on the study plus our own Facebook-external opt-in version of the experiment, which anyone currently using Facebook can participate in.

Summary of the study

Kramer, Guillory and Hancock’s paper provides evidence that emotional states as expressed in social media posts are contagious in that they affect whether readers of those posts reflect similar positive or negative emotional states in their own later posts. The evidence is based on an experiment involving about 700,000 Facebook users over a one week period in January 2012. These users were split into four groups: a group that had a reduction in positive messages in their Facebook feed, another that had a reduction in negative messages, a control group that had an overall 5% reduction in posts, and a second control group that had a 2% reduction. Positivity and negativity were determined by using the LIWC word lists. LIWC, which was created and maintained by my University of Texas at Austin colleague James Pennebaker, is a standard resource for psychological studies of emotional expression in language. Over the past two decades, it has been applied to language from varying sources, including speech, essays, and social media.

The study found a small but statistically significant difference in emotional expression between the positive suppression group and the control and the negative suppression group and the control. Basically, users who had positive posts suppressed produced slightly lower rates of positive word usage and slightly higher rates of negative word usage, and the mirror image of this was found for the negative suppression group (check out the plot for these). (This description of the study is short — see Nitin Madnani’s description for more detail and analysis.)

The study was published in PNAS, and then the shit hit the fan.

Objections to the study

Objections to the study and the infrastructure that made it possible have come from many sources. The two major complaints have to do with ethical considerations and research flaws.

The first major criticism is that the study was unethical. The key problem is that there was no informed consent. Facebook users had no idea that they were part of this study and had no opportunity to opt out of it. An important aspect of this is that the study conforms to the Facebook terms of service: Facebook has the right to experiment with feed filtering algorithms as part of improving its service. However, because Jeff Hancock is a Cornell University professor, many state it should have passed Cornell’s IRB process. Furthermore, many feel that Facebook should obtain consent from users when running such experiments, whether for eventual publication or for in-company studies to improve the service. The editors of PNAS itself have issued an editorial expression of concern over the lack of informed consent and opt-out for subjects of the study. We agree this is an issue, so in our third post, we’ll introduce a way this can be achieved through an opt-in version of the study.

The second type of criticism is that the research is flawed or otherwise unconvincing. The most obvious issue is that the effect sizes are small. A subtler problem familiar to anyone who has done anything with sentiment analysis is that counting positive and negative words is a highly imperfect means for judging the positivity/negativity of a text (e.g. it does the wrong thing with negations and sarcasm — see Pang and Lee’s overview). Furthermore, the finding that reducing positive words seen leads to fewer positive words produced does not mean that the user’s actual mood was affected. We will return to this last point in tomorrow’s post.

Support for the study

In response, several authors have joined the discussion to support the study and others similar to it, or to refute some aspects of the criticism leveled at it.

Several commentators have made unequivocal statements that the study would have never obtained IRB approval. This is in fact a misperception: Michelle Meyer provides a great overview of many aspects of IRB approval and concludes that actually this particular study could have legitimately passed the IRB process. A key point for her is that had an IRB approved the study, it would probably be the right decision. She concludes: “We can certainly have a conversation about the appropriateness of Facebook-like manipulations, data mining, and other 21st-century practices. But so long as we allow private entities freely to engage in these practices, we ought not unduly restrain academics trying to determine their effects.”

Another defense is that many concerns expressed about the study are misplaced. Tal Yarkoni argues “In Defense of Facebook” that many critics have inappropriately framed the experimental procedure as injecting positive or negative content into feeds, when in fact it was removal of content. Secondly, he notes that Facebook already manipulates users’ feeds, and this study is essentially business-as-usual in this respect. Yarkoni notes that it is a good thing that Facebook publishes such research: “by far the most likely outcome of the backlash Facebook is currently experiencing is that, in future, its leadership will be less likely to allow its data scientists to publish their findings in the scientific literature.” They will do the work regardless, but the public will have less visibility into the kinds of questions Facebook can ask and the capabilities they can build based on the answers they find.

Duncan Watts takes this to another level, saying that companies like Facebook actually have a moral obligation to conduct such research. He writes in the Guardian that the existence of social networks like Facebook gives us an amazing new platform for social science research, akin to the advent of the microscope. He argues that companies like Facebook, as the gatekeepers of such networks, must perform and disseminate research into questions such how users are affected by the content they see.

Finally, such collaborations between industry and academia should be encouraged. Kate Niederhoffer and James Pennebaker argue that both industry and academy are best served through such collaborations and that the discussion around this study provides an excellent case study. In particular, the backlash against the study highlights the need for more rigor, awareness and openness about the research methods and more explicit informed consent among clients or customers.

Wider issues raised by the study and the backlash against it

The backlash and the above responses have furthermore provided fertile ground for other observations and arguments based on subtler issues and questions that the study and the response to it have revealed.

One of my favorites is the observation that IRBs do not perform ethical oversight. danah boyd argues that the IRB review process itself is mistakenly viewed by many as mechanism for ensuring research is ethical. She makes an insightful, non-obvious argument: that the main function of an IRB is to ensure a university is not liable for the activities of a given research project, and that focusing on questions of IRB approval for the Facebook study is beside the point. Furthermore, the real source of the backlash for her is that there is public misunderstanding and growing negative sentiment for the practice of collecting and analyzing data about people using the tools of big data.

Another point is that the ethical boundaries and considerations between industry and academia are difficult to reconcile. Ed Felten writes that though the study conforms to Facebook’s terms of service, it clearly is inconsistent with the research community’s ethical standards. On one hand, this gap could lead to fewer collaborations between companies and university researchers, while on the other hand it could enable some university researchers to side-step IRB requirements by working with companies. Note that the opportunity for these sorts of collaborations often arise naturally and reasonably frequently; for example, it often happens when a professor’s student graduates and joins such companies, and they continue working together.

Zeynep Tufekci escalates the discussion to much higher level—she argues that companies like Facebook are effectively engineering the public. According to Tufekci, this study isn’t the problem so much as it is symptomatic of the wider issue of how a corporate entity like Facebook has the power to target, model and manipulate users in very subtle ways. In a similar, though less polemical vein, Tartleton Gillespie notes the disconnect between Facebook’s promise to deliver a better experience to its users with how users perceive the role and ability of such algorithms. He notes that this leads to “a deeper discomfort about an information environment where the content is ours but the selection is theirs.”

In a follow up post responding to criticism of his “In Defense of Facebook” post, Tal Yarkoni points out that the real problem is the lack of regulations/frameworks for what can be done with online data, especially that collected by private entities like Facebook. He suggests the best thing is to reserve judgment with respect to questions of ethics for this particular paper, but that the incident does certainly highlight the need for “a new set of regulations that provide a unitary code for dealing with consumer data across the board–i.e., in both research and non-research contexts.”

Perhaps the most striking thing about the Kramer, Guillory and Hancock paper is how the ensuing discussion has highlighted many deep and important aspects of the ethics of research in computational social science from both industry and university perspectives, and the subtleties that lie therein.

Summing up

A standard blithe rejoinder to users of services like Facebook who express concern, or even horror, about studies like this is to say “Don’t you see that when you use a service you don’t pay for, you are not the customer, you are the product?” This is certainly true in many ways, and it merits repeating again and again. However, it of course doesn’t absolve corporations from the responsibility to treat their users with respect and regard for their well-being.

I don’t think the researchers nor Facebook itself have been grossly negligent with respect to this study, but nonetheless the study is in an ethical gray zone. Our second post will touch on other activities, such as A/B testing in ad placement and content, that are arguably in that same gray zone, but which have not created a public outcry even after years of being practiced. It will also say more about how the linguistic framing of the study itself essentially primed the extreme backlash that was observed and how it is in many ways more innocuous than its own wording would suggest.

Our third post will introduce our own opt-in version of the study, which we think is a reasonable way to explore the questions posed in the study. We’d love to get plenty of folks to try it out, and we’ll even let participants guess whether they were in the positive or negative group. Stay tuned!

Happy Mothers Day to Academic Moms (We need more of you)

Happy Mother’s Day to all my female colleagues around the world, who produce amazing research and do great teaching while being moms!

Being an academic means a lot of hard (and rewarding) work, and being a parent on top of it brings an extensive set of challenges — especially as one effectively competes with others who don’t have kids! Compared to men, women face an additional set of challenges as academic parents, due to a wide variety of factors, including fixed biological ones (e.g. only they can actually bear children) and societal expectations which change ever so slowly (though thankfully generally for the better). It is important to have your perspectives as colleagues, teachers, and researchers, and I don’t think that academia does enough to allow you all to more easily balance the needs of work and family — much to our detriment. And there are still pay gaps between men and women, especially at more senior levels of academia. It all means that many women who may have provided fundamental insights into science sadly never go into academic work based on a very rational choice about the likely costs and benefits such a career brings. Many of my female colleagues feel they must wait until relatively late in their reproductive life to have children, often after tenure or after tenure is pretty much assured. This brings with it additional risks and challenges that women should not feel forced to take.

As it is we still have too few academic women, and even fewer academic moms. I believe the latter are an important group to support, since they are the ones who provide examples and can be role models for young women who are considering academic careers but who know they want children. Carlota Smith, a colleague in the UT Austin Linguistics department who sadly died five years ago, was a trailblazer who was a single mom academic in the 1970s and who I know directly inspired many of the female graduate students in our department. We need more Carlotas.

The less attractive it is to be an academic mom, the fewer women we’ll have in our midst, again to our detriment — this is especially true in fields like computer science. This has big effects on academic women who choose not to have children as it reduces the pool of potential female colleagues they could have. Even in our linguistics department, there are too few female graduate students who study computational linguistics, despite an otherwise reasonably balanced population of male and female graduate students.

So, knowing all the challenges you face on top of the usual ones — thanks again, and keep on being amazing. You all have my respect!