Tutorial Series: Limits of Social Data

References and Resources

Below there is a brief reading list for each session. For more references please check the links in the slides.

Companion Survey

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. SSRN Pre-print. 2016.

General Issues

Big questions for social media big data: Representativeness, validity and other methodological pitfalls. Zeynep Tufekci. In ICWSM, 2014.
Social media for large studies of behavior. Derek Ruths and Jurgen Pfeffer. Science, 346(6213):1063–1064, 2014.
Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. danah boyd and Kate Crawford. Information, Communication and Society, 2012.
Big data's disparate impact. Solon Barocas and Andrew D Selbst. Social Science Research Network Working Paper Series, 2014.
Machine Learning that Matters. K. L. Wagstaff. In ICML, 2012.

Data Biases and Representativeness

Understanding the Demographics of Twitter Users. Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J. Niels Rosenquist. In ICWSM, 2011.
Classifying Political Orientation on Twitter: It’s Not Easy! Raviv Cohen and Derek Ruths. In ICWSM, 2013.
"Blissfully happy" or "ready to fight": Varying Interpretations of Emoji. Hannah Miller, Jacob Thebault-Spieker, Shuo Chang, Isaac Johnson, Loren Terveen, and Brent Hecht. In ICWSM, 2016.
What’s in a @name? How Name Value Biases Judgment of Microblog Authors. Aditya Pal and Scott Counts. In ICWSM, 2011.
The Tweets They are a-Changin’: Evolution of Twitter Users and Behavior. Yabing Liu, Chloe Kliman-Silver and Alan Mislove. In ICWSM, 2014.

Methodological Pitfalls and Evaluation

Forecasting elections with non-representative polls Wei Wang, David Rothschild, Sharad Goel, and Andrew Gelman. In International Journal of Forecasting, 2015.
Sentiment Analysis on Evolving Social Streams: How Self-Report Imbalances Can Help. Pedro Calais Guerra, Wagner Meira Jr, and Claire Cardie. In WSDM, 2014.
Robust Text Classification in the Presence of Confounding Bias. Virgile Landeiro and Aron Culotta. In AAAI, 2016.
CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. In ICWSM, 2014.
Replicability is not Reproducibility: Nor is it Good Science. Chris Drummond. In ICML Workshops, 2009.

Ethics of Handling Social Data

The Belmont Report. Ryan et al. The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979.
"But the data is already public": on the ethics of research in Facebook. Michael Zimmer. In Ethics and information technology, 2010.
Social Privacy in Networked Publics: Teens' Attitudes, Practices, and Strategies. danah boyd and Alice E. Marwick. In A Decade in Internet Time: Sym. on the Dynamics of the Internet and Society, 2011.
Fairness Through Awareness. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Rich Zemel. In 3rd Innovations in Theoretical Computer Science Conference, 2011.
"I Didn't Sign Up for This!" Informed Consent in Social Network Research. Luke Hutton and Tristan Henderson. In ICWSM, 2015.