References and Resources
Below there is a brief reading list for each session. For more references please check the links in the slides.
Companion Survey
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. SSRN Pre-print. 2016.
General Issues
Big questions for social media big data: Representativeness, validity and other methodological pitfalls. Zeynep Tufekci. In ICWSM, 2014. |
Social media for large studies of behavior. Derek Ruths and Jurgen Pfeffer. Science, 346(6213):1063–1064, 2014. |
Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. danah boyd and Kate Crawford. Information, Communication and Society, 2012. |
Big data's disparate impact. Solon Barocas and Andrew D Selbst. Social Science Research Network Working Paper Series, 2014. |
Machine Learning that Matters. K. L. Wagstaff. In ICML, 2012. |
Data Biases and Representativeness
Understanding the Demographics of Twitter Users. Alan Mislove, Sune Lehmann, Yong-Yeol Ahn, Jukka-Pekka Onnela, and J. Niels Rosenquist. In ICWSM, 2011. |
Classifying Political Orientation on Twitter: It’s Not Easy! Raviv Cohen and Derek Ruths. In ICWSM, 2013. |
"Blissfully happy" or "ready to fight": Varying Interpretations of Emoji. Hannah Miller, Jacob Thebault-Spieker, Shuo Chang, Isaac Johnson, Loren Terveen, and Brent Hecht. In ICWSM, 2016. |
What’s in a @name? How Name Value Biases Judgment of Microblog Authors. Aditya Pal and Scott Counts. In ICWSM, 2011. |
The Tweets They are a-Changin’: Evolution of Twitter Users and Behavior. Yabing Liu, Chloe Kliman-Silver and Alan Mislove. In ICWSM, 2014. |
Methodological Pitfalls and Evaluation
Forecasting elections with non-representative polls Wei Wang, David Rothschild, Sharad Goel, and Andrew Gelman. In International Journal of Forecasting, 2015. |
Sentiment Analysis on Evolving Social Streams: How Self-Report Imbalances Can Help. Pedro Calais Guerra, Wagner Meira Jr, and Claire Cardie. In WSDM, 2014. |
Robust Text Classification in the Presence of Confounding Bias. Virgile Landeiro and Aron Culotta. In AAAI, 2016. |
CrisisLex: A Lexicon for Collecting and Filtering Microblogged Communications in Crises. Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Sarah Vieweg. In ICWSM, 2014. |
Replicability is not Reproducibility: Nor is it Good Science. Chris Drummond. In ICML Workshops, 2009. |
Ethics of Handling Social Data
The Belmont Report. Ryan et al. The National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, 1979. |
"But the data is already public": on the ethics of research in Facebook. Michael Zimmer. In Ethics and information technology, 2010. |
Social Privacy in Networked Publics: Teens' Attitudes, Practices, and Strategies. danah boyd and Alice E. Marwick. In A Decade in Internet Time: Sym. on the Dynamics of the Internet and Society, 2011. |
Fairness Through Awareness. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Rich Zemel. In 3rd Innovations in Theoretical Computer Science Conference, 2011. |
"I Didn't Sign Up for This!" Informed Consent in Social Network Research. Luke Hutton and Tristan Henderson. In ICWSM, 2015. |