Tutorial Series: Limits of Social Data

Tutorial Overview and Goals

Online social data such as user-generated content, expressed or implicit relationships between people, and behavioral traces are at the core of many popular web applications and platforms, driving the agenda of many practitioners and researchers in both academia and industry. The promises of social data are many, including the understanding of “what the world thinks” about a social issue, brand, product, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, economics and many social good applications. However, many academics and practitioners are also increasingly warning against the naive usage of social data. They highlight that there are biases and inaccuracies occurring at the source of the data, but also introduced during data processing pipeline; there are methodological limitations and pitfalls, as well as ethical boundaries and unexpected consequences that are often overlooked. Such an overlook can lead to wrong or inappropriate results that can be consequential for many applications, and that have sometimes also resulted in dramatic media headlines.

Organization Issues along data analysis pipeline

Recognizing that the rigor with which these issues are addressed varies across a wide range, in these seminars I will introduce a framework for identifying a broad variety of menaces in the research and practices around social data use. The main goals are:

1. To present a variety of challenges and pitfalls that can occur at different stages in the social data processing pipeline.

2. To recognize, understand, or quantify some major classes of limitations around the use of online social data and some of their consequences.

3. Give us food for thought, looking critically at our work as a community.

Requirements and Target Audience

The content of the seminars is designed to be accessible to a broad audience. Thus, while the tutorial may be more directly relevant for those working on (social) data analysis and analytics, experience with social data, or specific technical skills are not required. The tutorial is intended to teach researchers and practitioners a taxonomy of and methods for identifying a broad range of biases, methodological issues and pitfalls common in social data-based research. It is intended for researchers and practitioners who want to examine their own work, or that of others, through the lens of these issues.

Companion Survey

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kıcıman. SSRN Pre-print. 2016.


Email us !

How to stay updated?

Check out SDM'18 Tutorial Schedule close to the conference date in early May 2018.

Check out WSDM'18 Tutorial Schedule close to the conference date in early February 2018.

Follow us on Twitter!