Tutorial Series: Limits of Social Data

Tutorial Structure

Context and Applications

To set the context for the issues we review, we first discuss why (purposes and applications), and how (prototypical data processing and analysis pipeline) social data is used, and provide examples of typical limitations, tradeoffs or mistakes.

Data Biases and Representativeness

This session will cover issues related to the properties of the working datasets such as representativeness, validity, population and sampling biases, completeness, or temporal variations.

» Data Biases (e.g. functional, normative, population, behavioral, and collection biases)

» Other Data Quality Issues (e.g. data decay, non-humans, redundancy, default values, data sparsity)

» Debunking the Bigger, the Better Assumption

Methodological Pitfalls and Evaluation

This session will cover issues related to the design, evaluation, or generalizability of analyses and methods for collecting or leveraging social datasets.

» Data Processing Pipeline (e.g. data collection and management, black boxes and opportunistic approaches)

» Evaluation Pipeline (e.g. evaluation metrics, tools auditing)

» Accounting for Biased and Noisy Data (e.g. exploiting or correcting biases, causal inference)

» Standards and Disclaimers (e.g. reproducibility, data and tools sharing, negative results)

Ethics of Handling Social Data

Finally, we cover various ethical caveats when working with social data such as algorithmic reinforcement of discriminatory treatment and existing prejudice, and the risk of privacy breaches.

» Introduction and Framework (e.g. current frameworks and practices)

» Privacy and Data Protection (e.g. public vs. personal data, privacy)

» Algorithmic Discrimination (e.g. algorithmic fairness, legal issues)

» Ethical Experimentation (e.g. user consent, current practices)