Tutorial Structure
Context and Applications
To set the context for the issues we review, we first discuss why (purposes and applications), and how (prototypical data processing and analysis pipeline) social data is used, and provide examples of typical limitations, tradeoffs or mistakes.
Data Biases and Representativeness
This session will cover issues related to the properties of the working datasets such as representativeness, validity, population and sampling biases, completeness, or temporal variations.
» Data Biases (e.g. functional, normative, population, behavioral, and collection biases)
» Other Data Quality Issues (e.g. data decay, non-humans, redundancy, default values, data sparsity)
» Debunking the Bigger, the Better Assumption
Methodological Pitfalls and Evaluation
This session will cover issues related to the design, evaluation, or generalizability of analyses and methods for collecting or leveraging social datasets.
» Data Processing Pipeline (e.g. data collection and management, black boxes and opportunistic approaches)
» Evaluation Pipeline (e.g. evaluation metrics, tools auditing)
» Accounting for Biased and Noisy Data (e.g. exploiting or correcting biases, causal inference)
» Standards and Disclaimers (e.g. reproducibility, data and tools sharing, negative results)
Ethics of Handling Social Data
Finally, we cover various ethical caveats when working with social data such as algorithmic reinforcement of discriminatory treatment and existing prejudice, and the risk of privacy breaches.