As people who work with code on a daily basis, it’s perhaps inevitable that data scientists sometimes default to binary thinking. Ones and zeros. Signal and noise. Statistical significance—or its absence.
As Jessica Dai writes in a recent post on algorithms and fairness, we don’t have to stick to this either/or framework in every conversation, especially when what’s on the line is building models that don’t perpetuate bias. Looking at the full development lifecycle, Jessica points to potential points of intervention where data scientists can act as guardrails against bias, but without sacrificing accuracy. Above all, she argues, “ML practitioners must work together with stakeholders such as business leaders, humanities experts, compliance, and legal teams and formulate a program for how to best treat your population.”
Striking the right balance between real-world needs and ethical practice is at the center of many other lively conversations in data science. In the past week, TDS editor
rounded up several eye-opening posts that explain how federated learning can mitigate privacy and safety concerns when collecting massive amounts of data; on the TDS Podcast,
chatted with Andy Jones about the implications of scale on AI—how the aforementioned massive datasets open up opportunities we couldn’t have imagined just a few years ago, but also raise new risks.
While challenges like these often sound theoretical, they already affect and shape the work that machine learning engineers and researchers produce.
looks at a practical application of this conundrum when she explains the visual representation of bias and variance in bulls-eye diagrams. Taking a few steps back,
and Dirk Hovy’s article identifies the most pressing issues the authors and their colleagues face in the field of natural learning processing (NLP): “the speed with which models are published and then used in applications can exceed the discovery of their risks and limitations. And as their size grows, it becomes harder to reproduce these models to discover those aspects.”
Federico and Dirk’s post stops short of offering concrete solutions—no single paper could—but it underscores the importance of learning, asking the right (and often most difficult) questions, and refusing to accept an untenable status quo. If what inspires you to take action is expanding your knowledge and growing your skill set, we have some great options for you to choose from this week, too.
is back with his always-anticipated monthly collection of deep learning papers you don’t want to miss—the lineup for June covers some exciting ground, from self-supervised learning to class selectivity in deep neural networks.
- Can machine learning help the global effort against climate change? If you’re skeptical, read
’s post. She walks us through her project on wind energy in Ireland, and shows how she and her team worked their way to choosing the model that could produce the greatest efficiency in the the country’s electric grid.
and his coauthors set out to address one of the most crucial problems in any business: customer churn. While data scientists have been producing churn-prediction pipelines for quite some time, this post zooms in on predicting when customers might decide to ditch a product.
- Still in the world of business decision-making,
examines every marketer’s favorite analytical tool—the A/B test!—and shows how infusing it with a pinch of “Bayesian magic” will lead to more accurate results.
Thank you for joining us this week, supporting the work we publish, and trusting us to deliver more signal than noise. Here’s to many more lively conversations to come.
Until the next Variable,