Welcome back to The AI Shift, our weekly newsletter on AI and the labour market.
This week we’re diving into the implications of generative and agentic artificial intelligence for market research and polling. What happens to an industry that increasingly relies on the ability to easily and cheaply reach and interview people online when those people may not actually be people? And could there in fact be cases where surveying an AI instead of a person tells us something useful about human beliefs and behaviours?
John writes
Survey researchers have a problem. Thanks to advances in autonomous large language models, it is now straightforward for AIs to bypass the defences the industry has built up over recent years to filter out bots, risking a poisoning of the well that businesses, political campaigns and the broader public use to track public perceptions and preferences.
In an increasingly familiar pattern, this is a story of problems created by the internet being made much worse by AI. Pre-internet, poll respondents were typically recruited by sending physical mail to a random spread of addresses, with surveys then carried out by telephone. The internet changed all of this, allowing firms to draw their samples from the rapidly growing online population at much lower cost, and then have people fill out web surveys that required much less labour. But the increased convenience and reduced cost comes with trade-offs. When you painstakingly recruit and maintain a sample in the real world, you know who you’re talking to. By contrast, as American cartoonist Peter Steiner famously wrote in his 1993 New Yorker cartoon, “On the Internet, nobody knows you’re a dog”.
The problem of online survey respondents not always being who they claim to be, or giving false answers to questions, is not new. A 2022 experiment by US polling veterans Pew Research demonstrated that cheap online samples of people who opt in to be surveyed are particularly susceptible to “bogus respondents” — people who race through surveys with minimal effort or sincerity in order to earn rewards. Even after applying weights to ensure the reported demographics of respondents match up to the target population, 12 per cent of young adults in the sample declared they had a licence to operate a nuclear submarine.
In response to this issue, research firms have erected a series of defences, including checks on whether someone is filling in answers too quickly, properly reading instructions, selecting the same option every time, or falling for obvious trick questions. In the era of bots and now LLMs the arms race has continued to escalate, with surveys now screened by Captcha puzzles and ‘reverse shibboleths’ — trap questions that would be easy for an LLM to answer but difficult if not impossible for humans.
Until recently we can presume the two sides have been relatively evenly matched: polling errors in elections have not been systematically larger in the online era compared to the good old days. But a new study published last week suggests that may be about to change.
Sean J Westwood, a political scientist at Dartmouth College in the US, demonstrated that by using autonomous AI agents it is now straightforward to have bots complete online surveys without giving any sign of their true identity. Westwood’s bots generated and filled in realistic responses for people with particular beliefs, evading 99.8 per cent of 6,000 checks aimed at weeding out non-human or otherwise bogus respondents, including breezing past reverse shibboleths and solving Captcha puzzles. The clue should really have been in the name: Captcha stands for Completely Automated Public Turing test to tell Computers and Humans Apart; LLMs have been able to pass the Turing Test for well over a year. The findings demonstrate that bogus responders can now operate at scale, and raise the possibility of bad actors setting out to systematically nudge apparent public opinion in a particular direction to create a false sense of consensus.
Some online pollsters are turning to identity verification as the arms race continues, but this could raise privacy concerns that compromise responses to sensitive questions. Ultimately the days of cheap online surveys may be numbered. As we saw with LLMs and education, the only truly watertight defence here is to go back to the older (read: slower and more expensive) ways of doing things: use physical mail and careful vetting to build a trusted panel of known respondents.
All of that being said, it would be wrong to suggest AI is solely bad news for survey-based research. Sarah, I believe you’ve been looking into a use case that some in the industry are quite excited about?
Sarah writes
Yes John, are you ready to go through the looking glass? Welcome to the world of “synthetic samples”: using real data from real people to generate AI-powered proxies, which then simulate what people’s responses might be to novel questions. Elisabeth Costa of the UK’s Behavioural Insights Team, which has been doing some careful experiments with this new method to probe its strengths and weaknesses, described it to me this way: “You’re prompting an LLM to role-play an individual with defined demographics and characteristics [based on real data], then you’re asking it to answer survey questions, or state how it would behave in a particular scenario.”
You can see the appeal here to busy researchers, governments and marketers, given the time and cost of surveying actual humans. But can “synthetic samples” accurately mirror and predict what people think and do? It’s early days, and the results are mixed so far.
Costa told me about an experiment her team conducted alongside the UAE Behavioural Science Group to see if simulated participants could predict behavioural outcomes. They created a synthetic sample from real representative data, then compared it to a sample of actual people. One experiment asked people how they would respond to different interventions intended to persuade them to use less air conditioning. The synthetic sample did a good job of predicting which would be most and least effective interventions. But it did a terrible job of predicting how impactful they would be. In response to the most effective intervention, about a third of the human respondents said they would increase their air temperature, whereas the synthetic sample predicted that number would be closer to 80 per cent.
Costa told me synthetic samples seemed best suited for getting a rapid read-out or preliminary steer about a business or policy idea. But she warned that “in my mind one of the joys of human interaction and qualitative research is that people can surprise you — give you new insights on mental models or hidden barriers, and I think you’re unlikely to be surprised by a synthetic participant.”
Another danger is that the decisions you make about how to set up the synthetic sample can really shape the results. Recent work by Jamie Cummins at the University of Bern suggests that, as he has summarised in a post on Bluesky, “depending on the analytic decisions made, you can basically get these samples to show any effect you want.”
There are more straightforward use cases for AI in qualitative research. LLMs are very well-suited, for example, to analysing vast reams of interview transcripts or consultation responses to identify key themes and areas for further investigation. In the pre-AI days, researchers would have to read and carefully categorise this qualitative data manually. Mallory Durran, director of applied research and methods at Nesta, a UK innovation agency, told me that an analysis that would have taken “weeks, or even months in some cases, can now be done in a matter of hours.” That seems like a clear boon to researchers, not to mention civil servants faced with thousands of public consultation responses. (I do wonder, though, whether AI will still feel like a boon to civil servants if the public begins to use LLMs to flood the zone with even more consultation responses. But maybe our deep-dive into the wasteful AI arms race in recruitment has made me too cynical.)
So what have we learned?
John
I think we’re now at a point where online opt-in panels — and for that matter any poll that doesn’t disclose how its sample was recruited and whether respondents’ identities were verified — should be treated with suspicion. For me, synthetic sampling warrants strong scepticism at this stage, but the less glamorous uses like classifying open-ended responses or having LLM moderators carry out wide-ranging interviews instead of limiting people to multiple choice questions represent real steps forward.
Sarah
If you’re right, John, that the only way to stop bots from corrupting online polls is to revert to more expensive methods, that might increase the incentives to use synthetic samples instead. But I can imagine this technique leading businesses and policymakers down some blind alleys too, if not used with extreme care.
Recommended reading
-
The latest MIT/FT collaboration on privacy and chatbots was very interesting (Sarah)
-
Writer and software developer James Somers has a fascinating piece in the New Yorker on the increasing acceptance by leading neuroscientists that LLMs do think in a similar way to humans (John)
Leave a comment