Jun 14, 2026·10 min·Data, Measurement, Public Sector, Artificial Intelligence
The State that cannot measure itself
Portugal is among the countries that deliver the most digital services and yet rarely knows how to measure whether they work. AI on top of those services can make the problem worse. On the other hand, if we use it to measure first, it can start to solve it
When a project in the public sector ends, there is a question that almost never has an answer: did it improve anything? We know how many people entered the service, how many forms were submitted, how many authentications were made. We do not know whether the person who needed it left with their problem solved - we measure usage but outcomes almost never.
The private sector, which tends to move first, shows the trap clearly. A company puts AI on its complaints intake and, a few months later, proudly announces that complaints fell 30%, the question being whether they fell because people had their problem solved or because they gave up along the way. It happened to me with my own bank. I had a concrete problem, the chatbot did not understand it and there was no option to reach a person. I hit a dead end and ended up sending an email to resolve something I needed at that moment - I was left frustrated and without my problem solved. On the bank's side, it is quite likely this counted in the statistics as an interaction resolved by the automation, or as a complaint that never existed.
Portugal is, in several rankings, among the countries that best put public services online, and in the OECD's 2025 digital government index it sits third in the world. In the same exercise, in the part that assesses the openness and reuse of data, it sits much further down. We are good at building the service and getting it running but bad at knowing, afterwards, whether it is doing what it was created for.
Why measurement always gets left for later
The most common explanation is that systems are missing, data sits in silos, each body has its own format. It is true, but it mostly serves to move the subject out of the way, because the problem starts earlier. When a process is designed in public administration, it is rare for anyone to decide, from the outset, what will need to be known about it once it is running. You are left with the records the system needs in order to operate, and without the ones that would show where the process jams, how much time each step consumes, at what point people give up.
There is also a failure people avoid naming: many public processes have no clear decision points. They are inherited sequences, full of "it depends", where the same situation can take different paths depending on who handles it. A process like that can barely be measured, because no two cases are truly comparable, and the usual reaction is to patch the hole with one more field, one more validation, one more exception. Measurement should not be an audit done at the end, nor an annual report commissioned when someone asks for numbers. It should be a decision taken when the process is designed - what counts as a case, when it is resolved, what counts as having gone wrong, at which points something is decided. Designed that way, the process carries its own measurement and costs little to maintain. Designed as it usually is, measurement becomes a future project that competes with everything else and always loses, because there is never time, budget or someone free for it. "If the service is already built and running, why keep working on it?"
AI has stopped being a cheap assistant
When these tools appeared they looked like an almost free assistant; today an agent capable of doing serious work costs like a senior employee, with the advantage of having no ego and never needing to rest, but with a cost that fluctuates and has to be managed closely as usage grows. Managing AI consumption and cost has become a skill in its own right. And public administration, which already cannot hire or retain a database administrator, will hardly compete for AI profiles.
The National AI Agenda, approved in January 2026, dedicates an entire axis to infrastructure and data and foresees hundreds of millions of euros through 2030. The will exists and the money is starting to appear, but what is almost always missing is the foundation: without data on how the process runs today, there is no way to show that the version with AI runs better, and saying it "became more efficient" is only an act of faith.
This is also why measurement comes first, it is what tells you whether a case actually needs AI. Much of what is broken in public administration does not need one of these models, it needs a simplified process automated with technology that has existed for years and whose behaviour is understood. Putting AI, a box few people understand from the inside and with everything that implies for security and data protection, to clear bureaucratic work that was never rethought is, almost always, the most expensive and riskiest option.
Use AI to measure before putting it to automate
Where AI is hard to replace is upstream, reading a process through its histories, its exceptions, the cases nobody normalised, doing what no team has time to do by hand, understanding how the process actually works. It is that reading that lets you simplify with real knowledge and decide, only afterwards, what is worth automating and with what technology, which often will not be AI.
First, point AI at the past that already exists but is illegible. The poorly normalised histories, the officers' notes, the emails where the difficult cases were decided, the loose files where what the official system did not capture ended up recorded. It is the kind of disorganised material AI reads well, and from there you build the baseline that was never collected: how many cases, of what kinds, with what times, with what outcomes. It will not be perfect, but it will be the first one to exist.
Second, design the automation to produce measurement as a by-product, not as a future project. The automation itself that resolves a request, whether with AI or with simpler technology, records what it resolved, what it could not, what it handed back to a human and what happened next. Instead of automating now and measuring in a project that never comes, you automate in a way where measurement is part of the flow itself.
Third, keep it small. I am not arguing for a general obligation to measure everything, which quickly turns into a layer of indicators manufactured to comply, and we have seen that film with standards that become checklists. I am arguing for picking one service, running the full cycle on it, from baseline to automation to a real before-and-after comparison, and only then deciding what is worth extending. One concrete case with a before and an after teaches more, and convinces more, than a dashboard with fifty metrics nobody opens. Even though it sounds like common sense, not long ago, when I designed an interoperability flow between two public bodies, one of them said: "but why only this and not x, y and z too? It seems such a small measure, with so little impact for users." That is not untrue, but it is a constant in the Portuguese public administration, we always want to do everything and it always has to be big. Then either we create gigantic projects that take ages to implement and to show the first results, or we do nothing because "if that is all it does, it is not even worth it." We undervalue what can be done in two weeks and overvalue what can be done in one month.
We have to be careful, because the moment a metric starts to justify decisions, appraisals and/or payments, someone learns to dodge it or to dress it up. So it is better to measure few things, give each service an owner who answers for them, and count the hard cases instead of hiding them for spoiling the statistic.
These solutions come with no guaranteed results. AI can rebuild a baseline from bad records and inherit those records' errors. It can end up measuring what is easy instead of what matters. We can set it all up and, in the end, nobody looks at the numbers. But the alternative is the one we have today, to keep automating services we know almost nothing about, trusting that they are improving because they look more modern.
If I could choose one first use of artificial intelligence in the State to put ahead of the others, it would be this, using it so that we can, at last, see what we are doing. It is not flashy, it does not let our politicians appear on newspaper front pages saying we implemented AI and therefore automated X services, but it does let us push back against the current habit of accelerating first and only then seeing whether it was worth it - truly focusing on the end user.
Without that, after so many projects, we still cannot answer the simplest question of all: so, did it improve?
Related reflections
View allApr 12, 2026 · 10 min
Will AI replace civil servants?
Starting from Anthropic's study on AI's impact on the labour market, this reflection explores what happens when we apply that logic to Portuguese public administration: the paradox between a shortage of people and a surplus of automatable roles, the problem of attraction and exit, and why the current model does not work.
Mar 14, 2026 · 9 min
Migrating legacy systems in the public sector: what nobody warns you about before you start
Before replacing a system that has been running for 15 or 20 years, it is worth understanding what it solved. This reflection explores the risk of undervaluing legacy, the importance of listening to those who maintained it, and the tension between respecting what exists and knowing when to cut.
Feb 07, 2026 · 7 min
Reforming public procurement: accelerating with AI is not enough, we need to change what we buy
Introducing AI to speed up public procurement may make the problem more visible, but it does not solve it. The biggest friction lies in what gets contracted, how success is defined, and how delivered value is measured.