Skip to content

We respect your privacy

We use cookies to keep the site running and collect optional metrics. You can review your choices at any time.

View Cookie Policy

Jun 14, 2026·10 min·Data, Measurement, Public Sector, Artificial Intelligence

Data
Measurement
Public Sector
Artificial Intelligence

The State that cannot measure itself

Portugal is among the countries that deliver the most digital services and yet rarely knows how to measure whether they work. AI on top of those services can make the problem worse. On the other hand, if we use it to measure first, it can start to solve it

Index

When a project in the public sector ends, there is a question that almost never has an answer: did it improve anything? We know how many people entered the service, how many forms were submitted, how many authentications were made. We do not know whether the person who needed it left with their problem solved - we measure usage but outcomes almost never.

The private sector, which tends to move first, shows the trap clearly. A company puts AI on its complaints intake and, a few months later, proudly announces that complaints fell 30%, the question being whether they fell because people had their problem solved or because they gave up along the way. It happened to me with my own bank. I had a concrete problem, the chatbot did not understand it and there was no option to reach a person. I hit a dead end and ended up sending an email to resolve something I needed at that moment - I was left frustrated and without my problem solved. On the bank's side, it is quite likely this counted in the statistics as an interaction resolved by the automation, or as a complaint that never existed.

Portugal is, in several rankings, among the countries that best put public services online, and in the OECD's 2025 digital government index it sits third in the world. In the same exercise, in the part that assesses the openness and reuse of data, it sits much further down. We are good at building the service and getting it running but bad at knowing, afterwards, whether it is doing what it was created for.

Why measurement always gets left for later

The most common explanation is that systems are missing, data sits in silos, each body has its own format. It is true, but it mostly serves to move the subject out of the way, because the problem starts earlier. When a process is designed in public administration, it is rare for anyone to decide, from the outset, what will need to be known about it once it is running. You are left with the records the system needs in order to operate, and without the ones that would show where the process jams, how much time each step consumes, at what point people give up.

There is also a failure people avoid naming: many public processes have no clear decision points. They are inherited sequences, full of "it depends", where the same situation can take different paths depending on who handles it. A process like that can barely be measured, because no two cases are truly comparable, and the usual reaction is to patch the hole with one more field, one more validation, one more exception. Measurement should not be an audit done at the end, nor an annual report commissioned when someone asks for numbers. It should be a decision taken when the process is designed - what counts as a case, when it is resolved, what counts as having gone wrong, at which points something is decided. Designed that way, the process carries its own measurement and costs little to maintain. Designed as it usually is, measurement becomes a future project that competes with everything else and always loses, because there is never time, budget or someone free for it. "If the service is already built and running, why keep working on it?"

AI has stopped being a cheap assistant

When these tools appeared they looked like an almost free assistant; today an agent capable of doing serious work costs like a senior employee, with the advantage of having no ego and never needing to rest, but with a cost that fluctuates and has to be managed closely as usage grows. Managing AI consumption and cost has become a skill in its own right. And public administration, which already cannot hire or retain a database administrator, will hardly compete for AI profiles.

The National AI Agenda, approved in January 2026, dedicates an entire axis to infrastructure and data and foresees hundreds of millions of euros through 2030. The will exists and the money is starting to appear, but what is almost always missing is the foundation: without data on how the process runs today, there is no way to show that the version with AI runs better, and saying it "became more efficient" is only an act of faith.

This is also why measurement comes first, it is what tells you whether a case actually needs AI. Much of what is broken in public administration does not need one of these models, it needs a simplified process automated with technology that has existed for years and whose behaviour is understood. Putting AI, a box few people understand from the inside and with everything that implies for security and data protection, to clear bureaucratic work that was never rethought is, almost always, the most expensive and riskiest option.

Use AI to measure before putting it to automate

Where AI is hard to replace is upstream, reading a process through its histories, its exceptions, the cases nobody normalised, doing what no team has time to do by hand, understanding how the process actually works. It is that reading that lets you simplify with real knowledge and decide, only afterwards, what is worth automating and with what technology, which often will not be AI.

First, point AI at the past that already exists but is illegible. The poorly normalised histories, the officers' notes, the emails where the difficult cases were decided, the loose files where what the official system did not capture ended up recorded. It is the kind of disorganised material AI reads well, and from there you build the baseline that was never collected: how many cases, of what kinds, with what times, with what outcomes. It will not be perfect, but it will be the first one to exist.

Second, design the automation to produce measurement as a by-product, not as a future project. The automation itself that resolves a request, whether with AI or with simpler technology, records what it resolved, what it could not, what it handed back to a human and what happened next. Instead of automating now and measuring in a project that never comes, you automate in a way where measurement is part of the flow itself.

Third, keep it small. I am not arguing for a general obligation to measure everything, which quickly turns into a layer of indicators manufactured to comply, and we have seen that film with standards that become checklists. I am arguing for picking one service, running the full cycle on it, from baseline to automation to a real before-and-after comparison, and only then deciding what is worth extending. One concrete case with a before and an after teaches more, and convinces more, than a dashboard with fifty metrics nobody opens. Even though it sounds like common sense, not long ago, when I designed an interoperability flow between two public bodies, one of them said: "but why only this and not x, y and z too? It seems such a small measure, with so little impact for users." That is not untrue, but it is a constant in the Portuguese public administration, we always want to do everything and it always has to be big. Then either we create gigantic projects that take ages to implement and to show the first results, or we do nothing because "if that is all it does, it is not even worth it." We undervalue what can be done in two weeks and overvalue what can be done in one month.

We have to be careful, because the moment a metric starts to justify decisions, appraisals and/or payments, someone learns to dodge it or to dress it up. So it is better to measure few things, give each service an owner who answers for them, and count the hard cases instead of hiding them for spoiling the statistic.

These solutions come with no guaranteed results. AI can rebuild a baseline from bad records and inherit those records' errors. It can end up measuring what is easy instead of what matters. We can set it all up and, in the end, nobody looks at the numbers. But the alternative is the one we have today, to keep automating services we know almost nothing about, trusting that they are improving because they look more modern.

If I could choose one first use of artificial intelligence in the State to put ahead of the others, it would be this, using it so that we can, at last, see what we are doing. It is not flashy, it does not let our politicians appear on newspaper front pages saying we implemented AI and therefore automated X services, but it does let us push back against the current habit of accelerating first and only then seeing whether it was worth it - truly focusing on the end user.

Without that, after so many projects, we still cannot answer the simplest question of all: so, did it improve?

DataMeasurementPublic SectorArtificial Intelligence