Wednesday, June 18, 2025

Maybe don't think of MCP servers as an alternative to your API

Yesterday, I posted on LinkedIn about how I created an MCP server to return details from an experimental XAML analysis tool.

I've since been experimenting with it a bit more.

Without my tool, initially, Copilot told me that the file I was using for testing was great. (Sadly, I didn't get a screenshot of that.) But after running my tool a few times and then asking Copilot the same question without my MCP server, it started to suggest the possibility of improvements.

There is some duplication in animation and style usage, which could be refactored for maintainability. The file is well-structured for a modern WinUI 3 application.

Overall, the file is functional and adheres to good practices, but refactoring and optimization could improve maintainability and performance. Let me know if you'd like to implement any of these improvements!

With a bit more pushing, it even started to be able to suggest things comparable to what my tool was providing:

Duplications to Reduce in HomePage.xaml: 1. Animation Sets (ShowTransitions and HideTransitions): •	Both animation sets (ShowTransitions and HideTransitions) use similar structures with OffsetAnimation and OpacityAnimation. The only difference is the direction and duration values. •	Improvement: Create a parameterized or reusable animation resource to avoid duplication. 2. Repeated Styles for Controls: •	SelectorBarItem: Both instances use Style="{StaticResource TokenViewSelectorBarItemStyle}". •	GridView: All three GridView controls share the following properties: •	ItemTemplate="{StaticResource ControlItemTemplate}" •	IsItemClickEnabled="True" •	ItemClick="OnItemGridViewItemClick" •	ContainerContentChanging="OnItemGridViewContainerContentChanging" •	IsSwipeEnabled="False" •	Loaded="OnItemGridViewLoaded" •	SelectionMode="None" •	Improvement: Define a default style or template for GridView and SelectorBarItem in the resource dictionary.
The formatting it applied was even better than my simple string output.

But then I started asking about other files and it went back to giving generic feedback.


This whole experience has made me think about MCP servers in a new way.

If I have a tool or API that performs some specific task then I'll call that directly. Why further complicate things by getting an agent/AI/LLM involved?
If I have an API or tool that provides some data or information that it might be appropriate for the agent to use, then that might be appropriate as an MCP server (or other tool that an agent/AI) can use.

If I'm using an agent it might be appropriate to say "run this tool and make changes based on what it returns." In doing this the actions of the agent may not always be the same. Such is the nature of a non-deterministic system like an LLM based AI. Adjusting to the change from highly-deterministic systems to those that include a level of random variation may just be the hardest part of understanding "AI-based" computing.

If I know I want consistent results that are always presented/formatted the same way and don't have any random variations then I'll use a specific tool directly. If I want to make additional information available to the agent, or allow it to trigger external tasks, then an MCP server is highly appropriate.

I suspect this has some parallels with some businesses that are threatened by an Agent with an MCP wrapper to an API making them redundant. If all you're doing is providing data then where's the business? If you're creating/collating/gathering the data then that could be useful. If you're value comes from analysing or formatting data then AI could become a threat when it can do that analysis or formatting itself....



Monday, June 16, 2025

Sometimes we need people to run ahead

With so much seemingly changing at any one time, it can feel hard to keep up.

But, if all we ever do is try to keep up, how can we ever get ahead? Or even just prepare for what's coming?

sign posts on the road ahead

I've recently been thinking about the benefits of thinking about the future in ways that some people consider extreme or unnecessary. But, I've found that thinking deeply about what is or could be a long way off actually helps with thinking about the short term too.

If someone runs miles ahead, they can have an excellent idea of whether the next few meters are in the right direction.

Knowing what is, or could be, a way down the road also helps you know if you'd benefit from extra planning or preparation before you get there.
Are there lots of hills ahead? Better build up the muscles to make climbing them easier.
Is the road ahead dangerous? Better pick up some safety equipment before you get there.
Does the surface change? Do we need some different tyres, or even a different mode of transport for the next part of the journey?


Draw the analogies as you see fit. ;)

Wednesday, June 04, 2025

Have LLMs made code-coverage a meaningless statistic?

TLDR: If AI can easily generate code to increase test code coverage, has it become a meaningless metric?

Example code coverage report output

I used to like code coverage (the percentage of the code executed while testing) as a metric.

I was interested in whether it was very high or very low.

Either of these was a flag for further investigation.

Very low would indicate a lack of testing.

Very high would be suspicious or encouraging (if the code was written following TDD).

Neither was a deal breaker, as neither was an indication of the quality or value of the tests.


Now tests are easy. Anyone can ask an AI tool to create tests for a codebase.


This means very low code coverage indicates a lack of use of AI as a coding tool, which probably also suggests a lack of other productivity tools and time-saving techniques.

Now, very high code coverage can mean nothing. There may very well be lots of tests or tests that cover a lot of the code, but these are very likely to only be unit tests and are also very likely to be low-value tests.


There are two approaches to tests. Asking:

  1. Are there inputs or options that cause the code to break in unexpected or unintended ways?
  2. Does the code do what it's supposed to? (What the person/user/business wants?)


Type 1 tests are easy, and the type AI can produce as they can be written based on looking at the code. These are tests like: "What if this function is passed an empty string?"

Type 2 tests verify that the code behaves as intended. These are the kind that can't be written without knowledge that exists outside the codebase. These are tests like: "Are all the business rules met?"


Type 1 tests are about the reliability of the code. Type 2 tests are about whether you have the right code.

Type 1 tests are useful and necessary. Type 2 tests require understanding the business, the app, and the people who will be using it

Type 1 tests are generic. Type 2 tests will vary for each piece of software.

Type 1 tests are boring. Type 2 tests are where a lot of the challenge of software development lives. That's the fun bit.


Them: "We've got loads of tests."

Me: "But are they useful?"

Them: "Umm..."


I've recently started experimenting by keeping AI-generated tests separate from the ones I write myself. I'm hoping this will help me identify where value is created by AI and where it's from me.




Tuesday, June 03, 2025

The problem with multi-word terms (including "vibe coding")

TLDR: I think it's worth being clear about the meaning of the words we use. Maybe compound terms 

Not wanting to sound too pessimistic, but I think it's fair to say that we are Lazier than we realise and not as smart as we think.

We hear a term that's comprised of multiple words we recognise, and assume a meaning of the overall term based on our individual understanding of the individual words.
confused speech emojis
Let me give you three 3 examples.

1. "Vibe coding"

Originally, it was defined to describe people "going with the vibe" and letting the AI/LLM do all the work. You just tell the AI what you want and keep going until it has produced all the code and deployed the resultant software without having a care or knowledge about how it works.
But some developers heard the term, presumably thought "I know what coding is and I know what good vibes are so if I put them together that must mean 'using AI to produce code that gives me good vibes.'" 
The result: there are lots of different understandings of the meaning, and so whenever it's used, it's necessary to clarify what's meant. Yes, there can be lots of different meanings and I'm not going to argue that one is more valid than the others.

2. "Agile development" 

The original manifesto had some flexibility and left some things open to interpretation or implementation appropriate to specific circumstances. However, I suspect, there were a lot of people who thought "I know what development is and I know what it means to be agile so I'll just combine the two."
The result: everyone has their own understanding of what it means to "do agile development". Some of those variations are small and some are massive. I've yet to meet two different teams "doing agile development" who do things exactly the same. Does that matter? Probably not. It's just important to clarify what people mean when they use the term.


3. "Minimal viable product" (MVP)

Yes, you may know what all the words mean individually. You may even have an idea about the term as a whole, but the internet is bursting with explanations of what it actually means. My experience also tells me that if you have a development background, your understanding is highly likely to be very different from someone in product or marketing.
Does it matter? It depends on whether all the people using the term are in agreement. It might be fine if you're using it as an alternative term for "beta", or you mean it must have a particular set of features, or it requires a certain level of visual polish. I think that you can prove it's viable based on customer actions is more important. But, again if all the people on your project can agree on the meaning, I trust you'll work it out. (Confession: I left one job because the three people in charge--an issue for another time-- all had a different understanding of what MVP meant, but refused to give their definition or acknowledge their definition was different from the others. It made the work impossible.)



Some people (Or, maybe all people, some times--I have done this myself) will hear a word, assume a meaning and not ask any questions.

I've observed a similar thing with headlines. People make assumptions based on headlines or TLDRs, and so don't get to appreciate the nuance. Or maybe don't even appreciate that there might be more than a simple explanation.

Nuance matters. It's the detail where the devil hides. It's the 80% of edge cases accompanying the 20% of the obvious in a 'simple' scenario.

Words matter. I probably spend far too much of my time thinking about words because they're a foundation of communication.

Yes, for many people, words don't matter.

But, going back to thinking about "vibe coding", words are how we communicate with machines. While the trend has always been for "higher-level" languages, we didn't go all the way to our spoken languages previously because of
A) technical limitations 
B) the lack of precision in our spoken/written languages 

AI/LLMs overcome some of the technical limitations and can make some reasonable guesses to work around the lack of precision.

Relying solely on natural language to express all the subtle details and the specifics required with software using only a few sentences, or even paragraphs, doesn't seem appropriate.

Some people think 'Nuance doesn't matter' until the software doesn't do exactly what they expect in an edge case scenario.

Producing software that isn't as good as I want/expect may just be part of the enshittification of life.

I think many people believe (or think they can get away with) acting like "Close enough" is the new "Good enough".

Magpies are very vocal. And, maybe they're right. Perhaps we should just focus on the new and shiny.

Or if using AI/LLMs saves money and cuts costs, that's all that matters. Well, matters to some. I definitely don't think it's all that's important.



Then I wonder about choosing names for things. 
If there are such potential problems when combining existing words. Maybe using made-up words or words with no direct correlation to the thing the name is used for....




Now, I'll just wait for the comments that tell me I don't understand the above terms correctly...