Designing with AI: From Illusion to Intention

Introduction

The Illusion of Thinking

Apple recently released a study titled "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity".

This study reveals that “reasoning-enabled” Large Reasoning Models (LRMs) offer only an illusion of deeper thinking. Once tasks become more complex, these models start making more mistakes.

Apple’s research seems to push the industry conversation away from “How big can models get?” to “How deeply can they really reason?", a necessary shift as we learn to work with these systems and use it more thoughtfully.‍

Research paper

The Placebo Effect in AI

Another study that caught my attention why researching on the effects of AI use was from CHI 2024: Conference on Human Factors in Computing Systems titled “AI enhances our performance, I have no doubt this one will do the same": The Placebo Effect is Robust to Negative Descriptions in AI." It explored what researchers called the Placebo Effect in AI.
‍
The observation was that participants consistently believed that AI made them faster and more accurate even when told the AI might harm their performance. The key finding: belief in AI’s capability improves outcomes more than the system’s actual performance. This study also reveals that belief in AI assistance can produce measurable performance gains even when users are told the AI may harm them, highlighting the profound influence of expectations on human-AI interaction.

Human - AI Interaction

How to protect our thinking with AI image cover

It is a reminder that expectation shapes interaction. Our confidence in AI tools can amplify productivity, creativity and even trust, regardless of their objective accuracy.

inspiration

Curiosity about AI and Machine Learning

My exploration of AI wasn’t planned. It started when my husband, Ademar, began his Master’s in Artificial Intelligence at the University of Waikato.

I helped him rebuild his academic website and I found myself immersed in his reading lists and slowly, in this world of AI. He walked me through the basics of machine learning, how LLMs work and even pointing me to lectures by Andrej Karpathy, whose breakdown of how language models process context changed the way I viewed my daily interactions with tools like ChatGPT or Copilot.

Andrej Karpathy Youtube video screenshot

Video Tutorials from Andrej Karpathy

I watched a few long but engaging video tutorials from Andrej Karpathy, where he broke down the three main steps in developing and deploying large language models. He explained that our interaction happens through chat interface is the context window where each word is turned into tokens the model interprets.

That simple idea changed how I see the chat box. Now I write prompts more intentionally knowing every word shapes how the model understands and responds.

Context Window

Right now, our main interaction with large language models happens through a . Though in the future, new forms of interaction will likely emerge. When OpenAI released ChatGPT in 2022, it became an instant success because chatting felt natural. People were suddenly willing to pay to chat which marked a brilliant productization of an existing open-source technology first introduced by Google: Transformers.

Now that we know this chat is essentially a context window, it’s worth asking what does that actually look like behind the scenes?

tokenization from tiktokenizer screenshot

ChatGPT breaks down words or phrases into "tokens"

So let's take a look here. Every word we input into ChatGPT is broken down into tokens that look like this, and each token is represented by a value.

Looking at these values, we might not understand what they mean because they seem random. But, are they really? Maybe there’s structure hidden within the math.

The Math behind this

Now, when we plot those values, they start to form patterns. It’s essentially a mathematical equation that finds the vectors of closely related information to understand the context behind the words we add. The “answer” from an LLM is basically the result of an algorithm that calculates the nearest vector — the closest point of meaning based on everything in that context window. In a 2D diagram, it might look like the second image; in reality, it’s far more complex, unfolding in three dimensions or

Perhaps that’s why everyone wants to add “AI” to everything... because the effect feels positive, whether it’s fully accurate or not.

more. This brings us back to what Apple researchers called the illusion of deeper thinking. It’s not true cognition — it’s math. The model plots and searches for vector points to return the closest related information to our input. And because of the placebo effect in AI, we feel the magic in every ChatGPT response. Even when it hallucinates, we still feel empowered using these tools.

The real question

Who is thinking?

So the real question is... Is AI controlling our thinking? Or are we learning to think through it?
‍
Knowing that LLMs are, at their core, mathematical systems with numerical value representation reframes this question: the power lies not in the model but maybe in how we use it.

Applicatoin

Designer Better with AI

Once I understood that interaction happens within a bounded context window, I began asking: How can I design prompts that lead to more meaningful outcomes? Could I prompt it in such a way I get closest information to what I am looking for.

‍That question led me to explore prompt engineering... the no-code, no-math way of finding the right “vector.” What's nice is we now see many prompting techniques resources already shared online.

prompt more effectively

Prompt Engineering

Prompt engineering is the no-code way of finding the nearest vector, designing how we communicate with AI systems without the mathematical calculation. It’s not just influencer advice but there’s solid research behind what makes prompts effective.

Prompt engineering is interaction design for intelligent systems. You’re designing not pixels — but reasoning patterns, tone, and collaboration between human and machine.

How to apply this?

I experimented with some prompting techniques in my recent usability testing:

↪ Meta-prompting - teaching the AI how to think before it answers
↪ Function encapsulation - structuring tasks as functions you can call
↪ Output templating - designing the format of responses for clarity

Using these techniques, I ran LLMs like ChatGPT, Copilot, and Claude to analyze usability transcripts.

P.S. I only used company-approved LLMs for analyzing actual usability transcripts. For concept testing, I used a synthetic user transcript.

The result

Detailed analysis from LLMs

You can use tools like Copilot, ChatGPT or Claude to analyze your transcripts depending on what your company officially uses. However, prompt engineering remains essential, no matter which tool you use.

From Data to Design Insight

Ishikawa diagram or also called the "Fishbone Diagram"or the "Cause and Effect Diagram developed at the University of Tokyo in 1968. It is designed to identify, explore and display all possible causes of a specific problem, a classic method for mapping cause and effect in usability problems.

Prompt engineering became not just a technical skill but a design tool: a framework to guide the AI’s attention, helping me surface insights that might otherwise remain buried.

The takeaway

It’s not magic... it’s math

Is AI really asking us to take a side? After doing this research for my topic, I realized that AI isn’t asking us to take sides. It is asking us to "design" better questions. Understanding that it’s math (not mind) lets us use it more intentionally.
‍
AI may not truly think, but it helps us reflect and that is where real design happens in this new age of AI, through mindful prompt formatting techniques.

Knowing now that every word and phrase carries weight and value reminds us that this isn’t about machines overtaking our collective consciousness... it is about learning how to think with it.

References

Apple Machine Intelligence Research. (2024). The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. arXiv preprint arXiv:2404.02078. https://arxiv.org/abs/2404.02078

‍Eiband, M., Leichtmann, B., & Butz, A. (2023). "AI enhances our performance, I have no doubt this one will do the same": The placebo effect is robust to negative descriptions in AI. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1–13). Association for Computing Machinery. https://doi.org/10.1145/3544548.3581501