Thinking is Flow. Beyond prompting.

Going beyond snapshots to actual exploration of latent space.

May 27, 2025

We become what we behold. We shape our tools and then our tools shape us
Unknown

The way we interact with large language models today feels oddly constrained. We craft the perfect prompt, submit it, and receive a response. Rinse and repeat. Each interaction is a discrete moment, a snapshot of thought frozen in time. But thinking doesn't work like that.

We, humans, don't operate in these isolated snapshots. Our thoughts flow seamlessly from one concept to another, maintaining context and building understanding over time. We branch into tangents, merge insights from different domains, and backtrack when we hit dead ends. Thinking isn't about capturing perfect moments - it's about navigation through a continuous space of ideas.

What's ironic is that the underlying transformer architecture powering these models is inherently sequential (for now) and flow-based. These models process tokens in sequence with attention mechanisms that create connections across the stream - much closer to how humans think than to how databases retrieve information. Yet our interfaces force them into a retrieval-based paradigm that undermines their natural strengths.

Background

These ideas have emerged from challenges we have faced at Starwatcher.io, a platform designed to match startups with investors. From a language model standpoint, this involves gathering rich contextual information from both parties – understanding startup visions and investor preferences – and then mapping them together in the same conceptual space.

The traditional approach would treat this as a retrieval problem: index information, search for matches, return results. But the essence of good matchmaking is more fluid. It's about gathering nuanced contexts over time and exploring potential connections. Companies change, investor perspective evolves over time.

The Chat Interface: Breakthrough and Limitation

The conversational interface for AI was a genuine breakthrough in accessibility. Chat is intuitive - we ask questions and get answers, just like talking to a human expert, just like with a search engine. This familiar pattern helped millions engage with AI technology who would never have typed a formal query or written a prompt specification.

But this chat-based UI/UX comes with hidden limitations. It enforces an Input/Output mentality - isolated exchanges rather than continuous exploration. While we might maintain a chat history, we're still treating the interaction as a series of discrete questions and answers. Most of these chats are short lived bursts. Type question, get answer, forget the chat.

Context windows are misused as information dumps rather than coherent states of understanding. We stuff them with previous exchanges, relevant documents, and instructions - treating the model more like a search engine with a memory limit than a thinking entity with evolving comprehension. But it is sequential representation and if you changes the order of content, you might get very different output. Just like in stories, if you change the order of chapters, you get different endings.

This mismatch between interface (chat) and architecture (sequential processing with attention) creates fundamental limitations that no amount of context window expansion can fully solve. Regardless of whether the limit is 4K, 32K, or 128K tokens, we're still operating in the wrong paradigm.

Chain of Thought: The First Step Beyond Prompting

One early innovation that gestured toward a better paradigm was Chain of Thought (CoT) prompting. By instructing models to "think step by step," researchers discovered significant performance improvements across complex reasoning tasks.

What makes CoT interesting isn't just the performance gain but what it reveals about these models. CoT works because it better aligns with how transformer architectures naturally process information - as a flow of interdependent ideas rather than isolated facts. It guides the model through a trajectory in its latent space, helping it navigate toward better answers. Think in steps - build your thinking towards the goal by laying out your plan step by step.

Our current approaches amount to poking at the latent space with individual prompts - one insertion point at a time - rather than continuous navigation. We need a fundamentally different approach to unlock the full potential of these architectures.

Navigating the Vectorspace

What if we reimagined conversations with AI not as sequences of prompts and responses, but as journeys through conceptual space?

Imagine each point in conversation as a state vector - a high-dimensional fingerprint capturing the complete understanding at that moment. Instead of being limited by token counts, we could:

Branch into explorations - create divergent paths through conceptual space from any point
Bookmark states - save particularly important understanding fingerprints for later return
Merge insights - combine learnings from different conversational branches
Backtrack effectively - return to earlier states without losing the insights gained since

This approach solves the context limitation problem through clever state management rather than token counting. The model would maintain coherence across much longer conceptual journeys than current approaches allow.

Instead of treating the context window as a finite container, we'd treat it as a sliding viewport into a continuous conceptual landscape - keeping the model's attention on the relevant territory while maintaining awareness of the broader journey. This aligns perfectly with how transformer architectures naturally process information - as a flow of tokens with attention across the stream.

We would have to change our thinking from prompt optimisation to context window programming.

Implementation Pathways

Stanford's DSPy framework offers a glimpse of how we might program such systems. Rather than crafting prompts, DSPy allows developers to define modules with clear input/output signatures allowing users more control over data and how llm generates response. These input output signatures (e.g. landing page content as an input and company name and description as an output) than can be optimised for better performance.

Library doesn't have specific context window programming capabilities but its module based architecture allows more freedom and tools to implement one.

To fully realize vectorspace navigation, we would need:

APIs that expose and allow manipulation of internal model states
Efficient storage and indexing of these high-dimensional vectors
Navigation interfaces that make traversing this space intuitive
Compression techniques that preserve essential understanding while reducing dimensionality

The technical challenges are significant but not insurmountable. Several research threads are already converging on these solutions, from prompt caching systems that store attention states to continuous thought models that operate directly in latent space. While working on this article, new paper on similar topic has been published - Soft thinking.

What's particularly promising is that this approach doesn't require fundamentally new architectures - just new interfaces to existing ones. The models already create and manipulate these state vectors internally. We simply need to expose those capabilities at the API level and create interfaces that let us navigate them effectively.

Beyond Prompting

Prompt engineering has served us well as a first interface to these powerful models. But it as a transitional technology - not the destination. The future belongs to interfaces that embrace the flow of thought rather than treating it as discrete moments.

By treating context windows not as finite containers but as sliding viewports into a continuous landscape of understanding, we can unlock the true potential of these systems as extensions of human cognition.

The chat interface was revolutionary in making AI accessible to millions. Now we need a similar revolution in how we navigate and explore with these systems - one that aligns with both how humans think and how transformer architectures process information.

Thinking is not about crafting the perfect snapshot. Thinking is flow.

Starwatcher observatory

Discussion about this post