It's all about that Context - Awesome MLSS Newsletter

6th Edition

“Do this TASK in this CONTEXT.”
“No, not like that — use this METHOD, and remember to include A, B, and C.”
“Okay, now imagine you're a PERSONA, carrying out the TASK within the CONTEXT, following this METHOD exactly.”

Sound familiar? We’ve all been there. Whether you call it prompt engineering or just being extremely specific, the goal is the same: getting an LLM to give you the right response.

But prompt engineering is evolving. It's no longer just about wording a prompt carefully — it’s about crafting the right environment for intelligent systems to understand and act.

Enter Context Engineering: the science of structuring information so an agentic system can process, interpret, and respond accurately. This includes real time information, memory systems, knowledge bases, amongst other elements. 

We’ll dive into the details of context engineering — right after a few quick updates

Upcoming Summer School Announcements

Make sure to apply to them before the application deadline!

For the complete list, please visit our website

What’s happening in AI?

Right off the bat — NO, context engineering is not just glorified prompt engineering.

I won’t lie: when my teammate first asked me to look into it, I sighed. Then I skimmed a few blogs. Most of them boiled down to: “just add more context” — whether from RAG outputs, the current date, chat history, or past user memory.

But then I came across a paper.

At 166 pages, A Survey of Context Engineering for Large Language Models wasn’t exactly light reading. But it did offer a solid foundation in the mathematical principles behind context engineering, backed by a survey of over 1,400 papers.

What really clicked for me, though, was David Kimai’s Context-Engineering GitHub repo. His work emphasized a fundamental challenge: we’re going to keep feeding LLMs more and more context — but how are we deciding what’s relevant, what’s noise, and where the theoretical and practical limits lie?

Language Isn’t Rocket Science. It’s Harder To Understand

You see, language is inherently inexact. If you’ve studied computer science, you’ve probably come across context-free grammars — formal systems that define the syntax of programming languages. In these systems, the same input always produces the same output (assuming deterministic logic).

Natural language isn’t like that. Give the same instruction to a hundred people, and you might get a hundred slightly different interpretations.

That’s where context engineering becomes critical. We need to:

  • Provide the right context

  • Avoid overstuffing (which can degrade performance or trigger errors),

  • Manage token budgets smartly

Context isn’t just extra data. It’s a constraint system. It’s also a form of alignment. And as we move into more agentic systems, getting it right becomes not just helpful — but essential.

While there is a lot to cover, in this newsletter, we will help you understand at least the fundamentals of the framework, and discuss limitations. If you’d like to dive deeper, here is David Kimai’s repository.

Mathematical Foundations

First things first: why bother with formulae?

LLMs and agentic systems are probabilistic. To maximize accuracy while minimizing token and retrieval costs, we need something measurable — because you can’t improve what you can’t measure.

Also, this isn’t new. We've long measured accuracy and context relevance. This framework just makes it more structured and explainable.

Context Formalisation

We discussed making our prompting language more exact, and quantifiable. How exactly can we achieve that?  

The first step towards this is formalisation, a formulaic mechanism for creating the context. The basic formula for this is:

C = A(c₁, c₂, c₃, c₄, c₅, c₆)

Where:

C  = Final assembled context (what the AI receives)

A  = Assembly function (how we combine components)

c₁ = Instructions (system prompts, role definitions)

c₂ = Knowledge (external information, facts, data)

c₃ = Tools (available functions, APIs, capabilities)

c₄ = Memory (conversation history, learned patterns)

c₅ = State (current situation, user context, environment)

c₆ = Query (immediate user request, specific question)

Together, all of the above gives us the most complete context for a given task. But the key question is: what is the Assembly Function?

The Assembly Function defines how we combine all available information into a single, coherent input — ensuring the context is as strong as possible.

This is where strategy matters.

A simple approach is the Linear Strategy Template — a basic assembly prompt that stitches the components together in sequence.

But in more complex cases, you may need a dynamic strategy — one that adapts over time. For that, there’s the Adaptive Assembly Evolution Protocol, which evaluates past executions and refines the assembly function accordingly. One example is the Adaptive Context Assembly Protocol, with more protocols detailed in the document.

The various functions using which you can systematically analyse the quality of your outputs, and context formalisation system, have been provided here

Optimisation

Now that we have discussed building a system for the best context, how do we estimate the quality of outputs? First, we express it as a constrained optimisation problem

Maximize: Context_Quality(A, c₁, c₂, ..., c₆)
Subject to: Token_limits, Quality_thresholds, Computational_constraints

And then, we define our optimisation problem

F* = arg max F(A, c₁, c₂, ..., c₆)

     A∈𝒜

Where:

F* = Optimal assembly function

F(·) = Objective function measuring context quality

A = Assembly function we're optimizing

𝒜 = Set of all possible assembly functions

cᵢ = Context components

There are several metrics through which we can measure the quality of our component outputs, which include Relevance, Completeness, Consistency etc. A more complete list can be found here

Information Theory

We want the best quality information being passed. For this, we need to quantify what information content means. 

Information Content is essentially how much a piece of information ‘surprises’ you, or how low the likelihood of the event is. For instance, if I told you the sun rose today, you wouldn’t really care. However, if I told you the sun did NOT rise today, you would be shocked. The former is a high probability event, the latter a low probability one. 

The higher the surprisal, the greater the likelihood the information shared is of a high quality, as long as it is relevant to the context. Without going into detail, this can be measured by Entropy of Information, a primer on which is available here

Bayesian Theory for Probabilistic Context Adaptation

As we’d all know, the common response to any symptom online is something drastic. Headache? Could be a tumour. However, if we add some context to it, for instance, that you might be having trouble sleeping, suddenly it could be a simple tension headache, solved with a nap!

The same goes for Context Engineering - new evidence must be accurately collected with all new incoming information. Or, formulaically:

P(Hypothesis|Evidence) = P(Evidence|Hypothesis) × P(Hypothesis) / P(Evidence)

Or in context engineering terms:

P(Context_Strategy|User_Feedback) = 

    P(User_Feedback|Context_Strategy) × P(Context_Strategy) / P(User_Feedback)

Where:

- P(Context_Strategy|User_Feedback) = Posterior belief (updated strategy)

- P(User_Feedback|Context_Strategy) = Likelihood (how well strategy predicts feedback)

- P(Context_Strategy) = Prior belief (initial strategy confidence)

- P(User_Feedback) = Evidence probability (normalizing constant)

With the above core mathematical foundations in place, you should have a solid intuition of how we can fix the context engineering problem at its core. Of course, there are several other issues to take into consideration, such as how to make our RAG or memory systems function smoothly alongside our context engineering, or how to work with multi-agent systems, all of which are not quite possible to fit into this newsletter.

However, it is imperative to discuss some of the core limitations we will definitely run into, which only make context engineering all the more important.  

Limitations

Context Window Constraints

LLMs are still imperfect - even with the incredible million token budgets now available - there is only so much context we can stuff. It increases token budget, I/O bandwidth usage, energy consumption, and throttles parallelism. It could also cause more errors owing to infra limits, leading to wasted API calls, or GPU usage on local systems. Context engineering here becomes important from the perspective of enabling the minimal context that provides highest accuracy while reducing cost.

Cognitive Limitations

Attention mechanisms in larger models are often approximations of full attention - which means that if a lot of context is stuffed, there is the possibility of important information either being overlooked, or dropped entirely. It also reduces how complex the task executed can be, owing to both processing depth and information overload. These factors are heavily dependent on both input type, task complexity, model training, and also attention mechanisms. 

While there are still other issues to discuss, these two are the first obstacles to tackle in terms of not just context engineering, but the practical reality of LLMs themselves.

Awesome Machine Learning Summer Schools is a non-profit organisation that keeps you updated on ML Summer Schools and their deadlines. Simple as that.

Have any questions or doubts? Drop us an email! We would be more than happy to talk to you.

With love, Awesome MLSS

Reply

or to participate.