Peter Norvig and Andrew Ng predicted that 2024 would be the year AI agents truly delivered on their potential—and they were spot on. Startups surged with new agent frameworks and services, while enterprises raced to implement ambitious deployments. At the heart of this wave are large language models (LLMs), which have quickly become foundational for this agentic wave. Watching these agents wield tools, draft strategies, and optimize complex processes has been extraordinary.
At Hightouch, however, we’ve envisioned a slightly different path for AI agents. We believe the true power of agents lies in their ability to adapt, learn, and optimize toward specific (marketing) goals over time. Instead of centering on LLMs, we’ve built a composite agentic system that places reinforcement learning (RL) models at its core while using LLMs for tasks that capitalize on their generative strengths.
The combined approach enables us to use the most effective technologies to solve different aspects of one of marketing’s most complex challenges: 1:1 personalization.
In this post, we’ll explore AI agents and how they differ when based on LLMs and RL models. We’ll also explain how agents that continuously learn from interactions provide a compelling lever for creating personalized customer experiences.
What makes an AI agent?
Before understanding how LLM and RL-based agents differ, we need to establish a baseline concept of an AI agent.
AI agents are software systems that autonomously perform tasks using AI.
As we explored in a recent post, agents aren't a new idea in software. They've existed for decades, have an ISO definition, and even had a hype cycle in the 1990s. The concept of an agent has persisted even as the driving technology has evolved from rule-based to various ML models to current LLMs.
Contemporary AI agents are settling into a fundamental set of features. They can decide on a series of tasks or actions given to achieve an end goal through planning or reasoning. They have memory. They know about previous interactions and events and use that in their plans and actions. They use tools: they can search the web, use APIs, and other applications (including LLMs).
Each feature exists in a spectrum, determining how agentic a system is. Agents can have minimal ability to plan but still be able to act autonomously (more similar to rule-based systems). Agents with different purposes may access various types and amounts of memory. Some agents may only be able to use a single, designated API, while others have a full suite of complex tools.
These features—planning, memory, and tool use—are a structure or pattern for understanding these AI systems. We’ll use this framework to work through how LLM and RL-based agents carry out tasks, but first, we need to dig into the types of marketing personalization we’re trying to solve with ML/AI. We can only understand and evaluate these approaches if we first understand the constraints of marketing personalization.
Marketing personalization as a problem space for ML/AI
Here's the fundamental problem we’re solving: personalized marketing is more effective at driving desired outcomes than non-personalized or semi-personalized (segment-based) marketing. But sending the right message to the right user or customer at the right time with the right offer is complex, time-consuming, and impossible for companies at any significant scale to do manually.
To solve this with an ML/AI solution, we need to break down the problem into the types of inputs and outputs that ML/AI solutions require.
First, the data that can be used as inputs for a model are constrained. It's your customer data in your CDP or data warehouse, the content and other mutable aspects of marketing interactions, and potentially data on the effectiveness of prior campaigns or messaging. Customer data likely includes an email, a username, unique IDs, and some demographic information, and can include event and product data (e.g., website pages visited, items added to cart, workouts completed). It could also be data about similar customers, as well as data about other messages and their performance.
Next, the goals are limited, consequently limiting available actions and outputs for an ML model. Goals or outcomes must be measurable for a model to be optimized. Under a regime of operationalization, marketing has outcomes like increased customer lifetime value (LTV), higher conversion rates, or decreased churn likelihood.
Marketing actions are, then, limited. Companies have only so many channels and have bought into a limited number of engagement tools. A company focused on marketing through its website, emails, and search ads has a restricted range of possible marketing actions that can be impacted by or controlled by ML/AI models.
For each action, though, models can determine a range of variables. Emails can have different subject lines, tones, and product recommendations. They can be shorter or longer, sent more or less often, at various times, or on different days. They can be heavily designed or primarily text. Each of these variables could be part of the output of a model deciding on an email message for a customer or user.
Understanding these constraints on inputs, outputs, and outcomes enables us to consider how any given ML/AI system and model(s) could be implemented to solve specific problems in marketing personalization.
We’ll do just that, looking first at LLMs before turning to reinforcement learning.
How LLM-based agents work
Let's work through a task with an LLM-based agent to understand how they solve problems, with attention to planning, memory, and tool use. LLMs are generative, so we'll focus on a use case that leverages their generative capacities for personalizing marketing content.
Suppose you have an AI agent built and fine-tuned for marketing, and give it this prompt: "Create a series of LinkedIn posts to explain and promote the new product features in the provided product brief. Suggest when to post each post over a week, and create a Notion database with the messages for review." You provide a product brief and a marketing positioning and messaging brief with persona information. We’ll assume that the engineering team building the agent system has written a function to create databases in Notion as a tool and includes secure API credentials access.
At a high level, the marketing agent works in a few steps. It:
- Breaks down the problem and context and plans a series of tasks.
- Generates and saves a sequence of content assets with suggested posting dates and times based on its knowledge and the provided product and marketing documents (memory).
- Uses the Notion tool function to create a new database with the generated social posts (tool use).
An LLM-based AI agent for marketing
The LLM is the engine of this workflow. It uses context and knowledge embedded through its training to break down a directive (a prompt) into tasks, generate content, save it for future use, and use tools to push this content into a team’s existing workflows. It's incredible that we have systems that can already do this.
However, this system has risks and failure points. The risk of hallucinations is well-known at this point. While agent frameworks increasingly have evaluation systems (often using other LLMs and agents), the LLM agent can still generate content that doesn't precisely follow messaging guidelines or misses your audience's nuance. This agent also depends on a supplied tool function for Notion, an additional maintenance cost for an engineering team. The workflow will fail if the Notion API has a breaking change that is not accounted for in an updated function.
There are ways to mitigate each concern, but the infrastructure effort and maintenance costs build quickly and can offset the impressive gains from an LLM-based agentic system.
There's another approach to building AI agents that learns directly from outcomes rather than relying on language models—reinforcement learning.
Building agents around reinforcement learning
We’ll consider an AI decisioning use case to understand how a reinforcement learning-based agent operates. Let's say your business runs a fitness app with a monthly subscription. Your marketing goal is to drive re-engagement for customers who have decreased use of the app through a mix of email and mobile push notifications. You're hoping to optimize your communications at an individual customer level.
Let's say that you run engagement campaigns for a fitness company, which has a mobile app. Within Hightouch's AI Decisioning product, you set up a flow with a win-back goal of twice weekly workouts targeting an audience of lapsing subscribers. AI Decisioning operates within the guardrails you create for it, including aspects like message frequency and the content of the messages. In your engagement platform—such as your ESP—you build templates for the messages you want to send, which the agent will use. In Hightouch, you also provide a set of variants—such as email subject lines, greetings, and offers—and a catalog of the workouts available through your app.
Here's what the agent does:
- In the first run, the agent decides on a delivery time, channel, and message for each customer in the audience. The message entails a decision for each variable field: subject line, greeting, and workout recommendation. The agent delegates the workout recommendation to another ML model built for product recommendations based on the customer's past workouts in the app and other customer data.
- The agent constructs messages using the template pulled from the engagement platform and schedules them for sending by integration or API (tool use). An LLM generates semantic tags for messages that enable the agent to learn across messages.
- For subsequent messages, the agent predicts rewards for potential actions and makes an exploit-explore decision. Based on the RL model used, the model chooses to use a variant or message that has historically had a positive outcome (exploit) or experiments with a new variant (explore). With a contextual multi-armed bandit, like one of our current models, the agent uses customer data in the feature matrix to make decisions, enabling true personalization. The agent sends a particular message at a specific time to each user based on their customer data in the context of other users.
- The RL model learns from the results of sent messages (memory), i.e., the model updates.
- The agent repeats the cycle, making an exploit-explore decision for each user, constructing messages and schedules, sending messages, and then learning from the results.
A reinforcement learning-based AI agent for marketing
While the system has multiple ML models, we currently use a contextual bandit as its engine. It is the learning agent that makes decisions across content variables, channels, and timing using customer data as the context, creating personalized marketing experiences.
Fitting the model to the problem: Differences between LLM and reinforcement learning-based agents
At least two crucial differences exist between how LLM-based agents and RL-based agents work, which indicate the different types of personalization problems they can solve.
LLM and RL systems handle planning and decision-making differently. LLMs are built to predict the next token in a sequence and have emergently demonstrated reasoning. Given a problem and prompted to carry out a task, current LLMs can generate a series of tasks based on the knowledge embedded in the model and provided context, then action tasks depending on training and available functions.
The RL model at the core of AI Decisioning isn't responsible for breaking down a problem and planning tasks. Instead, the system is built to carry out a constrained set of actions. It decides on messages, their timing, and their channel and optimizes those decisions according to a user-provided goal. Its decision-making is built on the foundation of the tight feedback loop at its core, enabling the model to adjust to customer behaviors while optimizing its decisions over time.
AI Decisioning's agents are also built to execute a single large task over a long duration, while the tasks typically given to LLM-based agents are shorter and more limited. Because the system uses a reinforcement learning model, it improves over time to achieve its goal and can—to a certain degree—be set and left to run.
LLMs don’t inherently learn, though. They can improve through improving prompts, providing more or different contextual data, or fine-tuning the model on new data. To carry out any longer tasks requiring increasing accuracy or optimization based on new data, engineers must build feedback loops, context improvements, and repeated fine-tuning into the overall AI system in which the LLM is embedded. Though expensive, this engineering work is possible and may be compelling for use cases requiring flexible reasoning capacity or the interactive interfaces common to LLM-based agentic systems.
Building around RL models is the most compelling option for tasks where improvement over longer durations is needed and where the action space is constrained.
Decisioning agents for personalized marketing
The fast emergence of LLM-based agents has catalyzed conversations about AI systems that can act to achieve marketing goals. These agents excel at tasks that require reasoning and generative capacities, such as creating personalized content, creating derivatives from high-value content assets, and planning marketing campaigns. As we’ve explored, though, there are compelling reasons to look beyond LLMs when building agents for marketing personalization.
Reinforcement learning offers a different approach to agency built on continuous learning and optimization. While LLM-based agents adeptly handle complex, creative tasks, RL-based agents excel in scenarios where sustained optimization and adaptation to customer behaviors are vital. They’re well-suited to marketing personalization tasks with measurable outcomes, constrained action spaces, and the need for ongoing improvements.
The future of marketing automation lies in composite systems that can leverage both LLMs and RL, taking advantage of each where it shines. At Hightouch, we’re building AI Decisioning to deliver a truly personalized marketing experience through continuous experimentation and learning. When we talk about AI agents in marketing, it’s not just because it’s currently convenient. We’re describing systems that autonomously adapt to individual customer behaviors over time.
This is the vision for personalized marketing we're building: marketers should see each crafted experience and be surprised at how well it fits the customer; customers should feel like their experience is just for them.