AI Publisher Licensing Deals: How They Actually Shape Brand Visibility in AI Search

The Reddit deal is public. The mechanism behind it is not.
What actually matters for your brand strategy are more specific questions:
What does a licensing contract do to the outputs your customers see?
What does that mean for brands that are not party to those agreements?
These questions are harder to answer than the headline numbers suggest, and most commentary on this topic never gets there.
OpenAI struck a deal with Reddit worth around $70 million per year, giving its models access to Reddit's Data API in real time. Google signed a separate Reddit agreement at around $60 million per year. The News Corp partnership with OpenAI was structured at more than $250 million over five years, covering The Wall Street Journal, The Times, and Barron's among others.
Every major AI lab is now paying, in some form, for the content that trains its models or feeds its retrieval systems.
Key takeaways:
- AI publisher deals affect brand visibility in two distinct ways, and most brands are managing the wrong one
- Foundational model knowledge and retrieval-augmented generation are not the same thing, and your GEO strategy needs to treat them differently
- Between 40% and 60% of AI-cited sources change month to month, meaning AI visibility is not a ranking you earn once
- A five-step audit gives you a measurable baseline instead of a framework to guess from
What Are AI Publisher Licensing Deals?
AI publisher licensing deals are commercial contracts between AI companies and content owners, granting AI labs the right to use published content for model training or real-time retrieval.
These are not the same as RSS feeds or standard crawl permissions. A web crawl gives a model permission to index content for search. A licensing deal goes further: it gives the model the right to use that content as training material, making it part of the foundation from which the model learns about the world.
The pace of these deals has accelerated since 2023, driven by two pressures:
- AI labs needed to expand training datasets beyond what open web crawls could provide.
- Publishers threatened legal action after discovering their content had been used without permission or compensation.
The result is a commercial layer that now sits between published content and the AI systems your customers use to make purchase decisions.
What this table does not show is what the money actually buys. That is exactly where most brand strategy goes wrong.
What Do Those Contracts Actually Buy?
The answer depends entirely on whether the deal is for training data or retrieval access. These are fundamentally different, and they affect your brand visibility in completely different ways.
A training data deal shapes what the model has learned before it answers any question. Call it long-term memory. Brands that appear consistently in that training material become part of the model's baseline understanding of the world — in plain terms, what it will say about your category without looking anything up.
A retrieval deal works differently. It means content is accessed in real time while the model is answering a specific query. This is what the industry calls retrieval-augmented generation, or RAG: the model searches for current sources to support its answer. The content does not change what the model knows permanently. It only shapes that one response.
The Reddit deal is primarily a retrieval deal. OpenAI gained access to Reddit's Data API, letting its models pull current Reddit discussions during queries. That is a very different thing from Reddit content being permanently encoded in the model's knowledge. The News Corp deal is structured differently. Some of that content has gone into training data, meaning the model has built lasting associations with those publications and the topics they cover.
For brand visibility, the practical difference is this:
- Training data shapes the model's unprompted recommendations.
- Retrieval data shapes the model's sourced answers.
Your strategy needs to address both. Most current frameworks only address one.
How Do AI Publisher Deals Affect Your Brand Visibility?
Now you know the two types of deal. Here is what that means for your brand.
Brand visibility in AI search comes from three intersecting sources:
- What the model already knows (foundational knowledge)
- What it retrieves during a query (RAG)
- How it interprets the context of the user's intent
If your brand has a sustained presence in high-authority publications included in AI training data, the model associates you with your category before it retrieves anything. If your content appears on platforms AI systems use for live retrieval, such as Reddit, YouTube, and licensed news outlets, you show up in cited answers. If neither applies, your brand exists in the model's world only to the extent that your website or secondary mentions provide enough signal to surface you at all.
Most brands are absent from both.
The competitive implication is direct. Brands with strong foundational presence get recommended in responses where the model never searches for anything. As AI assistants get better at answering from their own knowledge, foundational presence becomes the more valuable asset. Brands that only exist in retrieval results will become invisible for the queries that matter most.
What Is the Difference Between Training Data and RAG?
Training data is what the model carries into every conversation. RAG is what it goes looking for during one.
The practical difference shows up the moment you ask an AI a category question. Ask ChatGPT: "What's the best project management software for a remote team?" In many responses, Asana or Notion appear without a single cited source. That is foundational knowledge at work. Those brands have been in enough high-authority content, over enough time, that the model associates them with that category by default. It does not need to search. It already knows.
Now ask: "What project management tools launched in 2025?" The model will typically search and return results with citations from recent articles. That is RAG. The model went looking because its foundational knowledge has a cut-off date.
The difference matters for strategy because the two types of visibility require different actions:
- Foundational visibility comes from sustained presence in high-authority content over months and years. It builds slowly and compounds.
- Retrieval visibility comes from being on the right platforms with the right content structure. It can be improved quickly but changes constantly.
Most brands chase retrieval visibility because it is faster to act on and easier to measure. Foundational visibility is the harder, slower work. It is also the work that builds a position competitors cannot copy in a quarter.
Real-time optimisation strategies cannot fully compensate for a weak training data footprint, a finding that holds when base model responses are tested separately from search-augmented ones, as Evertune shows via direct model API access.
What Are the Four Signals That Drive AI Citations?
The signals that determine whether your brand appears in AI-generated answers fall into four categories. Most GEO frameworks cover three. The fourth determines long-term, unprompted visibility.
Entity clarity
AI models organise their knowledge around entities: named companies, people, products, and concepts with consistent, verifiable attributes. A brand that appears across sources with a consistent name, description, and category is easier for the model to represent accurately.
Entity clarity comes from structured data (schema markup on your site), Wikipedia presence, consistent information across business directories, and persistent brand language across authoritative publications. Without it, the model may hold fragmented knowledge of your brand, or in some cases, quietly merge you with a competitor in the same category.
Content extractability
AI systems favour content that leads with the answer. A paragraph that buries its key claim in the fourth sentence is harder for the model to incorporate into a synthesised response than one that opens with the point and elaborates afterwards.
This is not about bullet-pointing everything. It is about removing the narrative build-up that works for human readers but adds noise for AI systems trying to extract a usable answer.
Platform authority
AI models pull from specific sources: Reddit threads, YouTube transcripts, LinkedIn articles, peer-reviewed publications, and licensed news content. A brand that participates genuinely in these environments builds a presence across the platforms AI systems trust.
Reddit matters here not simply because OpenAI paid for the API. Real user discussions about your brand are exactly the kind of third-party signal AI models are designed to surface.
Foundational model footprint
This is the signal most GEO frameworks leave out. It builds through sustained presence in high-authority, training-eligible content over time: coverage in publications AI labs license, academic references, media mentions, and consistent brand association across years of authoritative text.
Building foundational model knowledge is a 12 to 18-month project, not a campaign. Brands that start now earn a compounding advantage over those that treat GEO as something to revisit next quarter.
Why Does Traditional SEO Fall Short in AI Search?
Traditional SEO is built on ranking logic. A piece of content earns a position in a search index through signals an algorithm can evaluate: backlinks, relevance, click-through rate, site health.
AI search has no ranking index. It has a generation process. The model does not sort your content against competitors and return the best match. It synthesises a response from everything it knows. The competition is not for rank position. It is for inclusion in the model's knowledge base. The data makes this concrete: 62% of pages cited in AI Overviews do not rank in Google's top 10 for the same query, a finding we document in our competitor gap analysis for AI search.
The measurement gap is just as sharp. Traditional SEO gives you rankings, traffic, and click data. AI visibility gives you none of those by default. A brand can appear in thousands of AI responses per month and record zero corresponding traffic, because the user got a complete answer without clicking anywhere.
Then there is the volatility problem. Between 40% and 60% of AI-cited sources rotate out month to month. AI visibility is not a position you defend. It is a share of voice you actively manage. You can only manage what you measure, and most brands are currently measuring neither.
Can Brands Without Publisher Deals Still Win AI Visibility?
Yes. The licensing deals shape the playing field. They do not close it to everyone else.
Publisher deals primarily determine which content sources AI models treat as high-authority training material and trusted retrieval targets. Brands that appear consistently in those sources benefit. But the system is not sealed. AI models still pull from the open web, surface community discussions, and carry knowledge from sources that predate any licensing arrangement.
The Samsung example makes this concrete. Future Publishing ran a GEO campaign for Samsung using its own AI visibility tooling, and reported 28% growth in AI citations from Future-sourced content over three months. Samsung did not sign a deal with OpenAI or Google. The result came from structured content, platform leverage, and consistent presence in the right places.
A smaller-scale version of the same logic works for mid-sized brands. A B2B software company with a well-maintained LinkedIn presence, genuine participation in relevant Reddit communities, and a website structured for content extraction can build meaningful AI visibility without a publishing deal or a seven-figure content budget. It takes longer than a campaign. It compounds in a way that a campaign cannot.
The path runs through the four signals above: entity clarity, content extractability, genuine platform presence, and consistent media coverage in publications that appear in AI training and retrieval datasets. None of those require a licensing contract.
How Do You Audit Your AI Visibility?
Run a five-step quarterly audit to establish where your brand actually stands, rather than optimising from an assumed baseline.
Most GEO content describes optimisation tactics without first measuring the gap those tactics are meant to close. Auditing before optimising changes the work from guesswork to measurement.
The total time investment for a quarterly audit is four to six hours. What it gives you is something most brands currently do not have: a number.
What Does the Shifting Licensing Landscape Mean for Your Strategy?
The licensing landscape is not static, and brands that treat it as fixed will be managing yesterday's competitive environment.
The EU AI Act now requires every provider of a general-purpose AI model to publicly disclose their training data sources, respect copyright opt-outs, and label AI-generated content. Failure to publish the required training data summary can result in fines of up to 3% of annual worldwide turnover. This creates a compliance incentive for AI labs to formalise their data relationships, which means the published deal landscape will become more transparent over the next two years, not less.
The global market for AI training dataset licensing was valued at $4.8 billion in 2025 and is projected to reach $22.6 billion by 2034, growing at 18.8% per year. That growth reflects both increased AI development activity and the formalisation of data supply chains that were previously informal or legally ambiguous.
Deals will multiply, and the content inside them will turn over as new publications are added and existing agreements renegotiated. A brand that builds visibility through a single publication or platform may find that visibility shift as the licensing structure changes beneath it.
The brands best insulated from that volatility are those with strong foundational model knowledge, already embedded in the model's learned understanding rather than subject to ongoing deal changes, and broad platform presence across multiple retrieval sources. Building broad, not deep, is the more defensible position.
FAQs
The Reddit deal made headlines. What rarely makes headlines is the mechanism by which that deal, and dozens like it, determines which brands your customers encounter when they ask an AI what to buy. Getting that right requires understanding the difference between what AI models know and what they look up, building a strategy that addresses both, and measuring where you actually stand before optimising anything. Brands that start on this now will not be scrambling to catch up in 18 months.
Want to know where your brand stands in AI search right now? Tenpoint Labs runs AI visibility audits for B2B and consumer brands. Get in touch to start with a baseline.