
Reference
Not all indexation problems are technical.
Many URLs are crawled correctly, return a 200 status, and still never rank - or quietly disappear from the index. When this happens, teams often chase crawl budget, sitemaps, or algorithm changes.
In reality, the issue is frequently index quality.
Search engines actively evaluate whether a page deserves to exist in the index at all. When a page looks like an error, placeholder, or low-value variant - even if it technically “works” - it may be treated as a soft 404 or thin content and excluded.
This guide explains:
- what soft 404s really are
- how thin pages are identified
- why valid URLs get ignored
- and how to fix index quality problems without creating new ones
If you’re deciding between noindex, canonicals, and robots rules as “fixes”, read noindex vs canonical vs robots - these controls are often used to hide the symptom rather than solve the cause.
What a soft 404 actually is
A soft 404 is not a status code.
It is a classification decision made by a search engine.
A page is treated as a soft 404 when:
- it returns a 200 (or sometimes 302)
- but behaves like a missing or useless page
- and provides little or no value to users
From the engine’s perspective, the page exists - but it shouldn’t.
Common soft 404 patterns
Empty or near-empty pages
Examples:
- empty category pages
- tag pages with one or zero items
- search results with no matches
These often show:
- a heading
- minimal boilerplate
- no meaningful content
They look like errors without being errors.
“No results” states
Internal search or filter pages that say:
“No products found”
But still return a 200.
These pages technically exist, but serve no purpose in search results.
Auto-generated location or service pages
Examples:
- city pages with swapped names
- service pages with identical templates
- minimal unique content
At scale, these often cross the line from “templated” to “empty”.
Expired or unavailable items
Examples:
- products that no longer exist
- listings that are permanently gone
- events that ended long ago
If the page does not meaningfully redirect or explain itself, it often becomes a soft 404.
Thin pages vs soft 404s
The difference is subtle but important.
- Thin pages: low-value, low-utility content that technically exists
- Soft 404s: pages interpreted as errors or placeholders
Thin pages may still be indexed.
Soft 404s are often excluded entirely.
Many pages move from thin → soft 404 as quality declines.
Why search engines care about index quality
Indexing everything is expensive.
Search engines optimise for:
- usefulness
- reliability
- result satisfaction
Pages that:
- fail to satisfy intent
- add no unique value
- duplicate existing results
…are candidates for exclusion, even if they are valid URLs.
This is not a penalty. It is selection.
Why soft 404s are dangerous
Soft 404s cause silent failure.
- No ranking drop warnings
- No crawl errors
- No broken pages for users
Instead:
- pages never index
- coverage slowly shrinks
- important URLs lose trust by association
On large sites, soft 404s can quietly dominate coverage reports.
How soft 404s are detected
Search engines look at combinations of signals, including:
- content volume and uniqueness
- similarity to known error pages
- user engagement patterns
- internal link context
- historical behaviour of similar URLs
No single factor triggers a soft 404. It is pattern-based.
Thin content: not about word count
Thin content is often misunderstood as “short content”.
Length is not the issue.
Thin content is content that:
- fails to satisfy intent
- adds no new information
- exists only to fill a template
- duplicates other pages with minor variation
A 300-word page can be strong.
A 3,000-word page can be thin.
Common thin page types
Faceted category variants
When filters generate pages that:
- repeat the same products
- add no contextual explanation
- exist only as combinations
These often look distinct to users but redundant to engines.
Tag and archive pages
Especially when:
- tags are auto-generated
- archives are shallow
- pagination quickly empties out
These frequently degrade into low-value index clutter.
Programmatic SEO pages (done badly)
Large sets of pages generated without:
- unique insight
- real differentiation
- demand validation
Scale amplifies quality problems.
Fixing soft 404s the right way
Option 1: Return a real 404 or 410
Use when:
- the content should not exist
- there is no replacement
- the page has no long-term value
410 (Gone) is appropriate for permanently removed content.
Option 2: Redirect meaningfully
Use when:
- a clear alternative exists
- the intent can still be satisfied
- the redirect helps users
Avoid redirecting everything to the homepage. That often creates more soft 404s.
Option 3: Improve the page
Use when:
- the page represents real intent
- demand exists
- the page is structurally sound
Improvements must be substantive, not cosmetic.
Option 4: noindex
Use when:
- the page is useful to users
- but should not appear in search results
- removal is intentional
This is often appropriate for internal search and empty filter states.
What not to do
- Do not block soft 404s in robots.txt
- Do not canonicalise thin pages blindly
- Do not inflate content with filler
- Do not hide the problem behind noindex everywhere
These actions treat symptoms, not causes.
Index quality at scale
On large sites, index quality problems usually come from:
- uncontrolled URL generation
- lack of ownership over page creation
- assumptions that “Google will figure it out”
- measuring coverage, not usefulness
The fix is rarely a single rule. It is a system decision.
A simple quality test
Ask one question for each page type:
“If this page ranked, would a user be satisfied?”
If the honest answer is no, the page does not belong in the index.
Everything else flows from that.
How soft 404s relate to the rest of the system
Soft 404s interact with:
- crawl budget (wasted attention)
- parameters (duplicate low-value pages)
- pagination (empty deep pages)
- internal linking (amplified weakness)
Fixing index quality improves all of them.
Summary
Soft 404s and thin pages are not bugs. They are feedback.
They indicate:
- excess URLs
- weak intent alignment
- overproduction without governance
In 2026, successful sites are not the ones that index the most pages - they are the ones that index the right pages.
Index quality beats index quantity every time.
Related reading
Glossary terms
Want help applying this?
Get a baseline audit, explore the most relevant service, or use a tool to validate your next move.
Related Resources

Kiril Ivanov
Managing Director & Performance Lead
Kiril leads strategy and execution at TwoSquares, combining technical engineering backgrounds with advanced performance marketing. Specialising in programmatic SEO, Google Ads scripting (API), and full-funnel paid media architecture, he builds systems that turn search visibility into measurable revenue for UK brands.
View author profile →