SEO & Marketing

Crawled but not indexed: real diagnosis and solutions that work

E
Equip editorial Posicionament-Web
04 May 2026 9 min 15 views

Crawled but not indexed: real diagnosis and solutions that work

You open Google Search Console, go to Indexation > Pages and find URLs with the status "Crawled, but not indexed". Googlebot has visited the page, read it, and decided not to include it in the index. Without indexation there is no ranking, and the problem can affect anything from a key landing page to hundreds of product sheets. What you'll find here is the process I follow to diagnose and resolve this status on real websites: from a restaurant in Gràcia with duplicate menu pages to an e-commerce in Sabadell with 600 URLs generated by filters.

What this status really means

Google works in two phases: crawling (Googlebot visits the URL and reads the content) and indexation (saves the page and can show it in results). "Crawled but not indexed" confirms that the first phase has been completed, but Google has decided not to execute the second.

Here there is a distinction that changes the entire approach: it is not a technical error that prevents crawling. It is a judgment. Googlebot has said: "I read this and decided it doesn't deserve to be in the index." It could be due to content quality, contradictory technical signals, or a crawl budget problem. Identifying which of the three is the first real step.

Status in Search ConsoleCrawled but not indexed
Process affectedIndexation (crawling works)
Most common causeLow-value content or incorrect canonical
Direct impactThe page does not appear in any Google result
Typical resolution time2–6 weeks once the real problem is fixed

The 4 most common causes

1. Duplicate or near-duplicate content

In a fashion e-commerce in Sabadell that I audited, they had over 600 URLs generated by category filters: color, size, season, price. The content was virtually identical across all of them. Google indexed one and discarded the rest. The solution was not technical in the first instance: it was necessary to decide which URLs had real value for the user and configure rel="canonical" pointing to the main version for all others. In three months, the strategic pages recovered visibility.

2. Thin content: pages that don't answer any real intent

A physiotherapy clinic in Tarragona had 15 service pages with between 80 and 120 words each, with no differentiation between them. Google visited them and systematically discarded them. The problem was not the code: it was that none of those pages answered a real question that a patient might ask Google. We merged them into 5 well-structured pages, with clinical cases, frequently asked questions, and practical information. In six weeks, all were indexed and two were already ranking on the first page.

3. Incorrectly configured canonical tag

A rel="canonical" pointing to another URL is an explicit signal: "Don't index this one, index that one." I see this often in poorly executed migrations or WordPress themes that add incorrect automatic canonicals. The quickest way to detect it: in Search Console, use the URL inspection tool and look at the "Page indexation" section. You'll see the canonical URL that Google has detected. If it doesn't match the URL you're inspecting, you've found the problem.

4. Poorly managed crawl budget

On large websites, Google assigns a limited crawl budget per domain. If your website generates unnecessary URLs—session parameters, endless pagination, filters without canonical—Googlebot wastes the budget on irrelevant pages. On a local services portal in Girona, we detected that 60% of the crawl budget was going to internal search URLs. Blocking them in the robots.txt file freed up capacity for strategic pages in less than a month.

What makes the difference: Most tutorials look for technical errors when they see this status. What you need to do first is ask yourself whether the content deserves to be indexed. In our experience, more than 60% of cases are resolved by improving content, not touching the code.

How to diagnose it in Search Console

Follow this order before touching anything. Making changes without prior diagnosis is the main reason why many businesses go months without results:

  1. Export the complete list. Search Console → Indexation → Pages → "Crawled but not indexed" → Export. Sort the URLs by type: products, categories, blog posts, landing pages, archive pages. This will allow you to identify patterns (for example, if all affected URLs contain a specific parameter).
  2. Individual inspection of priority URLs. For each strategic page, use the URL inspection tool. Pay attention to three things: (a) canonical URL detected by Google—it must match the URL you're inspecting—, (b) whether there is a noindex tag in the source code, (c) date of last crawl—if it hasn't been visited for weeks, the problem could be crawl budget or lack of internal links.
  3. Check the source code directly. On the affected page, press Ctrl+U and search for noindex and canonical. A hidden noindex in a meta tag or HTTP header is easy to miss if you only look at the CMS. I've seen it in staging deployments where someone forgot to change the configuration.
  4. Evaluate content quality. Ask yourself: does this page answer a specific search intent? Does it have content that differentiates it from other URLs on the same website? Does it have at least 300 words with clear structure? If the answer to any of these is no, the problem is content, not code.
  5. Review server logs if you have access. Confirm that Googlebot visits the page regularly. If it hasn't visited it for weeks despite having it in the sitemap, the problem is crawl budget or insufficient internal links to that URL.
Step 1
Export and sort URLs by patterns
Step 2
Technical inspection: canonical + noindex
Step 3
Evaluate content quality before touching code

How to solve it by priority order

The order matters a lot. I've seen businesses spend six months trying to solve this problem because they did the steps in the wrong order:

  1. Fix obvious technical errors first. Accidental noindex and incorrect canonicals are quick to resolve and the impact is immediate once Google crawls again. Also check HTTP headers with tools like httpstatus.io: sometimes the noindex is sent by header and doesn't appear in visible source code.
  2. Improve or merge low-value content. If a page doesn't reach 300 words or doesn't answer any clear intent, expand it with useful and specific information, or merge it with a stronger URL via 301 redirect. Don't delete without redirecting: you lose any accumulated authority and generate 404 errors that damage user experience.
  3. Clean up unnecessary URLs. Block internal search URLs, session parameters, and archive pages you don't want indexed in robots.txt. Add noindex deliberately to thank you pages, user panels, and empty categories. Every URL Google doesn't have to process is crawl budget that can go to the pages you really want to rank.
  4. Strengthen internal linking. A page with no internal links has very little signal of importance to Google. Add at least 2–3 links from pages with authority on your website to the URLs you want indexed. In many cases, this step alone is enough to unlock pages that have been unindexed for months.
  5. Request indexation manually in Search Console once you've done all of the above. Do it URL by URL for priority pages. For large volumes, update the XML sitemap, resubmit it, and wait. Don't request the same URL every day: Google ignores it and may interpret it negatively.

Errors that prevent resolution

  • Requesting indexation without fixing the underlying problem. This is the most common error. Google will discard the page exactly the same way as before. The manual request doesn't override a quality judgment.
  • Deleting pages without redirecting them. A 404 doesn't transfer authority or signals. Always use 301 redirect to the thematically closest page or to the main category.
  • Confusing the two Search Console statuses. "Crawled but not indexed" and "Discovered but not crawled" have completely different causes and solutions. The latter indicates that Google hasn't even visited the page: the problem is usually crawl budget or lack of internal links, not content quality.
  • Generating too many new URLs without strategy. Each new filter in an e-commerce can create hundreds of URLs. A clothing store in Hospitalet tripled the number of pages in a year without any canonical policy. Result: 70% of product sheets were left out of the index. Define from the start which URLs you want indexed and block the rest.

If you have a website in Barcelona, Girona, Lleida, or any other Catalan city and want to know exactly which pages are losing visibility and why, ask us for a free SEO audit. We'll deliver you a report with the affected URLs, the probable cause, and the actions by priority order, with no commitment.

Frequently asked questions

How long does it take Google to index a page once the problem is fixed?

On websites with authority and frequent crawling, it can be a matter of days. On new or poorly linked websites, between 2 and 6 weeks. Requesting indexation manually in Search Console can speed up the process, but doesn't guarantee it if the underlying problem hasn't been resolved first. I've seen cases where the page was indexed in 48 hours and cases where it took two months because the content was still insufficient.

Can this status affect the ranking of pages that are indexed?

Yes, indirectly. A high volume of low-quality pages consumes crawl budget and can cause Google to crawl important pages less frequently. Additionally, duplicate content dilutes the thematic authority signals of the domain. It's not an immediate or dramatic effect, but on large websites the accumulation is noticeable.

What's the difference between "crawled but not indexed" and "discovered but not crawled"?

"Discovered but not crawled" means Google knows the page exists—through sitemap or a link—but hasn't visited it yet. It's usually a crawl budget or priority problem. "Crawled but not indexed" means it has already visited it and actively decided not to include it. The diagnosis and solution are completely different: in the first case you need to improve internal linking and the sitemap; in the second, you need to review quality and technical signals.

Should I worry if the affected pages are date archives or blog tags?

No, generally not. Date archive pages, tags with little content, or categories with a single entry rarely deserve to be in the index. What I recommend is adding noindex deliberately to avoid Google processing them unnecessarily every time it crawls your website and to concentrate crawl budget on strategic pages.

Does a well-configured XML sitemap help solve the problem?

The sitemap helps Google discover pages and understand which ones you consider important, but doesn't guarantee indexation. If a page is in the sitemap and still isn't indexed, the problem is content quality or technical signals—canonical, noindex, duplicate content—not discovery. Solve the underlying problem first; updating the sitemap is the last step, not the first.

Want to improve your SEO in Catalonia?

Free SEO analysis: we tell you exactly where to start.

Free analysis


Share:
E
Equip editorial Posicionament-Web

L'equip editorial de Posicionament-Web publica continguts SEO pensats per a negocis de Catalunya.

Comments

No comments yet. Be the first to comment!

Leave a comment