301 Redirect creates extra urls in sitemap

I have implemented a 301 Redirect to send all variations of mysite.com/index.php to mysite.com/.

This should prevent "duplicate" urls and, theoretically, help my Google ranking.

Now, however, Sitemap Generator creates a sitemap with both the pre-redirect addresses and the new, redirected addresses, which sort of defeats the purpose. And Google’s sitemap manager does not like all these extra urls.

So, should I go back and change all my internal links to reflect the redirect? I’d rather not, as I thought the redirect was intended to handle that job.

Or, is there a way to filter the old urls (mysite.com/php) out of the sitemap?

Comments

  1. AMPC says:

    Hello Slobjones,

    First, thanks for the question as it lead me learn that my sitemap generator will create a sitemap with the 301′s included. The purpose of showing the 301′s is to make the user aware of the redirect and where the traffic is being moved to. However, I didn’t intend for the 301′s to be written to the sitemap file and will correct this next time around.

    The purpose of a 301 is to tell the bots that you would like any traffic that headed towards page1 to be redirected to page2 (or site2) permanently.

    You do this in the header of your file or your htaccess. You then remove any links to page1 and point them to page2. The 301 is not intended to be used as a shortcut for changing a large number of files, rather, it’s there to tell the bots that the page has permanently moved and to update their records.

    So, yes, you should change all your internal links to the new pages from the old pages.

    You can filter the old files from the sitemap, but like my sitemap generator, like the bot will see a ton of links to old pages, try those old pages, then get redirected to the new pages.

    Careful on the redirecting the index.php to the site address, this could end up in an endless loop (which will chew up server resources and time out). The way to fix the mysite. com/index.php issue is to modify all your links referring to that file to instead go to mysite. com. They are both the same.

    I’m not sure on how you have the link structure, but if you have your navigation links in an include file, you should only need to update one file.

    I hope that helps!

    Best regards,

    Jim.

  2. slobjones says:

    I removed all references to index.php from my site, and that’s why I’m surprised to see that the sitemap generator again produced the index.php urls, along with the new, redirected urls.

    Any idea why that happened?

    I removed the index.php urls with a row filter and uploaded the new, slimmer sitemap to Google.

    But why did the generator produce those index.php urls, when they no longer exist on my site? I’m mystified.

    Where necessary, I replaced index.php with a forward slash (/). Everything seems to be functioning normally on my site.

    I’m doing exactly that will my navigation. PHP includes are very handy tools.

  3. slobjones says:

    The more I look at this, the more it seems I’ve gotten away from my original purpose, which was a very simple one — Redirect links from mysite/index.php to mysite.com.

    That’s all I want to do. Rewriting all internal links requires getting deep into the structure of my blog — not just the database content — and rooting out every reference to index.php. I’m not going to do that.

    So, here’s what I have: A 301 redirect presently sends visitors from mysite/index.php to mysite.com.

    The sitemap generator now creates two sets of links. For consistency’s sake, I’d like to filter out the new ones and leave the index.php set. Unfortunately, those urls are all titled "301 Moved." Bad form!

    What should I do?

    Thanks.

  4. slobjones says:

    On consideration, I realized that I’d better finish the job and remove all references to index.php from my site.

    And so I did.

    The SG, however, still generates 301 redirects and urls with index.php!

    I found it’s easy enough to isolate the extra urls with the row filter and delete them. The extra urls, however, effectively double the time it takes to generate a sitemap.

    That’s my report!

  5. AMPC says:

    If the sitemap generator is finding the index.php, then something is referencing it by name – there is no other way it can happen.

    In the sitemap generator under Sitemap, look for the link with index.php, follow it to column In-l and click the + in that column; this expands and shows you all the urls (webpages) that have a link with index.php in it. From there, it will be easy to fix.

    Also, not that you want to do this as the solution (that would mean index.php reference is still out there) but you can use the exclude section to ignore all index.php references.

    Best regards,

    Jim.

Speak Your Mind