links not followed

Hi,

I have stripped the index.html from all links on my site, because google reported that I have duplicate titles. Google doesn’t consider the link with and without the index.html to be the same anymore.

Now I have the problem that your sitemap generator doesn’t follow links without filename extension. What Now?

Jos

Comments

  1. AMPC says:

    That is incorrect – the sitemap generator does follow links to files without extensions. I have a large number of sites that do not use extensions on the filenames and sitemaps are created correctly and without problem.

    You’ll need to provide a url if you want me to explain where you went worng – I’m confident that something else on your site is wrong and if you don’t fix it, it could hurt your ranking.

    You can send me a private message if you would like to keep the url to yourself.

    Regards,

    Jim.

  2. bolognese says:

    Hi Jim,

    Thanks for your reply.

    About a year ago I noticed that the sitemap creator did not follow links without filename, so I added them in the menu on some entry pages.

    Now google seems to have changed its crawl policy, and starts complaining about two urls that contain the same description, title etc. Google meant for example vanderburgt. eu/croatia/ and the same link + index.shtml, which is what I had to add to make the sitemap creator follow all links. Somehow I was not really carefully in using links, but now I am going to be. I have found some tips on rewriting the url containing the file name to the link without.

    With the sitemap creator I get no more than 2050 links, while google has increased the number of indexed links up to 2253 since a couple of weeks (was only 1300)

    I am looking forward to your answer.

    Jos

  3. AMPC says:

    Send me the url of the site and I can take a look. I use this on my sites that don’t have filenames without any problems at all.

    You can send me a private message (click on my ID) if you want it private.

    Best regards,

    Jim.

  4. bolognese says:

    Strange, I did mention the url of my site in my previous message, but now it’s unreadable. You asked me to give you the url, so that’s what I did. Of course I don’t care about mentioning my site’s name here. It is of course vanderburgt.eu.

    Jos

  5. bolognese says:

    You still haven’t solved the problem, Jim. As you cannot read the link of my homepage from my previous messages: ht tp: //vanderburgt.eu

    Jos

  6. AMPC says:

    Hello JOs,

    Both the posts above have your url in them and I was able to follow them. I ran the sitemap generator on your site and it found far more than 2050 links. I didn’t filter anything, so perhaps that may have inflated the amount (images, style sheets, etc).

    Everything looks fine to me. What exactly is the issue you are concerned with?

    Regards,

    Jim.

  7. bolognese says:

    Jim,

    I just added an album to my website and started to make a new sitemap. I only needed the url of the new html links.

    In the field url I enter ht tp: //vanderburgt.eu

    Filter is *html.

    Exclude images is checked.

    I get 2351 urls in the sitemap, but absolutely no albums included on the carnaval page which is accessible from the link "/carnaval/"
    in the top menu. This is the only referrer to the carnaval page. Many other pages are found through backlinks because on the homepage there are some links that include a filename – except carnaval/index.html, which only has "/carnaval/" as referrer anywhere on my site.

    Jos

  8. bolognese says:

    Hi Jim,

    I created a sitemap with and without index.html added to the link /carnaval/ on my homepage and it gives me both the same number of found pages, so it’s true that according to your sitemap generator it doesn’t matter if the link has or hasn’t the extension index.html

    But what causes this link not being followed while any other link is followed
    Jos

  9. bolognese says:

    Is there anybody else that shine a light on this?

    Jos

  10. AMPC says:

    Hi Jos,

    I just ran your site (not full, just enough to monitor the spidering process) and used an exclude of .html and I’m seeing entries like these:

    /carnaval/klooneconcours2008/
    /carnaval/groteoptochtkerkradecentrum2008/
    ht tp: //vanderburgt.eu/carnaval/mijncarnavalinkerkradedoordejarenheen/

    This is what you want, no?

    Perhaps I’m missing something, but the sitemap generator is seeing your carnival directories. Is this not what you want?

    Regards,

    Jim.

  11. bolognese says:

    Now I see what my problem is. By using an include filter of *html no link is followed that only has the domain + directory in it, so without the filename.html.

    One more question is, that I do still get image links (ending with .jpg) in the results if I tick the box "exclude images".

    Thanks for your help.

    Jos

  12. AMPC says:

    Hi Jos,

    That question deserves a thread of its own, so follow this link on excluding images from your sitemap.

    Regards,

    Jim.

Speak Your Mind