firewall Spyware removal internet speed test web security service free software and tips


Go Back   Web Security > Our Free Tools > Webmaster Tool - Sitemap Generator
FAQ Members List Search Today's Posts Mark Forums Read

Webmaster Tool - Sitemap Generator Questions about building a sitemap, including the XML Sitemap, HTML sitemap, Google and Yahoo! Sitemaps.

Advertisements

Sitemap terminates

Closed Thread
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 01-30-2007,
Junior Member
 
Join Date: Jan 2007
Posts: 6
Default Sitemap terminates

Hello,

I tried to generate a sitemap for google several times but it always terminates after app. 18000 queued locations. I have changed the java settings to 512
  #2 (permalink)  
Old 01-30-2007,
AMPC's Avatar
Administrator
 
Join Date: Jan 2007
Posts: 1,415
Default XML Sitemap

The fix is easy, I just need to know which version of the sitemap generator are you using? If you have increased the memory to 500+, you should be able to crawl a very large site without any problems.

What OS are you running and what was the exact parameter you used to increase the memory? Also, what verison of the sitemap generator are you currently using?

If you like, I have a new web master tool that builds XML sitemaps and will soon replace our current sitemap generator. I'll PM you the link if you're interested.

Regards,

Jim.
  #3 (permalink)  
Old 01-30-2007,
Junior Member
 
Join Date: Jan 2007
Posts: 6
Default

I use windows xp prof sp2, 2gb mem.

Used the sitegenerator from Google Sitemap Generator - Free Site Map Builder, Create EASY

I tried to generate on my brandnew laptop (XP) now as well. Got a lot further.
Had 17000 processed and 18000 queued. (we have a large site). Than it terminated, but different, the programm just fanished. 3rd try, now just the status is terminated.

Have set java at 512 on both machines. used Xmx512m followed the instructions on the site. Need it to be more? Was unsure because the instruction was
Quote:
You can take it a step further when building a sitemap (if you're still having problems) and enter '-Xmx512m'.
Quote:
-Xmx512m
as command caused java error messages, so i used
Quote:
Xmx512m
Update: Tried 756. Did not help. Terminated again. now at 8405 processed and 19784 queued. Did a complete restart before trying.

Please send me a link to your new release.

By the way: our site Over Dieren, de informatiebron voor dierenbezitters

Thx

Last edited by dutchblues; 01-30-2007 at .
  #4 (permalink)  
Old 01-31-2007,
AMPC's Avatar
Administrator
 
Join Date: Jan 2007
Posts: 1,415
Default -Xmx256m Xmx512m Xmx768m Xmx1024m

The tool you are using is the old - I'll pm you the new webmaster tool details and you can start from there.

When you run the new tool, the first thing you should do is visit the system information tab and record the Free, Total and Max Memory. Max memory should be something like 508m if you entered -Xmx512m.

If you don't see that, then try -Xmx256m. If that still isn't working, try -Xmx768m and try it.

Each time you make changes, you will have to close your browser and the java icon on your screen should disappear; if it doesn't, then reboot.

Each time you make changes, record the memory settings.

Let me know the results and we'll go from there.

Regards,

Jim.
  #5 (permalink)  
Old 02-01-2007,
Junior Member
 
Join Date: Jan 2007
Posts: 6
Thumbs up Lots of java problems

Hi Jim,

Your new tools is awesome!!!!!!!!!!!

But..... I have enormous Java problems. If I used -Xmx512M I got error messages. Tried everything but could not solve the problem till I found on the java forum this post Several Java Virtual Machines running in the same process caused an error

Quote:
OK, get this one....we tried EVERYTHING...uninstalled/installed at least four times. The last time after seeing the same error message I had the user adjust his -Xmx setting down from 512M to 256M. THAT DID IT!

For some reason, on some machines running IE on Windows XP SP2, the -Xmx512M setting isn't very well liked!
That solved my problems too!
  #6 (permalink)  
Old 02-02-2007,
Junior Member
 
Join Date: Jan 2007
Posts: 6
Default

Here is a update:

Java problems were solved with -Xmx256M.

But .... The crawler still didn't make it till the end. After 34421 processed locations of 41321 queued locations it terminates due to shortage of memory.

I know, our website is very large, it is about pets and we have lots of photo albums, articles, a forum and lots of oher info.

I tried -Xmx512M, -Xmx768M and -Xmx1024M but java wouldn't work with these settings. I tried them on 4 different pc's. All with XP sp2. My PC has 2 gb ram. Any suggestions?
  #7 (permalink)  
Old 02-02-2007,
AMPC's Avatar
Administrator
 
Join Date: Jan 2007
Posts: 1,415
Default Sitemap stops working.

A few questions:

What is your browser and version.

What version of Sun's Java are you using?

What are the first three memory settings in the system window of my webmaster tool? Free, Total and Max.

I'll run your site on my end and see if anything is strange.

Regards,

Jim.
  #8 (permalink)  
Old 02-02-2007,
Junior Member
 
Join Date: Jan 2007
Posts: 6
Default

What is your browser and version. IE 6.

What version of Sun's Java are you using? 1.5.0

What are the first three memory settings in the system window of my webmaster tool? Free, Total and Max.
Memory*usage
Free*memory
10,9*M
Total*memory
27,2*M
Max*memory
254*M

I'll run your site on my end and see if anything is strange.
My profider is complaining about a very high server load.

Regards,
  #9 (permalink)  
Old 02-02-2007,
AMPC's Avatar
Administrator
 
Join Date: Jan 2007
Posts: 1,415
Lightbulb Web Tool Blocked by ISP

I ran the webmaster tool on your site and finally figured out the problem! It was staring me right in the face and just didn’t see it.

I have spidered massive sites with a 250 max memory setting without any problems, but this time, you site stopped around 12,000 pages.

Let me tell you about an incident that caused me to drop from the search engines. It was only with the help of Google that I found and, it’s it is directly related to your situation.

I had a site that dropped from the search engine (at least google) and though maybe I did something wrong or perhaps there were errors I wasn’t aware of. Long story short, I did a lot of testing and finally used Google sitemaps on the site.

The statistics section of Google sitemaps showed me the problem – my ISP was blocking the GoogleBot (it’s used to find your web pages and add them to the search engine).

Another long story short, the ISP admitted that they did this and said that it was a feature! Some feature, right? Their argument was they were trying to prevent the server from crashing because of too many requests. They don’t do that anymore….

In your case, your ISP or something, blocked me after about 10,000 page requests. This block caused my webmaster tool to report that pages were not loading, no to mention a serious decrease in speed (page after page errored out).

I verified this by opening up your site in a browser and trying to navigate (visit pages). I couldn’t bring up a thing. I then used another computer with a different IP address and could browse your site.

I am not saying that your ISP is blocking the Googlebot, but it’s blocking, no question. Some ISP’s do this to prevent DOS (Denial of Service) attacks. I’m guessing your ISP saw my IP spidering your site and it was too fast, so they blocked it; however, I would make sure they are not blocking the Googlebot by using the Google sitemap tool to look at your crawl statistics, sign up here: https://www.google.com/webmasters/to.../en/about.html

My tool has a number of features to prevent this. You can set the request delay by clicking on the crawler tab. This is set at 0, you can set it higher so that there is a time delay between requests. Note: the higher the number, the longer it will take to spider an entire site.

My webmaster tool also allows you to change the User Agent. The user agent is what browsers share with servers. These show up in the log files as Internet Explorer, Firefox, Opera, etc. When Google visits, it shows up as the Googlebot and looks like this in your log files:
Googlebot/2.X (+http://www.googlebot.com/bot.html)

My webmaster tool shows up as ‘AuditMyPC Webmaster Tool’. You can change this with the drop down box, or enter your own text! Yup, you could even have it say that you are the googlebot!

Why would you fake that you’re the Googlebot? If your ISP is blocking your attempt to spider your own site, it may be looking at the user agent and not the IP. If your ISP is good, it will NOT block the Googlebot, and by saying that you’re the googlebot, you would not be blocked. Actually, if it’s really good, it will compare the user agent with the IP address and verify that it is or is not the Googlebot, Yahoo or MSN.


OK – that was a long post!

So, I believe your ISP is blocking your attempt to spider your site. Test it as mentioned above, then change the request rate or leave it 0 and enter Googlebot/2.X (+http://www.googlebot.com/bot.html) as the user agent. Note: if you are already blocked, it’s probably a time delay and you’ll have to wait an hour or so, or use another IP – OR, change your own IP address. If you need help doing this, create another post just for this.

Also, check the exclude images option AND enter this exclude filter:
*/images/*
Before you run the tool.
  #10 (permalink)  
Old 02-02-2007,
AMPC's Avatar
Administrator
 
Join Date: Jan 2007
Posts: 1,415
Default IE and Java Memory Xmx512m setting bug

Oh, forget to mention that IE is picky and sometimes flakes out with the memory settings on Java. If you want more memory (254 should be fine), then install firefox and run the tool through that browser.

Let me know the results, I want to see a successful solution for your site.

Regards,

Jim.
Closed Thread


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off



All times are GMT -5. The time now is .


Powered by a CPU
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.3.2 © 2009, Crawlability, Inc.