![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
|||||||
| FAQ | Members List | Search | Today's Posts | Mark Forums Read |
| Webmaster Tool - Sitemap Generator Questions about building a sitemap, including the XML Sitemap, HTML sitemap, Google and Yahoo! Sitemaps. |
|
|
|
![]() |
|
|
LinkBack | Thread Tools | Display Modes |
|
|||
|
I use windows xp prof sp2, 2gb mem.
Used the sitegenerator from Google Sitemap Generator - Free Site Map Builder, Create EASY I tried to generate on my brandnew laptop (XP) now as well. Got a lot further. Had 17000 processed and 18000 queued. (we have a large site). Than it terminated, but different, the programm just fanished. 3rd try, now just the status is terminated. Have set java at 512 on both machines. used Xmx512m followed the instructions on the site. Need it to be more? Was unsure because the instruction was Quote:
Quote:
Quote:
Please send me a link to your new release. By the way: our site Over Dieren, de informatiebron voor dierenbezitters Thx Last edited by dutchblues; 01-30-2007 at . |
|
||||
|
The tool you are using is the old - I'll pm you the new webmaster tool details and you can start from there.
When you run the new tool, the first thing you should do is visit the system information tab and record the Free, Total and Max Memory. Max memory should be something like 508m if you entered -Xmx512m. If you don't see that, then try -Xmx256m. If that still isn't working, try -Xmx768m and try it. Each time you make changes, you will have to close your browser and the java icon on your screen should disappear; if it doesn't, then reboot. Each time you make changes, record the memory settings. Let me know the results and we'll go from there. Regards, Jim. |
|
|||
|
Hi Jim,
Your new tools is awesome!!!!!!!!!!! But..... I have enormous Java problems. If I used -Xmx512M I got error messages. Tried everything but could not solve the problem till I found on the java forum this post Several Java Virtual Machines running in the same process caused an error Quote:
|
|
|||
|
Here is a update:
Java problems were solved with -Xmx256M. But .... The crawler still didn't make it till the end. After 34421 processed locations of 41321 queued locations it terminates due to shortage of memory. I know, our website is very large, it is about pets and we have lots of photo albums, articles, a forum and lots of oher info. I tried -Xmx512M, -Xmx768M and -Xmx1024M but java wouldn't work with these settings. I tried them on 4 different pc's. All with XP sp2. My PC has 2 gb ram. Any suggestions? |
|
||||
|
A few questions:
What is your browser and version. What version of Sun's Java are you using? What are the first three memory settings in the system window of my webmaster tool? Free, Total and Max. I'll run your site on my end and see if anything is strange. Regards, Jim. |
|
|||
|
What is your browser and version. IE 6.
What version of Sun's Java are you using? 1.5.0 What are the first three memory settings in the system window of my webmaster tool? Free, Total and Max. Memory*usage Free*memory 10,9*M Total*memory 27,2*M Max*memory 254*M I'll run your site on my end and see if anything is strange. My profider is complaining about a very high server load. Regards, |
|
||||
|
I ran the webmaster tool on your site and finally figured out the problem! It was staring me right in the face and just didn’t see it.
I have spidered massive sites with a 250 max memory setting without any problems, but this time, you site stopped around 12,000 pages. Let me tell you about an incident that caused me to drop from the search engines. It was only with the help of Google that I found and, it’s it is directly related to your situation. I had a site that dropped from the search engine (at least google) and though maybe I did something wrong or perhaps there were errors I wasn’t aware of. Long story short, I did a lot of testing and finally used Google sitemaps on the site. The statistics section of Google sitemaps showed me the problem – my ISP was blocking the GoogleBot (it’s used to find your web pages and add them to the search engine). Another long story short, the ISP admitted that they did this and said that it was a feature! Some feature, right? Their argument was they were trying to prevent the server from crashing because of too many requests. They don’t do that anymore…. In your case, your ISP or something, blocked me after about 10,000 page requests. This block caused my webmaster tool to report that pages were not loading, no to mention a serious decrease in speed (page after page errored out). I verified this by opening up your site in a browser and trying to navigate (visit pages). I couldn’t bring up a thing. I then used another computer with a different IP address and could browse your site. I am not saying that your ISP is blocking the Googlebot, but it’s blocking, no question. Some ISP’s do this to prevent DOS (Denial of Service) attacks. I’m guessing your ISP saw my IP spidering your site and it was too fast, so they blocked it; however, I would make sure they are not blocking the Googlebot by using the Google sitemap tool to look at your crawl statistics, sign up here: https://www.google.com/webmasters/to.../en/about.html My tool has a number of features to prevent this. You can set the request delay by clicking on the crawler tab. This is set at 0, you can set it higher so that there is a time delay between requests. Note: the higher the number, the longer it will take to spider an entire site. My webmaster tool also allows you to change the User Agent. The user agent is what browsers share with servers. These show up in the log files as Internet Explorer, Firefox, Opera, etc. When Google visits, it shows up as the Googlebot and looks like this in your log files: Googlebot/2.X (+http://www.googlebot.com/bot.html) My webmaster tool shows up as ‘AuditMyPC Webmaster Tool’. You can change this with the drop down box, or enter your own text! Yup, you could even have it say that you are the googlebot! Why would you fake that you’re the Googlebot? If your ISP is blocking your attempt to spider your own site, it may be looking at the user agent and not the IP. If your ISP is good, it will NOT block the Googlebot, and by saying that you’re the googlebot, you would not be blocked. Actually, if it’s really good, it will compare the user agent with the IP address and verify that it is or is not the Googlebot, Yahoo or MSN. OK – that was a long post! So, I believe your ISP is blocking your attempt to spider your site. Test it as mentioned above, then change the request rate or leave it 0 and enter Googlebot/2.X (+http://www.googlebot.com/bot.html) as the user agent. Note: if you are already blocked, it’s probably a time delay and you’ll have to wait an hour or so, or use another IP – OR, change your own IP address. If you need help doing this, create another post just for this. Also, check the exclude images option AND enter this exclude filter: */images/* Before you run the tool. |
|
||||
|
Oh, forget to mention that IE is picky and sometimes flakes out with the memory settings on Java. If you want more memory (254 should be fine), then install firefox and run the tool through that browser.
Let me know the results, I want to see a successful solution for your site. Regards, Jim. |
![]() |
| Thread Tools | |
| Display Modes | |
|
|