firewall Spyware removal internet speed test web security service free software and tips


Go Back   Web Security > Our Free Tools > Webmaster Tool - Sitemap Generator
FAQ Members List Search Today's Posts Mark Forums Read

Webmaster Tool - Sitemap Generator Questions about building a sitemap, including the XML Sitemap, HTML sitemap, Google and Yahoo! Sitemaps.

Advertisements

Stripping ?... parameters such as session id

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 03-23-2007,
Junior Member
 
Join Date: Mar 2007
Posts: 6
Default Stripping ?... parameters such as session id

I'm getting multiple results for some pages, with different select, sort, and/or session id parameters.

Example:

2550 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=date
2551 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=state
2552 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=location
2553 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=course
2554 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=sponsor
2555 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=cost

Is there a way to get the tool to ignore anything after a question mark? And thus combine entries like the five above into a single one?

Thanks

Jim
Reply With Quote
  #2 (permalink)  
Old 03-24-2007,
Junior Member
 
Join Date: Mar 2007
Posts: 6
Default Solved using exclusion rules

Ok, I found a way to do this with exclusions. It would be nice if this was an option in the future though - a checkbox on the Settings Page perhaps.

I used a few different ? exclusions:

*php?*
*/?PHP*
*/?s*

This seems to work ok. There is still one version of the pages in question without any parameters. I was afraid it might eliminate the page entirely but it didn't seem to.

As far as I can tell it didn't eliminate anything else that I didn't intend to eliminate. Its a little hard to tell with so many pages to skim through, but it looks ok.

Passing paramters after a ? seems very common and it's been an issue with other website tools I use too. Which is why a checkbox option would be nice to have.
Reply With Quote
  #3 (permalink)  
Old 03-24-2007,
AMPC's Avatar
Administrator
 
Join Date: Jan 2007
Posts: 1,415
Default Sitemap Exclusions

Hello Jim,

I am building a list of improvements for the next version, so I would like to learn more about your request for sitemap exclusions.

You mentioned that an exclusions checkbox would be nice, can you explain that in great details? What exactly would you like to see in the next release?

Regards,

Jim.
Reply With Quote
  #4 (permalink)  
Old 03-25-2007,
Junior Member
 
Join Date: Mar 2007
Posts: 6
Default

I'm thinking that if a checkbox was checked then anything after a ? in a url would be ignored. So all the ones listed above would then be the same and listed as one page.

I don't think a ? should appear in a url unless it's before parameters such as sort keys, etc. Although I'm not 100% certain.

Right now I'm excluding the urls with the ? in them and counting on at least one version of the page showing up without the ?. I think it would be more reliably thorough if these pages were not excluded, but condensed into a single page when all before the ? is the same. This way it shouldn't miss a page that only shows up with various parameters, and for some reason doesn't show up at all without a ?.

Hopefully that makes sense.

I've seen similar behavior (listing multiple variations) in an old version of linkscan I use (great for checking links, doesnt do sitemaps and isn't free so my version is very old now).

The urls above are the result of php code on the page to select and sort info from a database listing. The other exclusion rules come from scripts for reciprocal links and for something else, these are larger more complicated scripts I installed but didn't write.

Jim
Reply With Quote
  #5 (permalink)  
Old 10-29-2007,
Junior Member
 
Join Date: Oct 2007
Posts: 4
Default

an edit excluded parameter list would solve the problem quite simply.

standard start of parameter is ? and standard join is & always ending in = then the variable after that.

so with that in mind if you can add an edit parameter list where you enter the value/word that is between ? and = and it will be excluded from the sitemap.

The problem with the above solution would be ecommerce and csm websites as these use the parameter ?page=newproduct "which ever page it is" all these pages that make up the majority of the pages in these sites would be excluded. Now I am not sure if you need them in your sitemap but if you do then that would be a big problem and there may be similar pronplems on other types of sites.

hope that helps
Reply With Quote
  #6 (permalink)  
Old 10-30-2007,
AMPC's Avatar
Administrator
 
Join Date: Jan 2007
Posts: 1,415
Default Suggestions on Top.

Thanks for the suggestions, I'll stick this on top!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off



All times are GMT -5. The time now is .


Powered by a CPU
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.3.2 © 2009, Crawlability, Inc.