I’m getting multiple results for some pages, with different select, sort, and/or session id parameters.
Example:
2550 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=date
2551 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=state
2552 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=location
2553 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=course
2554 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=sponsor
2555 http://www.avalanche-center.org/Education/Courses/NewMexico.php?tfm_order=ASC&tfm_orderby=cost
Is there a way to get the tool to ignore anything after a question mark? And thus combine entries like the five above into a single one?
Thanks
Jim

Ok, I found a way to do this with exclusions. It would be nice if this was an option in the future though – a checkbox on the Settings Page perhaps.
I used a few different ? exclusions:
*php?*
*/?PHP*
*/?s*
This seems to work ok. There is still one version of the pages in question without any parameters. I was afraid it might eliminate the page entirely but it didn’t seem to.
As far as I can tell it didn’t eliminate anything else that I didn’t intend to eliminate. Its a little hard to tell with so many pages to skim through, but it looks ok.
Passing paramters after a ? seems very common and it’s been an issue with other website tools I use too. Which is why a checkbox option would be nice to have.
Hello Jim,
I am building a list of improvements for the next version, so I would like to learn more about your request for sitemap exclusions.
You mentioned that an exclusions checkbox would be nice, can you explain that in great details? What exactly would you like to see in the next release?
Regards,
Jim.
I’m thinking that if a checkbox was checked then anything after a ? in a url would be ignored. So all the ones listed above would then be the same and listed as one page.
I don’t think a ? should appear in a url unless it’s before parameters such as sort keys, etc. Although I’m not 100% certain.
Right now I’m excluding the urls with the ? in them and counting on at least one version of the page showing up without the ?. I think it would be more reliably thorough if these pages were not excluded, but condensed into a single page when all before the ? is the same. This way it shouldn’t miss a page that only shows up with various parameters, and for some reason doesn’t show up at all without a ?.
Hopefully that makes sense.
I’ve seen similar behavior (listing multiple variations) in an old version of linkscan I use (great for checking links, doesnt do sitemaps and isn’t free so my version is very old now).
The urls above are the result of php code on the page to select and sort info from a database listing. The other exclusion rules come from scripts for reciprocal links and for something else, these are larger more complicated scripts I installed but didn’t write.
Jim
an edit excluded parameter list would solve the problem quite simply.
standard start of parameter is ? and standard join is & always ending in = then the variable after that.
so with that in mind if you can add an edit parameter list where you enter the value/word that is between ? and = and it will be excluded from the sitemap.
The problem with the above solution would be ecommerce and csm websites as these use the parameter ?page=newproduct "which ever page it is" all these pages that make up the majority of the pages in these sites would be excluded. Now I am not sure if you need them in your sitemap but if you do then that would be a big problem and there may be similar pronplems on other types of sites.
hope that helps
Thanks for the suggestions, I’ll stick this on top!