Is there a way for a program to know that two urls point to the same page if the urls are slightly different?
I have an application that is pulling in two RSS feeds and eliminating the duplicate entries. Many duplicate RSS entries have different text but they point to the same landing page. The application compares the urls of the landing pages and if they are identical, removes one.
Some landing pages use some text on the end of the url to signify the source (eg, …src=site1 or …source=site2)
A real-world example would be these two links:
(Yahoo shortens the links so you might have to click on them to see the end part of the urls)
URL 1: http://www.computerjobs.com/job_display.aspx?jobid=2506539″
URL2: http://www.computerjobs.com/job_display.aspx?jobid=2506539&utm_source=job_site&utm_medium=organic&utm_campaign=job_site”
Is there a way I can program my application to know that these two links point to the same page? It has to work for other sites as well, not just this one which uses “&utm_source=job_site&utm_medium=organic&utm_campaign=job_site ” after their url to define the source.
Thanks for the help!
Internet marketing course
Related Posts
- Why can't the yahoo page builder link show up in that box?
- How to get rid of unwanted advertising on my home page?
- On facebook, how do you make only your friends posts show on your wall?
- How can I have my home page open up to a random page?
- On a good day, would a Cessna make it from Page, Arizona to Prescott, Arizona without refueling?
- If keywords are used that get a site a first page ranking, about what percent of surfers will become visitors?
- Has anyone had success landing a legit job through Craigslist?
- Creating a Twitter Feed with select users to post on a blog?
- landing pages?
- Anyone know of a good membership referral php script?





April 16th, 2009 at 1:47 am
Blogging Workshop
To what level of detail does “same page” mean? Is it the job_display.aspx page? or does it include the jobid (after the ?jobid=2506539)
If only the page, then parse the URL up to the question mark, and do your compare at that point.
If you want it to include the job ID, then parse the URL up to the first &, if one exists, which are your additional arguments that’s passed to the page in addition to the job id.
Does any of this help?
April 19th, 2009 at 1:07 am
Article spinner
Essentially, no. All a comparison can do is check the characters. A page can only be linked to directly from the Web by a single string of characters (URL). It can, however, be linked to by another URL redirecting to it, but that’s not detectable from your end.
Sorry.