Advanced
Home    About    Contact    Services    Sitemap
eCommerce Marketing & Optimization

Froogle Remnants & Google Product Search/Directory Duplicate Content Issues

I have been researching some duplicate content topics lately for a friend/colleague of mine and while trying to find some examples for him I accidentally came across some interesting issues that Google is having on their own domain/subdomain with content being duplicated. I understand that massive sites (like Google) with as many different domains/subdomains and services as they have might be tough to manage. I imagine that their duplicate pages could get overwhelming to manage for anyone really quickly. Since they don’t seem to be aware of the issues I happened across (maybe more), I decided to write this post to help them out with a couple of the things I found (if they listen, I will consider posting more about other services).

Google Directory & Froogle Directory

In addition to the duplicate content issues I found, I also wanted to bring to light the many Froogle pages that still remain in Google’s index, despite being replaced by Google Product Search back in April 2007. Right now, if you type froogle.com into your address bar or even froogle.google.com, you’ll be redirected to the new product service at www.google.com/products. This is common, especially when changing domains or file locations, and it is good practice to use these types of 301 redirects, as Google stated about a year ago on the Webmaster Central blog.

Froogle Is Now Google Product Search

Many of you are already aware that Froogle used to be the product search site powered and operated by Google and that it has now been replaced by Product Search, however it appears that Google may be having a tough time getting older Froogle subdomain (froogle.google.com) pages out of their index (or updated), including the Shopping List pages that Google users have been creating since Product Search used to be Froogle.

Froogle.google.com Search Results

The screenshot above shows the number of English results still remaining in Google’s index using the search query site:froogle.google.com. It is important to remember that these are only English results, there are other foreign language pages on the Froogle subdomain, but I chose not to use them in my example as they may be using them in way that I am not aware of.

As I started sifting through the different results that are showing up, I noticed quite a few interesting things, including duplicate content pages for the Google Directory that show up under a froogle.google.com subdomain and many Google Shopping List pages still residing on froogle.google.com subdomain. I can understand that Google has yet to update the Shopping List pages, it’s not that popular of a service, but in my opinion, they should as it’s up to them to practice what they preach and set the bar for the rest of us. Rather than using the froogle.google.com/shoppinglist/ location I would 301 redirect those pages to www.google.com/products/shoppinglist, and more specifically either the www. or non-www. version of those pages, with or without the trailing slash (easy fix).

But what about the Google Directory results on the froogle.google.com subdomain? I can’t understand why these pages even exist as Froogle was never really part of Google Directory (powered by DMOZ). I was amazed to see that they are still indexed, but I didn’t think too much of it since I realize those pages may not have been crawled again recently, however when I clicked on one, I was not redirected, I was taken to a Google Directory results page on a froogle.google.com subdomain, weird. Take a peak below.

  1. First, go to http://froogle.google.com/Top/Shopping/
  2. Next, take away the froogle part and go to http://www.google.com/Top/Shopping/

What you see is duplicate content. In this instance it’s the exact same page, only they reside under different file locations. You typically see this with the www. version and non-www. version (which Google does right, try http://google.com/Top/Shopping/ ) rather than subdomains. The old pages (froogle) should be redirected to the existing Google Directory pages instead of both existing. Not only that, but many of the froogle.google.com pages have been crawled lately which tells me that this has most likely been overlooked by engineers (whoops).

As I began digging deeper I was stunned to see that not only are their froogle.google.com pages still in Google (quite a few of them) and half of them are duplicates of the Google Directory AND Open Directory AND anyone else who uses DMOZ data, but their are also duplicate pages for the MAIN Google Product Search page (www.google.com/products) still indexed. Take a peak below.

  1. First, go to http://froogle.google.com/products
  2. Next, go to http://froogle.google.com/index.html
  3. After that, take away the froogle part and go to http://www.google.com/products

They are all the same pages and the first two are considered duplicate content. Same as the Google Directory example, it’s mostly common with www. and non-www. versions of the domain (this time Google doesn’t do it right, try http://google.com/products and also try http://google.com/products/). Notice the trailing slash at the end of the last one. All of those versions should be 301 redirected back to the main version at www.google.com/products (with no trailing slash, or whatever they choose to use and stick with).

I can’t understand why they would let this go or overlook it. It seems like they are preaching a bunch of things they don’t actually practice across all of their services. Hopefully this post gets picked up by the right people at Google and they actually listen to some of this feedback, it’s not right to penalize sites for duplicate content, when they have the same issues themselves. You can see by searching for “products” that Google is number one and two in the results. The first one is the main page and the second is duplicate with a session variable appended to the end.

Article Information

View reader comments...
Like this? Subscribe to our RSS feed or email updates. It's free!
Our Subscribe page has even more ways, including our blog widget.
By eCopt on December 3, 2007, last modified December 3rd, 2007
Bookmarks: or use Permalink
Read Related Articles In: Comparison Shopping,Search Engines,eCommerce Marketing


Related Articles

Read more articles...

4 Reader Comments & Links

Add a new comment...

December 4, 2007 @ 10:00 am

Wow. Really cool post, eCopt.

I’m curious to see how Google responds. Lately, they’ve been repeatedly caught saying “do as I say and not as I do”

Please report back on this.

   Permalink   Latest   Recent   Most   Ratings
Rate Tim McGuiness:  
 

December 4, 2007 @ 5:28 pm

The Froogle product pages that are already serving a redirect are probably all Supplemental Results. Once a URL changes from serving content to being a redirect, it gets tagged as Supplemental and may hang around in the index for a year or more. Those URLs are not an issue if they are redirects.

The directory has been available at http://www.google.com/Top/ and at http://directory.google.com/Top/ for a long time, but I wasn’t aware of the additional copies that you have highlighted.

The ODP itself has recently sorted out their own domain canonicalization issues. Previously, the ODP site was available through more than 27 domain name and direct IP address variants under their own control.

   Permalink   Latest   Recent   Most   Ratings
Rate g1smd:  
 

December 4, 2007 @ 5:38 pm

@ g1smd – Thanks for commenting on this. I forgot to mention the directory.google.com pages, so thanks for adding that in. It gets to me because they do it on nearly all of the versions on nearly all of their urls/subdomains, just can’t understand why they take time to do it on some and not all (like trailing slashes and www and non-www versions and not certain subdomains, or vice versa).

By Froogle pages, do you mean the Shopping List pages or the ones that duplicate the Google Directory/Product Search? Their are no more results labeled as “supplemental,” they stopped doing that awhile ago. Either way, the pages are indexed and fall under Google’s own definition of duplicate content.

Also, wouldn’t you think they would use their own remove url feature (within Webmaster Central) for these pages as they are not the preferred versions? Like I said, I understand it can be really hard to find duplicate content and manage it all, but if you wrote the algo on it, it should be easier!! They do handle many of their pages well, however it is obvious that some do get overlooked.

   Permalink   Latest   Recent   Most   Ratings
Rate eCopt:  
 

December 8, 2007 @ 2:30 pm

@ Tim McGuiness – Tell me about it Tim, sometimes just thinking about it makes me cringe, but I do like being able to say, “Hey, you’re not practicing what you preach.”

I spoke with members of the Webmaster Central team at PubCon, they informed me that they had seen the blog post and that others who had seen it made them aware of the same issue, so I guess the message is getting across, although they haven’t fixed any of it yet.

I will continue to update the post as the issues get resolved or as soon as I notice some action, or lack of action being taken. Thanks for the continued support and comments!! Saw your MBL avatar finally got updated, nice!

   Permalink   Latest   Recent   Most   Ratings
Rate eCopt:  
 


TrackBack URI...

Add a Comment

Comments RSS feed for this post...

Name (required)