After several months of hard work, we’re finally ready to launch our new search engine. It still has a few bugs, but all in all, we’re pretty happy with it. You can take the beta for a test drive by visiting GregBoser.com

OK, we didn’t really build that engine. We’re just borrowing it to make a point.

In the Gokart thread, Gray Wolf asked what a webmaster could do to protect their site from dns shenanigans. I was working on an answer and I was trying to find a good example of a site who’s server was setup to deflect domains/host names that they do not control. I figured Google would be the best example, so I pointed gregboser.com at a Google IP so I could show everyone how Google denies the request.

Unfortunately, that didn’t really work out, so lets take a look at how Yahoo handles it instead.

If you point a host name like yahoo-test.webguerrilla.com at Yahoo’s IP, or even if you try to access the site by the IP address, you don’t get Yahoo’s home page. Instead, you get what looks like a custom 404 page that contains a noindex, follow robots meta tag, and a 60 second meta refresh that sends you to www.yahoo.com.

The links embedded in the page also use absolute urls, so if a bot originally requests the wrong domain or host name, they will still be able to extract good urls to follow. For the most part, it’s a pretty good setup. The only potential downside I see is the fact that they serve an http status code of 200 with the custom 404. This is obviously done so that the appropriate links can be followed, but I’m not sure if a 200 would prevent Google from consolidating the backlinks of two separate domains.

Here is how we handle it on this site:

RewriteEngine on

rewriteCond %{HTTP_HOST} !webguerrilla\.com
rewriteCond %{HTTP_HOST} webguerrilla\.net
rewriteRule (.*) http://www.webguerrilla.com/$1 [R=301,L]

rewriteCond %{HTTP_HOST} !webguerrilla\.com
rewriteCond %{HTTP_HOST} !webguerrilla\.net
rewriteRule (.*) http://www.fark.com/ [R=301,L]

Any domains we control will be redirected via 301 to www.webguerrilla.com. The last entry is a wildcard that redirects anything (including a request for the IP) other than what we have defined away from the site.

Here is an example of a domain that is pointing to the IP, but is handled by the wildcard entry:

www.webguerrilla.org

Now, we could send any wildcards to www.webguerrilla.com using a 301, but doing that won’t prevent the anchor text problem that we covered in the gokart thread. That’s because Google also merges backlinks through 301s.

There are many variations of this approach that will work. And none of them are extremely complicated. IMO, it’s definitely worth it to invest a little bit of time to set it up.

(Maybe I should send some code over to the plex??)

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google

Comments

3 Responses to “ We’ve Launched Our Own Search Engine ”

  1. phantombookman on August 26th, 2006 2:51 am

    I finally registered !

    Maybe I should send some code over to the plex??

    Greg,
    Send a transcript over to Bruce Clay as well no doubt they need something for today’s blog entry!

    Posts like this sort the wheat from the chaff in terms of SEO type blogs

  2. pageoneresults on August 27th, 2006 9:42 am

    Genius at work.

  3. notsleepy on August 27th, 2006 5:32 pm

    Great additions to my .htaccess template. Thanks!

Got something to say?