DISQUS

DISQUS Hello! Elliott Back's Blog is using DISQUS, a powerful comment system, to manage its comments. Learn more.

Community Page

Jump to original thread »
Author

DOS v.s. Index Retrieval

Started by elliottback · 8 months ago

No excerpt available. Jump to website »

2 comments

  • Elliot,

    You make some interesting observations, and I wanted to respond.

    First of all, my bad for using the word "hacker". The word has various connotations, both positive and negative, none of which is really appropriate for this situation. After all, some of my best friends are hackers. Thanks for pointing this out. I updated my post accordingly.

    On the question of people trying to mine our databases, it's really a question of our Terms of Service. Our service is intended for non-commercial use by individuals. Like most public search engines, our service is advertising-sponsored. When someone uses our public-facing service in some other way, they are violating our Terms of Service. I'm quite sure that any other public search engine would view this issue very similarly.

    As for being open, we really try to be. We have a full-featured API that hundreds of developers have used to build some very useful applications. We have given snapshots of our data to academic and corporate researchers, in order to produce greater value from it. Use of our API and data is free for non-commercial use. We are open to commercial relationships as well, but those need to be negotiated upfront. Simply put, if someone wants our data, they should just ask us.

    On the question of our ability to handle query volume, let me say that we work hard every day, not to mention spending a lot of money every month, to ensure we have adequate capacity to deliver good performance to our users.

    In this case, it was not the query volume that gave us trouble. Rather, these data-mining programs have the effect of subverting our caches. Like most high-performance, high-volume services, we build caches based on expected service usage. These programs fall well outside expected usage patterns, and that's what gave us trouble.

    I hope this helps you understand our position a little better. Thanks for taking the time to write about us. As always, we really appreciate the input.

    Adam Hertz
    Vice President of Engineering
    Technorati, Inc.
  • Thanks Adam for the response. It certainly clarifies a lot of what you were thinking at the time you made that post, and hearing about the cache subversion is interesting. I wonder what kind of usage pattern they must have been using to cause cache subversion--probably very high-speed mirroring. It's one thing to get slowly crawled, it's another to get mirrored.

Add New Comment

Returning? Login