DISQUS

DISQUS Hello! Elliott Back's Blog is using DISQUS, a powerful comment system, to manage its comments. Learn more.

Community Page

Jump to original thread »
Author

Latent Semantic Indexing can improve your WordPress search results — Elliott C. Back

Started by elliottback · 8 months ago

No excerpt available. Jump to website »

6 comments

  • Try:

    http://www.semiologic.com/projects/search-reloa...

    when it is available to increase the relevance of your search results
  • Good info on LSI. Thanks.

    Am an aspiring search engine architect. Ave been building a Presale Marketplace engine for my part of the world (E. Africa). It will be a search engine leading prospects to quality products, services, places etc. Twill be the first here. I want to give it some Artificial Intelligence.

    Ave worked extensively with Linux|Apache|PHP|MySQL and do hope to launch my solution on this paltform. Ave recently stumbled on LSI/LSA and became very interested. I have gone through a lot of sites and documents on LSI.

    Problem is, I really can't find any straight path from where I am now to augmenting LSI on my choice patform - PHP|MySQL. Where do I go from here? Please help
  • LSI is a patented technique, and is difficult to implement properly. Don't even try it unless you read the appropriate academic papers first. I suggest http://scholar.google.com as a first resource. Even google doesn't use LSI, at least yet, because of its computational complexity.
  • I studied LSI in my Masters and I am, too, looking for a commercial implementation of it.

    One problem with LSI (based on the proposed implementation in the textbook) is that the indexing is not continuous. You can add an entry in your database and have it indexed just by itself. You have to re-index the entire database, which is pain in the butt. And it might also cause the search results to change significantly from one build to another. So, Edward, if you can find a solution to this (making the indexing continuous), then you will be the next Bill Gates.

    For some applications this may not be a problem. But in general, this is bad.

    Another problem with LSI is the algorithm that finds the "nearest neighbors". I don't think people have a good solution to that yet. But in terms of searching, the time and space complexity is not so much of a concern.

    For Google, however, is the immense number of pages and keywords they have to index. The complexity go up at least at 2nd order polynomial rate. We may not have enough atoms in the universe to store all that information.
  • Typo in my previous message: I mean you "cannot" index one entry by itself.. =)
  • There are actually continuous models. You can recompute an approximate SVD of the Term-document matrix as new stuff comes in without too much work, it just won't be as accurate...

Add New Comment

Returning? Login