Searchify, Running the full open sourced IndexTank Search as a Service API
HoundSleuth, IndexTank Compatible API
IndexTanktoGO, IndexTank Compatible API
Bimaple, IndexTank Compatible API
IndexDen, IndexTank Compatible API
There's also my own http://websolr.com/ running Apache Solr. Some other Solr services are mentioned elsewhere on the page.
I've also recently launched http://bonsai.io/ for a hosted ElasticSearch service. Because ElasticSearch is actually quite awesome (and I'm happy to answer questions about why).
For Sphinx, there's Flying Sphinx (by Pat Allen of Thinking Sphinx Ruby client fame, great guy), and IndexDen (which is Sphinx, not IndexTank).
There's a lot to be said for ElasticSearch's data distribution. It does sharding and replication really well. That makes my life easier, as a service provider, as well as the life of anyone that has to manage and scale an ES cluster. Or who doesn't want to have to deal with client-side sharding or worry about how many servers can crash before their search goes down.
ElasticSearch has very little ceremony around creating a new index and getting started with using it. You will eventually need to do some configuration to tune its behavior for your specific application, but the learning curve is nice and gradual. This makes ES great for exploration.
The JSON document store aspect of ElasticSearch is indeed very nice. The RESTful API is simple enough that you don't really need a client, just grab your favorite HTTP client library and start integrating. Plus, coupled with solid distribution, you're looking at a pretty viable standalone data store, IMO.
Also, very good documentation. And its user/developer community is all full of the really smart, enthusiastic early-adopter types right now :)
Not least, Lucene itself is hands down the last word when it comes to search.
Last I checked, Sphinx had a huge design flaw in that it indexed directly from an SQL database. In other words, your Sphinx configuration not only needs to have read access to the database, it needs to contain the required SQL queries.
This tightly couples Sphinx to your application and your schema, and creates serious issues for your ops team since every app change potentially needs to modify the Sphinx config. It gets particularly hairy when you want to host multiple applications using a single Sphinx daemon.
We started out with Sphinx for our apps but quickly discarded it in favour of ElasticSearch, a much more elegant and orthogonal piece of software.
I always find sphinx limiting. For example, I can add a single doc to the index, I have to run a full re-index.
Also, I can't programmatically get a list of all "words" in the index with their frequency and the inverse dod freq, etc. With anything lucene based this kind of thing is really easy.
Why the bashing on Elasticsearch? We are using it to index log files; we have over 275 million documents in our index and performance has been pretty impressive.
We're running on five EC2 instances, each instance is running Elasticsearch configured to use 25GB of RAM. With the current data set we might be able to get by with less RAM, we're still in the process of figuring out what works best for us.
Yes, but the billing is hourly based as usual:
You'll be billed based on the number of running search instances. There are three search instance sizes (Small, Large, and Extra Large) at prices ranging from $0.12 to $0.68 per hour (these are US East Region prices, since that's where we are launching CloudSearch).
PS It looks like it's initially available only in US East Region
Yeah, there's a side project I've been wanting to build for a long time, and it needs search. But the way these prices have been presented, it seems that CloudSearch is just not economically feasible for a SaaS / free multiuser offering.
Solr or Sphinx based searching is what pushed me out of most of the PaaS offerings and into my own VPS. It's unfortunate that most of the hosted services out there are too expensive for side projects.
When I looked at WebSolr, the cost exceeded my entire VPS structure, even for their cheapest plans.
We think our prices actually compare pretty well to self-hosting. Hosting a search engine is not cheap when it comes to memory and disk IO, and I don't envy anyone trying to shoehorn production Solr traffic into a small VPS.
Not to mention, we've got transparent replicated redundancy on all our indexes—one of our better-kept secrets, I really need to update our marketing materials—so double your VPSs there.
After setting Solr up, I definitely see the value that you guys provide for high traffic sites. But there's a difference between production level needs and hobbyist level needs for a side projects.
For side projects I'd be okay with indexing being less aggressive, sizes being more restrictive, and response times being higher if that made the pricing more accessible.
I'd be happy to give you guys more money when I've got the traffic to justify it, but it'd be nice to be able to flip that on when I need it.