Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You know this doesn't seem like a bad deal though $100/mo might be high for someone just starting out. Right now my options for search are:

  Full text SQL search
  Apache Solr or something similar
  Google Search Appliance
  Custom search
  Google free search on your site
Yay for search as a service.


There's also:

   Searchify, Running the full open sourced IndexTank Search as a Service API  
   HoundSleuth, IndexTank Compatible API  
   IndexTanktoGO, IndexTank Compatible API  
   Bimaple, IndexTank Compatible API  
   IndexDen, IndexTank Compatible API


You seem to like IndexTank ;)

There's also my own http://websolr.com/ running Apache Solr. Some other Solr services are mentioned elsewhere on the page.

I've also recently launched http://bonsai.io/ for a hosted ElasticSearch service. Because ElasticSearch is actually quite awesome (and I'm happy to answer questions about why).

For Sphinx, there's Flying Sphinx (by Pat Allen of Thinking Sphinx Ruby client fame, great guy), and IndexDen (which is Sphinx, not IndexTank).


So, why is ElasticSearch awesome, apart from being a searchable document/JSON store? That's pretty obviously awesome :P


There's a lot to be said for ElasticSearch's data distribution. It does sharding and replication really well. That makes my life easier, as a service provider, as well as the life of anyone that has to manage and scale an ES cluster. Or who doesn't want to have to deal with client-side sharding or worry about how many servers can crash before their search goes down.

Here's a good video on the subject from ElasticSearch's creator: http://vimeo.com/26710663

ElasticSearch has very little ceremony around creating a new index and getting started with using it. You will eventually need to do some configuration to tune its behavior for your specific application, but the learning curve is nice and gradual. This makes ES great for exploration.

The JSON document store aspect of ElasticSearch is indeed very nice. The RESTful API is simple enough that you don't really need a client, just grab your favorite HTTP client library and start integrating. Plus, coupled with solid distribution, you're looking at a pretty viable standalone data store, IMO.

Also, very good documentation. And its user/developer community is all full of the really smart, enthusiastic early-adopter types right now :)

Not least, Lucene itself is hands down the last word when it comes to search.


And if you want a hassle free scalable search service with automatic sharding/scaling, a lucene underpinning and a nice REST API:

Elasticsearch


http://senseidb.com/ is another alternative that I work on used for search at LinkedIn. It's API compatible with Elasticsearch.


And if you want an actually good search product there is always SphinxSearch.


Last I checked, Sphinx had a huge design flaw in that it indexed directly from an SQL database. In other words, your Sphinx configuration not only needs to have read access to the database, it needs to contain the required SQL queries.

This tightly couples Sphinx to your application and your schema, and creates serious issues for your ops team since every app change potentially needs to modify the Sphinx config. It gets particularly hairy when you want to host multiple applications using a single Sphinx daemon.

We started out with Sphinx for our apps but quickly discarded it in favour of ElasticSearch, a much more elegant and orthogonal piece of software.


That's not true, you can pipe in data from any source:

http://sphinxsearch.com/docs/2.0.4/xmlpipe2.html


Also sphinx as real-time indexes.

You send data to sphinx (when you update it), and its indexed right away.

The original disk-indexes (updated by a batch process is still available)


I always find sphinx limiting. For example, I can add a single doc to the index, I have to run a full re-index.

Also, I can't programmatically get a list of all "words" in the index with their frequency and the inverse dod freq, etc. With anything lucene based this kind of thing is really easy.


+1 on this. I really liked Sphinx until I started inserting records...


Why wouldn't you consider ElasticSearch to be an "actually good search product" ?


Why the bashing on Elasticsearch? We are using it to index log files; we have over 275 million documents in our index and performance has been pretty impressive.


What kind of hardware are you running that on? We are setting up a larger cluster, and are interested in the config of others. Thnx!


We're running on five EC2 instances, each instance is running Elasticsearch configured to use 25GB of RAM. With the current data set we might be able to get by with less RAM, we're still in the process of figuring out what works best for us.


That's why you could look at IndexDen.com which powered by Sphinx Search cluster :)


And for the joke entry:

Yahoo! BOSS (http://developer.yahoo.com/search/boss/)


Yes, but the billing is hourly based as usual: You'll be billed based on the number of running search instances. There are three search instance sizes (Small, Large, and Extra Large) at prices ranging from $0.12 to $0.68 per hour (these are US East Region prices, since that's where we are launching CloudSearch).

PS It looks like it's initially available only in US East Region


Yeah, there's a side project I've been wanting to build for a long time, and it needs search. But the way these prices have been presented, it seems that CloudSearch is just not economically feasible for a SaaS / free multiuser offering.


CloudSearch is just not economically feasible for a SaaS / free multiuser offering

Are you saying it is too much? WebSolr etc have options that are cheaper.




"also" = additional to the already cited websolr ;)


And there's also LucidWorks Cloud. http://www.lucidimagination.com/products/lucidworks-search-p...

Lucid Imagination is run by some of the most experienced Solr and Lucene devs. Specifically Yonik Seeley, who created Solr.


Solr or Sphinx based searching is what pushed me out of most of the PaaS offerings and into my own VPS. It's unfortunate that most of the hosted services out there are too expensive for side projects.

When I looked at WebSolr, the cost exceeded my entire VPS structure, even for their cheapest plans.


We think our prices actually compare pretty well to self-hosting. Hosting a search engine is not cheap when it comes to memory and disk IO, and I don't envy anyone trying to shoehorn production Solr traffic into a small VPS.

Not to mention, we've got transparent replicated redundancy on all our indexes—one of our better-kept secrets, I really need to update our marketing materials—so double your VPSs there.


After setting Solr up, I definitely see the value that you guys provide for high traffic sites. But there's a difference between production level needs and hobbyist level needs for a side projects.

For side projects I'd be okay with indexing being less aggressive, sizes being more restrictive, and response times being higher if that made the pricing more accessible.

I'd be happy to give you guys more money when I've got the traffic to justify it, but it'd be nice to be able to flip that on when I need it.


I'm in the same situation, and I'm thinkin about using Solr in a vps or trying one of those hosted cloud solr solutions




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: