How can I use Sphinx on Scalarium?

Though there is no default role for Sphinx yet, it's easy to get it up and running using custom Chef cookbooks. We prepared one for your convenience in our repository of example cookbooks. It's targeted at Rails applications using Thinking Sphinx, but should give you an idea about what steps are involved.

You can either add our cookbook repository to your cloud, or you can just use our sample cookbook for Sphinx and include it in your own collection.

Once you've settled that and configured your cloud to use custom recipes, simply add a new role called Sphinx to your cloud, and add an instance to it:

Sphinx Role

Once you're done, add Custom Recipes to both roles, sphinx::install on setup and sphinx::deploy on deploy for the "Sphinx" role, sphinx::client on deploy for the Rails Application Server, like so:

Custom Recipes Sphinx

Custom Recipes Rails

Sphinx can be installed on any instance, if that's not a Rails Application Server, the cookbook will check out the source code, required by Thinking Sphinx to talk to Sphinx. If it is, the cookbook will simply use what's already there.

When you start the instance, Sphinx server will be installed, current version is 0.9.9. On the next deployment it will be properly configured and the index will be created. On subsequent deployments, the configuration will be kept up-to-date and Sphinx server restarted accordingly. You Rails apps will be properly and automatically configured to talk to the proper Sphinx instance, so no need to maintain a sphinx.yml, so no need to maintain a configuration on your end.

Don't forget to set ThinkingSphing.remote_sphinx = true and to explicitly do require 'riddle/0.9.9', otherwise you'll see some warnings pop up here and there. That's related to how Thinking Sphinx works, not the cookbook or Scalarium itself. Only set remote_sphinx to true if Sphinx is not running on the same machine as your application though.

The cookbooks allow for some options to be set, noteworthiest of them would be the memory limit which defaults to 256M, and the cronjob interval for reindexing (defaults to every 10 minutes), but you're welcome to adapt it to your needs. See the attributes file for all the details. Alternatively you can check the Scalarium cluster state file which contains all the information about the instance and the cluster it belongs to, and enable remote_sphinx only when the current instance does not have the role sphinx by putting this code in your environment.rb or an initializer:

cluster_state = '/var/lib/scalarium/cluster_state.json'
if Rails.env.production? and File.exist?(cluster_state) and not ActiveSupport::JSON.decode(File.read(cluster_state))['instance']['roles'].include?('sphinx')
  ThinkingSphinx.remote_sphinx = true
end

We set up a small example application to show how to integrate all the above in a Rails application. Also, don't forget to properly include the Rake tasks for Thinking Sphinx in your project's Rakefile like so:

require 'thinking_sphinx/tasks'

Otherwise Scalarium can't auto-manage the Sphinx index for you.

We'll eventually turn Sphinx into a first-class role at Scalarium, but for now, this should do.