Sunday, 31 March 2013

Chef: Not for me

It will make life easy they said...

Firstly let me admit right up front that I don't program in Ruby. However circumstances dictated that Chef, from Opscode, would occupy a large part of my time this last month. The result is that Chef and I have developed a mutual dislike.

Using a config management system makes sense. It spares you the drudgery of repeated tinkering to get multiple servers to the exact same state each time. And it helps with scaling. Saves time and creates a repeatable process? What's not to like?

How about an overly complicated model? The first hint is the cute naming. Cookbooks, Recipes and Knife. Which don't map to the mental models I have of those items. Annoying, but not too serious. At least not a serious as requiring a Chef Server and a Chef Workstation to manage Chef Nodes, which are also Chef Clients. Sometimes.

The Chef Server is supposed to be a central repository of the final state. That information is uploaded to the server from the Workstation. Immediately there is potential for confusion. Did the latest changes get pushed to the Server? The rationale is that the Nodes/Clients can periodically request the latest expected state from the Server. So how do I test new Recipes without pushing to the Server and potentially disrupting the Nodes/Clients who then read the new expected state? Why do I have to think about that?

But it gets worse.

I created a local environment in which to run my Chef Recipes. A shiny new Ubuntu 12.04 LTS, all updates applied and running in a VirtualBox host. Snapshotted to allow me to quickly get back to a base install. Part of that process was the need to delete the Node and Client information from the Server using knife. But knife kept throwing an error when I tried to delete the node.
chef knife delete_object': undefined method `destroy'
Sigh.

So I went ahead and upgraded chef and now my recipes, such as they were, stopped working. Mutter! After some investigation I discovered that Chef 11 had changed the way attributes were defined because...
Chef would load attributes files in essentially random order
Given that the words 'random order' appeared in the reasons for the change I can appreciate the need for a change that breaks things. Also keep in mind that I was still learning Chef, so my Recipe's weren't the best. Nevertheless, this kind of thing is annoying.

I continued working with Chef, but kept losing time and hair because of seemingly stupid things. Look at the following 2 excerpts:-
----- excerpt one -----
if recipe_config['ssl_enabled']
    template ssl_cert_file do
        source 'ssl.crt'
        owner 'root'
        group 'root'
        mode '0600'
        variables(
            :content => certificate_secrets[node.chef_environment]['cert']
        )
        notifies :restart, resources(:service => "nginx"), :delayed
    end
end
----- excerpt two -----
if recipe_config['ssl_enabled']
    template ssl_cert_file do
        source 'ssl.crt'
        owner 'root'
        group 'root'
        mode '0600'
        variables(
            :content => certificate_secrets[node.chef_environment]['cert']
        )
    end
    notifies :restart, resources(:service => "nginx"), :delayed
end
The 'notifies' notification for the nginx resource is outside the template block in the second excerpt. Chef chokes on that, claiming that it
Cannot find a resource for notifies
What is the difference? Obviously this stems from my lack of understanding of Chef or Ruby, or both, but this seems unreasonable. I asked around the office and even those who had used Ruby couldn't tell me what the issue was. So I just used excerpt one.

And the voice in my head said "You're going to regret this."

After further work with my Recipe I reached a state were I was ready to test what I had built on AWS (this was for an existing client that was already using AWS). Chef's Knife tool has a handy plugin for managing EC2 instances. It allowed me to create and deploy with a single command:-
knife ec2 server create -I 'ami-b6089bdf' -x ubuntu -i ~/.ssh/awskey.pem -r 'role[webserver]' -E stage -f t1.micro -G 'web-service' --region 'us-east' -Z 'us-east-1c'
This command will create a new t1.micro instance from the specified image file, and deploy it as a  'webserver'. The other parameters specify the user to login as, the key file authentication, the region and availability zone and the security group. All very useful, except I kept getting an error:-
.../gems/excon-0.20.0/lib/excon/socket.rb:42:in `getaddrinfo': getaddrinfo: nodename nor servname provided, or not known (SocketError) (Excon::Errors::SocketError)
from .../gems/excon-0.20.0/lib/excon/socket.rb:42:in `connect'
from .../gems/excon-0.20.0/lib/excon/ssl_socket.rb:72:in `connect'
from .../gems/excon-0.20.0/lib/excon/socket.rb:32:in `initialize'
from .../gems/excon-0.20.0/lib/excon/ssl_socket.rb:8:in `initialize'
from .../gems/excon-0.20.0/lib/excon/connection.rb:341:in `new'
from .../gems/excon-0.20.0/lib/excon/connection.rb:341:in `socket'
from .../gems/excon-0.20.0/lib/excon/connection.rb:87:in `request_call'
from.../gems/excon-0.20.0/lib/excon/middlewares/mock.rb:79:in `request_call'
from .../gems/excon-0.20.0/lib/excon/middlewares/instrumentor.rb:22:in `request_call'
from .../gems/excon-0.20.0/lib/excon/middlewares/base.rb:15:in `request_call'
from .../gems/excon-0.20.0/lib/excon/middlewares/base.rb:15:in `request_call'
from .../gems/excon-0.20.0/lib/excon/connection.rb:220:in `request'
from .../gems/excon-0.20.0/lib/excon/middlewares/idempotent.rb:11:in `error_call'
from .../gems/excon-0.20.0/lib/excon/middlewares/base.rb:10:in `error_call'
from .../gems/excon-0.20.0/lib/excon/connection.rb:236:in `rescue in request'
from .../gems/excon-0.20.0/lib/excon/connection.rb:197:in `request'
from .../gems/excon-0.20.0/lib/excon/middlewares/idempotent.rb:11:in `error_call'
from .../gems/excon-0.20.0/lib/excon/middlewares/base.rb:10:in `error_call'
from .../gems/excon-0.20.0/lib/excon/connection.rb:236:in `rescue in request'
from .../gems/excon-0.20.0/lib/excon/connection.rb:197:in `request'
from .../gems/excon-0.20.0/lib/excon/middlewares/idempotent.rb:11:in `error_call'
from .../gems/excon-0.20.0/lib/excon/middlewares/base.rb:10:in `error_call'
from .../gems/excon-0.20.0/lib/excon/connection.rb:236:in `rescue in request'
from .../gems/excon-0.20.0/lib/excon/connection.rb:197:in `request'
from .../gems/fog-1.10.0/lib/fog/core/connection.rb:21:in `request'
from .../gems/fog-1.10.0/lib/fog/aws/compute.rb:384:in `_request'
from .../gems/fog-1.10.0/lib/fog/aws/compute.rb:379:in `request'
from .../gems/fog-1.10.0/lib/fog/aws/requests/compute/describe_images.rb:54:in `describe_images'
from .../gems/fog-1.10.0/lib/fog/aws/models/compute/images.rb:49:in `all'
from .../gems/fog-1.10.0/lib/fog/aws/models/compute/images.rb:55:in `get'
from .../gems/knife-ec2-0.6.2/lib/chef/knife/ec2_server_create.rb:360:in `ami'
from .../gems/knife-ec2-0.6.2/lib/chef/knife/ec2_server_create.rb:367:in `validate!'
from .../gems/knife-ec2-0.6.2/lib/chef/knife/ec2_server_create.rb:226:in `run'
from .../gems/chef-11.4.0/lib/chef/knife.rb:460:in `run_with_pretty_exceptions'
from .../gems/chef-11.4.0/lib/chef/knife.rb:173:in `run'
from .../gems/chef-11.4.0/lib/chef/application/knife.rb:123:in `run'
from .../gems/chef-11.4.0/bin/knife:25:in `<top (required)>'
from .../bin/knife:19:in `load'
from .../bin/knife:19:in `<main>'
from .../bin/ruby_noexec_wrapper:14:in `eval'
from .../bin/ruby_noexec_wrapper:14:in `<main>'
Cost a day as I tried to debug network issues, but the issue was that I was specifying the region incorrectly. AWS' sole eastern datacenter is 'us-east-1', not 'us-east' as I had specified. Dumb error, but even dumber error message. Moreover, getting obtuse error messages wasn't uncommon.

Finally I was able to repeatably create 'webserver' instances on EC2. Each instance was identical, had all the required software and configurations and was ready for deployment once I had done a software update (I want to maintain software update as a discrete, manual step to allow proper rollout of updates and obviate any new software breaking working systems.) So Chef is a legitimate solution to the issue of configuration management of multiple flavours of multiple server environments.

But not for me.

Chef is a very thick layer over the administration of a system. One has to learn Ruby and Chef. And in addition to Cookbooks, Recipes, Servers, Workstations, Nodes and Clients, there are Resources, Attributes, Providers, Notifications and Actions. For me, someone who has managed systems before, this layer gets in the way of doing things. I seem to have spent an inordinate amount of time coaxing Chef to do what I knew I could do (and have done) fairly easily with some shell and/or Perl scripts, albeit not with a large number of servers.

I prefer something less cumbersome. I think I will have a look at Ansible. It seems more my cup of tea.

No comments:

Post a Comment