One of the perennial problems with a platform like Puppet is not the lack of documentation in and of itself, but moreso the lack of various levels of documentation at all levels of documentation consumer. What I mean by this is, we may have a lot of documentation available (take the veritable encyclopedia of data at https://puppet.com/docs), but we may not actually have access to beginner-level accessible data.
What do I mean by “accessible”?
Most of the professional documentation I’ve encountered around the Puppet ecosystem is produced by brilliant engineers. Oftentimes, though, as we progress forward in our development on any platform, we tend to forget from where we came, and the level of knowledge we might have had at our earliest stages of development. As a result, we get really advanced documentation that is trying to be apprehended by the novice user.
This series will attempt to alleviate some of that. It should be noted that the majority of the work in this series of documentation will be on Puppet Community. Where possible, when documentation features or functions, I will link official documentation, Puppet Git projects, or some other foundation for the assertions and documentation that I am making.
What is Puppet?
First and foremost, we should answer the question “What is Puppet”. Now, generally, when we come to this question, many have already answered that question, and are likely looking here for advanced information. However, since this is a primer, we will cover that here.
When we talk about Puppet, we’re not talking about these guys:
What we’re talking about is an automation platform used by System Administrators and Engineers (primarily) to automate their work at scale. In the past, SysAdmins would keep a list of hosts they managed locally in text files or in SSH session management software like PuTTY. Through any number of mechanisms tht would allow them to automate procedures across n+1 nodes. In some cases 5-10 at a time, and in other cases hundreds or thousands. However, there were problems…
With hundreds or thousands of nodes, you could only “chunk” your actions into groupings between 20-40 at a time, and many times the commands you would execute would be performed in a serial fashion, taking time during which you would have an environment that was not fully in sync. Therefore, people could conceivably reach resources that have had changes, refresh a browser or a thin client, and get an entirely different experience, feature set, or even data. This was no good.
Additional problems would be that as your fleet grew or shrunk, the lists of nodes you have may become out of sync, and if you had multiple engineers, your lists may be out of sync with each other, causing coverage issues when trying to make changes to your environment. Maybe some engineers were using one set of scripts or node lists, and another was using different scripts or node lists, and perhaps the functionality between them differed. This made for a lack of predictability in how an environment was configured and/or was functioning, and would conceal states of “drift” from node to node.
Enter Puppet
Puppet’s creator, Luke Kanies, found this disarray noted above in the System Administration space, and decided to create Puppet. You can learn a bit more about the early days and development of Puppet from an O’Reilly interview with Luke here:
Luke’s main development that sort of revolutionized the configuration paradigm was the development of a “Resource Abstraction Layer”, which he describes in the video. For a little more in-depth coverage of the RAL and how it works, check out these articles here:
RAL-1
RAL-2
In short, the RAL is the “thing about the thing”. In these days of “meta-everything”, it seems odd to use such a reference, but the referencing system works.
When approaching a system for system administration purposes, there are files, packages, users, configurations, text, binaries, repositories… many different things you have to be aware of. As a result, you build up a skillset consisting of a knowledge of not only what these things are, but how they work, are configured, and interact with other subsystems. Luke’s main development here was to build this RAL as a “modeling system” to approach a server or series of servers programmatically.
As a result, a system was broken down into components known as “resources”, which are the fundamental unit for modeling a system configuration in Puppet.
Puppet has simplified not just the configuration of systems in an IT infrastructure, but made it possible to assert a configuration against many systems at once in an infrastructure, but this isn’t the main power behind Puppet. The main power is to be able to collect various resources into groupings, and to apply those configurations programmatically, allowing you to work with code as your infrastructure rather than individual machine configurations.
How Does it Work?
When you use Puppet, you define the desired state of the system in code. This code that you use is a Domain Specific Language (the Puppet DSL) which you use against a wide array of operating systems and devices, defining the desired state of those systems, not how to get there. Puppet, by utilizing its RAL, a Puppet Server and Agent which interprets the code you’ve written and configures the destination machine with that code. The organization of that code flow and system configuration looks like so:
Above image from Puppet.com
Altogether, this suite of tools, coding, methods, and components makes up the Puppet Platform. In our next installment, we will break apart this platform into its separate components and see what’s under the hood.