Questy.org

Client-Server Model in Puppet Architecture Explained

| Permalink

(Puppet Primer II – “The Conversation”)

What is a “System”?

To get to the core of where we are heading with the entire idea of Puppet, and the larger concepts involved, we need to step back to the core of what we’re dealing with. When discussing servers, groups of servers, and their configuration, we first have to conceptualize the greater construct when we say “systems”.

When we say “systems”, we are including any computer operating system running on any server connected to an internetwork either locally on your business premises, in a co-location facility. data center, or even system instances in a cloud services platform. This would include your home as well.

Clearly, since we are discussing a configuration management platform such as Puppet, we sort of already knew that, but it bears repeating.

How Do We Communicate With a System?

It is a given that computers that are “internetworked” together are comprised of an Operating system that contains a network stack that allows the system to send data over “the wire” (an Ethernet network) to another system on the same or a remotely connected network. This is achieved via a network communication protocol known as TCP/IP which is known as the “Internet Protocol Suite“.

NOTE: The above linked topics in the text of the article are considerably outside the scope of this article, but are of PARAMOUNT importance to the functioning of Puppet. It is assumed the reader is knowledgeable about the core infrastructure tools and protocols outlined here, but the links are provided at your convenience.

To communicate between systems, many different methods can be employed. For instance, one can utilize a web browser to access Puppet.com by simply typing in the browser address bar https://puppet.com. As anyone who has gone through a technical interview of sufficient thoroughness would attest, to describe the entirety of the digital “conversation” that ensues would be prohibitive due to space. Just know that the web browser first looks up where to find that machine or machines that represent Puppet.com, it traverses the Internet and retrieves the page, displaying it in your web browser.

Puppet functions in a much similar way.

The Client/Server Model

We’ve talked about systems theoretically and in actuality. Now, we have to discuss communications between these systems beyond just their “connectedness” across a wire.

Once machines are connected one to another, there are a series of networking protocols that communicate largely unseen. Your connection to the Internet, if it is up and working, “just works”™. You can go to YouTube, read a website, and even watch television. But in the case of applications like Puppet, there is a greater communications relationship between systems. This relationship is generally called “protocol”.

When we speak of “protocol” in non-digital scenarios, we might mean something like the following:

  1. the official procedure or system of rules governing affairs of state or diplomatic occasions.”protocol forbids the prince from making any public statement in his defense”
  2. 2.the original draft of a diplomatic document, especially of the terms of a treaty agreed to in conference and signed by the parties.”signatories to the Montreal Protocol”

As you can see, not a whole lot of computer wrangling to be found in this definition. But the same guard rails exist. Namely: “The official procedure or system of rules governing…”. In terms of computer systems, this is an example of an official procedure or system of rules governing communications between systems. How long is the connection to be open? Who starts the conversation and who sets the parameters for that conversation including when it ends? The “protocol”, then, denotes those “rules of engagement”, if you will.

The promise of configuration management in general and Puppet in particular is this: That one admin can systematize the way she manages infrastructure in such a way that can be broadly applied across systems or groups of systems programmatically rather than one system at a time. So, instead of configuring, say, SSH on a single system then connecting to the next system and doing that for all affected systems in your environment, this can take precious time you likely do not have. Instead, but referring to systems and services on those systems in groups, it is much more efficient to systematically configure SSH against an entire grouping of systems all at once.

In the Puppet world, this is achieved by the manner in which Puppet has “decided” upon protocol to both communicate and distribute and apply configurations to your environment.

The Puppet Conversation

There are many documents and explanations on the Internet around Puppet and how it works. What I will attempt to do is relate my explanation to as many specific Puppet documents as I can. I understand this presents a large volume of information to parse through, but this base operational function is paramount to understand if you wish to master the Puppet environment.

The Puppet ecosystem consists of many components, but we will begin with what we see in the image above. At its basest level, Puppet consists of the Puppet server system and any number of client systems that connect to this server to retrieve configuration elements that are available to apply to themselves. In many circumstances, when we discuss client/server communications, we speak of a server that contains the configuration and directs all the traffic in the ecosystem, “deciding” when and where things happen, and how they happen.

Some of these elements are true, but in the case of Puppet, the power to establish and conduct the conversation has been placed in the hands of the system being managed rather than dictated by the Puppet server. What does this mean?

The writers of Puppet have given the Puppet agent software which runs on the systems being managed control over when they request information from the server. By default, Puppet “wakes up” on the agent system every 30 minutes, and asks the server for its configuration. This, in effect, is a client-server conversation. When the client(agent) system “wakes up” to perform a run, here are the procedures it follows in its default configuration:

  • The client software which is running on a timer becomes active every 30 minutes
  • The agent software downloads the CA (Certification Authority) bundle from the server.
  • If certificate revocation is enabled, it also will download the Certificate Revocation List (CRL), utilizing the CA it just downloaded to verify the connection.
  • The agent loads or generates a private key. If the agent needs a certificate, it generates a Certificate Signing Request (CSR) which will include any dns_alt_names and csr_attributes, and submits the request to the Puppet server via an API call:
    PUT /puppet-ca/v1/certificate_request/:certname
  • The agent then attempts to download the signed certificate using the API endpoint:
    GET /puppet-ca/v1/certificate/:certname
    • If there is a conflict in retrieving the certificate requiring some kind of resolution or remediation on the Puppet server (cleaning an old CSR or certificate) the agent will then “sleep” for the configured time period stored in the waitforcert configuration variable. (default: 2 minutes)
  • If the downloaded certificate fails verification, such as not matching its private key, the Puppet will discard the certificate. The agent will then sleep for the configured waitforcert period and repeats the process.

While this may seem like a lot, this is the primary conversation entered into between the agent and the server to ensure (by way of SSL) that each node is the node it represents itself to be, and that it is authorized to communicate with the Puppet server.

Once the server and the node connecting to the server are rather certain who each other is, the Puppet conversation can begin. Think of the above the portion where someone pulls you aside and says “let’s go somewhere we can talk privately”.

The next thing that happens is the agent node then requests it’s “node object” (more on this later) and to drop into the working “environment” it belongs in.

  • Request a node object and switch environments with the API call:
    GET /puppet/v3/node/<NAME>
  • If the API call is successful, the agent then reads the environment from the node object. If the node object has an environment, use that environment instead of the one in the agent’s config file in all subsequent requests during this run.
  • If the API call is unsuccessful, or if the node object has no environment set, use the environment setting from the agent’s config file.

Since Puppet is an extensible platform, there are many added features and functions you can add to the Puppet environment. One of these is known as a “plugin”, and Puppet moves these plugins back and forth between the server and agent by a process known as pluginsync. In short, Pluginsync is a mechanism by which Puppet synchronizes its custom system profile information (known as “facts”) that are delivered via a platform component known as “facter“. We will cover facter more completely at a later time, just know that the facter “facts” contain a full profile of your system that the server needs to know about.

  • If pluginsync is enabled on the agent system, fetch plugins from a file server mountpoint that scans the lib/ directory of every Puppet module.
  • Request a “catalog” from the server while submitting the latest facts produced by facter.
    • Do a POST /puppet/v3/catalog/<NAME> where the post data is all of the node’s facts encoded as JSON while receiving a compiled catalog from the server in return.
  • Make file resource requests when applying the catalog:
    • File resources can specify file contents as either a content or source attribute. Content attributes go into the catalog, and the agent needs no additional data.
    • Source attributes put only references into the catalog, and may require additional HTTPS requests.

If you are using the default compiler, then for each file source, the agent makes a
GET /puppet/v3/file_metadata/<SOMETHING> request and compares the metadata returned to the state of that file already existent on-disk.

  • If the file is in sync, the agent moves on to the next file resource.
  • If the file is out of sync, the agent does a GET /puppet/v3/file_content/<SOMETHING> to retrieve the content of the file that should be on the disk

If you are using the static compiler (a more efficient compiler) all file metadata is embedded in the catalog. For each file source, the agent compares the embedded metadata from the catalog to the file contents on disk.

  • If the file content is in sync, the agent moves on to the next file resource.
  • If the file is out of sync, it performs a GET /puppet/v3/file_bucket_file/md5/<CHECKSUM> for the content.

NOTE: Using a static compiler is more efficient with network traffic than using the normal (dynamic) compiler. Using the dynamic compiler is less efficient during catalog compilation. Large amounts of files, especially recursive directories, amplifies either issue.

Finally, if the agent’s configuration has “report” enabled on the agent node, the Puppet agent will then submit the report to the server by performing a PUT /puppet/v3/report/<NAME>

Comments and Contents

This may seem like overkill in the grand scheme of a Puppet primer but I guarantee you will definitely be thankful we “went there” this early in the process. When we start breaking down all the various components like facter, catalogs, resources, and the push-pull nature of the conversation between the Puppet server and agents, having a fundamental knowledge of the process that is occurring as well as having a reference to come back to will be invaluable.

I’d like to thank Puppet for their encyclopedic platform reference. Largely the entire page here came nearly directly from the documentation. I tried to link specific pages where I was either referring to or introducing some new concept or platform component we have not yet discussed. I am primarily referring to the Puppet reference on Agent-server HTTPS communications found here specifically as it relates to the agent-side checks and HTTPS requests made during a single Puppet run.

Idiosyncratically, Puppet will perform in specific ways and perform considerably more functions and details than what is related here, but the items here are documented and transparent whereas some of the other items are “private” functions and API calls, and are not intended for customer consumption.