Understanding reservations, concurrency, and locking in Nova

Imagine that two colleagues, Alice and Bob, issue a command to launch a new virtual machine at approximately the same moment in time. Both Alice’s and Bob’s virtual machines must be given an IP address within the range of IP addresses granted to their project. Let’s say that range is 192.168.20.0/28, which would allow for a total of 16 IP addresses for virtual machines [1]. At some point during the launch sequence of these instances, Nova must assign one of those addresses to each virtual machine.

How do we prevent Nova from assigning the same IP address to both virtual machines?

In this blog post, I’ll try to answer the above question and shed some light on issues that have come to light about the way in which OpenStack projects currently solve (and sometimes fail) to address this issue.

Demonstrating the problem

figure A

figure A

Dramatically simplified, the launch sequence of Nova looks like figure A. Of course, I’m leaving out hugely important steps, like the provisioning and handling of block devices, but the figure demonstrates the important steps in the launch sequence for the purposes of our discussion here. The specific step in which we find our IP address reservation problem is the determine networking details step.

figure B

figure B

Now, within the determine networking details step, we have a set of tasks that looks like figure B. All of the tasks except the last revolve around interacting with the Nova database [2]. The tasks are all pretty straightforward: we grab a record for a “free” IP address from the database and mark it “assigned” by setting the IP address record’s instance ID to the ID of the instance being launched, and the host field to the ID of the compute node that was selected during the determine host machine step in figure A. We then save the updated record to the database.

OK, so back to our problem situation. Imagine if Alice and Bob’s launch requests were made at essentially the same moment in time, and that both requests arrived at the start of the determine networking details step at the same point in time, but that the tasks from figure B are executed in an interleaved fashion between Alice and Bob’s requests like figure C shows.

figure C

figure C

If you step through the numbered actions in both Alice and Bob’s request process, you will notice a problem. Actions #7 and #9 will both return the same IP address information to their callers. Worse, the database record for that single IP address will show the IP address is assigned to Alice’s instance, even though Bob’s instance was (very briefly) assigned to the IP address because the database update in action #5 occurred (and succeeded) before the database update in action #8 occurred (and also succeeded). In the words of Mr. Mackey, “this is bad, m’kay”.

There are a number of ways to solve this problem. Nova happens to employ a traditional solution: database-level write-intent locks.

Database-level Locking

At its core, any locking solution is intended to protect some critical piece of data from simultaneous changes. Write-intent locks in traditional database systems are no different. One thread announces that it intends to change one or more records that it is reading from the database. The database server will mark the records in question as locked by the thread, and return the records to the thread. While these locks are held, any other thread that attempts to either read the same records with the intent to write, or write changes to those records, will get what is called a lock wait.

Only once the thread indicates that it is finished making changes to the records in question — by issuing a COMMIT statement — will the database release the locks on the records. What this lock strategy accomplishes is prevention of two threads simultaneously reading the same piece of data that they intend to change. One thread will wait for the other thread to finish reading and changing the data before its read succeeds. This means that using a write-intent lock on the database system results in the following order of events:

figure D

figure D

For MySQL and PostgreSQL, the SQL keyword that is used to indicate to the database server that the calling thread intends to change records that it is asking for is called SELECT ... FOR UPDATE.

Using a couple MySQL command-line client sessions, I’ll show you what affect this SELECT FOR UPDATE construct has on a normal MySQL database server (though the effect is identical for PostgreSQL). I created a test database table called fixed_ips that looks like the following:

CREATE TABLE `fixed_ips` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `host` INT(11) DEFAULT NULL,
  `instance_id` INT(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

I then populate the table with a few records representing IP addresses, all “available” for an instance: the host and instance_id fields are set to NULL:

mysql> SELECT * FROM fixed_ips;
+----+------+-------------+
| id | host | instance_id |
+----+------+-------------+
|  1 | NULL |        NULL |
|  2 | NULL |        NULL |
|  3 | NULL |        NULL |
+----+------+-------------+
3 rows in set (0.00 sec)

And now, here interleaved in time order in a tabular format, are the SQL commands executed in each of the sessions. Thread A is on the left, thread B on the right.

Alice (thread A) Bob (thread B)
sessA>BEGIN;
Query OK, 0 rows affected (...)

sessA>SELECT NOW();
+---------------------+
| NOW()               |
+---------------------+
| 2014-12-31 09:03:07 |
+---------------------+
1 row in set (0.00 sec)

sessA>SELECT * FROM fixed_ips
    -> WHERE instance_id IS NULL
    -> AND host IS NULL
    -> ORDER BY id LIMIT 1
    -> FOR UPDATE;
+----+------+-------------+
| id | host | instance_id |
+----+------+-------------+
|  2 | NULL |        NULL |
+----+------+-------------+
1 row in set (...)
 
 
sessB>BEGIN;
Query OK, 0 rows affected (...)

sessB>SELECT NOW();
+---------------------+
| NOW()               |
+---------------------+
| 2014-12-31 09:04:05 |
+---------------------+
1 row in set (0.00 sec)

sessB>SELECT * FROM fixed_ips
    -> WHERE instance_id IS NULL
    -> AND host IS NULL
    -> ORDER BY id LIMIT 1
    -> FOR UPDATE;
sessA>UPDATE fixed_ips
    -> SET host = 42,
    ->     instance_id = 42
    -> WHERE id = 2;
Query OK, 1 row affected (...)
Rows matched: 1  Changed: 1

sessA>COMMIT;
Query OK, 0 rows affected (...)
 
 
+----+------+-------------+
| id | host | instance_id |
+----+------+-------------+
|  3 | NULL |        NULL |
+----+------+-------------+
1 row in set (42.03 sec)

sessB>COMMIT;
Query OK, 0 rows affected (...)

I’ve highlighted in red above the important things to note about the interplay between session A and session B. The 42.03 seconds is important: it shows the amount of time the SELECT ... FOR UPDATE statement waited on the write-intent locks held by session A. Secondly, the 3 returned by session B’s SELECT ... FOR UPDATE statement indicates that a different row was returned for the same query that session A issued. In other words, MySQL waited until session A issued a COMMIT before executing session B’s SELECT ... FOR UPDATE statement.

In this way, the write-intent locks constructed with the SELECT ... FOR UPDATE statement prevent the collision of threads changing the same record at the same time.

How locks “fail” with MySQL Galera Cluster

galera_replication1

At the Atlanta design summit, I co-led an Ops Meetup session on databases and was actually surprised by my poll of who was using which database server for their OpenStack deployments. Out of approximately 220 people in the room, MySQL Galera Cluster was by far the most popular way of deploying MySQL for use by OpenStack services, with around 200 or so operators raising their hands that they used it. Standard MySQL was next, and there was one person using PostgreSQL.

MySQL Galera Cluster is a system that wraps the standard MySQL row-level binary replication log transmission with something called working-set replication, enabling synchronous replication between many nodes running the MySQL database server. Now, that’s a lot of fancy words to really say that Galera Cluster allows you to run a cluster of database nodes that do not suffer from replication slave lag. You are guaranteed that the data on disk on each of the nodes in a Galera Cluster is exactly the same.

One interesting thing about MySQL Galera Cluster is that it can efficiently handle writes to any node in the cluster. This is different from standard MySQL replication, which generally relies on a single master database server that handles writes and real-time reads, and one or more slave database servers that serve read requests from applications that can tolerate some level of lag between the master and slave. Many people refer to this setup as multi-master mode, but that is actually a misnomer, because with Galera Cluster, there is no such thing as a master and a slave. Every node in a cluster is the same. Each can apply writes coming to the node directly from a MySQL client. For this reason, I like to refer to such a setup as multi-writer mode.

This ability to have writes be directed to and processed by any node in the Galera Cluster is actually pretty awesome. You can direct a load balancer to spread read and write load across all nodes in the cluster, allowing you to scale writes as well as reads. This multi-writer mode is ideal for WAN-replicated environments, believe it or not, as long as the amount of data being written to is not crazy-huge (think: Ceilometer), because you can have application servers send writes to the closest database server in the cluster, and let Galera handle the efficiency of transmitting writesets across the WAN.

However, there’s a catch. Peter Boros, a principal architect at Percona, a company that makes a specialized version of Galera Cluster called Percona XtraDB Cluster, was actually the first to inform the OpenStack community about this catch — in the aforementioned Ops Meetup session. The problem with MySQL Galera Cluster is that it does not replicate the write-intent locks for SELECT ... FOR UPDATE statements. There’s actually a really good reason for this. Galera does not have any idea about the write-intent locks, because those locks are constructions of the underlying InnoDB storage engine, not the MySQL database server itself. So, there’s no good way for InnoDB to communicate to the MySQL row-based replication stream that write-intent locks are being held inside of InnoDB for a particular thread’s SELECT ... FOR UPDATE statement [3].

Figure E

Figure E

The ramifications of this catch are interesting, indeed. If two application server threads issue the same SELECT ... FOR UPDATE request to a load balancer at the same time, which directs each thread to different Galera Cluster nodes, both threads will return the exact same record(s) with no lock waits [4]. Figure E illustrates this phenomenon, with the circled 1, 2, and 3 events representing things occurring at exactly the same time (due to no locks being acquired/held).

One might be tempted to say that Galera Cluster, due to its lack of support for SELECT ... FOR UPDATE write-intent locks, is no longer ACID-compliant, since now two threads can simultaneously select the same record with the intent of changing it. And while it is indeed true that two threads can select the same record with the intent of changing it, it is extremely important to point out that Galera Cluster is still ACID-compliant.

The reason is because even though two threads can simultaneously read the same record with the intent of changing it (which is the identical behaviour that would be seen if the FOR UPDATE was left off the SELECT statement), if both threads attempt to write a change to the same record via an UPDATE statement, either one or none of the threads would succeed in updating the record, but not both. The reason for this is in the way that Galera Cluster certifies a working set (the set of changes to data). If node 1 writes an update to disk, it must certify with a quorum of nodes in the cluster that its update does not conflict with updates to those nodes. If node 3 has begun changing the same row of data, but has not certified with the other nodes in the cluster for that working set, then it will fail to certify the original working set from node 1 and will send a certification failure back to node 1.

This certification failure manifests itself as a MySQL deadlock error, specifically error 1213, which will look like this:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

All nodes other than the one that first “won” — i.e. successfully committed and certified its transaction — will return this deadlock error to any other thread that attempted to change the same record(s) at the same time as the thread that “won”. Need a visual of all this interplay? Check out figure F, which I scraped together for the graphically-inclined.

Figure F

Figure F

If you ever wondered why, in the Nova codebase, we make prodigious use of a decorator called @_retry_on_deadlock in the SQLAlchemy API module, it is partly because of this issue. These deadlock errors can be consistently triggered by running load tests or things like Tempest that can put a load on the database that forces “hot spots” in the data to occur. This decorator does exactly what you’d think it would do: it retries the transaction if a deadlock error is returned from the database server.

So, given what we know about MySQL Galera Cluster, one thing we are trying to do is entirely remove any use of SELECT ... FOR UPDATE from the Nova code base. Since we know it doesn’t work the way people think it works on Galera Cluster, we might as well stop using this construct in our code. However, the retry-on-deadlock mechanism is actually not the most effective or efficient mechanism we could use to solve the concurrent update problems in the Nova code base. There is another technique, which I’ll call compare and swap, which offers a variety of benefits over the retry-on-deadlock technique.

Compare and swap

One of the drawbacks to the retry-on-deadlock method of handling concurrency problems is that it is reactive by nature. We essentially wrap calls that may tend to deadlock with a decorator that catches the deadlock error if it arises and retry the entire database transaction again. The problem with this is that the deadlock error that manifests itself from the Galera Cluster working set certification failure (see Figure F above) takes some non-insignificant amount of time to occur.

Think about it. A thread manages to start a write transaction on a Galera Cluster node. It writes the transaction on the local node and gets all the way up to the point of doing the COMMIT. At that point, the node sends out a certification request to each node in the cluster (in parallel). It must wait until a quorum of those nodes respond with a successful certification. If another node has an active working set that changes the same modified rows, then a deadlock will occur, and that deadlock will eventually bubble its way back to the caller, who will retry the exact same database transaction. All of these things, while individually very quick in Galera Cluster, do take some amount of time.

What if we used a technique that would allow us to structure our SQL statements in such a way that we can avoid the roundtrips from one Galera Cluster node to the other nodes? Well, there is.

Consider the following SQL statements, taken from the above CLI examples:

BEGIN;
/* Grab the "first" unassigned IP address */
SELECT id FROM fixed_ips
WHERE host IS NULL
AND instance_id IS NULL
ORDER BY id
LIMIT 1
FOR UPDATE;
/* Let's assume that the above query returned the
   fixed_ip with ID of 1
   We now "assign" the IP address to instance #42
   and on host #99 */
UPDATE fixed_ips
SET host = 99, instance_id = 42
WHERE id = 1;
COMMIT;

Now, we know that the locks taken for the FOR UPDATE statement won’t actually be considered by any other nodes in a Galera Cluster, so we need to get rid of the use of SELECT ... FOR UPDATE. But, how can we structure things so that the SQL code sent to any node in the Galera Cluster will guarantee to us that we will neither stumble into a deadlock error and that the cluster node we end up executing our statements on will not need to contact any other node to determine that another thread has updated the same record during the time that we SELECT‘d our record and when we go to UPDATE it?

The answer lies in constructing an UPDATE statement that contains a WHERE clause that contains all the fields from the previously SELECT‘ed record, like so:

/* Grab the "first" unassigned IP address */
SELECT id FROM fixed_ips
WHERE host IS NULL
AND instance_id IS NULL
ORDER BY id
LIMIT 1;
/* Let's assume that the above query returned the
   fixed_ip with ID of 1
   We now "assign" the IP address to instance #42
   and on host #99, but specify that the host and
   instance_id fields must match our original view
   of that record -- i.e., they must both be NULL
*/
UPDATE fixed_ips
SET host = 99, instance_id = 42
WHERE id = 1
AND host IS NULL
AND instance_id IS NULL;

If we structure our application code so that it is executing the above SQL statements, each statement can be executed on any node in the cluster, without waiting for certification failures to occur before “knowing” if the UPDATE would succeed. Remember that working set certification in Galera only happens once the local node (i.e. the node originally receiving the SQL statement) is ready to COMMIT the changes. Well, if thread B managed to update the fixed_ip record with id = 1 in between the time when thread A does its SELECT and the time thread A does its UPDATE, then the WHERE condition:

WHERE id = 1
AND host IS NULL
AND instance_id IS NULL;

Will fail to select any any rows in the database to update, since host IS NULL AND instance_id IS NULL will no longer be true if another thread updated the record. We can catch this failure to update any rows in the database more efficiently than the certification timeout, since the thread that sent the UPDATE ... WHERE ... host IS NULL AND instance_id IS NULL statement will receive notification about no rows updated before any certification traffic would ever be generated (since there’s no certification needed if nothing was updated).

Do we still need a retry mechanism? Yes, of course we do, in order to retry the SELECT, then UPDATE ... WHERE statements when a previous UPDATE ... WHERE statement returned zero rows affected. The difference between this compare-and-swap approach and the brute-force retry-on-deadlock approach is that we’re no longer reacting to an exception being emitted after some timeout of certification, but instead being proactive and just structuring our UPDATE statement to pass in our previous view of the record we want to change, allowing for a tighter retry loop to occur (no timeout waits needed, simply detect whether rows_affected is greater than zero).

This compare and swap mechanism is what I describe in the lock-free-quota-management Nova blueprint specification. There’s been a number of mailing list threads and IRC conversations about this particular issue, so I figured I would write a bit and create some pretty graphics to illustrate the sequencing of events that occurs. Hope this has been helpful. Let me know if you have thoughts on the topic or see any errors in my work. Always happy for feedback.

[1] This is just for example purposes. Technically, such a CIDR would result in 13 available addresses in Nova, since addresses for the gateway, cloudpipe VPN, and broadcast addresses are reserved for use by Nova.

[2] We are not using Neutron in our example here, but the same general problem resides in Neutron’s IPAM code as is described in this post.

[3] Technically, there are trade-offs between pessimistic locking (which InnoDB uses locally) and optimistic locking (that Galera uses in its working-set certification. For an excellent read on the topic, check out Jay Janssen‘s blog article on multi-node writing and deadlocks in Galera.

[4] If both threads happened to hit the same Galera Cluster node, then the last thread to execute the SELECT ... FOR UPDATE would end up waiting for the locks (in InnoDB) on that particular cluster node.

Understanding the OpenStack CI System

This post describes in detail the upstream OpenStack continuous integration platform. In the process, I’ll be describing the code flow in the upstream system — from the time the contributor submits a patch to Gerrit, all the way through the creation of a devstack environment in a virtual machine, the running of the Tempest test suite against the devstack installation, and finally the reporting of test results and archival of test artifacts. Hopefully, with a good understanding of how the upstream tooling works, setting up your own linked external testing platform will be easier.

Some History and Concepts

Over the past four years, there has been a steady evolution in the way that the source code of OpenStack projects is tested and reviewed. I remember when we used Bazaar for source control and Launchpad merge proposals for code review. There was no automated or continuous testing to speak of in those early days, which put pressure on core reviewers to do testing of proposed patches locally. There was also no standardized integration test suite, so often a change in one project would inadvertantly break another project.

Thanks to the work of many contributors, particularly those patient souls in the OpenStack Infrastructure team, today there is a robust platform supporting continuous integration testing for OpenStack and Stackforge projects. At the center of this platform are the Jenkins CI servers, the Gerrit git and patch review server, and the Zuul gating system.

The Code Review System

When a contributor submits a patch to one of the OpenStack projects, one pushes their code to the git server managed by Gerrit running on review.openstack.org. Typically, contributors use the git-review Git plugin, which simplifies submitting to a git server managed by Gerrit. Gerrit controls which users or groups are allowed to propose code, merge code, and administer code repositories under its management. When a contributor pushes code to review.openstack.org, Gerrit creates a Changeset representing the proposed code. The original submitter and any other contributors can push additional amendments to that Changeset, and Gerrit collects all of the changes into the Changeset record. Here is a shot of a Changeset under review. You can see a number of patches (changes) listed in the review screen. Each of those patches was an amendment to the original commit.

Individual patches amend the changeset

Individual patches amend the changeset

For each patch in Gerrit, there are three sets of “labels” that may be applied to the patch. Anyone can comment on a Changeset and/or review the code. A review is shown on the patch in the Code-Review column in the patch “labels matrix”:

The "label matrix" on a Gerrit patch

The “label matrix” on a Gerrit patch

Non-core team members may give the patch a Code-Review label of +1 (Looks good to me), 0 (No strong opinion), or -1 (I would prefer you didn’t merge this). Core team members can give any of those values, plus +2 (Looks good to me, approved) and -2 (Do not submit).

The other columns in the label matrix are Verified and Approved. Only non-interactive users of Gerrit, such as Jenkins, are allowed to add a Verified label to a patch. The external testing platform you will set up is one of these non-interactive users. The value of the Verified label will be +1 (check pipeline tests passed), -1 (check pipeline tests failed), +2 (gate pipeline tests passed), or -2 (gate pipeline tests failed).

Only members of the OpenStack project’s core team can add an Approved label to a patch. It is either a +1 (Approved) value or not, appearing as a check mark in the Approved column of the label matrix:

An approved patch.

An approved patch.

Continuous Integration Testing

Continuous integration (CI) testing is the act of running tests that validate a full application environment on a continual basis — i.e. when any change is proposed to the application. Typically, when talking about CI, we are referring to tests that are run against a full, real-world installation of the project. This type of testing, called integration testing, ensures that proposed changes to one component do not cause failures in other components. This is especially important for complex multi-project systems like OpenStack, with non-trivial dependencies between subsystems.

When code is pushed to Gerrit, a series of jobs are triggered that run a series of tests against the proposed code. Jenkins is the server that executes and manages these jobs. It is a Java application with an extensible architecture that supports plugins that add functionality to the base server.

Each job in Jenkins is configured separately. Behind the scenes, Jenkins stores this configuration information in an XML file in its data directory. You may manually edit a Jenkins job as an administrator in Jenkins. However, in a testing platform as large as the upstream OpenStack CI system, doing so manually would be virtually impossible and fraught with errors. Luckily, there is a helper tool called Jenkins Job Builder (JJB) that constructs these XML configuration files after reading a set of YAML files and job templating rules. We will describe JJB later in the article.

The “Gate”

When we talk about “the gate”, we are talking about the process by which code is kept out of a set of source code branches if certain conditions are not met.

OpenStack projects use a method of controlling merges into certain branches of their source trees called the Non-Human Gatekeeper model [1]. Gerrit (the non-human) is configured to allow merges by users in a group called “Non-Interactive Users” to the master and stable branches of git repositories under its control. The upstream main Jenkins CI server, as well as Jenkins CI systems running at third party locations, are the users in this group.

So, how do these non-interactive users actually decide whether to merge a proposed patch into the target branch? Well, there is a set of tests (different for each project) — unit, functional, integration, upgrade, style/linting — that is marked as “gating” that particular project’s source trees. For most of the OpenStack projects, there are unit tests (run in a variety of different supported versions of Python) and style checker tests for HACKING and PEP8 compliance. These unit and style tests are run in Python virtualenvs managed by the tox testing utility.

In addition to the Python unit and style tests, there are a number of integration tests that are executed against full installations of OpenStack. The integration tests are simply subsets of the Tempest integration test suite. Finally, many projects also include upgrade and schema migration tests in their gate tests.

How Upstream Testing Works

Graphically, the upstream continuous integration gate testing system works like this:

gerrit-zuul-jenkins-flow

We step through this event flow in detail below, referencing the numbered steps in bold.

The Gerrit Event Stream and Zuul

After a contributor has pushed (1a) a new patch to a changeset or a core team member has reviewed the patch and added an Approved +1 label (1b), Gerrit pushes out a notification event to its event stream (2). This event stream can have a number of subscribers, including the Gerrit Jenkins plugin and Zuul. Zuul was developed to manage the many complex graphs of interdependent branch merge proposals in the upstream system. It monitors in-progress jobs for a set of related patches and will pre-emptively cancel any dependent test jobs that would not succeed due to a failure in a dependent patch [2].

In addition to this dependency monitoring, Zuul is responsible for constructing the pipelines of jobs that should be executed on various events. One of these pipelines is called the “gate” pipeline, appropriately named for the set of jobs that must succeed in order for a proposed patch to be merged into a target branch.

Zuul’s pipelines are configured in a single file called layout.yaml in the OpenStack-Infra config project. Here’s a snippet from that file that constructs the gate pipeline:

  - name: gate
    description: Changes that have been approved by core developers...
    failure-message: Build failed. For information on how to proceed...
    manager: DependentPipelineManager
    precedence: low
    trigger:
      gerrit:
        - event: comment-added
          approval:
            - approved: 1
        - event: comment-added
          comment_filter: (?i)^\s*reverify( (?:bug|lp)[\s#:]*(\d+))\s*$
    start:
      gerrit:
        verified: 0
    success:
      gerrit:
        verified: 2
        submit: true
    failure:
      gerrit:
        verified: -2

Zuul listens to the Gerrit event stream (3), and matches the type of event to one or more pipelines (4). The matching conditions for the gate pipeline are configured in the trigger:gerrit: section of the YAML snippet above:

    trigger:
      gerrit:
        - event: comment-added
          approval:
            - approved: 1
        - event: comment-added
          comment_filter: (?i)^\s*reverify( (?:bug|lp)[\s#:]*(\d+))\s*$

The above indicates that Zuul should fire the gate pipeline when it sees reviews with an Approved +1 label, and any comment to the review that contains “reverify” with or without a bug identifier. Note that there is a similar pipeline that is fired when a new patchset is created or when a review comment is made with the word “recheck”. This pipeline is called the check pipeline. Look in the layout.yaml file for the configuration of the check pipeline.

Once the appropriate pipeline is matched, Zuul executes (5) that particular pipeline for the project that had a patch proposed.

But wait, hold up…“, you may be asking yourself, “how does Zuul know which Jenkins jobs to execute for a particular project and pipeline?“. Great question! :)

Also in the layout.yaml file, there is a section that configures which Jenkins jobs should be run for each project. Let’s take a look at the configuration of the gate pipeline for the Cinder project:

  - name: openstack/cinder
    template:
      - name: python-jobs
...snip...
    gate:
      - gate-cinder-requirements
      - gate-tempest-dsvm-full
      - gate-tempest-dsvm-postgres-full
      - gate-tempest-dsvm-neutron
      - gate-tempest-dsvm-large-ops
      - gate-tempest-dsvm-neutron-large-ops
      - gate-grenade-dsvm

Each of the lines in the gate: section indicate a specific Jenkins job that should be run in the gate pipeline for Cinder. In addition, there is the python-jobs item in the template: section. Project templates are a way that Zuul consolidates configuration of many similar jobs into a simple template configuration. The project template definition for python-jobs looks like this (still in layout.yaml:

project-templates:
  - name: python-jobs
...snip...
    gate:
      - 'gate-{name}-docs'
      - 'gate-{name}-pep8'
      - 'gate-{name}-python26'
      - 'gate-{name}-python27'

So, on determing which Jenkins jobs should be executed for a particular pipeline, Zuul sees the python-jobs project template in the Cinder configuration and expands that to execute the following Jenkins jobs:

  • gate-cinder-docs
  • gate-cinder-pep8
  • gate-cinder-python26
  • gate-cinder-python27

Jenkins Job Creation and Configuration

I previously mentioned that the configuration of an individual Jenkins job is stored in a config.xml file in the Jenkins data directory. Now, at last count, the upstream OpenStack Jenkins CI system has just shy of 2,000 jobs. It would be virtually impossible to manage the configuration of so many jobs using human-based processes. To solve this dilemma, the Jenkins Job Builder (JJB) python tool was created. JJB consumes YAML files that describe both individual Jenkins jobs as well as templates for parameterized Jenkins jobs, and writes the config.xml files for all Jenkins jobs that are produced from those templates. Important: Note that Zuul does not construct Jenkins jobs. JJB does that. Zuul simply configures which Jenkins jobs should run for a project and a pipeline.

There is a master projects.yaml file in the same directory that lists the “top-level” definitions of jobs for all projects, and it is in this file that many of the variables that are used in job template instantiation are defined (including the {name} variable, which corresponds to the name of the project.

When JJB constructs the set of all Jenkins jobs, it reads the projects.yaml file, and for each project, it sees the “name” attribute of the project, and substitutes that name attribute value wherever it sees {name} in any of the jobs that are defined for that project. Let’s take a look at the Cinder project’s definition in the projects.yaml file here:

- project:
    name: cinder
    github-org: openstack
    node: bare-precise
    tarball-site: tarballs.openstack.org
    doc-publisher-site: docs.openstack.org

    jobs:
      - python-jobs
      - python-grizzly-bitrot-jobs
      - python-havana-bitrot-jobs
      - openstack-publish-jobs
      - gate-{name}-pylint
      - translation-jobs

You will note one of the items in the jobs section is called python-jobs. This is actually not a single Jenkins job, but actually a job group. A job group definition is merely a list of jobs or job templates. Let’s take a look at the definition of the python-jobs job group:

- job-group:
    name: python-jobs
    jobs:
      - '{name}-coverage'
      - 'gate-{name}-pep8'
      - 'gate-{name}-python26'
      - 'gate-{name}-python27'
      - 'gate-{name}-python33'
      - 'gate-{name}-pypy'
      - 'gate-{name}-docs'
      - 'gate-{name}-requirements'
      - '{name}-tarball'
      - '{name}-branch-tarball'

Each of the items listed in the jobs section of the python-jobs job group definition above is a job template. Job templates are expanded in the same way as Zuul project templates and JJB job groups are expanded. Let’s take a look at one such job template in the list above, called gate-{name}-python27.

(Hint: all Jenkins jobs for any OpenStack or Stackforge project are described in the OpenStack-Infra Config project’s modules/openstack_projects/files/jenkins_jobs/config/ directory).

The python-jobs.yaml file in the modules/openstack_project/files/jenkins_job_builder/config directory contains the definition of common Python project Jenkins job templates. One of those job templates is gate-{name}-python27:

- job-template:
    name: 'gate-{name}-python27'
... snip ...
    builders:
      - gerrit-git-prep
      - python27:
          github-org: '{github-org}'
          project: '{name}'
      - assert-no-extra-files

    publishers:
      - test-results
      - console-log

    node: '{node}'

Looking through the above job template definition, you will see a section called “builders“. The builders section of a job template lists (in sequential order of expected execution) the executable sections or scripts of the Jenkins job. The first executable section in the gate-{name}-python27 job template is called “gerrit-git-prep“. This executable section is defined in macros.yaml, which contains a number of commonly-run scriptlets. Here’s the entire gerrit-git-prep macro definition:

- builder:
    name: gerrit-git-prep
    builders:
      - shell: "/usr/local/jenkins/slave_scripts/gerrit-git-prep.sh https://review.openstack.org http://zuul.openstack.org git://git.openstack.org"

So, gerrit-git-prep is simply executing a Bash script called “gerrit-git-prep.sh” that is stored in the /usr/local/jenkins/slave_scripts/ directory. Let’s take a look at that file. You can find it in the /modules/jenkins/files/slave_scripts/ [3]
directory in the same OpenStack Infra Config project:

#!/bin/bash -e
 
GERRIT_SITE=$1
ZUUL_SITE=$2
GIT_ORIGIN=$3
 
# ... snip ...
 
set -x
if [[ ! -e .git ]]
then
    ls -a
    rm -fr .[^.]* *
    if [ -d /opt/git/$ZUUL_PROJECT/.git ]
    then
        git clone file:///opt/git/$ZUUL_PROJECT .
    else
        git clone $GIT_ORIGIN/$ZUUL_PROJECT .
    fi
fi
git remote set-url origin $GIT_ORIGIN/$ZUUL_PROJECT
 
# attempt to work around bugs 925790 and 1229352
if ! git remote update
then
    echo "The remote update failed, so garbage collecting before trying again."
    git gc
    git remote update
fi
 
git reset --hard
if ! git clean -x -f -d -q ; then
    sleep 1
    git clean -x -f -d -q
fi
 
if [ -z "$ZUUL_NEWREV" ]
then
    git fetch $ZUUL_SITE/p/$ZUUL_PROJECT $ZUUL_REF
    git checkout FETCH_HEAD
    git reset --hard FETCH_HEAD
    if ! git clean -x -f -d -q ; then
        sleep 1
        git clean -x -f -d -q
    fi
else
    git checkout $ZUUL_NEWREV
    git reset --hard $ZUUL_NEWREV
    if ! git clean -x -f -d -q ; then
        sleep 1
        git clean -x -f -d -q
    fi
fi
 
if [ -f .gitmodules ]
then
    git submodule init
    git submodule sync
    git submodule update --init
fi

The purpose of the script above is simple: Check out the source code of the proposed Gerrit changeset and ensure that the source tree is clean of any cruft from a previous run of a Jenkins job that may have run in the same Jenkins workspace. The concept of a workspace is important. When Jenkins runs a job, it must execute that job from within a workspace. The workspace is really just an isolated shell environment and filesystem directory that has a set of shell variables export’d inside it that indicate a variety of important identifiers, such as the Jenkins job ID, the name of the source code project that has triggered a job, the SHA1 git commit ID of the particular proposed changeset that is being tested, etc [4].

The next builder in the job template is the “python27” builder, which has two variables injected into itself:

      - python27:
          github-org: '{github-org}'
          project: '{name}'

The github-org variable is a string of the already existing {github-org} variable value. The project variable is populated with the value of the {name} variable. Here’s how the python27 builder is defined (in macros.yaml:

- builder:
    name: python27
    builders:
      - shell: "/usr/local/jenkins/slave_scripts/run-unittests.sh 27 {github-org} {project}"

Again, just a wrapper around another Bash script, called run-unittests.sh in the /usr/local/jenkins/slave_scripts directory. Here’s what that script looks like:

version=$1
org=$2
project=$3
 
# ... snip ...
 
venv=py$version
 
# ... snip ...
 
source /usr/local/jenkins/slave_scripts/select-mirror.sh $org $project
 
tox -e$venv
result=$?
 
echo "Begin pip freeze output from test virtualenv:"
echo "======================================================================"
.tox/$venv/bin/pip freeze
echo "======================================================================"
 
if [ -d ".testrepository" ] ; then
# ... snip ...
    .tox/$venv/bin/python /usr/local/jenkins/slave_scripts/subunit2html.py ./subunit_log.txt testr_results.html
    gzip -9 ./subunit_log.txt
    gzip -9 ./testr_results.html
# ... snip ...
fi
 
# ... snip ...

In short, for the Python 2.7 builder, the above runs the command tox -epy27 and then runs a prettifying script and gzips up the results of the unit test run. And that’s really the meat of the Jenkins job. We will discuss the publishing of the job artifacts a little later in this article, but if you’ve gotten this far, you have delved deep into the mines of the OpenStack CI system. Congratulations!

Devstack-Gate and Running Tempest Against a Real Environment

OK, so unit tests running in a simple Jenkins slave workspace are one thing. But what about Jenkins jobs that run integration tests against a full set of OpenStack endpoints, interacting with real database and message queue services? For these types of Jenkins jobs, things are more complicated. Yes, I know. You probably think things have been complicated up until this point, and you’re right! But the simple unit test jobs above are just the tip of the proverbial iceberg when it comes to the OpenStack CI platform.

For these complex Jenkins jobs, an additional set of tools are added to the mix:

  • Nodepool — Provides virtual machine instances to Jenkins masters for running complex, isolation-sensitive Jenkins jobs
  • Devstack-Gate — Scripts that create an OpenStack environment with Devstack, run tests against that environment, and archive logs and results

Assignment of a Node to Run a Job

Different Jenkins jobs require different workspaces, or environments, in which to run. For basic unit or style-checking test jobs, like the gate-{name}-python27 job template we dug into above, not much more is needed than a tox-managed virtualenv running in a source checkout of the project with a proposed change. However, for Jenkins jobs that run a series of integration tests against a full OpenStack installation, a workspace with significantly more resources and isolation is necessary. For these latter types of jobs, the upstream CI platform uses a pool of virtual machine instances. This pool of virtual machine instances is managed by a tool called nodepool. The virtual machines run in both HP Cloud and Rackspace Cloud, who graciously donate these instances for the upstream CI system to use. You can see the configuration of the Nodepool-managed set of instances here.

Instances that are created by Nodepool run Jenkins slave software, so that they can communicate with the upstream Jenkins CI master servers. A script called prepare_node.sh runs on each Nodepool instance. This script just git clones the OpenStack Infra config project to the node, installs Puppet, and runs a Puppet manifest that sets up the node based on the type of node it is. There are bare nodes, nodes that are meant to run Devstack to install OpenStack, and nodes specific to the Triple-O project. The node type that we will focus on here is the node that is meant to run Devstack. The script that runs to prepare one of these nodes is prepare_devstack_node.sh, which in turn calls prepare_devstack.sh. This script caches all of the repositories needed by Devstack, along with Devstack itself, in a workspace cache on the node. This workspace cache is used to enable fast reset of the workspace that is used during the running of a Jenkins job that uses Devstack to construct an OpenStack environment.

Devstack-Gate

The Devstack-Gate project is a set of scripts that are executed by certain Jenkins jobs that need to run integration or upgrade tests against a realistic OpenStack environment. Going back to the Cinder project configuration in the Zuul layout.yaml file:

  - name: openstack/cinder
    template:
      - name: python-jobs
... snip ...
    gate:
      - gate-cinder-requirements
      - gate-tempest-dsvm-full
      - gate-tempest-dsvm-postgres-full
      - gate-tempest-dsvm-neutron
      - gate-tempest-dsvm-large-ops
      - gate-tempest-dsvm-neutron-large-ops
      - gate-grenade-dsvm
... snip ...

Note the highlighted line. That Jenkins job template is one such job that needs an isolated workspace that has a full OpenStack environment running on it. Note that “dsvm” stands for “Devstack virtual machine”.

Let’s take a look at the JJB configuration of the gate-tempest-dsvm-full job:

- job-template:
    name: '{pipeline}-tempest-dsvm-full{branch-designator}'
    node: '{node}'
... snip ...
    builders:
      - devstack-checkout
      - shell: |
          #!/bin/bash -xe
          export PYTHONUNBUFFERED=true
          export DEVSTACK_GATE_TIMEOUT=180
          export DEVSTACK_GATE_TEMPEST=1
          export DEVSTACK_GATE_TEMPEST_FULL=1
          export BRANCH_OVERRIDE={branch-override}
          if [ "$BRANCH_OVERRIDE" != "default" ] ; then
              export OVERRIDE_ZUUL_BRANCH=$BRANCH_OVERRIDE
          fi
          cp devstack-gate/devstack-vm-gate-wrap.sh ./safe-devstack-vm-gate-wrap.sh
          ./safe-devstack-vm-gate-wrap.sh
      - link-logs

    publishers:
      - devstack-logs
      - console-log

The devstack-checkout builder is simply a Bash script macro that looks like this:

- builder:
    name: devstack-checkout
    builders:
      - shell: |
          #!/bin/bash -xe
          if [[ ! -e devstack-gate ]]; then
              git clone git://git.openstack.org/openstack-infra/devstack-gate
          else
              cd devstack-gate
              git remote set-url origin git://git.openstack.org/openstack-infra/devstack-gate
              git remote update
              git reset --hard
              if ! git clean -x -f ; then
                  sleep 1
                  git clean -x -f
              fi
              git checkout master
              git reset --hard remotes/origin/master
              if ! git clean -x -f ; then
                  sleep 1
                  git clean -x -f
              fi
              cd ..
          fi

All the above is doing is git clone’ing the devstack-gate repository into the Jenkins workspace, and if the devstack-gate repository already exists, checks out the latest from the master branch.

Returning to our gate-tempest-dsvm-full JJB job template, we see the remaining part of the builder is a Bash scriptlet like so:

          #!/bin/bash -xe
          export PYTHONUNBUFFERED=true
          export DEVSTACK_GATE_TIMEOUT=180
          export DEVSTACK_GATE_TEMPEST=1
          export DEVSTACK_GATE_TEMPEST_FULL=1
          export BRANCH_OVERRIDE={branch-override}
          if [ "$BRANCH_OVERRIDE" != "default" ] ; then
              export OVERRIDE_ZUUL_BRANCH=$BRANCH_OVERRIDE
          fi
          cp devstack-gate/devstack-vm-gate-wrap.sh ./safe-devstack-vm-gate-wrap.sh
          ./safe-devstack-vm-gate-wrap.sh

Not all that complicated. It exports some environment variables and copies the devstack-vm-gate-wrap.sh script out of the devstack-gate repo that was clone’d in the devstack-checkout macro to the work directory and then runs that script.

The devstack-vm-gate-wrap.sh script is responsible for setting even more environment variables and then calling the devstack-vm-gate.sh script, which is where the real magic happens.

Construction of OpenStack Environment with Devstack

The devstack-vm-gate.sh script is responsible for constructing a full OpenStack environment and running integration tests against that environment. To construct this OpenStack environment, it uses the excellent Devstack project. Devstack is an elaborate series of Bash scripts and functions that clones each OpenStack project source code into /opt/stack/new/$project [5]— , runs python setup.py install in each project checkout, and starts each relevant OpenStack service (e.g. nova-compute, nova-scheduler, etc) in a separate Linux screen session.

Devstack’s creation script (stack.sh) is called from the script after creating the localrc file that stack.sh uses when constructing the Devstack environment.

Execution of Integration Tests Against an OpenStack Environment

Once the OpenStack environment is constructed, the devstack-vm-gate.sh script continue on to run a series of integration tests:

    cd $BASE/new/tempest
    if [[ "$DEVSTACK_GATE_TEMPEST_ALL" -eq "1" ]]; then
        echo "Running tempest all test suite"
        sudo -H -u tempest tox -eall -- --concurrency=$TEMPEST_CONCURRENCY
        res=$?
    elif [[ "$DEVSTACK_GATE_TEMPEST_FULL" -eq "1" ]]; then
        echo "Running tempest full test suite"
        sudo -H -u tempest tox -efull -- --concurrency=$TEMPEST_CONCURRENCY
        res=$?

You will note that the $DEVSTACK_GATE_TEMPEST_FULL Bash environment variable was set to “1” in the gate-tempest-dsvm-full Jenkins job builder scriptlet.

sudo -H -u tempest tox -efull triggers the execution of Tempest’s integration test suite. Tempest is the collection of canonical OpenStack integration tests that are used to validate that OpenStack APIs work according to spec and that patches to one OpenStack service do not inadvertently cause failures in another service.

If you are curious what actual commands are run, you can check out the tox.ini file in Tempest:

[testenv:full]
# The regex below is used to select which tests to run and exclude the slow tag:
# See the testrepostiory bug: https://bugs.launchpad.net/testrepository/+bug/1208610
commands =
  bash tools/pretty_tox.sh '(?!.*\[.*\bslow\b.*\])(^tempest\.(api|scenario|thirdparty|cli)) {posargs}'

In short, the above runs the Tempest API, scenario, CLI, and thirdparty tests.

Archival of Test Artifacts

The final piece of the puzzle is archiving all of the artifacts from the Jenkins job execution. These artifacts include log files from each individual OpenStack service running in Devstack’s screen sessions, the results of the Tempest test suite runs, as well as echo’d output from the devstack-vm-gate* scripts themselves.

These artifacts are gathered together by the devstack-logs and console-log JJB publisher macros:

- publisher:
    name: console-log
    publishers:
      - scp:
          site: 'static.openstack.org'
          files:
            - target: 'logs/$LOG_PATH'
              copy-console: true
              copy-after-failure: true


- publisher:
    name: devstack-logs
    publishers:
      - scp:
          site: 'static.openstack.org'
          files:
            - target: 'logs/$LOG_PATH'
              source: 'logs/**'
              keep-hierarchy: true
              copy-after-failure: true

Conclusion

I hope this article has helped you understand a bit more how the OpenStack continuous integration platform works. We’ve stepped through the flow through the various components of the platform, including which events trigger what actions in each components. You should now have a good idea how the various parts of the upstream CI infrastructure are configured and where to go look in the source code for more information.

The next article in this series discusses how to construct your own external testing platform that is linked with the upstream OpenStack CI platform. Hopefully, this article will provide you most of the background information you need to understand the steps and tools involved in that external testing platform construction.


[1]— The link describes and illustrates the non-human gatekeeper model with Bazaar, but the same concept is applicable to Git. See the OpenStack GitWorkflow pages for an illustration of the OpenStack specific model.
[2]— Zuul really is a pretty awesome bit of code kit. Jim Blair, the author, does an excellent job of explaining the merge proposal dependency graph and how Zuul can “trim” dead-end branches of the dependency graph in the Zuul documentation.
[3]— Looking for where a lot of the “magic” in the upstream gate happens? Take an afternoon to investigate the scripts in this directory. :)
[4]— Gerrit Jenkins plugin and Zuul export a variety of workspace environment variables into the Jenkins jobs that they trigger. If you are curious what these variables are, check out the Zuul documentation on parameters.
[5]— The reason the projects are installed into /opt/stack/new/$project is because the current HEAD of the target git branch for the project is installed into /opt/stack/old/$project. This is to allow an upgrade test tool called Grenade to test upgrade paths.

Working with the OpenStack Code Review and CI system – Chef Edition

For too long, the state of the OpenStack Chef world had been one of duplicative effort, endless forks of Chef cookbooks, and little integration with how many of the OpenStack projects choose to control source code and integration testing. Recently, however, the Chef + OpenStack community has been getting its proverbial act together. Folks from lots of companies have come together and pushed to align efforts to produce a set of well-documented, flexible, but focused Chef cookbooks that install and configure OpenStack services.

My sincere thanks go out to the individuals who have helped to make progress in the last couple weeks, the individuals on the upstream openstack.org continuous integration team, and of course, the many authors of cookbooks whose code and work is being merged together.

OK, so what’s happened?

StackForge now hosting a set of Chef cookbooks for OpenStack

Individual cookbooks for each integrated OpenStack project have been created in the StackForge GitHub organization. Each cookbook name is prefixed with cookbook-openstack- followed by the OpenStack service name (not the project code name):

Note that we have not yet created the cookbook for Heat, but that will be coming in the Havana timeframe, for sure. Also note that the Ceilometer (metering) cookbook is empty right now. We’re in the process of pulling the ceilometer recipes out of the compute cookbook into a separate cookbook.

UPDATE: The Heat cookbook repository is now up on Stackforge.

In addition to the OpenStack project cookbooks listed above, there are three other related cookbooks:

Finally, there will be another repository called openstack-chef-repo that will contain example Chef roles, databags and documentation showing how all the OpenStack and supporting cookbooks are tied together to create an OpenStack deployment.

UPDATE: The OpenStack Chef Repository is now up on Stackforge.

Code in cookbooks gated by Gerrit like any other OpenStack project

The biggest advantage of hosting all these Chef cookbooks on the StackForge GitHub repository is the easy integration with the upstream continuous integration system. The upstream CI team has built a metric crap-ton (technical term) of automation code that enabled us to quickly have Gerrit managing the patch queues and code reviews for all these cookbook repositories as well as have each repository guarded by a set of gate jobs that run linter and unit tests against the cookbooks.

The rest of this blog post explains how to use the development and continuous integration systems when working on the OpenStack Chef cookbooks housed in Stackforge.

Prepare to develop on a cookbook

OK, so you want to start working on one of the OpenStack Chef cookbooks? Great! You will need to install the git-review plugin. The easiest way to do so is to use pip:

sudo pip install git-review

The first thing you need to do is clone the appropriate Git repository containing the cookbook code and set up your Gerrit credentials. Here is the code to do that:

git clone git@github.com:stackforge/cookbook-openstack-$SERVICE
cd cookbook-openstack-$SERVICE
git review -s

Of course, replace $SERVICE above with one of common, compute, identity, image, block-storage, object-storage, network, metering, or dashboard. What that will do is clone the upstream Stackforge repository for the corresponding cookbook to your local machine, change directory into that clone’d repository, and set up a git remote called “gerrit” pointing to the review.openstack.org Gerrit system.

If everything was successful, you should see something like this:

jpipes@uberbox:~/gerrit-tut$ git clone git@github.com:stackforge/cookbook-openstack-common
Cloning into 'cookbook-openstack-common'...
remote: Counting objects: 506, done.
remote: Compressing objects: 100% (168/168), done.
remote: Total 506 (delta 246), reused 503 (delta 243)
Receiving objects: 100% (506/506), 81.97 KiB, done.
Resolving deltas: 100% (246/246), done.
jpipes@uberbox:~/gerrit-tut$ cd cookbook-openstack-common/
jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ git review -s
Creating a git remote called "gerrit" that maps to:
	ssh://jaypipes@review.openstack.org:29418/stackforge/cookbook-openstack-common.git

Repeat the above for each cookbook you wish to clone and work on locally, or simply execute this to clone them all:

for i in common compute identity image block-storage object-storage network metering dashboard;\
do git clone git@github.com:stackforge/cookbook-openstack-$i; cd cookbook-openstack-$i; git review -s; cd ../;\
done

Start to develop on a cookbook

Now that you have git clone’d the upstream cookbook repository and set up your Gerrit remote properly, you can begin coding on the cookbook. Remember, however, that you should never make changes in your local “master” branch. Always work in a local topic branch. This allows you to work on a branch of code separately from the local master branch you will use to bring in changes from other developers.

Create a new topic branch like so:

git checkout -b <TOPIC_NAME>

Here is an example of what you can expect to see:

jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ git checkout -b tut-example
Switched to a new branch 'tut-example'

Once you are checked out into your topic branch, you can now add, edit, delete, and move files around as you wish. When you have made the changes you want to make, you then need to commit your changes to the working tree in source control.

IMPORTANT NOTE:: If you created any new files while working in your branch, you will need to tell Git about those new files before you commit. An easy way to check if you’ve added any new files that should be added to Git source control is to always call git status before doing your commit. git status will tell you if there are any untracked files in your working tree that you may need to add to Git:

jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ touch something_new.txt
jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ git status
# On branch tut-example
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	something_new.txt
nothing added to commit but untracked files present (use "git add" to track)

As the note shows, you use the git add command to add the untracked file to source control:

jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ git status
# On branch tut-example
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	new file:   something_new.txt
#

If you make changes to files, they will show up in git status as changed files, as shown here:

jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ vi README.md 
jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ git status
# On branch tut-example
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	new file:   something_new.txt
#
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   README.md
#

As you can see, I edited the README.md file, and the call to git status shows that file as modified. If you want to review the changes you made, use the git diff command:

jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ git diff
diff --git a/README.md b/README.md
index 4fbbd57..e197d3b 100644
--- a/README.md
+++ b/README.md
@@ -24,6 +24,8 @@ of all the settable attributes for this cookbook.
 
 Note that all attributes are in the `default["openstack"]` "namespace"
 
+TODO(jaypipes): Should we list all the attributes in the README?
+
 Libraries
 =========

If you are happy with the changes, you’re now ready to commit those changes to source control. Call git commit, like so:

git commit -a

This will open up your text editor and present you with an area to write your commit message describing the contents of your patch. Commit messages should be properly formatted and abide by the upstream conventions. Feel free to read that link, but here is a brief rundown of stuff to keep in mind:

  • Make the first line of the commit message 50 chars or less
  • Separate the first line from the rest of the commit message with a blank newline
  • Make the commit message descriptive of what the patch is and what the motivation for the patch was
  • Do NOT make the commit message into a list of the things in the patch you changed in each revision — we can already see what is contained in the patch

Save and close your editor to finalize the commit. Once successfully committed, you now need to push your changes to the Gerrit code review and patch management system on review.openstack.org. You do this using a call to git review.

When you issue a call to git review for any of the cookbooks, a patch review is created in Gerrit. Behind the scenes, the git review plugin is simply doing the call for you to git push to the Gerrit remote. You can always see what git review is doing by passing the -v flag, like so:

jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ git commit -a
[tut-example 66f844a] Simple patch for tutorial -- please IGNORE
 1 file changed, 2 insertions(+)
 create mode 100644 something_new.txt
jpipes@uberbox:~/gerrit-tut/cookbook-openstack-common$ git review -v
2013-05-20 12:48:54.333823 Running: git log --color=never --oneline HEAD^1..HEAD
2013-05-20 12:48:54.337044 Running: git remote
2013-05-20 12:48:54.339673 Running: git branch -a --color=never
2013-05-20 12:48:54.342580 Running: git rev-parse --show-toplevel --git-dir
2013-05-20 12:48:54.345139 Running: git remote update gerrit
Fetching gerrit
2013-05-20 12:48:55.475616 Running: git rebase -i remotes/gerrit/master
2013-05-20 12:48:55.616412 Running: git reset --hard ORIG_HEAD
2013-05-20 12:48:55.620552 Running: git config --get color.ui
2013-05-20 12:48:55.623098 Running: git log --color=always --decorate --oneline HEAD --not remotes/gerrit/master --
2013-05-20 12:48:55.626745 Running: git branch --color=never
2013-05-20 12:48:55.629703 Running: git log HEAD^1..HEAD
Using local branch name "tut-example" for the topic of the change submitted
2013-05-20 12:48:55.634665 Running: git push gerrit HEAD:refs/publish/master/tut-example
remote: Resolving deltas: 100% (2/2)
remote: Processing changes: new: 1, done    
remote: 
remote: New Changes:
remote:   https://review.openstack.org/29797
remote: 
To ssh://jaypipes@review.openstack.org:29418/stackforge/cookbook-openstack-common.git
 * [new branch]      HEAD -> refs/publish/master/tut-example
2013-05-20 12:48:56.775939 Running: git rev-parse --show-toplevel --git-dir

As you can see in the above output, a new patch review was created at the location https://review.openstack.org/29797. You can go to that URL and see the patch review, shown here before any comments or reviews have been made on the patch.

gerrit-review-29797

Reviewing patches with Gerrit

Gerrit has three separate levels of reviews: Verify, Code-Review, and Approve.

The Verify (V) review level

The Verify level is limited to the Jenkins Gerrit user, which runs the automated tests that protect each cookbook repository’s master branch. These automated tests are known as gate tests.

When you push code to Gerrit, there are a set of automatic tests that are run against your code by Jenkins. Jenkins is a continuous integration job system that the upstream OpenStack CI team has integrated into Gerrit so that projects managed by Gerrit may have a series of automated check jobs run against proposed patches to the project. Below, you can see that the Jenkins user in Gerrit has already executed two jobs — gate-cookbook-openstack-common-chef-lint and gate-cookbook-openstack-common-chef-unit — against the proposed code changes. The jobs (expectedly) both pass, as I haven’t actually changed anything in the code, only added a blank file and added a line to the README file.

jenkins-jobs-29797

Curious about how those gate jobs are set up? Check out the github.com openstack-infra/config project. Hint: look at these two files.

If the Jenkins jobs fail, you will see Jenkins issue a -1 in the V column in the patch review. Any -1 from Jenkins as a result of a failed gate test will prevent the patch from being merged into the target branch, regardless of the reviews of any human.

The Code-Review (R) review level

Anyone who is logged in to Gerrit can review any proposed patch in Gerrit. To log in to Gerrit, click the “Sign In” link in the top right corner and log in using the Launchpad Single-Signon service. Note: This requires you to have an account on Launchpad.

Once logged in, you will see a “Review” button on each patch in the patchset. You can see this Review button in the images above. If you were the one that pushed the commit, you will also see buttons for “Abandon”, “Work in Progress”, and “Rebase Change”. The “Abandon” button simply lets you mark the patchset as abandoned and gets the patch off of the review radar. “Work in Progress” lets you mark the patchset as not ready for reviews, and “Rebase Change” is generally best left alone unless you know what you’re doing. ;)

Each file (including the commit message itself) has a corresponding link that you can view the diff of that file and add inline comments similar to how GitHub pull requests allow inline commenting. Simply double click directly below the line you wish to comment on, and a box for your comments will appear, as shown below:

inline-review-29797

IMPORTANT NOTE: Unlike GitHub inline commenting on pull requests, your inline comments on Gerrit reviews are NOT viewable by others until you finalize your review by clicking the “Review” button. Your comments will appear in red as “Draft” comments on the main page of the patch review, as shown below:

inline-draft-29797

To put in a review for the patch, click the “Review” button. You will see options for:

  • +1 Looks good to me, but someone else must approve
  • +0 No score
  • -1 I would prefer you didn’t merge this

If you are a core reviewer, in addition to the above three options, you will also see:

  • +2 Looks good to me (core reviewer)
  • -2 Do Not Merge

There is also a comment are for you to put your review comments, which is directly above an area that shows all the inline comments you have made:

review-29797

After selecting the review +/- that matches your overall thoughts on the patch, and entering any comment, click the “Publish Comments” button, and your review will show in the comments of the patch, as shown below:

reviewed-29797

The Approve (A) review level

Members of the core review team also see a separate area in the review screen for the Approve (A) review level. This level tells Gerrit to either proceed with the merge of the patch into the target branch (+1 Approve) or to wait (0 No Score).

The general rule of thumb is that core reviewers should not hit +1 Approve until 2 or more core reviewers (can include the individual doing the +1 Approve) have added a +2 (R) to the patch in reviews. This rule is subject to the discretion of the core reviewer for trivial changes like typo fixes, etc.

Summary

I hope this tutorial has been a help for those new to the Gerrit and Jenkins integration used by the OpenStack upstream projects. Contributing to the Chef OpenStack cookbooks should be no different than contributing to the upstream OpenStack projects now, and additional gate tests — including full integration test runs using Vagrant or even a multi-node deployment — are on our TODO radar. Please sign up on the OpenStack Chef mailing list if you haven’t already. We look forward to your contributions!

Ushering in the OpenStack Essex Release

As some of you may have noticed, the OpenStack community published its latest six-month release, codenamed Essex, this week[1]. As shown in the release notes, there’s a massive amount of change that comes in this release.

Some of that change is quite visible. For example, the dashboard project, code-named Horizon, was entirely overhauled and became a core OpenStack project in the Essex release cycle. The new Horizon is pretty stunning[2], if I may say so myself. Other visible awesomeness comes from Anne Gentle and the dozens of contributors who worked on the new API documentation site. It’s an excellent, and well-needed, resource for the community of developers who want to build applications on OpenStack clouds.

Other innovations weren’t so visible, but were just as impactful. The Swift development team added the ability for objects to expire, the ability to post objects via HTML forms with the “tempurl” functionality, and integration with the authentication mechanism in the OpenStack Identity API 2.0.

Under Vishvananda Ishaya‘s continued leadership, contributors to the OpenStack Compute project, code-named Nova, focused on a number of things in the past six months. Notably, on the feature front, floating IP pools and role-based access control were added. A variety of internal subsystems were dramatically refactored, including de-coupling the network service from Nova itself — something critical to scaling the network service with the Quantum and Melange incubated projects — as well as separating the volume endpoint into its own service. In addition, the remote procedure call subsystem was streamlined (again) and the way API extensions are handled in source code was cleaned up substantially. On the performance front, there were numerous bug fixes, but one that stands out is the overhaul of the metadata service that Vish completed. This one patch dramatically improves performance of the metadata service used by things like cloud-init when setting up new launched instances. You can see the entire list of 53 blueprints implemented and 765 bugs fixed in Nova in the Essex release here. Pretty impressive.

Over in the OpenStack Image Service project, code-named Glance, we focused on performance and stability in this cycle. With a fresh infusion of contributors like Reynolds Chin, Eoghan Glynn and Stuart McLaren, the Glance project made some dramatic improvements. Notably, Reynolds Chin added a visual progressbar to the Glance CLI tool when uploading images, Stuart McLaren submitted patches that enabled a significant improvement in throughput by starting the Glance API and registry services on multiple operating system processes. Eoghan Glynn fixed a massive amount of bugs and added new functionality revolving around external images and having Glance’s API server automatically copy an image from an external datastore. Brian Waldon, Glance’s new PTL (congrats, Sir Glancelot!), added RBAC support and did the heavy lifting of converting Glance’s image identifiers to a UUID format. Check here for the complete list of 11 blueprints implemented and 185 bugs fixed in Glance in the Essex cycle.

The Keystone codebase was entirely rewritten, causing some late cycle turmoil, but the team of contributors working on Keystone is dedicated to improving its stability and functionality in the Folsom release series. The new Keystone design should enable better extensibility and I’m confident the new PTL, Joe Heck, will work actively with contributing organizations to see Keystone make terrific improvements in coming months.

I’m sure there’s lots of names and stuff I’ve neglected to mention and I’ll apologize for that now! :) Here’s to a great design summit a week from now and a productive and cooperative Folsom release series. Thank you to all the OpenStack contributors. You are what makes OpenStack so special.

[1] In the OpenStack community — as in the Ubuntu community — we publish major releases every six months. We don’t hold up releases for a specific feature; if the feature isn’t completed, it simply goes into the next release when it is code-complete. In my opinion, this is one of the strengths of the OpenStack release model: it is predictable.

[2] What’s more, we can’t wait to introduce the goodness of the Horizon Essex dashboard into TryStack. We aim to get this done before the summit, but more on that in a later blog post.

Testing Essex RC1 with Devstack and Tempest

This past week, the first release candidates of a number of OpenStack projects was released. From this point until the OpenStack Design Summit, we are pretty much focused on testing the release candidates. One way you can help out is to test the release candidate code, and this article will walk you through doing that with the Devstack and Tempest projects.

Setting up an OpenStack Essex RC1 Environment with Devstack

Before you test, you need an OpenStack environment. The easiest way to get an OpenStack environment up and running on a single machine [1] is to use Devstack. To get a version of Devstack that is designed to run against the release candidate branches of the OpenStack projects, simply clone the main repo of Devstack, like so:

git clone git://github.com/openstack-dev/devstack

Setting Up Your Devstack stackrc to pull RC1 branches of OpenStack Projects

Devstack contains a file called stackrc that is sourced by the main stack.sh script to create your OpenStack environment. The stackrc file contains environment variables that tell stack.sh which repositories and branches of OpenStack projects to clone. We will need to change the branches in the stackrc file from master to milestone-proposed to grab the release candidate branches. You will want to change the target branches for Nova, Glance, Keystone, and their respective client libraries. Here is what the default stackrc will look like:

stackrc before...

And here is what it should look like after you change the master branches to milestone-proposed branches appropriately:

stackrc after...

Setting Up Your Devstack localrc for Running Tempest

There are a couple things you will want to put into your Devstack’s localrc file before actually creating your OpenStack environment for testing with Tempest. So, open up your editor of choice and make sure that you have at least the following in your localrc file in Devstack’s root source directory. (If the file does not exist, simply create it)

API_RATE_LIMIT=False
MYSQL_PASSWORD=pass
RABBIT_PASSWORD=pass
SERVICE_PASSWORD=pass
ADMIN_PASSWORD=pass
SERVICE_TOKEN=servicetoken

The first line instructs devstack to disable the default ratelimit middleware in the Nova API server. We need to do this because Tempest issues hundreds of API requests in a short amount of time as it runs its tests. If we don’t do this, Tempest will take a much longer time to run and you will likely get test failures with a bunch of overLimitFault messages.

The other lines simply set the various passwords to an easy-to-remember password “pass” for testing. And the final line is needed currently to set up some services but should be deprecated fairly soon…

Installing the OpenStack RC1 Environment

Now that you’ve got your devstack scripts cloned and your localrc installed, it’s time to run the main stack.sh script that will install OpenStack on your test machine. It’s as easy as running the stack.sh script. After running &mdash and be patient, on a first run the script can take ten or more minutes to complete — you will see a bunch of output and then something like this:

$> ./stack.sh
<snip lots and lots of output...>
The default users are: admin and demo
The password: pass
This is your host ip: 192.168.1.98
stack.sh completed in 517 seconds.

At this point, feel free to run the info.sh script to verify all went well:

jpipes@librebox:~/repos/devstack$ ./tools/info.sh 
git|glance|milestone-proposed[f4a7035]
git|horizon|milestone-proposed[97fc4f8]
git|keystone|milestone-proposed[f3ce326]
git|nova|milestone-proposed[4e02ba1]
git|noVNC|master[22b9a75]
git|python-keystoneclient|milestone-proposed[bf13df1]
git|python-novaclient|milestone-proposed[aa0e87f]
os|vendor=Ubuntu
os|release=11.10
pkg|pep8|0.6.1-2ubuntu1
pkg|pylint|0.23.0-1
pkg|python-pip|1.0-1
<snip a whole bunch of package versions...>
pip|pika|0.9.5
localrc|API_RATE_LIMIT=False
localrc|HOST_IP=192.168.1.98
localrc|SERVICE_TOKEN=servicetoken

At this point, you have a fully functioning OpenStack RC1 environment. You can do the following to check into the logs (actually just the daemon output in a screen window):

$> screen -x

Switch screen windows using the <Ctrl>-a NUM key combination, where NUM is the number of the screen window you see at the bottom of your console. Type <Ctrl>-a d to detach from your screen session. The screenshot below shows what your screen session may look like. In the screenshot, I’ve hit <Ctrl>-a 4 to switch to the n-api screen window which is showing the Nova API server daemon output.

screen session showing the Nova API server window...

Testing the OpenStack Essex RC1 Environment with Tempest

The Tempest project is an integration test suite for the OpenStack projects. Personally, in my testing setup at home, I run Tempest from a different machine on my local network than the machine that I run devstack on. However, you are free to run Tempest on the same machine you just installed Devstack on.

Grab Tempest by cloning the canonical repo:

$> git clone git://github.com/openstack/tempest

Once cloned, change directory into tempest.

Creating Your Tempest Configuration File

Tempest needs some information about your OpenStack environment to run properly. Because Tempest executes a series of API commands against the OpenStack environment, it needs to know where to find the main Compute API endpoint or where it can find the Keystone server that can return a service catalog. In addition, Tempest needs to know the UUID of the base image(s) that Devstack downloaded and installed in the Glance server.

Create the tempest configuration file by copying the sample config file included in Tempest $tempest_dir/etc/tempest.conf.sample.

$> cp etc/tempest.conf.sample etc/tempest.conf

Next, you will want to query the Glance API server to get the UUID of the base AMI image used in testing. To do this, issue a call like so:

jpipes@uberbox:~/repos/tempest$ glance -I admin -K pass -T admin -N http://192.168.1.98:5000/v2.0 -S keystone index | grep ami | cut -f1 | awk '{print $1}'
99a48bc4-d356-4b4d-95d4-650f707699c2

Of course, you will want to replace the appropriate parts of the call above with your own environment. In my case above, my devstack environment is running on a host 192.168.1.98 and I’m accessing Glance with an “admin” user in an “admin” tenant with a password of “pass”. Copy the UUID identifier of the image that is returned from the command above (in my case, that UUID is 99a48bc4-d356-4b4d-95d4-650f707699c2).

Now go ahead and open up the configuration file you just created by copying the tempest.conf.sample file. You will see something like this:

[identity]
use_ssl=False
host=127.0.0.1
port=5000
api_version=v2.0
path=tokens
user=admin
password=admin-password
tenant_name=admin-project
strategy=keystone

[compute]
# Reference data for tests. The ref and ref_alt should be
# distinct images/flavors.
image_ref=e7ddc02e-92fa-4f82-b36f-59b39bf66a67
image_ref_alt=346f4039-a81e-44e0-9223-4a3d13c92a07
flavor_ref=1
flavor_ref_alt=2
ssh_timeout=300
build_interval=10
build_timeout=600
catalog_type=compute
create_image_enabled=true
resize_available=true

[image]
username=admin
password=********
tenant=admin
auth_url=http://localhost:5000/v2.0

You will want to replace the various configuration option values with ones that correspond to your environment. For the image_ref and image_ref_alt values in the [compute] scetion of the config file, use the UUID you copied from above.

Here is what my fully-replaced config file looks like. Keep in mind, my Devstack environment is running on 192.168.1.98. I’ve highlighted the values different from the sample config…

[identity]
use_ssl=False
host=192.168.1.98
port=5000
api_version=v2.0
path=tokens
user=demo
password=pass
tenant_name=demo
strategy=keystone

[compute]
# Reference data for tests. The ref and ref_alt should be
# distinct images/flavors.
image_ref=99a48bc4-d356-4b4d-95d4-650f707699c2
image_ref_alt=99a48bc4-d356-4b4d-95d4-650f707699c2
flavor_ref=1
flavor_ref_alt=2
ssh_timeout=300
build_interval=10
build_timeout=600
catalog_type=compute
create_image_enabled=true
resize_available=true

[image]
username = demo
password = pass
tenant= demo
auth_url=http://192.168.1.98:5000/v2.0

Fire Away

The only thing left to do is fire Tempest at your OpenStack environment. Below, I’m executing Tempest in verbose mode. Nosetests is our standard test runner.

jpipes@uberbox:~/repos/tempest$ nosetests -v tempest
All public and private addresses for ... ok
Providing a network type should filter ... ok
<snip a whole bunch of tests>
An access IPv6 address must match a valid address pattern ... ok
Use an unencoded file when creating a server with personality ... ok
Create a server with name parameter empty ... ok

----------------------------------------------------------------------
Ran 131 tests in 798.125s

OK (SKIP=5)

After you’re done running Tempest — and hopefully everything runs OK — feel free to hit your Devstack Horizon dashboard and log in as your demo user. Unless you made some changes when installing Devstack above, your Horizon dashboard will be available at http://$DEVSTACK_HOST_IP.

If you encounter any test failures or issues, please be sure to log bugs for the appropriate project!

Known Issues

You can try running Tempest with the --processes=N option which uses the nosetest multiprocessing plugin. You might get a successful test run … but probably not :)

Likely, you will hit two issues: the first is that you will likely hit the quote limits for your demo user because multiple processes will be creating instances and volumes. You can remedy this by altering the quotas for the tenant you are running the compute tests with.

Secondly, you may run into error output that looks like this:

======================================================================
ERROR: An image for the provided server should be created
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jpipes/repos/tempest/tempest/tests/test_images.py", line 38, in test_create_delete_image
    self.servers_client.wait_for_server_status(server['id'], 'ACTIVE')
  File "/home/jpipes/repos/tempest/tempest/services/nova/json/servers_client.py", line 147, in wait_for_server_status
    raise exceptions.BuildErrorException(server_id=server_id)
BuildErrorException: Server 2e0b78fc-cc98-485b-8471-753778bee472 failed to build and is in ERROR status

And in your nova-compute log (or screen output) you might notice something like this:

libvirtError: operation failed: cannot restore domain 'instance-0000001f' uuid 2453b24b-87e1-4f85-9c25-ce3706a8c1d1 from a file which belongs to domain 'instance-0000001f' uuid deb8e941-4693-4768-90cc-03ad98444c85

I’ve not quite gotten to the bottom of this, but there seems to be a race condition that gets triggered in the Compute API when a similar request is received nearly simulataneously. I can reliably reproduce the above error by simply adding --processes=2 to my invocation of Tempest. I believe there is an issue with the seeding of identifiers, but it’s just a guess. I still have to figure it out. But in the meantime, be aware of the issue.

1. You can certainly use devstack to install a multi-node OpenStack environment, but this tutorial sticks to a single-node environment for simplicity reasons.

Looking for a Few Good Engineers

Do you know Python? Do you get a thrill breaking other people’s code? Do you have experience with Chef, Puppet, Cobbler, Orchestra, or Jenkins? Have you ever deployed or worked on highly distributed systems? Do you understand virtualization technologies like KVM, Xen, ESX or Hyper-V?

If you answered “Yes!” to any of the questions above and are interested in working in a distributed, high-energy engineering team on solving complex problems with cloud infrastructure software, I want to hear from you. Experience with OpenStack is a huge plus.

I’m looking for QA software engineers, software deployment and/or automation engineers and software developers that can hit the ground running and make a big impact from Day One. Feel free to email me at REVERSE('moc.liamg@sepipyaj').

Diagnose and fix PEP8 issues during code review

I figured I’d write a quick post about how to deal with “pep8 issues” that come up during code reviews on OpenStack core projects. These issues come up often for new contributors, and it can be a source of frustration until the contributor understands how to diagnose and fix the issues that come up.

PEP8 is the Python PEP that deals with a recommended code style. All core (and periphery Python) OpenStack projects validate that new code pushed to the source tree is “pep8-compliant”. When a new patchset is pushed from code review to Jenkins for the set of automated pre-merge tests, the pep8 command-line tool is run against the new source tree to ensure it meets PEP8 code style standards.

If this PEP8 Jenkins job fails, the code submitter will see a notification that the job failed, and the contributor must fix up any pep8 issues and push those fixes up for review again. Typically, this notification looks something like this:

Change subject: Added Keypair extension (os-keypairs) client and tests LP#900139
......................................................................


Patch Set 2: I would prefer that you didn't submit this

Build Unstable

https://jenkins.openstack.org/job/gate-tempest-pep8/38/ : UNSTABLE
https://jenkins.openstack.org/job/gate-tempest-merge/78/ : SUCCESS

--
To view, visit https://review.openstack.org/3179
To unsubscribe, visit https://review.openstack.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I34c7e9aa6a1796b8d4c3ac9b3b69438796752866
Gerrit-PatchSet: 2
Gerrit-Project: openstack/tempest
Gerrit-Branch: master
Gerrit-Owner: kavan-patil
Gerrit-Reviewer: Brian Waldon 
Gerrit-Reviewer: Jay Pipes 
Gerrit-Reviewer: Jenkins
Gerrit-Reviewer: kavan-patil

There are a couple ways you can diagnose what style points your code violated. Probably the easiest and fastest is to just follow the link in the notification email to the Jenkins job. Clicking the link above, I get to the Jenkins job page, which looks like this:

Clicking on the graph, I get to a details screen showing the source files and lines of code that violated pep8 rules:

I can then go to line 86 of tempest/openstack.py and investigate the code style

Alternately, I could run the pep8 CLI tool on my local branch, which will tell me the pep8 violations and what to fix, as shown here:

jpipes@uberbox:~/repos/tempest$ pep8 --repeat tempest
tempest/openstack.py:86:73: W292 no newline at end of file

There we are… the tempest/openstack.py file doesn’t end with a newline. An easy fixup. :)

The Science (or Art?) of Commit Messages

There are some things in the world of development that you appreciate much more when you do a lot of code reviews. One of those things is commit messages.

At first glance, commit messages seem to be a small, relatively innocuous thing for a developer. When you commit code, you type in some description about your code changes and then, typically, push your code somewhere for review by someone.

Regardless of whether the code you pushed is going to an open source project, an internal proprietary code repository, or just some code exchanged between friends working on a joint project, that simple little commit message tells the person reading your code a whole lot about you. It speaks volumes about the way you feel about the code you submit and the quality of the review you expect for your code.

As an example, suppose I was working on some code that fixed a bug. I got my code ready for initial review and I did the following:

$> git commit -a -m "Fixes some stuff"

And then I push my commit somewhere using git push

Inevitably, what happens is that another developer will get some email or notification that I have pushed code up to some repository. It is likely that this notification will look something like this:

Change subject: Fixes some stuff
......................................................................

Fixes some stuff

Change-Id: I79bbac32b5c99742b5cb283c6e55e6204bf92adc
---
M path/to/some/changed/file
1 file changed, 1 insertion(+), 1 deletion(-)

And in the notification will be some link to a place to go do a code review.

Now, what do you think is the first thought that goes through the reviewer’s mind? My guess would be: Really? Fixes what stuff? By not including any context about what the patch is attempting to solve, you leave the reviewer with a bad taste in their mouth. And a bad taste in the reviewer’s mouth generally means one thing: a reluctance to review your patch.

OK, so what could we do to make the commit message better, to provide the reviewer with more initial context about your patch? Well, the first thing that comes to mind is to reference a specific bug that you are fixing with this patch.

Alright, so we amend our commit message to include a bug identifier:

$> git commit --amend -m "Fixes Bug 123456"

And subsequently push our amended commit message. The reviewer now gets a new notification that you’ve amended a previous patch. Now the notification includes the bug identifier. What do you think the next thought a typical reviewer might have? My guess is this: What, does this developer think that I’ve memorized all the bug IDs for all open bugs? How should I know what Bug 123456 is about? And here comes that bad taste in the mouth again.

OK, so this time, we will forgo the use of the time-saving -m switch to git commit and actually type a proper, multi-line commit message in our editor of choice that describes the bug that our patch fixes, including a brief description of how we fixed the bug:

git commit --amend  # This will open up your editor...

Now we’d enter a good commit message … something like this would work:

Fixes Bug 123456 - ImportError raised improperly in Storage Driver

Due to a circular dependency, an ImportError was improperly
being thrown when the storage driver was set to XYZ. Rearranged
code to remove circular dependency.

The commit message now will give the reviewer everything they need in the notification to understand what the patch is for and how you solved a bug, without needing to go to their bug tracker to figure out what the bug was about.

A detailed commit message shows you care about the time that reviewers spend on your patch and that you value the code you are submitting.

Presentation: OpenStack QA – Walkthrough of Processes, Tools and Code

Last night I gave a short webinar to some folks about the basics of contributing to the Tempest project, which is the OpenStack integration test suite. It was the first time I’d used Google Docs to create and give a presentation and I must say I was really impressed with the ease-of-use of Google Docs Presentation. Well done, Google.

Anyway, I’ve uploaded a PDF of the presentation to this website and provided a link to the Google Docs presentation along with a brief overview of the topics covered in the slides below. As always, I love to get feedback on slides. Feel free to leave a comment here, email me or find me on IRC. Enjoy!

Google Presentation (HTML)
PDF slides


Topics included in the slides:

  • OpenStack Contribution Process
  • Running Devstack Locally
  • Running Tempest against an Environment
  • Walkthrough the Tempest Source Code
  • Progressively improving a test case
  • Common Scenarios in Code Review and Submission

OpenStack Dev Tip — Easily Pull a Review Branch

Just a quick tip for developers working on OpenStack projects that work on multiple development machines or want to pull a colleague’s code from the Gerrit review system and test it locally.

If you have followed the instructions about setting up a development environment successfully, you will have installed the git-review tool that Jim Blair and Monty Taylor maintain. The git-review tool has a nice little feature that enables you to easily pull any branch that anyone has pushed up to code review:

$> git review -d $REVIEW_NUM

The $REVIEW_NUM variable should be replaced with the identifier of the review branch in Gerrit.

For example, I developed some code on my laptop that I now want to pull to my beefier work machine. The original branch is failing a few tests in Jenkins and I want to diagnose what’s going on. The review branch is here: https://review.openstack.org/#change,1656. The review number (ID) is 1656.

To grab that branch into my local environment and check it out, I do:

jpipes@uberbox:~/repos/glance$ git review -d 1656
Downloading refs/changes/56/1656/2 from gerrit into review/jay_pipes/bug/850377

Doing a git status, you’ll note that I am now in the local branch called review/jay_pipes/bug/850377:

jpipes@uberbox:~/repos/glance$ git status
# On branch review/jay_pipes/bug/850377
# Your branch and 'gerrit/master' have diverged,
# and have 1 and 2 different commit(s) each, respectively.
#
nothing to commit (working directory clean)

I can now run tests, diagnose the issue(s), fix code up and do a:

$> git commit -a --amend
$> git review

And my changes will be pushed up to the original review in Gerrit for others to look at.