Introduction

Ganeti is…

a cluster-based virtualisation manager
handling for you the hypervisors, storage, networking, etc.
managing small (1-2 machines) and big (hundreds of machines) clusters
software libero, naturalmente! and developed using open-source tools, too
- Python, Haskell
- Git, autoconf/automake
- shelltestrunner, QuickCheck, Buildbot, pep8, pylint, hlint, coverage, hpc
- Pandoc, Sphinx, Haddock, Epydoc, HsColour
- …
hosted on code.google.com

The problem

How to ensure that the code is correct? How to do it without proprietary tools?

Many possibilities:

code reviews
unittests
manual end-to-end tests

… but they do not work in isolation

Ganeti development life-cycle

Except for the design, each step needs to use automated methods in order to assure a consistent quality across the code base: people are creative, but not consistent; computers are “dumb”, but persistent.

Each phase can introduce/find bugs in multiple parts of the software:

design/architectural issues
implementation issues
user interface issues

Feature requirements/Design phase

no automated tools (no A.I. yet…)
not easy to balance internal features vs. external user needs
how to make the design process as transparent as possible?
decided to treat design the same way as code writing
- design documents stored in the source tree
- they go through the same code review process as the code
- still, not full transparency sometimes on why
fortunately, many of our external contributors have understood the process
most bugs in this phase are related to design and UI issues

Programming language

the first tool that helps writing correct code is the programming language
type system, debugger, logging, and similar features are very important
the two languages we use (Python & Haskell) are very different in these aspects
due to lack of static typing, data validation is much harder to do in Python
- ended up with three separate mini-type systems in the Python code-base (one generic, two specific)
- using pylint as a type-checker, rather than just lint tool
- still, we cannot ensure lack of type-related run-time errors
most bugs in this phase are implementation bugs

Code style

while the style doesn't impact the code, a consistent code style helps programmers parse the code easier
but without enforcement, the style will also diverge over time
here lint/style tools help:
- pep8, pylint, hlint
the downside of enforcing code style is that it's harder to accept one-off patches

Unit-testing

many choices here, for both languages
- Python's unittest module in the standard library
- QuickCheck for Haskell
- shelltest for testing programs
however, writing good unit-tests is both tedious and a complex issue
- and common testing methodologies differ between the languages
enter coverage tools:
- coverage for Python
- built-in HPC tool for Haskell
- note coverage tool behaviour for the two languages differ, due to Haskell's lazyness
most bugs caught in this phase are implementation bugs
- QuickCheck can catch some scalability bugs

Code reviews

code reviews: yes! … but with a few tiny caveats
within Google, code reviews are pervasive/mandatory, so the Google team is used to them
however, not many open-source projects use code reviews (especially before commit)
how to deal with external contributors that just want to contribute one feature?
- too much insistence on code style/design might put off people
- too lax standards will create problems later
- needs balance
code review usually catches implementation bugs
- … but good code review can also catch architectural/scalability issues
- and rarely UI issues

Continuous build

happy users of Buildbot
downside: internal installation only
- first feature that is not available to external developers
buildbot runs unit-tests, shell tests, etc. on multiple distributions
- Debian Lenny (32/64), Squeeze (32/64)
- Ubuntu Lucid (64)
- either physical machines or chroots
- not yet any RPM-based distribution ☹; working on Fedora
also, pristine environment for generating releases
integrates with Git and runs after each commit
alternatively, developers can request builds from their own working tree
this usually catches implementation bugs, and non-standard environment issues

Integration tests

two integration test tools
the simpler one is the burnin tool, which runs non-destructively on the cluster
- exercises many of the instance-related functionality, using an internal API
- if it passes, the hardware/software is probably OK for running Ganeti
- installed on all Ganeti clusters
the more complex one is the QA suite
- destructive mode (in the default configuration, at least)
- tries to exercise most cluster operations, using external APIs (CLI and RAPI)
  - create cluster, join nodes, remove nodes, destroy cluster
  - instance related-operation, hardware failures
- not installed by default
together, they are used both for testing/qualifying the Ganeti code and the hardware/software configuration

Integration tests #2

automatically run via BuildBot, and runs for (most) commits
- a partial QA takes ~1 hour, a full QA takes 2-3 hours
however, coverage is incomplete:
- Ganeti supports ~3 hypervisors, 7 storage backends, 2 network modes (42 combinations)
- each hypervisor has many parameters (e.g. KVM, 45)
- testing all possible combinations is not feasible, more than ~2³⁰ (arbitrary value)
  - for example, a recent bug (found manually) was triggered only when live-migrating KVM instances using VNC (instead of serial console) with a custom keymap
so we test only the configurations we run internally ☹
still looking for a better way of testing more
- planning to implement a “virtual” cluster mode for faster testing
- ideas welcome!
this usually catches subtle implementation issues (e.g. race conditions)

Canary process

once a release has been made, we start a canary process on the internal fleet
- usually starts with 1-3 Ganeti clusters, gradually extending
- 4, 8, …, up until ~50% of the fleet
- after 50% the roll-out changes from gradual to full
- usually an x.y.0 release is made only after canary is successful
designed to test behaviour in real-life
- scalability, usability, real work-loads, etc.
- implementation bugs too (but more so for the Python code)
it is the first case where resilience to HW errors is significantly tested
- on a big enough fleet, there are always HW errors
- simulated HW failures are not real-world HW failures…
we're not aware of good open-source tools for realistic testing in test environment

Production use

last stage, but still important
some corner case issues appear only on a big enough fleet
for example, significant outages (e.g. power loss) are the biggest feedback for usability issues
internal use vs. external use
- internal use is wide-scale, but quite homogeneous
- external users might run smaller deployment, but in more varied scenarios
- we've both bugs found first in the internal environment, and vice-versa
- yes, being open source helps our internal use too!

Common issues

dynamic typing in Python coupled with incomplete test coverage means some paths are exercised only in some corner cases
- after seeing the Nth type error in error reporting code path, we've added "simulation modes" to a few operation which behave as if most errors happened (to test error reporting)
data validation issues
- require validation at Ganeti/system level, or between Ganeti components?
- how to degrade gracefully in have of real-world issues?
- implemented ht mini-type system for Python, used also for data validation:
```
ht.TListOf(ht.TElemOf([1, 2, 3]))
ht.TString
ht.TAnd(ht.TNotNone, ht.TString)
ht.TDictOf(ht.TElemOf(["a", "b", "c"]), ht.TInt)
```

And many others…

Assuring quality in the Ganeti code-base