Software Management

Why Use Package Managers?

Deploy Software in a Small Fraction of the Time

Valuable research is often hindered or outright prevented by the inability to install software.  This need not be the case.

Since I began supporting research computing in 1999, I’ve frequently seen researchers struggle for days or weeks trying to install a single open source application.  In most cases, they ultimately failed.

In many cases, they could have easily installed the software in seconds with one simple command, using a package manager such as Debian packages, FreeBSD ports, MacPorts, or Pkgsrc, just to name a few.

Developer websites often contain poorly written instructions for doing “caveman installs”; manually downloading, unpacking, patching, and building the software.  The same laborious process must often be followed for other software packages on which it depends, which can sometimes number in the dozens.  Many researchers are simply unaware that there are easier ways to install the software they need.  Caveman installs are a colossal waste of man-hours.  If 1000 people around the globe spend an average of 20 hours each trying to install the same program that could have been installed with a package manager (this is not uncommon), then 20,000 man-hours have been lost that could have gone toward science.  How many important discoveries are delayed by this?

The elite research institutions have ample funding and dozens of IT staff dedicated to research computing.  They can churn out publications even if their operation is inefficient.  Most institutions, however, have few or no IT staff dedicated to research, and cannot afford to squander precious man-hours on temporary, one-off software installs.  The wise approach for those of us in that situation is to collaborate on making software deployment easier for everyone.  If we do so, then even the smallest research groups can leverage that work to be more productive and make more frequent contributions to science.

Fortunately, the vast majority of open source software installs can be made trivial for anyone to do for themselves.  Modern package managers perform all the same steps as a caveman install, but automatically.  Package managers also install dependencies for us automatically.

The package managers named above generally have the latest stable version of most software, so unless you really need a feature that was just added, they’ll probably provide what you need to get your research done.

Flexibility

Package managers that support building from source (e.g. FreeBSD ports, Gentoo Portage, MacPorts, OpenBSD ports, Pkgsrc, etc.) also offer build options for many packages, so you can customize the installation to your needs.  For example, below is a FreeBSD ports dialog for building and installing the R statistics package.

r-options

Massively Parallel Workforce

Maintaining tens of thousands of packages requires a lot of man-hours.  Fortunately, there are also thousands of maintainers working together on it, each benefiting from the work of the rest.

Eliminate Redundant Effort

Sadly, thousands of end-users around the world often duplicate the same unnecessary effort trying to install a software package.  If they all used a package manager instead, each package would need only be created and maintained by one person, while everyone else can deploy it effortlessly.

Every Machine is Expendable

Have you ever been in a panic because your server went down and you’re approaching a deadline to get your analysis or models done?  If you deploy the software with a package manager, no problem…  Just install it on another machine and carry on.  If you’ve done a caveman install, you might be dead in the water for a while until you can restore the server or duplicate the installation on another.

Why Pkgsrc?

Portability

Most package managers are specific to one or a few similar platforms.  For example, Debian Packages are used only on Debian Linux and derivatives such as Ubuntu.  FreeBSD ports are used on FreeBSD and its derivatives (DesktopBSD, GhostBSD, TrueOS), and Dragonfly BSD.  MacPorts are only for OS X.

The pkgsrc package manager is unique in that it fully supports most POSIX compatible (Unix-like) operating systems.

Pkgsrc is the primary package manager for NetBSD and Joyent SmartOS, but is also well supported on other BSDs, Linux, Mac OS X, Solaris, and many others.

At UWM, we use pkgsrc extensively to install the latest open source packages on our CentOS HPC clusters and development servers.

For enterprise Linux platforms, pkgsrc provides by far the largest collection of packages available from any package manager.  In addition, it allows you to easily install the exact same software on Mac OS X, NetBSD, Solaris, and many other POSIX platforms you might be using.

Flexibility

Pkgsrc readily supports installing multiple package collections (trees) on the same system under different prefixes.  No need for chroots, jails, virtual machines, or other containers.  Just set your PATH for a given tree, and you’re on your way.  Each collection is fully contained in its own directory, separate from the others and from software installed via Yum or other means.  This allows older software to remain in-place indefinitely for long-term studies, while newer software can be easily deployed under another prefix.

In addition, individual package can be installed under an alternate prefix within a given tree.  For example, you may wish to install a new development version of a package in the same pkgsrc tree as the latest stable version.  This is as easy as typing

shell-prompt: bmake PREFIX=/path/to/install install

Modernity

Of course, RHEL-based systems such as CentOS have their own native package manager, Yum.  However, the RHEL Yum package collection is rather small, only a few thousand distinct packages, while pkgsrc currently offers more then 17,000.  The Yum repository also contains older versions of most software, because RHEL is designed for stability and long-term binary compatibility with commercial software in enterprise environments.  Redhat does not want to introduce new bugs or break binary compatibility by upgrading to the latest kernel or cutting-edge versions of common libraries.

Pkgsrc packages tend to remain up-to-date with the latest stable versions of most software.  Unless you need to latest experimental features of a given application, pkgsrc will probably provide what you need.

RHEL and CentOS users often struggle with open source software that requires a newer compiler suite than provided by the Yum repository.  For example, RHEL/CentOS 6.x, the mainstream version at the time of this writing, uses GCC 4.4, which does not fully support the C++11 standard.  RHEL/CentOS 7 only advances this to GCC 4.8.  With pkgsrc, using a later GCC to build a package as easy as adding the following to the package Makefile:

GCC_REQD=6.0

You can also add this macro on the command line for a one-time build:

shell-prompt: bmake GCC_REQD=6.0 install

The pkgsrc system will then automatically install GCC 6.x (if it’s not already installed) and use it to build this package.

In addition to easing the installation of scientific software, pkgsrc can turn a RHEL/CentOS system into a more viable development or desktop system, given all the non-scientific tools it provides, such as editors, compilers, office, multimedia and network tools, etc.

In addition, there is the pkgsrc-wip (work in progress) collection, which makes new packages available worldwide for testing before they are committed to the pkgsrc repository.  At the time of this writing, pkgsrc-wip contains over 4000 work-in-progress packages.

Quality and Security

Every package committed to the pkgsrc repository undergoes a quality control process, verifying clean installation and deinstallation, basic security checks, compatibility with other packages, etc.

Hence, the likelihood of encountering problems with software installed via pkgsrc is minimal.  Compare this with your past experience doing caveman installs.

Collaboration

The pkgsrc project is a worldwide collaborative effort with contributions from thousands of the smartest people on the planet.  All that’s needed to make it work is for each of us to contribute a little bit.  Each of us can get back 1,000 times as much as we put in, even if we’re not using the same operating systems.

If everyone involved in research computing maintained one or two pkgsrc packages, then virtually all of our open source software deployment troubles would be gone.

Convenience

Pkgsrc makes it easy to build binary packages, which can then be installed in seconds on other systems with the same architecture and operating system.  If you have many systems on which you need to deploy packages, you only have to build them once.  Binary packages are already available on the web for the more popular platforms as well.  ( See below. )

Support

The pkgsrc community includes thousands of contributors, including some of the brightest and most experienced Unix admins on the planet.  Hence, most problems with pkgsrc are solved before you encounter them and in a sustainable way.  There are numerous mailing lists you can join to ask questions and share your knowledge, depending on your needs and experience level.

Growth

The pkgsrc project is already a large and well-established project and has been growing rapidly in recent years.

View pkgsrc growth graphs

Join the party and help make pkgsrc even better!

Getting Started

If you’re running NetBSD or SmartOS, the OS installer can configure pkgsrc for you.

Otherwise, you can use our auto-pkgsrc-setup script to easily bootstrap multiple pkgsrc collections onto most other POSIX systems.  Please report any issues with auto-pkgsrc-setup to research-computing@uwm.edu.  For general pkgsrc support, please join the pkgsrc-users list.

This script can be run as root or as an ordinary user, and will guide you through the installation.  Just run it and answer the questions.  In about 15 minutes, you’ll be ready to start installing packages!

The script automatically generates sh and csh RC scripts and an environment module file for each pkgsrc collection.  If you want a particular collection in everyone’s path, the RC scripts can be simply copied into /etc/profile.d on systems that support it, or sourced by startup scripts on other systems.  ( The auto-pkgsrc-setup script offers to do this for you automatically. )

Suppose you installed a pkgsrc collection into ~/Pkgsrc/pkg-2017Q4.  To begin using it, just run to following for csh, tcsh, or other C shell derivatives:

shell-prompt: source ~/Pkgsrc/pkg-2017Q4/etc/pkgsrc.csh

…or the following for sh, ksh, bash, dash, or other Bourne-shell derivatives:

shell-prompt: . ~/Pkgsrc/pkg-2017Q4/etc/pkgsrc.sh

…or if you use environment modules (yum install environment-modules on RHEL/CentOS):

shell-prompt: module load ~/Pkgsrc/pkg-2017Q4/etc/modulefiles/pkgsrc/2017Q4

Then, try it out:

shell-prompt: cd ~/Pkgsrc/pkgsrc-2017Q4/math/blas/
shell-prompt: bmake install

Don’t forget about the pkgsrc-wip collection, which contains many newer packages that have not yet been committed:

shell-prompt: cd ~/Pkgsrc/pkgsrc-2017Q4/wip/ncbi-blast
shell-prompt: bmake install

Note: The pkgsrc bootstrap process may fail if the running user or group name contains white space or special characters.  This is often caused by the use of AD authentication servers.  If you encounter this issue, try installing pkgsrc as root or another local user with canonical user and primary group names.

Note to FreeBSD users: While pkgsrc can be used on FreeBSD, you may find that bootstrapping additional FreeBSD ports trees is a better option.  This will not provide exactly the same package collection as pkgsrc, but it will provide an even bigger collection (27,000+ ports at the time of this writing).  You can use the auto-ports-setup script to get started.

Binary Packages for Research Computing

As a service to the research computing community, the University of Wisconsin — Milwaukee provides binary packages for multiple operating systems on which pkgsrc is commonly used.

Of course, you are never limited to using the binary packages.  You can still install  pkgsrc packages from source as described above (e.g. to use non-default build options or install a new package from the WIP collection).

Unlike most package repositories, which have a single fixed prefix and frequently upgraded packages, these packages are available for multiple prefixes and remain unchanged for a given prefix.  Additional packages may be added and existing packages may be patched to fix bugs or security issues, but the software versions will not be changed.  This allows researchers to keep older software in-place indefinitely for long-term studies while deploying newer software in later snapshots.

Packages are generated for each quarterly pkgsrc snapshot under a prefix of /sharedapps/pkg-<year>Q<quarter>, e.g. /sharedapps/pkg-2017Q4.

As an example, to make use of the binary packages for 2017Q4:

  1. Install a pkgsrc tree in /sharedapps/pkg-2017Q4 following the “Getting Started” instructions above.  The script will ask for one or more binary package hosts.  Just enter one of the sites under “Current Mirrors” below.  This will add a line like the following to /sharedapps/pkg-2017Q4/etc/pkgin/repositories.conf, which is used by the pkgin system:
    http://mirror1.hpc.uwm.edu/pkgsrc/packages/sharedapps/pkg-2017Q4/RHEL7/All

You can edit this file any time to add or remove mirrors.  You can also configure multiple mirrors, separated by newlines:

http://mirror1.hpc.uwm.edu/pkgsrc/packages/sharedapps/pkg-2017Q4/RHEL7/All
http://mirror2.hpc.uwm.edu/pkgsrc/packages/sharedapps/pkg-2017Q4/RHEL7/All

Run “man pkgin” for details.

Current mirrors:

Note: The temporary URLs http://unixdev.ceas.uwm.edu:8084 and http://unixdev.ceas.uwm.edu:8085 are no longer active.

Additional mirrors may be added at any time.  Check back frequently for changes and additions.

Site Geographic Location
http://mirror1.hpc.uwm.edu University of Wisconsin — Milwaukee
http://mirror2.hpc.uwm.edu University of Wisconsin — Milwaukee

Once your repositories.conf is properly configured and your PATH is set for the appropriate installation, you can use pkgin to install any of the thousands of packages in seconds:

shell-prompt: pkgin avail | wc -l
16944
shell-prompt: pkgin install clang gcc49 gcc5 gcc6 go emacs25 R lapack

To see what’s available, run “pkgin avail” or browse the complete package collections here.

If you would like to host a mirror of these packages, please contact us.

Warning: As these packages generally do not receive security updates, running programs in this collection as root is not recommended (although you may need to be root to install them).  If you wish to run system services or other admin tasks from pkgsrc, we recommend using pkgsrc-current and updating your packages regularly.  NetBSD and SmartOS users can configure the use of binary packages during OS installation.  Joyent Cloud Services provides regularly updated binary packages for SmartOS/Illumos, RHEL/CentOS and OS X.  Our joyent-pkgsrc script can be used to quickly and easily bootstrap a basic installation using Joyent’s binary packages.

Note: A fresh pkgsrc installation requires very little space, but can consume more than 50 gigabytes if you are building many large packages from source.  If your /sharedapps directory is on a partition that’s too small, create it as a symbolic link before bootstrapping pkgsrc, e.g.:

shell-prompt: mkdir /bigpartition/sharedapps
shell-prompt: ln -s /bigpartition/sharedapps /

Much less space is required when installing binary packages using pkg_add or pkgin.

Note: If you use auto-pkgsrc-setup to bootstrap an installation for use with binary packages on RHEL/CentOS, you MUST choose the correct minimum GCC version.

2017Q3 uses GCC 4.8.

2017Q4 uses GCC 5.0.

If you use the binary bootstrap kits described below, you will not need to worry about this.

Binary Bootstrap Kits

Beginning with 2017Q4, bootstrap kits are available for creating /sharedapps installations instead of using auto-pkgsrc-setup or otherwise building the pkgsrc installation from source.  A bootstrap kit is simply a tarball of a pristine pkgsrc installation.  Download one of the kits and the corresponding sha512 file from a mirror, e.g.

http://mirror1.hpc.uwm.edu/pkgsrc/bootstrap-kits/
http://mirror2.hpc.uwm.edu/pkgsrc/bootstrap-kits/

Verify the sha512 sum to protect against a man-in-the-middle attack, and unpack as the root user in /.

Example:

shell-prompt: sha512 pkgsrc-RHEL7-gcc-5.0-sharedapps-pkg-2017Q4.tgz
(compare output with the corresponding .sha512 file you downloaded)
shell-prompt: tar -C / -zxvf pkgsrc-RHEL7-gcc-5.0-sharedapps-pkg-2017Q4.tgz

# Bourne-shell family users (sh/bash/dash/ksh/zsh):
shell-prompt: source /sharedapps/pkg-2017Q4/etc/pkgsrc.sh

# C-shell family users (csh/tcsh):
shell-prompt: source /sharedapps/pkg-2017Q4/etc/pkgsrc.csh

# Environment module users:
module load /sharedapps/pkg-2017Q4/etc/modulefiles/pkgsrc/2017Q4

shell-prompt: pkgin avail | wc -l
16944
shell-prompt: pkgin install gcc5 gcc6 gcc7 clang R octave gromacs

RHEL/CentOS users will need to install gcc, gcc-c++, and gcc-gfortran from Yum.  ( auto-pkgsrc-setup does this automatically if you choose that route instead. )

FYI: The kits are generated with auto-pkgsrc setup and the following responses in addition to the prefix and appropriate GCC requirements:

Are you a package developer? y/[n] n
Accept all software licenses? y/[n] y
Allow vulnerable packages? y/[n] n

These settings can be changed by editing prefix/etc/mk.conf, for example /sharedapps/pkg-2017Q4/etc/mk.conf.

Contributing to Pkgsrc

The pkgsrc project is a highly organized worldwide collaboration.  As more people get involved, life becomes easier for all of us.  Here are a few ways you can pitch in, depending on your skill level:

  • Use pkgsrc and report bugs as soon as you encounter them.
  • Join the pkgsrc-users mailing list.
  • Learn to fix or create packages of your own.  Pkgsrc has many tools to ease the creation of new packages.  Once you learn to use these tools, creating a new pkgsrc package will be much easier than a caveman install.
  • Tell your collegaues about pkgsrc.  The more people get involved, the easier life will be for all of us.
  • If you’re a software developer, make your software package-friendly.  Doing so will not only help pkgsrc, but other package managers as well (Debian packages, FreeBSD ports, MacPorts, Gentoo Portage, etc).  If end-users install your software via a package manager, you won’t have to provide  support for installation issues and you can focus on code development.
    • Provide an install target in your Makefile following the filesystem hierarchy standard under ${DESTDIR}${PREFIX}.
    • Not mentioned in the Wikipedia article: Programs not meant to be run directly by users can be installed under $PREFIX/libexec.
    • Use standard variables in your Makefile.  See the GNU Coding Standards and GNU Make documentation for some good examples.
    • Do not bundle dependencies.  Doing so is essentially creating your own esoteric fragment of a package manager.  Make your software work with external dependencies from the local package manager.  If you need to patch a function from an external library, just bundle the patched function with your source code rather than the whole library, so that the linker won’t pull the unpatched version from the external library.  Send patches to the external library developers ASAP so that they can be incorporated into the next release.
    • Package the source code of each release as project-version.tar.xz (e.g. foo-1.3.1.tar.xz) or similar, so that it unpacks to a directory called project-version (e.g. foo-1.3.1), or use a popular source repository like Github.
    • More tips about package-friendly development are available in our user’s guide at http://uwm.edu/hpc/support/.

Building Your Own Binary Packages

If you’re interested in building your own binary pkgsrc packages, the standard tool is pbulk.  The system we use at UWM is described here.