Software Management

Why Use Package Managers?

Deploy Software in a Small Fraction of the Time

Valuable research is often hindered or outright prevented by the inability to install software.  This need not be the case.

Since I began supporting research computing in 1999, I’ve frequently seen researchers struggle for days or weeks trying to install a single open source application.  In most cases, they ultimately failed.

In many cases, they could have easily installed the software in seconds with one simple command, using a package manager such as Debian packages, FreeBSD ports, MacPorts, or Pkgsrc, just to name a few.

Developer websites often contain poorly written instructions for doing “caveman” installs; manually downloading, unpacking, patching, and building the software.  The same laborious process must often be followed for other software packages on which it depends, which can sometimes number in the dozens.  Caveman installs are a colossal waste of man-hours, especially considering that hundreds or thousands of people around the world are duplicating this wasted effort.

Fortunately, the vast majority of open source software installs can be made trivial for anyone to do for themselves.  Modern package managers perform all the same steps as a caveman install, but automatically.  Package managers also install dependencies for us automatically.

The package managers named above generally have the latest stable version of most software, so unless you really need a feature that was just added, they’ll probably provide what you need to get your research done.

Flexibility

Package managers that support building from source (e.g. FreeBSD ports, Gentoo Portage, MacPorts, OpenBSD ports, Pkgsrc, etc.) also offer build options for many packages, so you can customize the installation to your needs.  For example, below is a FreeBSD ports dialog for building and installing the R statistics package.

r-options

Massively Parallel Workforce

Maintaining tens of thousands of packages requires a lot of man-hours.  Fortunately, there are also thousands of maintainers working together on it, each benefiting from the work of the rest.

Eliminate Redundant Effort

Sadly, thousands of end-users around the world often duplicate the same unnecessary effort installing a software package.  If they all used a package manager instead, each package would need only be created and maintained by one person, while the rest can deploy it effortlessly.

The addition of every new package to a package collection has a “ratchet effect”, eliminating the need for thousands of future caveman installs and locking in an easier life for everyone.

Why Pkgsrc?

Portability

Most package managers are specific to one or a few similar platforms.  For example, Debian Packages are used only on Debian Linux and derivatives such as Ubuntu.  FreeBSD ports are used on FreeBSD and its derivatives (DesktopBSD, GhostBSD, TrueOS), and Dragonfly BSD.  MacPorts are only for OS X.

The pkgsrc package manager is unique in that it fully supports most POSIX compatible (Unix-like) operating systems.

Pkgsrc is the primary package manager for NetBSD and Joyent SmartOS, but is also well supported on other BSDs, Linux, Mac OS X, Solaris, and many others.

At UWM, we use pkgsrc extensively to install the latest open source packages on our CentOS HPC clusters and development servers.

For enterprise Linux platforms, pkgsrc provides by far the largest collection of packages available from any package manager.  In addition, it allows you to easily install the exact same software on Mac OS X, NetBSD, Solaris, and any other POSIX platforms you might be using.

Flexibility

Pkgsrc readily supports installing multiple package collections (trees) on the same system under different prefixes.  No need for chroots, jails, virtual machines, or other containers.  Just set your PATH for a given tree, and you’re on your way.  Each collection is fully contained in its own directory, separate from the others and from software installed via Yum or other means.  This allows older software to remain in-place indefinitely for long-term studies, while newer software can be easily deployed under another prefix.

In addition, individual package can be installed under an alternate prefix within a given tree.  For example, you may wish to install a new development version of a package in the same pkgsrc tree as the latest stable version.  This is as easy as typing

bmake PREFIX=/path/to/install install

Modernity

Of course, RHEL-based systems such as CentOS have their own native package manager, Yum.  However, the RHEL Yum package collection is rather small, only a few thousand distinct packages, while pkgsrc currently offers more then 17,000.  The Yum repository also contains older versions of most software, because RHEL is designed for stability and long-term binary compatibility with commercial software in enterprise environments.  Redhat does not want to introduce new bugs or break binary compatibility by upgrading to the latest kernel or cutting-edge versions of common libraries.

Pkgsrc packages tend to remain up-to-date with the latest stable versions of most software.  Unless you need to latest experimental features of a given application, pkgsrc will probably provide what you need.

RHEL and CentOS users often struggle with open source software that requires a newer compiler suite than provided by the Yum repository.  For example, RHEL/CentOS 6.x, the mainstream version at the time of this writing, uses GCC 4.4, which does not fully support the C++11 standard.  With pkgsrc, using GCC 4.9 to build a package as easy as adding the following to the package Makefile:

GCC_REQD=4.9

You can also add this macro on the command line for a one-time build:

bmake GCC_REQD=4.9 install

The pkgsrc system will then automatically install GCC 4.9 (if it’s not already installed) and use it to build this package.

In addition to easing the installation of scientific software, pkgsrc can turn a RHEL/CentOS system into a more viable development or desktop system, given all the non-scientific tools it provides, such as editors, compilers, office, multimedia and network tools, etc.

In addition, there is the pkgsrc-wip (work in progress) collection, which makes new packages available worldwide for testing before they are committed to the pkgsrc repository.  At the time of this writing, pkgsrc-wip contains over 4000 work-in-progress packages.

Quality and Security

Every package committed to the pkgsrc repository undergoes a quality control process, verifying clean installation and deinstallation, basic security checks, compatibility with other packages, etc.

Hence, the likelihood of encountering problems with software installed via pkgsrc is minimal.  Compare this with your past experience doing caveman installs.

Collaboration

The pkgsrc project is a worldwide collaborative effort with contributions from thousands of the smartest people on the planet.  All that’s needed to make it work is for each of us to contribute a little bit.  Each of us can get back 1,000 times as much as we put in, even if we’re not using the same operating systems.

If everyone in research computing maintained one or two pkgsrc packages, then virtually all of our open source software deployment troubles would be gone.

Convenience

Pkgsrc makes it easy to build binary packages, which can then be installed in seconds on other systems with the same architecture and operating system.  If you have many systems on which you need to deploy packages, you only have to build them once.  Binary packages are already available on the web for the more popular platforms as well.  ( See below. )

Support

The pkgsrc community includes thousands of contributors, including some of the brightest and most experienced Unix admins on the planet.  Hence, most problems with pkgsrc are solved rather quickly and in a sustainable way.  There are numerous mailing lists you can join to ask questions and share your knolwedge, depending on your needs and experience level.

Growth

The pkgsrc project is already a large and well-established project and has been growing rapidly in recent years.

View pkgsrc growth graphs

Join the party and help make pkgsrc even better!

Getting Started

If you’re running NetBSD or SmartOS, the OS installer can configure pkgsrc for you.

Otherwise, you can use our auto-pkgsrc-setup script to easily bootstrap multiple pkgsrc collections onto most other POSIX systems.  Please report any issues with auto-pkgsrc-setup to research-computing@uwm.edu.  For general pkgsrc support, please join the pkgsrc-users list.

This script can be run as root or as an ordinary user, and will guide you through the installation.  Just run it and answer the questions.  In about 15 minutes, you’ll be ready to start installing packages!

The script automatically generates sh and csh RC scripts and an environment module file for each pkgsrc collection.  If you want a particular collection in everyone’s path, the RC scripts can be simply copied into /etc/profile.d on systems that support it, or sourced by startup scripts on other systems.  ( The auto-pkgsrc-setup script offers to do this for you automatically. )

Suppose you installed a pkgsrc collection into ~/Pkgsrc/pkg-2017Q1.  To begin using it, just run to following for csh, tcsh, or other C shell derivatives:

source ~/Pkgsrc/pkg-2017Q1/etc/pkgsrc.csh

…or the following for sh, ksh, bash, dash, or other Bourne-shell derivatives:

. ~/Pkgsrc/pkg-2017Q1/etc/pkgsrc.sh

…or if you use environment modules:

module load ~/Pkgsrc/pkg-2017Q1/etc/modulefiles/pkgsrc/2017Q1

Then, try it out:

cd ~/Pkgsrc/pkgsrc-2017Q1/math/blas/
bmake install

Don’t forget about the pkgsrc-wip collection, which contains many newer packages that have not yet been committed:

cd ~/Pkgsrc/pkgsrc-2017Q1/wip/ncbi-blast
bmake install

Note: The pkgsrc bootstrap process may fail if the running user or group name contains white space or special characters.  This is often caused by the use of AD authentication servers.  If you encounter this issue, try installing pkgsrc as root or another local user with canonical user and primary group names.

Note to FreeBSD users: While pkgsrc can be used on FreeBSD, you may find that bootstrapping additional FreeBSD ports trees is a better option.  This will not provide exactly the same package collection as pkgsrc, but it will provide a bigger collection (27,000+ ports at the time of this writing).  You can use the auto-ports-setup script to get started.

Binary Packages for Research Computing

The University of Wisconsin — Milwaukee provides binary pkgsrc packages for selected operating systems as a service to the research computing community.

The RHEL/CentOS package collections for 2017 quarter 1 contain more than 15,600 binary packages, most of which can be installed in a few seconds.  Of course, you are never limited to using the binary packages.  You can still install  packages from source as described above (e.g. to use non-default build options or install a new package from the WIP collection).

Unlike most package repositories, which have a fixed prefix and frequently upgraded packages, these packages are available for multiple prefixes and remain unchanged for a given prefix.  Additional packages may be added and existing packages may be patched to fix bugs or security issues, but the software versions will not be changed.  This allows researchers to keep older software in-place indefinitely for long-term studies while deploying newer software in later snapshots.

Packages are generated for each quarterly pkgsrc snapshot under a prefix of /sharedapps/pkg-<year>Q<quarter>, e.g. /sharedapps/pkg-2017Q1.

As an example, to make use of the binary packages for 2017Q1:

  1. Install a pkgsrc tree in /sharedapps/pkg-2017Q1 following the Getting Started instructions above.  The script will ask for a binary package host.  Just enter one of the sites under Current Mirrors below.
  2. This will add a line like the following /sharedapps/pkg-2017Q1/etc/pkg_install.conf:
    PKG_PATH=http://unixdev.ceas.uwm.edu:8084/pkgsrc/packages/sharedapps/pkg-2017Q1/RHEL6/All

You can edit this file any time to switch to a different mirror.  You can also configure multiple mirrors, separated by semicolons.

PKG_PATH=http://unixdev.ceas.uwm.edu:8084/pkgsrc/packages/sharedapps/pkg-2017Q1/RHEL6/All;http://unixdev.ceas.uwm.edu:8085/pkgsrc/packages/sharedapps/pkg-2017Q1/RHEL6/All

Run man pkg_install.conf for details.

Current mirrors:

Note: These are temporary URLs and additional mirrors may be added at any time.  Check back frequently for changes and additions.

Site Geographic Location
http://unixdev.ceas.uwm.edu:8085 University of Wisconsin — Milwaukee
http://unixdev.ceas.uwm.edu:8084 University of Wisconsin — Milwaukee

Once your pkg_install.conf is properly configured and your PATH is set for the appropriate installation, you can use pkg_add to install any of the thousands of packages in seconds:

pkg_add clang gcc49 gcc5 gcc6 go emacs25 R lapack plink unanimity

To see what’s available, you can browse the complete package collections here.

If you would like to host a mirror of these packages, please contact us.

Warning: As these packages generally do not receive security updates, running programs in this collection as root is not recommended.  If you wish to run system services or other admin tasks from pkgsrc, we recommend using pkgsrc-current and updating your packages regularly.  NetBSD and SmartOS users can configure the use of binary packages during OS installation.  Joyent Cloud Services provides regularly updated binary packages for SmartOS/Illumos, RHEL/CentOS and OS X.  Our joyent-pkgsrc script can be used to quickly and easily bootstrap a basic installation using Joyent’s binary packages.

Note: A pkgsrc installation can easily consume 50 gigabytes or more while building packages.  If your /sharedapps directory is on a partition that’s too small, create it as a symbolic link before bootstrapping pkgsrc, e.g.:

mkdir /bigpartition/sharedapps
ln -s /bigpartition/sharedapps /

Note: You can also use the more modern pkgin command to manage your packages.

[root@centosdev bacon]# module load /sharedapps/pkg-2017Q2/etc/modulefiles/pkgsrc/2017Q2
[root@centosdev bacon]# pkgin avail | wc -l
 16121
[root@centosdev bacon]# pkgin install gcc7
calculating dependencies... done.
nothing to upgrade.
1 packages to be installed (0B to download, 361M to install):
gcc7-7.1.0nb3
proceed ? [Y/n] 
...
marking gcc7-7.1.0nb3 as non auto-removable

Don’t like our binary packages?  Fine, build your own!  Scroll down for instructions on setting up pbulk for your own package builds.

Contributing to Pkgsrc

The pkgsrc project is a highly organized worldwide collaboration.  As more people get involved, life becomes easier for all of us.  Here are a few ways you can pitch in, depending on your skill level:

  • Use pkgsrc and report bugs as soon as you encounter them.
  • Join the pkgsrc-users mailing list.
  • Learn to fix or create packages of your own.  Pkgsrc has fabulous tools to ease the creation of new packages.  Once you learn to use these tools, creating a new pkgsrc package will be much easier than a caveman install.
  • Promote pkgsrc among your colleagues.  The more people get involved, the easier life will be for all of us.
  • If you’re a software developer, make your software package-friendly.  Doing so will not only help pkgsrc, but other package managers as well (Debian packages, FreeBSD ports, MacPorts, Gentoo Portage, etc).  If most end-users install your software via a package manager, you won’t have to provide much support for installation issues and you can focus on code development.
    • Provide an install target in your Makefile following the filesystem hierarchy standard under ${DESTDIR}${PREFIX}.
    • Not mentioned in the Wikipedia article: Programs not meant to be run directly by users can be installed in $prefix/libexec.
    • Use standard variables in your Makefile.  See the GNU Coding Standards and GNU Make documentation for some good examples.
    • Do not bundle dependencies.  Make your software work with external dependencies from the local package manager.  If you need to patch a function from an external library, just bundle the patched function with your source code, so that the linker won’t pull the unpatched version from the external library.  Send patches to the external library developers ASAP so that it can be incorporated into the next release.
    • Package the source code of each release as project-version.tar.xz (e.g. foo-1.3.1.tar.xz) or similar, so that it unpacks to project-version (e.g. foo-1.3.1), or use a popular source repository like Github.
    • More tips about package-friendly development are available in our user’s guide at http://uwm.edu/hpc/support/.

Building Your Own Binary Packages

There are many ways to go about building binary packages and many sets of instructions for doing so.

The approach we use is designed to be as simple and safe as possible.  Below are instructions for setting up the safest possible pbulk package-building environment using CentOS Linux.

One of the issues you will run into eventually while building packages is “leakage”.  For example, sometimes a badly written configure script with hard-coded search paths might slip through QA and cause pkgsrc to use a build tool or library installed by Yum or other means.

The pkgsrc build system does a fabulous job isolating itself from other software installed on your system and leakage is rare.  It’s not perfect, though.  For this reason, binary packages (which must work on any installation of the same OS, not just the one on which they’re built) should be built on a pristine system with no unnecessary software installed.  This will ensure that they work properly on other machines and don’t depend on anything outside pkgsrc.  This is most easily done using a virtual machine, chroot, or other container.

Here’s how to accomplish this easily with a simple chroot configuration.  You could just as easily use a dedicated machine or a virtual machine for your pbulk builds, but the chroot approach entails the lowest overhead of these methods.

  1. Do a minimal CentOS install, from the CentOS minimal ISO.  We perform this step under VirtualBox.  DO NOT INSTALL ANY ADDITIONAL YUM PACKAGES.  Use the exact same operating system and version as the system that will be used for package building, since we will be using this image in a chroot environment.
  2. Within the new installation as the root user:
    1. yum update -y && shutdown -r now
    2. Create a tarball of the entire installation, e.g. centos-6.tgz:
      1. cd /
      2. mkdir tar
      3. tar zcvf tar/os-version.tgz –exclude ./tar –exclude ./proc –exclude ./dev .
      4. Save this pristine system image in a safe place for repeated future deployments.
      5. Copy os-version.tgz to the system you plan to use for pbulk builds, e.g. using scp.
  3. On the system you will use for bulk builds:
    1. mkdir chroot-os-version
    2. cd chroot-os-version
    3. tar zxvf /path/to/os-version.tgz
    4. chmod 1777 /tmp /var/tmp
    5. cd ..
    6. Download pkg-pbulk-setup into chroot-os-version.
    7. Enter the chroot environment.  We use the pkg-start-chroot script for this.
  4. Within the chroot:
    1. Install the minimum Yum packages needed to bootstrap pkgsrc, and nothing more:
      1. yum install -y gcc gcc-c++ gcc-gfortran
    2. Download a pkgsrc snapshot from http://ftp.netbsd.org/pub/pkgsrc/ and unpack it in the desired location, e.g. /sharedapps/pkgsrc-2017Q1.
    3. Run pkg-pbulk-setup and follow the instructions on the screen.  This should leave you with a functional pbulk setup.  You can edit /usr/pbulk/etc/pbulk.conf and /usr/pbulk/etc/pbulk.list to tune the setup to your needs.  Warning: Be sure to that the user you choose to run pbulk builds has a default shell of sh or bash.  The pbulk system as of May 2017 is sensitive to the user’s shell env and may fail with other shells.  For the same reason, you cannot run pbulk builds as “nobody” or “daemon”, which generally have “nologin” as their default shell.