Tuesday, June 22, 2010

Apt Cacher: A local APT Server

APT-CACHER

apt-cacher is a program to set up an apt cache server from which a number of other computer can download packages. Apt-cacher reduces network bandwidth used while downloading similar packages on a number of computers over LAN.

Tutorial to Setup Apt-Cacher

How To Set up a repository cache with apt-cacher

When running multiple machine with the same distribution, it is interesting to set up a repository cache on your network so that once a package is downloaded from an official repository, all other machines will download it from your local area network.
Having different machines running the same linux distribution, it becomes interesting to set up a repository cache somewhere on your network. This way, you won't download common packages more than 1 time from official repositories.
Here is the situation, we have one machine called repository-cache, this machine is going to act as the repository cache, basically, any other machines in your network is going to use it as a repository.

1. Getting started

As usual, you need to install the required packages in the first place. So type in a terminal:
$sudo apt-get install apt-cacher
Once this is done, it is time to get into the configuration files in /etc/apt-cacher/apt-cacher.conf

2. Configuring Apt-Cacher

2.1. apt-cacher.conf

So now, open apt-cacher's main configuration file: /etc/apt-cacher/apt-cacher.conf and start editing it according to your settings.

The default port apt-cacher is running on is port 3142. You might want to change this value accordingly to your needs.

allowed_hosts: by default, all host are allowed to use the repository cache. You can change this value if you want to only allow certain host. In my case, I want to allow my LAN 192.168.1.0/24 and localhost (both 127.0.0.1 and 127.0.1.1 on ubuntu boxes), so I changed the value to:
allowed_hosts=192.168.1.0/24, 127.0.1.1 as 127.0.0.1 is always allowed, it was not necessary to add 127.0.0.1
generate_reports: This directive makes apt-cacher create a report on how efficient your cache was on a daily basis. Default is 1, if you want to disable this, set it to 0

path_map: This is an interesting directive. Here you can define different aliases for different repository host. For my ubuntu edgy box, my path_map looks like this:
path_map = debuntu repository.debuntu.org ; ubuntu archive.ubuntu.com/ubuntu; ubuntu-updates archive.ubuntu.com/ubuntu ; ubuntu-security security.ubuntu.com/ubuntu
#######################################
A WORKING APT_CACHER FILE
#######################################
# This is the config file for apt-cacher. On most Debian systems
# you can safely leave the defaults alone.
#######################################

# cache_dir is used to set the location of the local cache. This can
# become quite large, so make sure it is somewhere with plenty of space.
cache_dir=/media/808d3175-bdb4-4e90-890f-81b7f27de33a/apt-cacher
#cache_dir=/media/disk/apt-cacher

# The email address of the administrator is displayed in the info page
# and traffic reports.
admin_email=root@localhost

# For the daemon startup settings please edit the file /etc/default/apt-cacher.

# Daemon port setting, only useful in stand-alone mode. You need to run the
# daemon as root to use privileged ports (<1024).
daemon_port=3142

# optional settings, user and group to run the daemon as. Make sure they have
# sufficient permissions on the cache and log directories. Comment the settings
# to run apt-cacher as the native user.
group=www-data
user=www-data

# optional setting, binds the listening daemon to specified IP(s). Use IP
# ranges for more advanced configuration, see below.
# daemon_addr=localhost

# If your apt-cacher machine is directly exposed to the Internet and you are
# worried about unauthorised machines fetching packages through it, you can
# specify a list of IPv4 addresses which are allowed to use it and another
# list of IPv4 addresses which aren't.
# Localhost (127.0.0.1) is always allowed. Other addresses must be matched
# by allowed_hosts and not by denied_hosts to be permitted to use the cache.
# Setting allowed_hosts to "*" means "allow all".
# Otherwise the format is a comma-separated list containing addresses,
# optionally with masks (like 10.0.0.0/22), or ranges of addresses (two
# addresses separated by a hyphen, no masks, like '192.168.0.3-192.168.0.56').
allowed_hosts=*
denied_hosts=

# And similarly for IPv6 with allowed_hosts_6 and denied_hosts_6.
# Note that IPv4-mapped IPv6 addresses (::ffff:w.x.y.z) are truncated to
# w.x.y.z and are handled as IPv4.
allowed_hosts_6=fec0::/16
denied_hosts_6=

# This thing can be done by Apache but is much simpler here - limit access to
# Debian mirrors based on server names in the URLs
#allowed_locations=ftp.uni-kl.de,ftp.nerim.net,debian.tu-bs.de

# Apt-cacher can generate usage reports every 24 hours if you set this
# directive to 1. You can view the reports in a web browser by pointing
# to your cache machine with '/apt-cacher/report' on the end, like this:
#      http://yourcache.example.com/apt-cacher/report
# Generating reports is very fast even with many thousands of logfile
# lines, so you can safely turn this on without creating much
# additional system load.
generate_reports=1

# Apt-cacher can clean up its cache directory every 24 hours if you set
# this directive to 1. Cleaning the cache can take some time to run
# (generally in the order of a few minutes) and removes all package
# files that are not mentioned in any existing 'Packages' lists. This
# has the effect of deleting packages that have been superseded by an
# updated 'Packages' list.
clean_cache=1

# Apt-cacher can be used in offline mode which just uses files already cached,
# but doesn't make any new outgoing connections by setting this to 1.
offline_mode=0

# The directory to use for apt-cacher access and error logs.
# The access log records every request in the format:
# date-time|client ip address|HIT/MISS/EXPIRED|object size|object name
# The error log is slightly more free-form, and is also used for debug
# messages if debug mode is turned on.
# Note that the old 'logfile' and 'errorfile' directives are
# deprecated: if you set them explicitly they will be honoured, but it's
# better to just get rid of them from old config files.
logdir=/var/log/apt-cacher

# apt-cacher can use different methods to decide whether package lists need to
# be updated,
# A) looking at the age of the cached files
# B) getting HTTP header from server and comparing that with cached data. This
# method is more reliable and avoids desynchronisation of data and index files
# but needs to transfer few bytes from the server every time somebody requests
# the files ("apt-get update")
# Set the following value to the maximum age (in hours) for method A or to 0
# for method B
expire_hours=0

# Apt-cacher can pass all its requests to an external http proxy like
# Squid, which could be very useful if you are using an ISP that blocks
# port 80 and requires all web traffic to go through its proxy. The
# format is 'hostname:port', eg: 'proxy.example.com:8080'.
#http_proxy=proxy.example.com:8080

# Use of an external proxy can be turned on or off with this flag.
# Value should be either 0 (off) or 1 (on).
use_proxy=0

# External http proxy sometimes need authentication to get full access. The
# format is 'username:password'.
#http_proxy_auth=proxyuser:proxypass

# Use of external proxy authentication can be turned on or off with this flag.
# Value should be either 0 (off) or 1 (on).
use_proxy_auth=0

# This sets the interface to use for the upstream connection.
# Specify an interface name, an IP address or a host name.
# If unset, the default route is used.
#interface=

# Rate limiting sets the maximum bandwidth in bytes per second to use
# for fetching packages. Syntax is fully defined in 'man wget'.
# Use 'k' or 'm' to use kilobits or megabits / second: eg, 'limit=25k'.
# Use 0 or a negative value for no rate limiting.
limit=0

# Debug mode makes apt-cacher spew a lot of extra debug junk to the
# error log (whose location is defined with the 'logdir' directive).
# Leave this off unless you need it, or your error log will get very
# big. Acceptable values are 0 or 1.
debug=0

# To enable data checksumming, install libberkeleydb-perl and set this option
# to 1. Then wait until the Packages/Sources files have been refreshed once
# (and so the database has been built up). You can also nuke them in the cache
# to trigger the update.
# checksum=1

# Print a 410 (Gone) HTTP message with the specified text when accessed via
# CGI. Useful to tell users to adapt their sources.list files when the
# apt-cacher server is being relocated (via apt-get's error messages while
# running "update")
#cgi_advise_to_use = Please use http://cacheserver:3142/ as apt-cacher access URL
#cgi_advise_to_use = Server relocated. To change sources.list, run perl -pe "s,/apt-cacher\??,:3142," -i /etc/apt/sources.list

# Server mapping - this allows to hide real server names behind virtual paths
# that appear in the access URL. This method is known from apt-proxy. This is
# also the only method to use FTP access to the target hosts. The syntax is
# simple, the part of the beginning to replace, followed by a list of mirror
# urls, all space separated. Multiple profile are separated by semicolons
# Note that you need to specify all target servers in the allowed_locations
# options if you make use of it. Also note that the paths should not overlap
# each other. FTP access method not supported yet, maybe in the future.
# path_map = debian ftp.uni-kl.de/pub/linux/debian ftp2.de.debian.org/debian ; ubuntu archive.ubuntu.com/ubuntu ; security security.debian.org/debian-security ftp2.de.debian.org/debian-security
path_map = ubuntu in.archive.ubuntu.com/ubuntu ; ubuntu-security  security.ubuntu.com/ubuntu ; scratchbox scratchbox.org/debian ; ubuntu-ports ports.ubuntu.com/ubuntu-ports; hasty-armv6el-vfp repository.handhelds.org/hasty-armv6el-vfp;


# Permitted package files - this is a perl regular expression which matches all
# package-type files (files that are uniquely identified by their filename).
# The default is:
#package_files_regexp = (?:\.deb|\.rpm|\.dsc|\.tar\.gz|\.diff\.gz|\.udeb|index\.db-.+\.gz|\.jigdo|\.template)$

# Permitted Index files - this is the perl regular expression which matches all
# index-type files (files that are uniquely identified by their full path and
# need to be checked for freshness).
#The default is:
#index_files_regexp = (?:Index|Packages\.gz|Packages\.bz2|Release|Release\.gpg|Sources\.gz|Sources\.bz2|Contents-.+\.gz|pkglist.*\.bz2|release|release\..*|srclist.*\.bz2|Translation-.+\.bz2)$


#######################################
Let me explain that bit. Here I created mappings from names:
  • debuntu to host repository.debuntu.org
  • ubuntuand ubuntu-updates to host archive.ubuntu.com/ubuntu
  • and ubuntu-security to security.ubuntu.com
Now, in order to access a specific repository, we simply need to append the mapping name to our cache repository server, like: repository_cache_machine:port/mapping_name
So, for instance, we can access debuntu repository through http://repository-cache:3142/debuntu and ubuntu secutiry repository through http://repository-cache:3142/ubuntu-security.

2.2. Activating apt-cacher to start

In order to start, apt-cacher needs to be activated from /etc/default/apt-cacher. So open /etc/default/apt-cacher and set AUTOSTART to 1:
AUTOSTART=1
Now restart apt-cacher:
$sudo /etc/init.d/apt-cacher restart
Now that apt-cacher runs, it is time to update all our clients /etc/apt/sources.list files so every host on the network will use our repository-cache machine.

#######################################
#A WORKING SOURCES.LIST FILE
#deb cdrom:[Ubuntu 9.10 _Karmic Koala_ - Release i386 (20091028.5)]/ karmic main restricted
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.

#deb http://in.archive.ubuntu.com/ubuntu/ karmic main restricted multiverse universe
#deb-src http://in.archive.ubuntu.com/ubuntu/ karmic main restricted multiverse universe

deb http://192.168.10.101:3142/ubuntu/ karmic main restricted multiverse universe

## Major bug fix updates produced after the final release of the
## distribution.
#deb http://in.archive.ubuntu.com/ubuntu/ karmic-updates main restricted
#deb-src http://in.archive.ubuntu.com/ubuntu/ karmic-updates main restricted

#deb http://192.168.10.101:3142/ubuntu/ karmic-updates main restricted

## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://in.archive.ubuntu.com/ubuntu/ karmic-backports main restricted universe multiverse
# deb-src http://in.archive.ubuntu.com/ubuntu/ karmic-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu karmic partner
# deb-src http://archive.canonical.com/ubuntu karmic partner

#deb http://security.ubuntu.com/ubuntu karmic-security main restricted universe multiverse
#deb-src http://security.ubuntu.com/ubuntu karmic-security main restricted  universe multiverse

deb http://192.168.10.101:3142/ubuntu-security karmic-security main restricted universe multiverse

deb http://192.168.10.101:3142/scratchbox stable main

###################################


3. Setting up the Clients and Server sources.list

Now it is time to set up the client hosts apt source list files: /etc/apt/sources.list. It make sense to use the repository cache on the server too, as that way, any updates made by the server will fill up the cache.
Here is the original /etc/apt/sources.list:
#debuntu repository
deb http://repository.debuntu.org edgy multiverse
deb-src http://repository.debuntu.org edgy multiverse

#ubuntu main repository
deb http://archive.ubuntu.com/ubuntu/ edgy main restricted universe multiverse
deb-src http://archive.ubuntu.com/ubuntu/ edgy main restricted universe multiverse

#ubuntu updates repository
deb http://archive.ubuntu.com/ubuntu/ edgy-updates main restricted universe multiverse
deb-src http://archive.ubuntu.com/ubuntu/ edgy-updates main restricted universe multiverse

#ubuntu security updates repository
deb http://security.ubuntu.com/ubuntu edgy-security main restricted universe multiverse
deb-src http://security.ubuntu.com/ubuntu edgy-security main restricted universe multiverse
In order to use our cache repository, those entries need to be changed to:
#debuntu repository
deb http://repository-cache:3142/debuntu edgy multiverse
deb-src http://repository-cache:3142/debuntu edgy multiverse

#ubuntu main repository
deb http://repository-cache:3142/ubuntu edgy main restricted universe multiverse
deb-src http://repository-cache:3142/ubuntu edgy main restricted universe multiverse

#ubuntu updates repository
deb http://repository-cache:3142/ubuntu-updates edgy-updates main restricted universe multiverse
deb-src http://repository-cache:3142/ubuntu-updates edgy-updates main restricted universe multiverse

#ubuntu security updates repository
deb http://repository-cache:3142/ubuntu-security edgy-security main restricted universe multiverse
deb-src http://repository-cache:3142/ubuntu-security edgy-security main restricted universe multiverse
Cool, now, every host should be able to retrieve the .deb packages from our repository cache once:
$sudo apt-get update
has been ran on every host.

4. Importing existing package from /var/cache/apt/archives/ to apt-cacher repository

It might happen that your server already got a whole lot of packages cached in its local repository: /var/cache/apt/archives/. apt-cacher offers a tool to import those files to apt-cacher repository.
There is a whole lot of usefull script that can be found in /usr/share/apt-cacher/. The one we are interested in here is apt-cacher-import.pl. To import deb files from /var/cache/apt/archives to apt-cacher repository run:
$sudo /usr/share/apt-cacher/apt-cacher-import.pl /var/cache/apt/archives
This must be run as user root or .deb files might not be copied to the cache repository
Now, the directory /var/cache/apt-cacher/packages/ should be filled up with a whole bunch of packages.

5. Getting report usage of your repository cache

If you left the directive generate_reports set to 1, apt-cacher will generate report on the usage of the cache every day.
You will be able to access it at the address: http://repository-cache:3142/report
.
If you need to regenerate the report, run: $sudo /usr/share/apt-cacher/apt-cacher-report.pl

6. Conclusion

apt-cacher is an easy and efficient package which will save you both time and bandwidth when using multiple machines with the same distribution like it could happen in a home network or in at a company.

###################################################

No comments:

Post a Comment