org-hyperion-cules/html/cckddasd.html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 3.0//EN" "html.dtd">
<HTML>
<HEAD><TITLE>
Hercules: Compressed Dasd Emulation</TITLE>
<LINK REL=STYLESHEET TYPE="text/css" HREF="hercules.css">
</HEAD>
<BODY BGCOLOR="#ffffcc" TEXT="#000000" LINK="#0000A0"
      VLINK="#008040" ALINK="#000000">
<h1>Compressed Dasd Emulation</h1>
<hr noshade>
<h2>Contents</h2>
<ul>
<li><a href="#introduction">       Introduction       </a>
<li><a href="#shadowfiles">        Shadow Files       </a>
<li><a href="#filestructure">      File Structure     </a>
<li><a href="#howitworks">         How It Works       </a>
<li><a href="#cckdcommand">        The CCKD Command   </a>
<li><a href="#utilities">          Utilities          </a>
<li><a href="#faq">                FAQ                </a>
</ul>

<hr noshade>
<h3><a NAME="introduction">Introduction</a></h3>
Using compressed DASD files you can significantly reduce the file space
required for emulated DASD files and possibly gain a performance boost
because less physical I/O occurs.

Both <b>CKD</b> (Count-Key-Data) and <b>FBA</b> (Fixed-Block-Architecture)
emulation files can be compressed.
<p>
In regular (or uncompressed) files, each CKD track or FBA block occupies
a specific spot in the emulation file.  The offset of the track or block
in the file can be directly calculated knowing the track or block number
and the maximum size of the track or block.  In compressed files, each
track image or group of blocks may be compressed by
<a href="http://www.info-zip.org/pub/infozip/zlib/"><b>zlib</b></a> or
<a href="http://sourceware.cygnus.com/bzip2/"><b>bzip2</b></a>, and only
occupies the space neccessary for the compressed image.  The offset of a compressed
track or block is obtained by performing a two-table lookup.  The lookup
tables themselves reside in the emulation file.
<p>
Because FBA blocks are 512 bytes in length, and that being a rather small
number, FBA blocks are grouped into <b>block groups</b>.  Each block group
contains 120 FBA blocks (60K).
<p>
Whenever a track or block group is written to a compressed file,
it is written either to an existing free space within the file, or at
the end of the file, then the lookup tables are updated, and then the space the
track or block group previously occupied is freed.  The location of a
track or block group in the file can change many times.
<p>
In the event of a catastrophic failure (for example, Hercules crash,
operating system crash, power failure), the compressed emulation file
on the host's physical disk may be out of sync if the host operating
system defers physical writes to the file system containing the emulation
file.  A number of techniques have been provided to minimize emulation
file corruption in such an event.
<p>
A compressed file may occupy only 20% of the disk space required by an
uncompressed file.  In other words, you may be able to have 5 times more
emulated volumes using compressed DASD files.  However, compressed files
are more sensitive to failures and corruption may occur.

<p>
<hr noshade>
<p><h3><a NAME="shadowfiles">Shadow Files</a></h3>


An compressed CKD or FBA dasd can have more than one physical file.  The
additional files are called <em>shadow files</em>.
The function is implemented as a kind of
<i>snapshot</i>, where a new shadow file can be created on demand.
An emulated dasd is represented by a <em>base</em> file and 0 or more
shadow files.  All files are opened <em>read-only</em>
except for the <em>current</em> file, which is opened <em>read-write</em>.
<p>
Shadow files are specified by the <b>sf=</b><i>shadow-file-name</i> parameter
on the device statement for the compressed DASD device.  The shadow file name
should have spot where the shadow file number will be set.  This is
either the character preceding the last period after the last slash or the
last character if there is no period.  For example:<br><br>
<code>0100 3390 disks/linux1.dsk sf=shadows/linux1_*.dsk</code>
<p>
There can be up to 8 shadow files in use at any time for an
emulated dasd device.  The base file is designated file<b>[0]</b> and
the shadow files are file<b>[1]</b> to file<b>[8]</b>.
The <em>highest</em> numbered file in use at a given time is the <em>current</em>
file, where all writes will occur.  Track reads start with the <em>current</em>
file and proceed down until a file is found that actually contains the track
image.
<p>
A shadow file contains all the changes made to the emulated dasd
since it was created, until the next shadow file is created.  The moment
of the shadow file's creation can be thought of as a <em>snapshot</em>
of the current emulated dasd at that time, because if the shadow file is
later removed, then the emulated dasd reverts back to the state it was at
when the <em>snapshot</em> was taken.
<p>
Using shadow files, you can keep the base file on a read-only device
such as cdrom, or change the base file attributes to read-only,
ensuring that this file can never be corrupted.
<p>
Hercules console commands are provided to add a new shadow file, remove
the current shadow file (with or without backward merge), compress the
curent shadow file, and display the shadow file status and statistics:<br><br>

<table>
<tr><td align="left"><b>sf+</b></td>
    <td align="left" colspan="2"><font size=-1>unit</font></td>
    <td align="left">&nbsp&nbsp Create a new shadow file</td>
<tr><td align="left"><b>sf-</b></td>
    <td align="left" colspan="2"><font size=-1>unit</font></td>
    <td align="left">&nbsp&nbsp Remove a shadow file with backwards merge</td>
<tr><td align="left"><b>sf-</b></td>
    <td align="left">            <font size=-1>unit</font></td>
    <td><b>nomerge</b></td>
    <td align="left">&nbsp&nbsp Remove a shadow file without backwards merge</td>
<tr><td align="left"><b>sfc</b></td>
    <td align="left" colspan="2"><font size=-1>unit</font></td>
    <td align="left">&nbsp&nbsp Compress the current file</td>
<tr><td align="left"><b>sfd</b></td>
    <td align="left" colspan="2"><font size=-1>unit</font></td>
    <td align="left">&nbsp&nbsp Display shadow file status and statistics</td>
</table>
<br>
<b><font size=-1>Note</font></b>.  You can use <b>*</b> in place of unit
address to apply the command to all compressed dasd.

<p>
<hr noshade>
<p><h3><a NAME="filestructure">Compressed DASD File Structure</a></h3>

A compressed DASD file has 6 types of spaces, a <em>device header</em>,
a <em>compressed device header</em>, a <em>primary lookup table</em>,
<em>secondary lookup tables</em>, track or block group <em>images</em>,
and <em>free spaces</em>.  The first 3 types only occur once, at the
beginning of the file in order.  The rest of the file is occupied by
the other 3 space types.
<p>
The first 512 bytes of a compressed DASD file contains a <b>device header</b>.
The device header contains an eye-catcher that identifies the file type
(CKD or FBA and base or shadow).  The device type and file size is also
specified in this header.  The header is identical to the header used
for uncompressed CKD files, except for the eye-catcher:
<p>
<table border=1>
<tr><td align="left" colspan="8"><font size=-1>devid</font></td>
    <td align="left" colspan="4"><font size=-1>heads</font></td>
    <td align="left" colspan="4"><font size=-1>trksize</font></td>
<tr><td align="left" colspan="1"><font size=-1>devt</font></td>
    <td align="left" colspan="1"><font size=-1>seq</font></td>
    <td align="left" colspan="2"><font size=-1>hicyl</font></td>
    <td align="left" colspan="12">&nbsp</td>
<tr><td align="center" valign="middle" colspan="16">
        <br><br><font size=-1>reserved</font><br><br><br></td>
</table>
<p>

The next 512 bytes contains the <b>compressed device header</b>.
This contains file usage information such as the amount of free
space in the file:
<p>
<table border=1>
<tr><td align="left" colspan="3"><font size=-1>vrm</font></td>
    <td align="left" colspan="1"><font size=-1>opts</font></td>
    <td align="left" colspan="4"><font size=-1>numl1</font></td>
    <td align="left" colspan="4"><font size=-1>numl2</font></td>
    <td align="left" colspan="4"><font size=-1>size</font></td>
<tr><td align="left" colspan="4"><font size=-1>used</font></td>
    <td align="left" colspan="4"><font size=-1>->free</font></td>
    <td align="left" colspan="4"><font size=-1>free</font></td>
    <td align="left" colspan="4"><font size=-1>largest</font></td>
<tr><td align="left" colspan="4"><font size=-1>number</font></td>
    <td align="left" colspan="4"><font size=-1>&nbsp</font></td>
    <td align="left" colspan="4"><font size=-1>cyls</font></td>
    <td align="left" colspan="1"><font size=-1>&nbsp</font></td>
    <td align="left" colspan="1"><font size=-1>comp</font></td>
    <td align="left" colspan="4"><font size=-1>parm</font></td>
<tr><td align="center" colspan="16">
        <br><br><font size=-1>reserved</font><br><br><br></td>
</table>
<p>
After the compressed device header is the <b>primary lookup table</b>,
also called the <em>level 1 table</em> or <em>l1tab</em>.  Each
4 byte unsigned entry in the l1tab contains the file offset of
a <em>secondary lookup table</em> or <em>level 2 table</em> or
<em>l2tab</em>.  The track or block group number being accessed
divided by 256 gives the index into the l1tab.  That is, each l1tab
entry represents 256 tracks or block groups.  The number of entries
in the l1tab is dependent on the size of the emulated device:
<p>
<table border=1>
<tr><td align="left" colspan="4"><font size=-1>l2<sub>0</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>1</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>2</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>3</sub></font></td>
<tr><td align="left" colspan="4"><font size=-1>l2<sub>4</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>5</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>6</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>7</sub></font></td>
<tr><td align="left" colspan="16">
        <br><br><center>.&nbsp&nbsp.&nbsp&nbsp.</center><br><br></td>
<tr><td align="left" colspan="4"><font size=-1>l2<sub>n-4</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>n-3</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>n-2</sub></font></td>
    <td align="left" colspan="4"><font size=-1>l2<sub>n-1</sub></font></td>
</table>
<p>
Following the <em>l1tab</em>,
in no particular order, are <em>l2tabs</em>, track or block group
<em>images</em>, and <em>free spaces</em>.
<p>
Each <b>secondary lookup table</b> (or <em>l2tab</em>), contains 256 8-byte
entries.  The entry is indexed
by the remainder of the track or block group number divided by 256.  Each
entry contains an unsigned 4 byte offset and an unsigned 2 byte length of the
track or block group image:<p>
<table border=1>
<tr><td align="left" colspan="4"><font size=-1>
       <sup>0</sup>&nbsp ->image
       &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp</font></td>
    <td align="left" colspan="2"><font size=-1>length</font></td>
    <td align="left" colspan="2"><font size=-1>unused</font></td>
<tr><td align="left" colspan="4"><font size=-1>
       <sup>1</sup>&nbsp ->image
       &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp</font></td>
    <td align="left" colspan="2"><font size=-1>length</font></td>
    <td align="left" colspan="2"><font size=-1>unused</font></td>
<tr><td align="center" colspan="8"><font size=-1>
       <br>.&nbsp&nbsp .&nbsp&nbsp .<br><br></td>
<tr><td align="left" colspan="4"><font size=-1>
       <sup>255</sup>&nbsp ->image
       &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp</font></td>
    <td align="left" colspan="2"><font size=-1>length</font></td>
    <td align="left" colspan="2"><font size=-1>unused</font></td>
</table>
<p>
A track or block group <b>image</b> contains two fields, a 5-byte
<em>header</em> and a variable amount of data that may or may not be
compressed.  The length in the l2tab entry includes the length of the
header and the data.
<p>
<table border=1>
<tr><td align="left"><font size=-1>hdr</font></td>
    <td align="left"><font size=-1>track or block group data</font></td>
</table>
<p>
The 5 byte header contains a 1 byte flag field and 4 bytes that
identify the track or block group.  The format of the identifier
depends on whether the emulated device is CKD or FBA:
<p>
CKD hdr
<table border=1>
<tr><td><font size=-1>flags</font></td>
    <td align="left" colspan="2"><font size=-1><b>CC</b></font>&nbsp&nbsp</td>
    <td align="left" colspan="2"><font size=-1><b>HH</b></font>&nbsp&nbsp</td>
</table>
<p>
The 2 byte CC is the cylinder number for the track image and the HH
is the head number.  These numbers are stored in <em>big-endian</em>
byte order.  When the flag byte is zeroed, the 5 byte header is identical
to the <em>Home Address</em> (or <em>HA</em>) for the track image.
The data, which may or may not be compressed, begins with the <em>R0</em>
count and ends with the <em>end-of-track</em> (or <em>eot</em>) marker,
which is a count field containing 8 0xff's.  The <em>HA</em> plus the
uncompressed track data comprise the track image.
<p>
FBA hdr
<table border=1>
<tr><td><font size=-1>flags</font></td>
    <td align="left" colspan="4"><font size=-1>nnnn</font>
        &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp</td>
</table>
<p>
The 4 byte nnnn field is the FBA block group number in
<em>big-endian</em> byte order. The data contains 120 FBA blocks,
which may or may not be compressed.  Uncompressed, the FBA block
group is 60K.  The header for FBA, unlike CKD, is not used as part
of the uncompressed image.
<p>
The flags byte contains 8 bits in the format
<table border=1>
<tr><td><font size=-1>0 &nbsp 0 &nbsp 0 &nbsp 0 &nbsp
                      0 &nbsp 0 &nbsp <b>c</b> &nbsp <b>c</b> &nbsp
    </font></td>
</table>
The first 6 bits are always zero but may be used in future releases.
The last two bits, <em>cc</em>, indicate the compression algorithm
for the data portion:
<table border="1">
<tr><td>0 &nbsp 0</td><td>&nbsp &nbsp Data is uncompressed</td>
<tr><td>0 &nbsp 1</td><td>&nbsp &nbsp Data is compressed using zlib</td>
<tr><td>1 &nbsp 0</td><td>&nbsp &nbsp Data is compressed using bzip2</td>
<tr><td>1 &nbsp 1</td><td>&nbsp &nbsp Not valid</td>
</table>

<p>

<b>Free space</b> contains a 4-byte <i>offset</i> to the next free space, a
4-byte <i>length</i> of the free space, and zero or more bytes of residual data:
<p>
<table border=1>
<tr><td><font size=-1>->next</font></td>
    <td><font size=-1>length</font></td>
    <td>&nbsp &nbsp<font size=-1>residual</font>&nbsp &nbsp</td>
</table>
<p>
The minimum length of a free space is 8 bytes.
The free space chain is ordered by file offset and no two free spaces are
adjacent.  The <em>compressed device header</em> contains the offset to the
first free space.  The chain is terminated when a free space has zero offset
to the next free space.  The free space chain is read when the file is opened
for read-write and written when the file is closed; while the file is opened,
the free space chain is maintained in storage.
<p>

<hr noshade>
<p><h3><a NAME="howitworks">How It Works</a></h3>

<b>Reading</b><br>
A track or block group image is read while executing a channel
program or by the <em>readahead</em> thread.  An image has to
be read before it is updated or written to.  An image may be <em>cached</em>.
If an image is cached, then the channel program may complete
<em>synchronously</em>.  This means that if all the data a channel program
accesses is cached and Hercules does not have to perform physical I/O,
then the channel program runs synchronously within the SSCH or SIO
instruction in the <em>CPU</em> thread.  All DASD channel programs are started
synchronously.  If a CCW in the channel program requires physical I/O
then the channel program is interrupted and restarted at that CCW
<em>asynchronously</em> in a <em>device I/O</em> thread.
<p>
All compressed devices share a common cache; the devices can be a mixture
of FBA and/or CKD device types.  Each cache entry contains a pointer to
a 64K buffer containing an uncompressed track or block group image.
If the track or block group image being read is not found in the cache,
then the oldest (or <em>least recently used</em> or <em>LRU</em>) entry that
is not <em>busy</em> is <em>stolen</em>.  A cache entry is busy if it is
being read, or last accessed by an <em>active</em> channel program, or updated
but not yet written, or being written.  If no cache entries are available then
the read must enter a <em>cache wait</em>.  When images are detected to be
accessed sequentially then the readahead thread(s) may be signalled to read
following sequential images.
<p>
<b>Writing</b><br>
When a cache entry is updated or written to, a bit is turned on indicating
the cache entry has been updated.  When a <em>cache wait</em> occurs, or
(more likely) during garbage collection, a cache <em>flush</em> is performed.
When the cache is flushed, if any entries have the updated bit on, then
the writer thread(s) are signalled.  The writer thread selects the oldest
cache entry with the updated bit on, compresses the image, and writes it
to the file.  The new image is written to a new space in the file and then
the space previously occupied by the image is freed.  In certain circumstances,
the image may be written under <em>stress</em>.  A stress write occurs when
a reading thread is in a <em>cache wait</em> or when a high percentage of
cache entries are pending write.  In this circumstance, the compression
parameters are relaxed to reduce the CPU requirements.  An image written
under stress is likely to take up more space than the same image written
not under stress.  The writer thread(s) run 1 nicer than the CPU thread(s);
compression is a CPU intensive activity.
<p>
<b>Garbage Collection</b><br>
The primary function of the garbage collector is to keep the emulated
compressed DASD files as small as possible.  After all, that is the reason
for using compressed DASD files in the first place.  Another function
is to perform emulation file synchronization.
<p>
A single garbage collector thread runs for all compressed devices.
By default it wakes up at 5 second intervals.  The garbage collector
performs <em>space recovery</em> for each compressed device in the order
that the device was defined or attached.  After space recovery the garbage
collector flushes the cache to force all outstanding writes.  Once all the
writes have been completed, a file synchronization (<em>fsync()</em>) may
optionally be performed, which commits any outstanding host I/O to the
physical disk.  Finally free space is flushed (to be explained later).
<p>
We see that with the fsync option enabled that the physical disk file
has a coherent emulation file at the end of each garbage collection cycle.
Space freed since the last garbage collection cycle completed is not
available for allocation until the current garbage collection cycle
completes.  This free space is called <em>pending free space</em>.
That is, previous track or block group images are not overwritten
until the current garbage collection completes.
If a catastrophic error occurs, then the emulation file should be
recoverable at least up to the point of the last garbage collection cycle.
<p>
However, performing an fsync() may decrease performance.  You can increase
the garbage collection interval, to reduce the number of fsync()s, but this may also increase the probability of a cache wait occurring.  You can increase the
size of the cache to decrease this probability, but you may increase paging or
have to decrease the size of emulated memory.
<p>
Another possibility is to not enable the fsync option.  This is the default.
In this circumstance, by default, freed space is not available until 2
garbage collection cycles complete.  That is, <i>pending free space</i> is
not an attribute but a count.  You have the option to explicitly set the
pending free space count.  However, by increasing the free space count or
by increasing the garbage collection interval, then you may be increasing
the size of the emulation file.
<p>
At the very end of the garbage collection cycle, the free space is
<em>flushed</em>.  This means that the pending free space count is decremented
for all free spaces with a non-zero count.  If the count goes to zero and
the preceding space is a free space with a zero count then the spaces are
combined.
<P>
The space recovery process of the garbage collector simply attempts to move
some amount of used space towards the beginning of the file causing free
space to move towards the end of the file.  When a free space reaches the
end of the file, the file is <em>truncated</em>, reducing its size.  The
amount of used space moved depends on the ratio of free space to used space
and on the number of free spaces.  The larger the numbers, the more space
the garbage collector attempts to move.  That is, the garbage collector
attempts to decrease the ratio of free space to used space and to decrease
the number of free spaces.  Within a cycle, the garbage collector might not
move the selected amount of used space if the moves are detected to be
counter-productive (ie the offset of the new space is greater than the
current offset).

<hr noshade>
<p><h3><a NAME="cckdcommand">The cckd command</a></h3>

The <b>cckd</b> command and initialization statement can be used to
affect cckd processing.  Normally the defaults should suffice; however
the cache size may need to be adjusted depending upon the number of
emulated devices and the amount of physical memory you have.
<p>
<b>Syntax:</b>
<table>
<tr><td><b>cckd</b></td><td><b>help</b></td><td>Display cckd help</td>
<tr><td><b>cckd</b></td><td><b>stats</b></td>
                        <td>Display current cckd statistics</td>
<tr><td><b>cckd</b></td><td><b>opts</b></td><td>Display current cckd options</td>
<tr><td><b>cckd</b></td><td>opt=value</td><td>Set a cckd option</td>
<tr><td>&nbsp;</td><td>&nbsp;</td><td>Multiple options may be specified,
                                      separated by a comma with no intervening
                                      blanks.</td>
<tr><td>&nbsp;</td><td><b>cache=</b>n</td><td>Cache size in M</td>
<tr><td>&nbsp;</td><td><b>l2cache=</b>n</td><td>L2 cache size in K</td>
<tr><td>&nbsp;</td><td><b>ra=</b>n</td><td>Number readahead threads</td>
<tr><td>&nbsp;</td><td><b>raq=</b>n</td><td>Readahead queue size</td>
<tr><td>&nbsp;</td><td><b>rat=</b>n</td><td>Number of tracks to readahead</td>
<tr><td>&nbsp;</td><td><b>wr=</b>n</td><td>Number writer threads</td>
<tr><td>&nbsp;</td><td><b>gcint=</b>n</td><td>Garbage collection interval</td>
<tr><td>&nbsp;</td><td><b>gcparm=</b>n</td><td>Garbage collection parameter</td>
<tr><td>&nbsp;</td><td><b>nostress=</b>n</td><td>Turn stress writes on or off</td>
<tr><td>&nbsp;</td><td><b>freepend=</b>n</td><td>Set the free pending value</td>
<tr><td>&nbsp;</td><td><b>fsync=</b>n</td><td>Turn fsync on or off</td>
<tr><td>&nbsp;</td><td><b>ftruncwa=</b>n</td><td>Turn ftruncate bug workaround
                                                 on or off</td>
<tr><td>&nbsp;</td><td><b>trace=</b>n</td><td>Number of trace table entries</td>
</table>
<p>
<b>Options:</b>
<table>
<tr><td valign="top"><b>cache=</b>n</td>
    <td>Size of the cache in megabytes.  Each cache entry points
        to a 64K buffer.  Therefore each megabyte represents 16 cache entries.
        <p>
        The default is <b>8</b>, or 256 cache entries.
        <p>
        You can specify a number between <b>1</b> and <b>64</b> (16 to
        1024 cache entries).
        <p>
    </td>
<tr><td valign="top"><b>l2cache=</b>n&nbsp</td>
    <td>Size of the level 2 table cache in kilobytes.
        Each cache entry points to a 2K l2tab.  Therefore each 2K
        represents a single cache entry.
        <p>
        The default is <b>512</b>, or 256 cache entries.
        <p>
        You can specify a number between <b>256</b> and <b>2048</b> (128 to
        1024 cache entries).
        <p>
    </td>
<tr><td valign="top"><b>ra=</b>n</td>
    <td>Number of readahead threads.  When sequential track or block group
        access is detected, some number (<em>rat= </em>) of tracks or
        block groups are queued (<em>raq= </em>) to be read by one of the
        readahead threads.
        <p>
        The default is <b>2</b>.
        <p>
        You can specify a number between <b>1</b> and <b>9</b>.
        <p>
    </td>
<tr><td valign="top"><b>raq=</b>n</td>
    <td>Size of the readahead queue.  When sequential track or block group
        access is detected, some number (<em>rat= </em>) of tracks or
        block groups are queued in the readahead queue.
        <p>
        The default is <b>4</b>.
        <p>
        You can specify a number between <b>0</b> and <b>16</b> (a value
        of zero disables readahead).
        <p>
    </td>
<tr><td valign="top"><b>rat=</b>n</td>
    <td>Number of tracks or block groups to read ahead when sequential access
        has been detected.
        <p>
        The default is <b>2</b>.
        <p>
        You can specify a number between <b>0</b> and <b>16</b> (a value
        of zero disables readahead).
        <p>
    </td>
<tr><td valign="top"><b>wr=</b>n</td>
    <td>Number of writer threads.  When the cache is <em>flushed</em> updated
        cache entries are marked write pending and a writer thread is signalled.
        The writer thread compresses the track or block group and writes the
        compressed image to the emulation file.  A writer thread is cpu-intensive
        while compressing the track or block group and i/o-intensive while writing
        the compressed image.  The writer thread runs one <em>nicer</em> than
        the CPU thread(s).
        <p>
        The default is <b>2</b>.
        <p>
        You can specify a number between <b>1</b> and <b>9</b>.
        <p>
    </td>
<tr><td valign="top"><b>gcint=</b>n</td>
    <td>Number of seconds the garbage collector thread waits durinng an interval.
        At the end of an interval, the garbage collector performs space recovery,
        flushes the cache, and optionally <em>fsync</em>s the emulation file.
        (However, the file will not be <em>fsync</em>ed unless at least 5
        seconds have elapsed since the last <em>fsync</em>).
        <p>
        The default is <b>5</b> seconds.
        <p>
        You can specify a number between <b>1</b> and <b>60</b>.
        <p>
    </td>
<tr><td valign="top"><b>gcparm=</b>n</td>
    <td>A value affecting the amount of data moved during the garbage collector's
        space recovery routine. The garbage collector determines an amount of
        space to move based on the ratio of free space to used space in an
        emulation file, and on the number of free spaces in the file.  (The
        garbage collector wants to reduce the free space to used space ratio
        and the number of free spaces).  The value is logarithmic; a value
        of 8 means moving 2<sup>8</sup> the selected value while a negative
        value similarly decreases the amount to be moved.  Normally, 256K
        will be moved for a file in an interval.  Specifying a value of 8 can
        increase the amount to 64M.  At least 64K will be moved.  Interestingly,
        specifying a large value (such as 8) may not increase the garbage
        collection efficiency correspondingly.
        <p>
        The default is <b>0</b>.
        <p>
        You can specify a number between <b>-8</b> and <b>8</b>.
        <p>
<tr><td valign="top"><b>nostress=</b>n&nbsp</td>
    <td>Indicates whether <em>stress</em> writes will occur or not.  A track
        or block group may be written under stress when a high percentage of
        the cache is pending write or when a device i/o thread is waiting for
        a cache entry.  When a stressed write occurs, the compression algorithm
        and/or compression parm may be relaxed, resulting in faster compression
        but usually a larger compressed image.  If <em>nostress</em> is set
        to one, then a stressed situation is ignored.  You would typically
        set this value to one when you want create the smallest emulation file
        possible in exchange for a possible performance degradation.
        <p>
        The default is <b>0</b>.
        <p>
        You can specify <b>0</b> (enable stressed writes) or <b>1</b>
        (disable stressed writes).
        <p>
    </td>
<tr><td valign="top"><b>freepend=</b>n&nbsp</td>
    <td>Specifies the <em>free pending</em> value for freed space.  When a
        track or block group image is written the space it previously occupied
        is freed.  This space will not be available for future
        allocations until <em>n</em> garbage collection intervals have completed.
        In the event of a catastrophic failure, previously written track or
        block group images should be recoverable if the current image has
        not yet been written to the physical disk.  By default the value
        is set to <b>-1</b>.  This means that if <em>fsync</em> is specified
        then the value is 1 otherwise it is 2.  If 0 is specified then freed
        space is immediately available for new allocations.
        <p>
        The default is <b>-1</b>.
        <p>
        You can specify a number between <b>-1</b> and <b>4</b>.
        <p>
    </td>
<tr><td valign="top"><b>fsync=</b>n&nbsp</td>
    <td>Enables or disables <em>fsync</em>.  When fsync is enabled, then
        the disk emulation file is synchronized with the physical hard
        disk at the end of a garbage collection interval (however, no more
        often than 5 seconds).  This means that if <em>freepend</em> is
        non-zero then if a catastrophic error occurs then the emulated disks
        <em>should</em> be recovered coherently.  However, fsync may cause
        performance degradation depending on the host operating system and/or
        the host operating system level.
        <p>
        The default is <b>0</b> (fsync disabled).
        <p>
        You can specify <b>0</b> (disable fsync) or <b>1</b> (enable fsync).
        <p>
    </td>
<tr><td valign="top"><b>ftruncwa=</b>n&nbsp</td>
    <td>Work-around for a linux kernel bug in 2.4.18 (shipped in at least RH7.3
        and RH8.0).  Symptom is excessive amount of kernel cpu time and
        non-responsiveness of the associated hercules emulated dasd file.
        The problem may still occur with this option turned on, although
        less freqently.<br>
        The problem appears to be fixed in 2.4.19.
        <p>
        The default is <b>0</b>.
        <p>
        You can specify <b>0</b> or <b>1</b> (enable workaround).
        <p>
    </td>
<tr><td valign="top"><b>trace=</b>n&nbsp</td>
    <td>Number of cckd trace entries.  You would normally specify a non-zero
        value when debugging or capturing a problem in cckd code.  When the
        problem occurs, you should enter the <b>k</b> Hercules console command
        which will print the trace table entries.
        <p>
        The default is <b>0</b>.
        <p>
        You can specify a number between <b>0</b> and <b>200000</b>.
        Each entry represents 128 bytes.  Normally, for debugging, I use
        100000.
        <p>
    </td>
</table>
<h4>Notes</h4>
<ul>
    <li>The size of the <em>cache</em> is a difficult number to determine.
        The storage used by the cache could also be used for emulated virtual
        storage instead.  You don't want to steal storage from your emulated
        operating system so that it starts paging heavily.  You also don't
        want your host operating system to page.  However, you don't want
        your cache to flush too often because cpu cycles may have to be stolen
        from the cpu thread to compress updated images.
    <li>You need at least one l2cache entry per compressed device.  Since the
        maximum number of l2cache entries is 1024, this implies that no more
        than 1024 compressed devices can be defined.  Let me know if this is a
        problem ;-)  If you have a large number of devices then specify the
        maximum value otherwise the default should suffice.
    <li><em>raq</em> should be at least as large as <em>ra</em>.  Readahead
        threads are scheduled from entries in the readahead queue.  Likewise
        <em>rat</em> should not exceed <em>raq</em> because only <em>raq</em>
        tracks or block groups can be queued at any time.
    <li>The number of writer threads (<em>wr</em>) should usually be 1 more
        than the number of host processors.  This is because one writer thread
        could be cpu-bound (compressing a track or block-group image) and the
        other could be i/o-bound (writing the compressed image).
    <li>The garbage collection interval governs the maximum time in seconds
        an updated track or block group image will reside in storage before
        being written to the emulation file.  A large value may mean more data
        loss if a catastrophic error occurs.  A small value may mean that
        more cpu time is spent compressing images.  For example, suppose that
        a particular image is updated several times each second.  If the interval
        is changed from the default 5 seconds to 1 second, then that image will
        be compressed and written 5 times more often.  A large value may cause
        more cache flushes within a garbage collection interval.  These kind
        of flushes mean that a read will wait because there are no available
        cache entries, slowing the emulated operating system.  A large value
        will also cause more pending free space to build up (since free space
        is flushed each interval).  This may mean that the garbage collector
        space recovery routine will perform more work and that the emulation
        file may be larger.
    <li>Specify <em>fsync=1</em> and <em>gcint=5</em> if you are absolutely
        paranoid about your data being lost due to a failure.  <em>fsync</em>
        will ensure your data on disk is coherent.  However, fsync may cause
        a noticeable performance degradation.  Note that an fsync will not
        be performed more often than every 5 seconds.
</ul>
My advice is to use the default options and adjust them if you have a very
good reason.

<hr noshade>
<p><h3><a NAME="utilities">Utilities</a></h3>

<a NAME="ckd2cckd">
<li><b>ckd2cckd</b> <i>[options] source-file target-file</i>
   <ul><li><small><b>Description</b></small> Copies a regular CKD Dasd emulation
                 file to a compressed CKD Dasd emulation file.  The target
                 file cannot previously exist.  If the emulated Dasd device
                 is in more than 1 file then specify the <em>first</em> file.
                 After the copy completes, the target file contains no
                 free space, imbedded or otherwise.
       <li><small><b>Options</b></small>
           <ul><li><b>-c</b>ompress <i>n</i><br>Compression Algorithm
               <ul><li><b>0</b> don't compress
                   <li><b>1</b> compress using zlib
                   <li><b>2</b> compress using bzip2
               </ul>
               <li><b>-d</b>ontcompress <i>n</i><br>Same as <i>-compress 0</i>
               <li><b>-m</b>axerrs <i>errs</i><br>Maximum number of errors
                      that can occur before the copy is terminated;
                      if 0 then errors are ignored.  Default is 5.
               <li><b>-n</b>ofudge<br>[deprecated]
               <li><b>-q</b>uiet<br>Quiet mode; don't display status
               <li><b>-z</b> <i>parm</i><br>Parameter passed to compression
                      <br>
                      <br>zlib compression level:
                      <br>0 = no compression
                      <br>1=fastest ... 9=best
                      <br>
                      <br>bzip2 blockSize100k value:
                      <br>1=fastest ... 9=best
           </ul>
       </ul>
   </ul>
<a NAME="cckd2ckd">
    <li><b>cckd2ckd</b> <i>[options] source-file target-file</i>
   <ul><li><small><b>Description</b></small> Copies a compressed CKD Dasd emulation
                 file to a regular CKD Dasd emulation file.  The target
                 file cannot previously exist.  More than 1 target file may
                 be created.
       <li><small><b>Options</b></small>
           <ul><li><b>-c</b>yls <i>n</i><br>Number of cylinders to copy
                      if the entire file isn't to be copied.  If <b>0</b>
                      then only the number of cylinders in use are copied.
               <li><b>-m</b>axerrs <i>errs</i><br>Maximum number of errors
                      that can occur before the copy is terminated;
                      if 0 then errors are ignored.  Default is 5.
               <li><b>-q</b>uiet<br>Quiet mode; don't display status
               <li><b>-v</b>alidate<br>Validate track images [default]
               <li><b>-n</b>ovalidate<br>Don't Validate track images
           </ul>
       </ul>
   </ul>
<a NAME="cckdcdsk">
    <li><b>cckdcdsk</b> <i>[-level] file-name</i>
   <ul><li><small><b>Description</b></small> Performs compressed or shadowed CKD Dasd emulation
                 file integrity verification and recovery and repair.
       <li><small><b>Options</b></small>
           <ul><li>-<i>level</i><br>A digit 0, 1 or 3 that specifies
                   the level of checking.  The higher the level, the
                   longer the integrity check takes.
               <ul><li><b>0</b> Minimal checking.  Device headers are verified,
                       free space is verified, primary lookup table and secondary
                       lookup tables are verified.
                   <li><b>1</b> Same checks as level 0 plus all 5-byte track headers
                       are verified.
                   <li><b>3</b> Same checks as level 1 plus all track images are
                       read, uncompressed and verified.
               </ul>
           </ul>
       </ul>
   </ul>
<a NAME="cckdcomp">
    <li><b>cckdcomp</b> <i>[-level] file-name</i>
   <ul><li><small><b>Description</b></small> Removes all free space from a compressed
                 or shadow CKD Dasd emulation file.  (Compresses or compacts a cckd
                 file ... your choice!).
                 If <i>level</i> is specified, then <b>cckdcdsk</b> is called first
                 with the specified level; this is a short-hand method to call both
                 functions in one utility call.
       <li><small><b>Options</b></small>
           <ul><li>-<i>level</i><br>A digit 0, 1 or 3 that specifies
                   the level of checking.  The higher the level, the
                   longer the integrity check takes.
               <ul><li><b>0</b> Minimal checking.  Device headers are verified,
                       free space is verified, primary lookup table and secondary
                       lookup tables are verified.
                   <li><b>1</b> Same checks as level 0 plus all 5-byte track headers
                       are verified.
                   <li><b>3</b> Same checks as level 1 plus all track images are
                       read, uncompressed and verified.
               </ul>
           </ul>
       </ul>
   </ul>
<a NAME="cckdfix">
    <li><b>cckdfix</b> <i>file-name</i>
   <ul><li><small><b>Description</b></small> This is a skeleton program that is
                 not compiled during make.  It can be edited to change/repair
                 the device headers.
       <li><small><b>Compiling</b></small> Enter `<i>cc -o cckdfix -DARCH=390 cckdfix.c</i>'
                 to compile and link the edited program.
   </ul>
    <li><b>cckddump</b>
   <ul><li><small><b>Description</b></small> This is an os/390 hlasm (High Level
                 Assembler) program that will create a compressed CKD emulation file
                 from an actual CKD device.  See <a href="#cckddump">below</a> for
                 a description on how to build and run this program.
   </ul>
</ul>

<hr noshade>
<p><h3><a NAME="faq">FAQ</a></h3>
<table>
<tr><td valign="top"><b>Q.</b><td>
             What devices are supported ?
<tr><td valign="top"><b>A.</b><td>
             2311, 2314, 3330, 3340, 3350, 3375, 3380, 3390 and 9345.
<br><br>

<tr><td valign="top"><b>Q.</b><td>
             Is a 3390 model 9 supported ?
<tr><td valign="top"><b>A.</b><td>
             Yes, maybe.  A 3390-9 is a little over 8G in size.
             A cckd file cannot exceed 2G on a system that does
             not support large files, otherwise it cannot exceed
             4G.  If the data on the 3390-9 compresses to below
             these limits then the answer is Yes.
<br><br>

<tr><td valign="top"><b>Q.</b><td>
             How can I get rid of the free space in my files ?
<tr><td valign="top"><b>A.</b><td>
             Once the total amount of free space falls below 6% of
             the total file size, the garbage collector is not very
             aggressive about eliminating free space.  To remove
             all free space from the file while Hercules is running
             use the <b>sfc</b> console command.  See
             <a href="#usingsfiles">Using Shadow Files</a> above.
             Otherwise, you can use the <b>cckdcomp</b> utility.
             See <a href="#utilities">Utilities</a> above.

<br><br>

<tr><td valign="top"><b>Q.</b><td>
             How can I display the space statistics for a compressed
             file ?
<tr><td valign="top"><b>A.</b><td>
             The statistics are displayed when the compressed file
             is opened.  Currently, there is no supplied method to
             display these statistics at any other time.  However,
             it shouldn't be too hard to write a shell script
             (similar to <code>dasdlist</code>) to display these
             statistics.  The statistics are contained in the
             <code>CCKDDASD_DEVHDR</code> which is at offset 512
             in the compressed file; the header is mapped in
             <code>hercules.h</code>.
<br><br>

<tr><td valign="top"><b>Q.</b><td>
             What is a "null track" anyway ?
<tr><td valign="top"><b>A.</b><td>
             The term "null track" is just something I made up.  It is
             what is returned when a zero offset is found in either the
             primary or secondary lookup table for the track.  It contains
             the folllowing fields:
             <table>
             <tr><td><code>0CCHH</code></td><td>Home address</td>
             <tr><td><code>CCHH0008 00000000</code></td><td>standard R0</td>
             <tr><td><code>CCHH1000</code></td><td>end-of-file marker</td>
             <tr><td><code>ffffffff</code></td><td>end-of-track marker</td>
             </table>
             When a null track is written, space previously occupied by
             the track is freed and the offset in the secondary lookup table
             is set to zero.  If all offsets in the secondary lookup table
             are zero, then the secondary lookup table is freed and the
             primary lookup table entry is zeroed.
<br><br>

<tr><td valign="top"><b>Q.</b><td>
             I want to try bzip2 but I'm getting compiler errors.
             What am I doing wrong ?
<tr><td valign="top"><b>A.</b><td>
             Probably bzip2 is not installed or is not installed
             properly. You can obtain bzip2 from
             <a href="http://sourceware.cygnus.com/bzip2/">here</a>.
             If bzip2 is installed, then you need to find the directory
             where <code>bzlib.h</code> is installed and the
             directory where <code>libbz2.a</code> is installed.
             You can then add "-I <i>bzlib.h-directory</i>" to the
             CFLAGS in the make file and add "-L <i>libbz2.a-directory</i>"
             to the LFLAGS.
<br><br>

<tr><td valign="top"><b>Q.</b><td>
             Which is better, zlib or bzip2 ?
<tr><td valign="top"><b>A.</b><td>
             This is a religious question.  I have no actual preference,
             I just wanted to make a choice available.
<br><br>

<tr><td valign="top"><b>Q.</b><td>
             Can other compression programs be used ?
<tr><td valign="top"><b>A.</b><td>
             Yes.  The program is architecturally structured so that other
             compression algorithms can be added rather painlessly.  This
             will require, of course, an update to the source.
<br><br>

<tr><td valign="top"><b>Q.</b><td>
             Can this compression scheme be used for FBA devices too ?
<tr><td valign="top"><b>A.</b><td>
             I have not worked with FBA devices for over 20 years.
             However, it seems to me that a similar program for FBA
             devices should be simpler than this program for CKD devices
             (none of those count/key/data fields mucking everything
             up).  Since an FBA block is 512 bytes, it might not
             be efficient to have each block compressed individually;
             it might be better to compress blocks in 32K or 64K chunks.
             If someone asks very nicely, I may consider looking into it;-)
<br><br>

</table>

<hr noshade>
<p>
Greg Smith
<a href="mailto:gsmith@nc.rr.com"><em>gsmith</em>&#064;<em>nc.rr.com</em></a>
<p><small>Last updated 17 Nov 2002</small>
</BODY>
</HTML>