1
0
mirror of git://git.sv.gnu.org/coreutils.git synced 2026-02-14 03:12:10 +02:00

(shuf invocation, Random sources): New sections.

(Operating on sorted files): Add shuf.
(sort invocation, shred invocation): New option --random-source.
(sort invocation): Fix typo: -R -> -r.
This commit is contained in:
Paul Eggert
2006-08-08 22:11:49 +00:00
parent f0992c673c
commit 0d98074403

View File

@@ -97,6 +97,7 @@
* sha1sum: (coreutils)sha1sum invocation. Print or check SHA-1 digests.
* sha2: (coreutils)sha2 utilities. Print or check SHA-2 digests.
* shred: (coreutils)shred invocation. Remove files more securely.
* shuf: (coreutils)shuf invocation. Shuffling text files.
* sleep: (coreutils)sleep invocation. Delay for a specified time.
* sort: (coreutils)sort invocation. Sort text files.
* split: (coreutils)split invocation. Split into fixed-size pieces.
@@ -174,7 +175,7 @@ Free Documentation License''.
* Formatting file contents:: fmt pr fold
* Output of parts of files:: head tail split csplit
* Summarizing files:: wc sum cksum md5sum sha1sum sha2
* Operating on sorted files:: sort uniq comm ptx tsort
* Operating on sorted files:: sort shuf uniq comm ptx tsort
* Operating on fields within a line:: cut paste join
* Operating on characters:: tr expand unexpand
* Directory listing:: ls dir vdir dircolors
@@ -207,6 +208,7 @@ Common Options
* Exit status:: Indicating program success or failure.
* Backup options:: Backup options
* Block size:: Block size
* Random sources:: Sources of random data
* Target directory:: Target directory
* Trailing slashes:: Trailing slashes
* Traversing symlinks:: Traversing symlinks to directories
@@ -246,6 +248,7 @@ Summarizing files
Operating on sorted files
* sort invocation:: Sort text files.
* shuf invocation:: Shuffle text files.
* uniq invocation:: Uniquify files.
* comm invocation:: Compare two sorted files line by line.
* ptx invocation:: Produce a permuted index of file contents.
@@ -641,6 +644,7 @@ name.
* Exit status:: Indicating program success or failure.
* Backup options:: -b -S, in some programs.
* Block size:: BLOCK_SIZE and --block-size, in some programs.
* Random sources:: --random-source, in some programs.
* Target directory:: Specifying a target directory, in some programs.
* Trailing slashes:: --strip-trailing-slashes, in some programs.
* Traversing symlinks:: -H, -L, or -P, in some programs.
@@ -920,6 +924,44 @@ set. The @option{-h} or @option{--human-readable} option is equivalent to
@option{--block-size=human-readable}. The @option{--si} option is
equivalent to @option{--block-size=si}.
@node Random sources
@section Sources of random data
@cindex random sources
The @command{shuf}, @command{shred}, and @command{sort} commands
sometimes need random data to do their work. For example, @samp{sort
-R} must choose a hash function at random, and it needs random data to
make this selection.
Normally these commands use the device file @file{/dev/urandom} as the
source of random data. Typically, this device gathers environmental
noise from device drivers and other sources into an entropy pool, and
uses the pool to generate random bits. If the pool is short of data,
the device reuses the internal pool to produce more bits, using a
cryptographically secure pseudorandom number generator.
@file{/dev/urandom} suffices for most practical uses, but applications
requiring high-value or long-term protection of private data may
require an alternate data source like @file{/dev/random} or
@file{/dev/arandom}. The set of available sources depends on your
operating system.
To use such a source, specify the @option{--random-source=@var{file}}
option, e.g., @samp{shuf --random-source=/dev/random}. The contents
of @var{file} should be as random as possible. An error is reported
if @var{file} does not contain enough bytes to randomize the input
adequately.
To reproduce the results of an earlier invocation of a command, you
can save some random data into a file and then use that file as the
random source in earlier and later invocations of the command.
Some old-fashioned or stripped-down operating systems lack support for
@command{/dev/urandom}. On these systems commands like @command{shuf}
by default fall back on an internal pseudorandom generator initialized
by a small amount of entropy.
@node Target directory
@section Target directory
@@ -3262,6 +3304,7 @@ These commands work with (or produce) sorted files.
@menu
* sort invocation:: Sort text files.
* shuf invocation:: Shuffle text files.
* uniq invocation:: Uniquify files.
* comm invocation:: Compare two sorted files line by line.
* ptx invocation:: Produce a permuted index of file contents.
@@ -3509,9 +3552,19 @@ appear earlier in the output instead of later.
@opindex -R
@opindex --random-sort
@cindex random sort
Sort by hashing the input keys and then sorting the hash values. This
is much like a random shuffle of the inputs, except that keys with the
same value sort together. The hash function is chosen at random.
Sort by hashing the input keys and then sorting the hash values.
Choose the hash function at random, ensuring that it is free of
collisions so that differing keys have differing hash values. This is
like a random permutation of the inputs (@pxref{shuf invocation}),
except that keys with the same value sort together.
If multiple random sort fields are specified, the same random hash
function is used for all fields. To use different random hash
functions for different fields, you can invoke @command{sort} more
than once.
The choice of hash function is affected by the
@option{--random-source} option.
@end table
@@ -3550,6 +3603,13 @@ On newer systems, @option{-o} cannot appear after an input file if
scripts should specify @option{-o @var{output-file}} before any input
files.
@item --random-source=@var{file}
@opindex --random-source
@cindex random source for sorting
Use @var{file} as a source of random data used to determine which
random hash function to use with the @option{-R} option. @xref{Random
sources}.
@item -s
@itemx --stable
@opindex -s
@@ -3559,7 +3619,7 @@ files.
Make @command{sort} stable by disabling its last-resort comparison.
This option has no effect if no fields or global ordering options
other than @option{--reverse} (@option{-R}) are specified.
other than @option{--reverse} (@option{-r}) are specified.
@item -S @var{size}
@itemx --buffer-size=@var{size}
@@ -3835,6 +3895,147 @@ ls */* | sort -t / -k 1,1R -k 2,2
@end itemize
@node shuf invocation
@section @command{shuf}: Shuffling text
@pindex shuf
@cindex shuffling files
@command{shuf} shuffles its input by outputting a random permutation
of its input lines. Each output permutation is equally likely.
Synopses:
@example
shuf [@var{option}]@dots{} [@var{file}]
shuf -e [@var{option}]@dots{} [@var{arg}]@dots{}
shuf -i @var{lo}-@var{hi} [@var{option}]@dots{}
@end example
@command{shuf} has three modes of operation that affect where it
obtains its input lines. By default, it reads lines from standard
input. The following options change the operation mode:
@table @samp
@item -e
@itemx --echo
@opindex -c
@opindex --echo
@cindex command-line operands to shuffle
Treat each command-line operand as an input line.
@item -i @var{lo}-@var{hi}
@itemx --input-range=@var{lo}-@var{hi}
@opindex -i
@opindex --input-range
@cindex input range to shuffle
Act as if input came from a file containing the range of unsigned
decimal integers @var{lo}@dots{}@var{hi}, one per line.
@end table
@command{shuf}'s other options can affect its behavior in all
operation modes:
@table @samp
@item -n @var{lines}
@itemx --head-lines=@var{lines}
@opindex -n
@opindex --head-lines
@cindex head of output
Output at most @var{lines} lines. By default, all input lines are
output.
@item -o @var{output-file}
@itemx --output=@var{output-file}
@opindex -o
@opindex --output
@cindex overwriting of input, allowed
Write output to @var{output-file} instead of standard output.
@command{shuf} reads all input before opening
@var{output-file}, so you can safely shuffle a file in place by using
commands like @code{shuf -o F <F} and @code{cat F | shuf -o F}.
@item --random-source=@var{file}
@opindex --random-source
@cindex random source for shuffling
Use @var{file} as a source of random data used to determine which
permutation to generate. @xref{Random sources}.
@item -z
@itemx --zero-terminated
@opindex -z
@opindex --zero-terminated
@cindex sort zero-terminated lines
Treat the input and output as a set of lines, each terminated by a zero byte
(@acronym{ASCII} @sc{nul} (Null) character) instead of an
@acronym{ASCII} @sc{lf} (Line Feed).
This option can be useful in conjunction with @samp{perl -0} or
@samp{find -print0} and @samp{xargs -0} which do the same in order to
reliably handle arbitrary file names (even those containing blanks
or other special characters).
@end table
For example:
@example
shuf <<EOF
A man,
a plan,
a canal:
Panama!
EOF
@end example
@noindent
might produce the output
@example
Panama!
A man,
a canal:
a plan,
@end example
@noindent
Similarly, the command:
@example
shuf -e clubs hearts diamonds spades
@end example
@noindent
might output:
@example
clubs
diamonds
spades
hearts
@end example
@noindent
and the command @samp{shuf -i 1-4} might output:
@example
4
2
1
3
@end example
@noindent
These examples all have four input lines, so @command{shuf} might
produce any of the twenty-four possible permutations of the input. In
general, if there are @var{N} input lines, there are @var{N}! (i.e.,
@var{N} factorial, or @var{N} * (@var{N} - 1) * @dots{} * 1) possible
output permutations.
@exitstatus
@node uniq invocation
@section @command{uniq}: Uniquify files
@@ -7746,6 +7947,12 @@ for all of the useful overwrite patterns to be used at least once.
You can reduce this to save time, or increase it if you have a lot of
time to waste.
@item --random-source=@var{file}
@opindex --random-source
@cindex random source for shredding
Use @var{file} as a source of random data used to overwrite and to
choose pass ordering. @xref{Random sources}.
@item -s @var{BYTES}
@itemx --size=@var{BYTES}
@opindex -s @var{BYTES}