1
0
mirror of git://git.sv.gnu.org/coreutils.git synced 2026-04-21 03:12:48 +02:00

(uniq invocation, squeezing, The uniq command):

Use "repeated" rather than "duplicate" to describe adjacent
duplicates; this simplifies the description and makes it more
consistent with POSIX.
(uniq invocation): Make it clear that -d and -u suppress the
output of lines, rather than cause some lines to be output.
Mention what happens if a line lacks enough fields or characters.
This commit is contained in:
Jim Meyering
2003-05-14 07:58:40 +00:00
parent 5f62a53f9c
commit 5d51fc8a5b

View File

@@ -3271,12 +3271,12 @@ standard input if nothing is given or for an @var{input} name of
uniq [@var{option}]@dots{} [@var{input} [@var{output}]]
@end example
By default, @command{uniq} prints the unique lines in a sorted file, i.e.,
discards all but one of identical successive lines. Optionally, it can
instead show only lines that appear exactly once, or lines that appear
more than once.
By default, @command{uniq} prints its input lines, except that
it discards all but the first of adjacent repeated lines, so that
no output lines are repeated. Optionally, it can instead discard
lines that are not repeated, or all repeated lines.
The input need not be sorted, but duplicate input lines are detected
The input need not be sorted, but repeated input lines are detected
only if they are adjacent. If you want to discard non-adjacent
duplicate lines, perhaps you want to use @code{sort -u}.
@@ -3295,7 +3295,8 @@ The program accepts the following options. Also see @ref{Common options}.
@itemx --skip-fields=@var{n}
@opindex -f
@opindex --skip-fields
Skip @var{n} fields on each line before checking for uniqueness. Fields
Skip @var{n} fields on each line before checking for uniqueness. Use
a null string for comparison if a line has fewer than @var{n} fields. Fields
are sequences of non-space non-tab characters that are separated from
each other by at least one space or tab.
@@ -3307,7 +3308,8 @@ does not allow this; use @option{-f @var{n}} instead.
@itemx --skip-chars=@var{n}
@opindex -s
@opindex --skip-chars
Skip @var{n} characters before checking for uniqueness. If you use both
Skip @var{n} characters before checking for uniqueness. Use a null string
for comparison if a line has fewer than @var{n} characters. If you use both
the field and character skipping options, fields are skipped over first.
On older systems, @command{uniq} supports an obsolete option
@@ -3330,31 +3332,34 @@ Ignore differences in case when comparing lines.
@itemx --repeated
@opindex -d
@opindex --repeated
@cindex duplicate lines, outputting
Print one copy of each duplicate line.
@cindex repeated lines, outputting
Discard lines that are not repeated. When used by itself, this option
causes @command{uniq} to print the first copy of each repeated line,
and nothing else.
@item -D
@itemx --all-repeated[=@var{delimit-method}]
@opindex -D
@opindex --all-repeated
@cindex all duplicate lines, outputting
Print all copies of each duplicate line.
@cindex all repeated lines, outputting
Do not discard the second and subsequent repeated input lines,
but discard lines that are not repeated.
This option is useful mainly in conjunction with other options e.g.,
to ignore case or to compare only selected fields.
The optional @var{delimit-method} tells how to delimit
groups of duplicate lines, and must be one of the following:
groups of repeated lines, and must be one of the following:
@table @samp
@item none
Do not delimit groups of duplicate lines.
Do not delimit groups of repeated lines.
This is equivalent to @option{--all-repeated} (@option{-D}).
@item prepend
Output a newline before each group of duplicate lines.
Output a newline before each group of repeated lines.
@item separate
Separate groups of duplicate lines with a single newline.
Separate groups of repeated lines with a single newline.
This is the same as using @samp{prepend}, except that
there is no newline before the first group, and hence
may be better suited for output direct to users.
@@ -3373,13 +3378,14 @@ This is a @sc{gnu} extension.
@opindex -u
@opindex --unique
@cindex unique lines, outputting
Print non-duplicate lines.
Discard the first repeated line. When used by itself, this option
causes @command{uniq} to print unique lines, and nothing else.
@item -w @var{n}
@itemx --check-chars=@var{n}
@opindex -w
@opindex --check-chars
Compare @var{n} characters on each line (after skipping any specified
Compare at most @var{n} characters on each line (after skipping any specified
fields and characters). By default the entire rest of the lines are
compared.
@@ -4649,13 +4655,13 @@ tr -s '\n'
@item
Find doubled occurrences of words in a document.
For example, people often write ``the the'' with the duplicated words
For example, people often write ``the the'' with the repeated words
separated by a newline. The bourne shell script below works first
by converting each sequence of punctuation and blank characters to a
single newline. That puts each ``word'' on a line by itself.
Next it maps all uppercase characters to lower case, and finally it
runs @command{uniq} with the @option{-d} option to print out only the words
that were adjacent duplicates.
that were repeated.
@example
#!/bin/sh
@@ -12055,8 +12061,8 @@ Finally (at least for now), we'll look at the @command{uniq} program. When
sorting data, you will often end up with duplicate lines, lines that
are identical. Usually, all you need is one instance of each line.
This is where @command{uniq} comes in. The @command{uniq} program reads its
standard input, which it expects to be sorted. It only prints out one
copy of each duplicated line. It does have several options. Later on,
standard input. It prints only one
copy of each repeated line. It does have several options. Later on,
we'll use the @option{-c} option, which prints each unique line, preceded
by a count of the number of times that line occurred in the input.