* tests/misc/printf-mb.sh: Given we shortcut the single char
(invalid multi-byte) case, add a case to ensure we're correctly
checking the return from mbrtowc().
* src/printf.c (STRTOX): Update to support multi-byte chars.
* tests/misc/printf-mb.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the improvement.
Fixes https://bugs.gnu.org/54388
* tests/misc/wc-nbsp.sh: Only the en_US.iso8859-1 form
is accepted on macOS 10.15.7 at least. GNU/Linux also
accepts ISO-8859-1 (and canonicalizes the charmap to this).
COLORTERM is an environment used usually to expose truecolor support in
terminal emulators. Therefore support matches on that in addition
to TERM. Also set the default COLORTERM match pattern so that
we apply colors if COLORTERM is any value.
This implicitly supports a terminal like "foot"
without a need for an explicit TERM entry.
* NEWS: Mention the new feature.
* src/dircolors.c (main): Match COLORTERM like we do for TERM.
* src/dircolors.hin: Add default config to match any COLORTERM.
* tests/misc/dircolors.pl: Add test cases.
* NEWS: Mention the new feature.
* doc/coreutils.texi (dircolors invocation): Describe the new
--print-ls-colors option.
* src/dircolors.c (print_ls_colors): A new global to select
between shell or terminal output.
(append_entry): A new function refactored from dc_parse_stream()
to append the entry in the appropriate format.
(dc_parse_stream): Adjust to call append_entry().
* tests/misc/dircolors.pl: Add test cases.
This also affects ls -v in some corner cases.
Problems reported by Michael Debertol <https://bugs.gnu.org/49239>.
While looking into this, I spotted some more areas where the
code and documentation did not agree, or where the documentation
was unclear. In some cases I changed the code; in others
the documentation. I hope things are nailed down better now.
* doc/sort-version.texi: Distinguish more carefully between
characters and bytes. Say that non-identical strings can
compare equal, since they now can. Improve readability in
various ways. Make it clearer that a suffix can be the
entire string.
* src/ls.c (cmp_version): Fall back on strcmp if filevercmp
reports equality, since filevercmp is no longer a total order.
* src/sort.c (keycompare): Use filenvercmp, to treat NULs correctly.
* tests/misc/ls-misc.pl (v_files):
Adjust test to match new behavior.
* tests/misc/sort-version.sh: Add tests for stability,
and for sorting with NUL bytes.
Use more constrained argument matching
to improve forward compatibility and robustness.
For example it's better that `cksum -a sha3` is _not_
equivalent to `cksum -a sha386`, so that a user
specifying `-a sha3` on an older cksum would not be surprised.
Also argmatch() is used when parsing tags from lines like:
SHA3 (filename) = abcedf....
so it's more robust that older cksum instances to fail
earlier in the parsing process, when parsing output from
possible future cksum implementations that might support SHA3.
* src/digest.c (algorithm_from_tag): Use argmatch_exact()
to ensure we don't match abbreviated algorithms.
(main): Likewise.
* tests/misc/cksum-a.sh: Add a test case.
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
* tests/sample-test: Adjust to use the single most recent year.
* tests/misc/env-signal-handler.sh: Use retry_delay_ to
avoid a false failure under load, where env hasn't setup
the SIGINT handling before timeout(1) sends the SIGINT.
Fixes https://bugs.gnu.org/51793
New warnings are added related to the handling
of thousands grouping characters, decimal points, and sign characters.
Examples now diagnosed are:
$ printf '0,9\n1,a\n' | sort -nk1 --debug -t, -s
sort: key 1 is numeric and spans multiple fields
sort: field separator ‘,’ is treated as a group separator in numbers
1,a
_
0,9
___
$ printf '1,a\n0,9\n' | LC_ALL=fr_FR.utf8 sort -gk1 --debug -t, -s
sort: key 1 is numeric and spans multiple fields
sort: field separator ‘,’ is treated as a decimal point in numbers
0,9
___
1,a
__
$ printf '1.0\n0.9\n' | LC_ALL=fr_FR.utf8 sort -s -k1,1g --debug
sort: note numbers use ‘,’ as a decimal point in this locale
0.9
_
1.0
_
$ LC_ALL=fr_FR.utf8 sort -n --debug /dev/null
sort: text ordering performed using ‘fr_FR.utf8’ sorting rules
sort: note numbers use ‘,’ as a decimal point in this locale
sort: the multi-byte number group separator in this locale \
is not supported
$ sort --debug -t- -k1n /dev/null
sort: key 1 is numeric and spans multiple fields
sort: field separator ‘-’ is treated as a minus sign in numbers
sort: note numbers use ‘.’ as a decimal point in this locale
$ sort --debug -t+ -k1g /dev/null
sort: key 1 is numeric and spans multiple fields
sort: field separator ‘+’ is treated as a plus sign in numbers
sort: note numbers use ‘.’ as a decimal point in this locale
* src/sort.c (key_warnings): Add the warnings above.
* tests/misc/sort-debug-warn.sh: Add test cases.
Also check that all sort invocations succeed.
* NEWS: Mention the improvement.
Addresses https://bugs.gnu.org/51011
* src/timeout.c (main): Propagate the killed status from the child.
* doc/coreutils.texi (timeout invocation): Remove the
description of the --foreground specific handling of SIGKILL,
now that it's consistent with the default mode of operation.
* tests/misc/timeout.sh: Add a test case.
* NEWS: Mention the change in behavior.
Fixes https://bugs.gnu.org/51135
* src/digest.c: Allow using the --untagged option with --check,
so that `cksum -a md5 --untagged` used to emulate md5sum for example,
may be augmented with the --check option. Also support the --tag
option with cksum, to allow overriding a previous --untagged setting.
* doc/coreutils.texi: Adjust accordingly.
* tests/misc/cksum-a.sh: Likewise.
* src/digest.c (digest_check): Treat empty lines like comments,
as commented checksum files very often have empty lines.
* tests/misc/md5sum.pl: Adjust accordingly.
* src/digest.c: Always set the digest_length, so that
we check the correct number of hex digits when parsing
non tagged format checksums.
* tests/misc/cksum-a.sh: Add a test case. Also fix
up this test which was ineffective due to fail=1
being set in a subshell and ignored.
Support checksum files with CRLF line endings,
which is a common gotcha for using --check on windows,
or with checksum files generated on windows.
Note we escape \r here to support the original coreutils format
(with file name at EOL), and file names with literal
\r characters as the last character of their name.
* src/digest.c (filename_unescape): Convert \\r -> \r.
(print_filename): Escape \r -> \\r.
(output_file): Detect \r chars in file names.
(digest_check): Ignore literal \r char at EOL.
* tests/misc/md5sum.pl: Add a test case.
* tests/misc/sha1sum.pl: Likewise.
* NEWS: Mention the improvement.
This only practically matters on windows.
But given there are separate text handling options in cygwin,
keep the interface simple, and avoid exposing the
confusing binary/text difference here.
* doc/coreutils.texi (md5sum invocation): Mention that
--binary and --text are not supported by the cksum command.
* src/digest.c: Set flag to use binary mode by default.
(output_file): Don't distinguish text and binary modes with
' ' and '*', and just use ' ' always.
This format is a better default, since it results in simpler usage,
as you don't need to specify --tag on generation or -a on
checking invocations. Also it's a more general format supporting
mixed and length adjusted digests.
* doc/coreutils.texi (cksum invocation): Document a new --untagged
option, to use the older coreutils format.
(md5sum invocation): Mention that cksum doesn't support --tag.
* src/digest.c: Adjust cksum(1) to default to --tag,
and accept the new --untagged option.
* tests/misc/b2sum.sh: Adjust accordingly.
* tests/misc/cksum-a.sh: Likewise.
* tests/misc/cksum-c.sh: Likewise.
* src/cksum.h: Thread DELIM through the output functions.
* src/digest.c: Likewise.
* src/sum.c: Likewise.
* src/sum.h: Likewise.
* src/cksum.c: Likewise. Also adjust check to allow -z
with traditional output modes. Also ajust the global variable
name to avoid shadowing warnings.
* tests/misc/cksum-a.sh: Adjust accordingly.
Support `cksum --check FILE` without having to specify a digest
algorithm, allowing for more generic file check instructions.
This also supports mixed digest checksum files, supporting
more robust multi digest checks.
* src/digest.c (algorithm_from_tag): A new function to
identify the digest algorithm from a tagged format line.
(split3): Set the algorithm depending on tag, and update
the expected digest length accordingly.
* tests/misc/cksum-c.sh: Add a new test.
* tests/local.mk: Reference the new test.
* tests/misc/md5sum.pl: Adjust to more generic error.
* tests/misc/sha1sum.pl: Likewise.
* doc/coreutils.texi (md5sum invocation): Mention the new -c feature.
* NEWS: Mention the new feature.
Add message digest sm3, which uses the OSCCA SM3 secure
hash (OSCCA GM/T 0004-2012 SM3) generic hash transformation.
* bootstrap.conf: Add the sm3 module.
* doc/coreutils.texi: Mention the cksum -a option.
* src/digest.c: Provide support for --algorithm='sm3'.
* tests/misc/sm3sum.pl: Add a new test (from Tianjia Zhang)
* tests/local.mk: Reference the new test.
* NEWS: Mention the new feature.
Tested-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
* src/digest.c: Organize HASH_ALGO_CKSUM to be table driven,
and amalgamate all digest algorithms.
(main): Parse all options if HASH_ALGO_CKSUM, and disallow
--tag, --zero, and --check with the traditional bsd, sysv, and crc
checksums for now.
* src/local.mk: Reorganize to include all digest modules in cksum.
* tests/misc/cksum-a.sh: Add a new test.
* tests/misc/b2sum.sh: Update to default to checking with cksum,
as b2sum's implementation diverges a bit from the others.
* tests/local.mk: Reference the new test.
* doc/coreutils.texi (cksum invocation): Adjust the summary to
identify the new mode, and document the new --algorithm option.
* man/cksum.x: Adjust description to be more general.
* man/*sum.x: Add [See Also] section referencing cksum(1).
* NEWS: Mention the new feature.
Adjust to output the file name if any name parameter is passed.
This is consistent with sum -s, cksum, and sum implementations
on other platforms. This should not cause significant compat
issues, as multiple fields are already output, and so already
need to be parsed.
* src/sum.c (bsd_sum_file): Output the file name
if any name parameter is passed.
* tests/misc/sum.pl: Adjust accordingly.
* doc/coreutils.texi (sum invocation): Likewise.
* NEWS: Mention the change in behavior.
* tests/misc/help-version.sh: Test that /dev/full causes
shell printf to fail. This ports better to NetBSD 9.88.46,
where it doesn’t. Problem reported by Nelson H. F. Beebe.
* tests/misc/help-version.sh: Merge gzip-related changes
back from gzip/tests/help-version. This fixes problems
when TERM is not 'dumb', and should simplify maintenance.
Emil Lundberg <lundberg.emil@gmail.com> reports in
https://bugs.gnu.org/49741 about a 'basenc --base64 -d' decoding bug.
The input buffer length was not divisible by 3, resulting in
decoding errors.
* NEWS: Mention fix.
* src/basenc.c (DEC_BLOCKSIZE): Change from 1024*5 to 4200 (35*3*5*8)
which is divisible by 3,4,5,8 - satisfying both base32 and base64;
Use compile-time verify() macro to enforce the above.
* tests/misc/basenc.pl: Add test.
This patch modifies basenc to prefer signed integers to
unsigned, as signed are less error-prone.
This patch also updates Gnulib to to latest, which updates Gnulib’s
base32 and base64 modules to prefer signed to unsigned integers.
* src/basenc.c: Include idx.h.
(struct base2_decode_context): Use unsigned char, not unsigned
for an octet that must fit in an unsigned char.
(base_encode, struct base_decode_context)
(base64_decode_ctx_wrapper, prepare_inbuf, base64url_encode)
(base64url_decode_ctx_wrapper, base32_decode_ctx_wrapper)
(base32hex_encode, base32hex_decode_ctx_wrapper, base16_encode)
(base16_decode_ctx, z85_encode, Z85_HI_CTX_TO_32BIT_VAL)
(z85_decoding, z85_decode_ctx, base2msbf_encode)
(base2lsbf_encode, base2lsbf_decode_ctx, base2msbf_decode_ctx)
(wrap_write, do_encode, do_decode, main):
Prefer signed integers to unsigned.
(main): Treat extremely large wrap columns as if they were
infinite; that’s good enough. Since we’re now using xstrtoimax,
this allows ‘-w -0’ (same as ‘-w 0’).
* tests/misc/base64.pl (gen_tests): -w-0 is no longer an error.
We must delay handling when \r is the last character
of the buffer being processed, as the next character
may or may not be \n.
* src/cat.c (pending_cr): A new global to record whether
the last character processed (in -E mode) is '\r'.
(cat): Honor pending_cr when processing the start of the buffer.
(main): Honor pending_cr if no more files to process.
* tests/misc/cat-E.sh: Add test cases.
Fixes https://bugs.gnu.org/49925
In documentation and comments, don’t assume that secondary storage
devices are disk devices. Similarly, don’t assume that main memory
uses magnetic cores, which became obsolete in the 1970s.
* src/du.c (usage):
* src/ls.c (usage):
* src/shred.c (usage): Reword to avoid “disk” in usage messages.
In preparation for changing the default device number
representation (to decomposed decimal), provide more
formatting options for device numbers.
These new (FreeBSD compat) formatting options are added:
%Hd major device number in decimal (st_dev)
%Ld minor device number in decimal (st_dev)
%Hr major device type in decimal (st_rdev)
%Lr minor device type in decimal (st_rdev)
%r (composed) device type in decimal (st_rdev)
%R (composed) device type in hex (st_rdev)
* doc/coreutils.texi (stat invocation): Document new formats.
* src/stat.c (print_it): Handle the new %H and %L modifiers.
(print_statfs): Adjust to passing the format as two chars
rather than an int. Using an int was introduced in commit db42ae78,
but using separate chars is cleaner and more extensible.
(print_stat): Likewise. Handle any modifiers and the new 'r' format.
(usage): Document the new formats.
* tests/misc/stat-fmt.sh: Add a test case for new modifiers.
Addresses https://bugs.gnu.org/48960
Use cpuid to detect CPU support for hardware instruction.
Fall back to slice by 8 algorithm if not supported.
A 500MiB file improves from 1.40s to 0.67s on an i3-2310M
* configure.ac [USE_PCLMUL_CRC32]: A new conditional,
set when __get_cpuid() and clmul compiler intrinsics are supported.
* src/cksum.c (pclmul_supported): A new function using __get_cpuid()
to determine if pclmul instructions are supported.
(cksum): A new function refactored from cksum_slice8(),
which calls pclmul_supported() and then cksum_slice8()
or cksum_pclmul() as appropriate.
* src/cksum.h: Export the crctab array for use in the new module.
* src/cksum_pclmul.c: A new module to implement using pclmul intrinsics.
* po/POTFILES.in: Reference the new cksum_pclmul module.
* src/local.mk: Likewise. Note we build it as a separate library
so that it can be portably built with separate -mavx etc. flags.
* tests/misc/cksum.sh: Add new test modes for pertinent buffer sizes.
- \r\n is common a line end combination
- catting such a file without options causes it to display normally
- overwriting the first char with $, loses info
* src/cat.c (cat): Convert \r preceeding a \n to ^M.
* tests/misc/cat-E.sh: New test.
* tests/local.mk: Reference new test.
* tests/misc/cat-proc.sh: Fix typo.
* doc/coreutils.texi (cat invocation): Mention the new behavior.
* NEWS: Mention the improvement.
Run "make update-copyright" and then...
* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
* tests/sample-test: Adjust to use the single most recent year.
* src/nl.c (main): Enforce the POSIX specified
behavior of assuming ':' is specified after a single
character argument to -d.
* tests/misc/nl.sh: Add a test case.
* NEWS: Mention the bug fix.
* doc/coreutils.texi (nl invocation): Mention the GNU extensions
of allowing arbitrary length and empty delimiter strings.
* src/nl.c (usage): Likewise.
* tests/misc/nl.sh: Add test cases for the GNU extensions.
The format can be determined from --options or the locale,
so it's useful to output the format string being used.
* src/date.c (show_date): Show the output format
along with the date being shown.
* tests/misc/date-debug.sh: Adjust accordingly.
Addresses https://bugs.gnu.org/44960
This crash was identified by Cyber Independent Testing Lab:
https://cyber-itl.org/2020/10/28/citl-7000-defects.html
and was introduced with commit v8.5-163-g3f48829c2
* src/tr.c (validate_case_classes): Don't apply these
extra case alignment checks in the --complement case,
which is even more restrictive as to the contents of SET2.
* tests/misc/tr-case-class.sh: Add a test case,
for a large SET1, which caused the length adjustment
in validate_case_classes to underflow and trigger the assert.
* NEWS: Mention the bug fix.
Previously we would have failed immediately upon internal overflow,
which didn't output the full line being processed, and assumed
there would be another numbered line.
* src/nl.c (line_no_overflow): A new global to track overflow.
(print_lineno): Only fail if about to output an overflowed number.
(reset_lineno): A new function to refactor resetting of the number,
and which also clears line_no_overflow.
* tests/misc/nl.sh: Add a test case.
* src/csplit.c (process_regexp): Process the line suppression
in all invocations so that the last match is suppressed.
Previously with a non infinite match count,
the last regex pattern was not suppressed.
* NEWS: Mention the bug fix.
* tests/misc/csplit-suppress-matched.pl: Add a test case.
Fixes https://bugs.gnu.org/42764