1
0
mirror of git://git.sv.gnu.org/coreutils.git synced 2026-02-18 05:12:15 +02:00
Commit Graph

29986 Commits

Author SHA1 Message Date
Pádraig Brady
0ee8b03422 doc: touch: clarify --time description in man page
* src/touch.c (usage): Reorganise the description to be similar to
the format used for the ls --time description, which formats better
when converted to a man page.  Also separate the description
to allow for more granular translations.
Fixes https://bugs.gnu.org/67656
2023-12-06 13:07:43 +00:00
dann frazier
73d119f4f8 tail: fix tailing sysfs files where PAGE_SIZE > BUFSIZ
* src/tail.c (file_lines): Ensure we use a buffer size >= PAGE_SIZE when
searching backwards to avoid seeking within a file,
which on sysfs files is accepted but also returns no data.
* tests/tail/tail-sysfs.sh: Add a new test.
* tests/local.mk: Reference the new test.
* NEWS: Mention the bug fix.
Fixes https://bugs.gnu.org/67490
2023-12-01 23:01:32 +00:00
Pádraig Brady
615167cc4d numfmt: support lowercase 'k' for Kilo and Kibi
For consistency with the "SI" standard, and with other coreutils
which output a lowercase 'k' in "SI" mode.

* src/numfmt.c (suffix_power): Treat 'k' like 'K' on input.
(double_to_human): Output lowercase 'k' in SI mode.
(usage): Adjust accordingly.
* doc/coreutils.texi: Mention 'k' accepted, and printed in SI mode.
* tests/misc/numfmt.pl: Adjust accordingly.
* NEWS: Mention the change in behavior.
Fixes https://bugs.gnu.org/47103
2023-11-27 19:41:49 +00:00
Paul Eggert
74b9d6a6e8 uniq: fix bug with -w in multibyte locales
-w counted bytes not characters, which is wrong in multibyte locales.
This bug exists even in Fedora, which is why the recently-added
test cases from Fedora didn’t catch it.
* src/uniq.c (find_field): New arg PLEN.  All callers changed.
Compute length of field correctly in multi-byte locales.
(different): Don’t worry about check_chars; find_field now does that.
* tests/uniq/uniq.pl: Test for this bug.
2023-11-16 11:37:25 -08:00
Paul Eggert
0ed9d1823a tests: omit inapplicable test code
* tests/misc/join.pl, tests/uniq/uniq.pl:
Remove test for "invalid byte, character or field list" message
that is not generated.
2023-11-16 11:37:25 -08:00
Paul Eggert
77201c506f uniq: change macro to function
* src/uniq.c (swap_lines): New static function, replacing
the old SWAP_LINES macro.  These days this is just as fast.
All uses changed.
2023-11-16 11:37:25 -08:00
Paul Eggert
3bee7c9754 uniq: prefer static init
* src/uniq.c (skip_fields, skip_chars, check_chars, count_occurrences)
(output_unique, output_first_repeated, output_later_repeated)
(delimit_groups): Initialize statically, rather than in ‘main’.
This shrinks the executable a bit.
2023-11-16 11:37:25 -08:00
Paul Eggert
a72b7823b4 uniq: simplify and fix unlikely bug by using bool
* src/uniq.c (enum countmode): Remove this type.
(count_occurrences): New static var, replacing the old countmode,
and of type boolean instead of a two-value enum type that was
confusing (and which caused a hard-to-test bug when the count
exceeded INTMAX_MAX - 1).  All uses changed.
2023-11-16 11:37:25 -08:00
Paul Eggert
a257b63ce7 uniq: prefer signed integers
* src/uniq.c (skip_fields, skip_chars, check_chars, size_opt)
(find_field, different, writeline, check_file, main):
Prefer signed to unsigned integer types, since this allows
for better runtime checking with -fsanitize=undefined.
2023-11-14 23:15:18 -08:00
Paul Eggert
23e26ed972 maint: DECIMAL_DIGIT_ACCUMULATE uses stdckdint.h
* src/system.h: Include <stdckdint.h>, since the new
DECIMAL_DIGIT_ACCUMULATE uses it.
Do not include stdckdint.h from files that also include system.h.
(DECIMAL_DIGIT_ACCUMULATE): Omit last arg, which is no longer needed.
Reimplement by using C23-style stdckdint.h’s ckd_mul and ckd_add,
as that’s more standard and is more likely to generate better code.
2023-11-14 20:38:24 -08:00
Paul Eggert
3e0d7787e6 pinky: fix string size calculation
* src/pinky.c (count_ampersands): Simplify and return idx_t.
(create_fullname): Compute proper destination string size,
basically, by adding (ulen - 1) * ampersands rather than ulen *
(ampersands - 1).  Problem found on CHERI-64.
2023-11-11 00:17:49 -08:00
Paul Eggert
4c15a1b6e6 maint: port randread to FreeBSD 14
* gl/lib/randread.c (POINTER_IS_ALIGNED): Rename from
ALIGNED_POINTER to avoid a collision with <machine/param.h>
on FreeBSD 14.
2023-11-11 00:17:49 -08:00
Paul Eggert
394b29aaff build: update gnulib submodule to latest 2023-11-11 00:17:48 -08:00
Pádraig Brady
7f2c97a241 ls: fix recent regression in size alignment
* src/ls.c (print_long_format): Use correct column width,
introduced due to a copy/paste error in commit v9.4-2-gcbb6dfec5
* tests/ls/size-align.sh: Add a test.
* tests/local.mk: Reference the new test.
Fixes https://bugs.gnu.org/66919
2023-11-03 16:34:38 +00:00
Paul Eggert
56e9acb292 join: fix recently introduced NUL bug
* src/join.c (xfields): Simplify and fix bug with fields
that start with a NUL byte when -t is not used.
* tests/misc/join-utf8.sh: Also test when -t is not used,
and when a field starts with NUL.
2023-10-30 10:49:44 -07:00
Paul Eggert
bd45f0963c maint: pacify ‘make syntax-check’
* tests/misc/join-utf8.sh: Omit fail=0.
Fix framework_failure_ typo.
* tests/misc/join.pl: Change ` to '.
2023-10-30 01:33:19 -07:00
Paul Eggert
ba5017b65a maint: copy join, uniq tests from Fedora
* tests/misc/join.pl, tests/uniq/uniq.pl:
Copy from Fedora 39.  This adds more multi-byte tests.
2023-10-30 01:24:43 -07:00
Paul Eggert
11b01fc21f join,uniq: support multi-byte separators
* NEWS: Mention this.
* bootstrap.conf (gnulib_modules): Remove cu-ctype, as this module
is now more trouble than it’s worth.  All uses removed.
Add skipchars.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
Remove.
* gl/lib/skipchars.c, gl/lib/skipchars.h, gl/modules/skipchars:
* tests/misc/join-utf8.sh:
New files.
* src/join.c: Include skipchars.h and mcel.h instead of cu-ctype.h.
(tab): Now mcel_t, not int.  All uses changed.
(output_separator, output_seplen): New static vars.
(eq_tab, newline_or_blank, comma_or_blank): New functions.
(xfields, prfields, prjoin, add_field_list, main):
Support multi-byte characters.
* src/numfmt.c: Include ctype.h, skipchars.h.
Do not include cu-ctype.h.
(newline_or_blank): New function.
(next_field): Support multi-byte characters.
* src/sort.c: Include ctype.h instead of cu-ctype.h.
(inittables): Open-code field_sep since it no longer exists.
‘sort’ is not multi-byte safe yet, but when it is this code
will need revamping anyway.
* src/uniq.c: Include mcel.h and skipchars.h instead of cu-ctype.h.
(newline_or_blank): New function.
(find_field): Support multi-byte characters.
* tests/local.mk (all_tests): Add tests/misc/join-utf8.sh
2023-10-30 00:58:04 -07:00
Paul Eggert
2709bea0f4 test: allow non-blank white space in numbers
* src/test.c (find_int): Use isspace, not isblank,
for compatibility with how strtol works, which
is how most other shells do this.
2023-10-30 00:58:04 -07:00
Paul Eggert
a3ce33c106 stdbuf: port to oddball toupper
* src/stdbuf.c: Do not include ctype.h.
(set_libstdbuf_options): Use c_toupper, not toupper,
since the C locale is intended here.
2023-10-30 00:58:04 -07:00
Paul Eggert
8d60cd8ad6 dircolors: assume C-locale spaces
* src/dircolors.c: Include c-ctype.h, not ctype.h.
(parse_line): Use c_isspace, not isspace, as the .dircolors
file format (which does not seem to be documented!) appears
to be ASCII.
2023-10-30 00:58:04 -07:00
Paul Eggert
5602342a16 maint: port to oddball tolower
* src/digest.c (hex_equal): Work even in oddball locales
where tolower does not work as expected on ASCII letters.
2023-10-30 00:58:04 -07:00
Paul Eggert
4edb14d20f maint: include ctype.h selectively
Include ctype.h only in files that need it.  Many of its uses
are incorrect, as they assume single-byte locales.  The idea is
to remove the incorrect uses later, when there is time.
* src/chroot.c, src/csplit.c, src/dd.c, src/digest.c, src/dircolors.c:
* src/expand-common.c, src/expand.c, src/fmt.c, src/fold.c, src/ls.c:
* src/od.c, src/pinky.c, src/pr.c, src/ptx.c, src/seq.c:
* src/set-fields.c, src/split.c, src/stdbuf.c, src/test.c:
* src/tr.c, src/truncate.c, src/unexpand.c, src/wc.c:
Include ctype.h.
* src/system.h: Do not include ctype.h.

include ctype.h.o
2023-10-30 00:58:04 -07:00
Paul Eggert
684e810ae2 maint: move field_sep into separate module
This is so that we don’t need to have every source file
include ctype.h.
* bootstrap.conf (gnulib_modules): Add cu-ctype.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
New files.
* src/join.c, src/numfmt.c, src/sort.c, src/uniq.c:
Include cu-ctype.h, for field_sep.
* src/system.h (field_sep): Remove; now supplied by cu-ctype.
2023-10-30 00:58:04 -07:00
Paul Eggert
2f3d9524bb digest: omit unnecessary b2sum includes
* src/blake2/b2sum.c: Do not include string.h, errno.h,
ctype.h, unistd.h, getopt.h.
2023-10-30 00:58:03 -07:00
Paul Eggert
0292a5678a maint: prefer c_isxdigit when that is the intent
* src/digest.c (valid_digits, split_3):
* src/echo.c (main):
* src/printf.c (print_esc):
* src/ptx.c (unescape_string):
* src/stat.c (print_it):
When the code is supposed to support only POSIX-locale hex digits,
use c_isxdigit rather than isxdigit.  Include c-ctype.h as needed.
This defends against oddball locales where isxdigit != c_isxdigit.
2023-10-30 00:58:03 -07:00
Pádraig Brady
f7e25d5bb5 maint: fix syntax check issue
* src/basenc.c: Fix preprocessor indentation.
2023-10-28 13:13:50 +01:00
Pádraig Brady
8c735f6585 base32,base64: disallow non-canonical encodings
This will make decoding more resilient to corruption
whether due to transmission errors or nefarious adjustment.
See https://eprint.iacr.org/2022/361.pdf

* gnulib: Update to commit 3f463202bd enforcing canonical encoding.
* tests/basenc/base64.pl: Add test cases, and adjust existing cases.
* NEWS: Mention the change in behavior.
2023-10-28 13:13:34 +01:00
Paul Eggert
60bd7bad9d basenc: fix unlikely locale issue; tune
This sped up ‘basenc -d --base16’ by 60% on my old platform,
AMD Phenom II X4 910e, Fedora 38.
* src/basenc.c (struct base16_decode_context): Simplify by
omitting have_nibble.  ‘nibble’ is now negative if it’s missing.
All uses changed.
(B16): New macro, inspired by ../lib/base64.c.
(base16_to_int): New static var, likewise.
(isubase16): Reimplement using base16_to_int, since isxdigit is
not guaranteed to succeed on the chars we want when the locale is
oddball.
(base16_decode_ctx): Tune by using base16_to_int and by
2023-10-25 15:09:27 -07:00
Paul Eggert
dcc1514d9a basenc: tweak checks to use unsigned char
This tends to generate better code, at least on x86-64,
because callers are just as fast and callees can avoid a conversion.
* src/basenc.c: The following renamings also change the arg type
from char to unsigned char.  All uses changed.
(isubase): Rename from isbase.
(isubase64url): Rename from isbase64url.
(isubase32hex): Rename from isbase32hex.
(isubase16): Rename from isbase16.
(isuz85): Rename from isz85.
(isubase2): Rename from isbase2.

2023-10-24  Paul Eggert  <eggert@cs.ucla.edu>

* src/basenc.c (struct base16_decode_context):
Simplify by storing -1 for missing nibbles.  All uses changed.
2023-10-25 15:09:27 -07:00
Paul Eggert
f4a59d453e build: update gnulib submodule to latest 2023-10-25 15:09:27 -07:00
Pádraig Brady
5f538c27a1 basenc: --base16: also allow lower case with --ignore-garbage
* src/basenc.c (isbase16): Also return true for lower case.
* tests/basenc/basenc.pl: Add a test case.
Reported by Paul Eggert.
2023-10-25 14:04:00 +01:00
Pádraig Brady
d733f2ec26 basenc: --base16: support lower case hex digits
* src/basenc.c (base16_decode_ctx): Convert to uppercase
before converting from hex.
* tests/basenc/basenc.pl: Add a test case.
* NEWS: Mention the change in behavior.
Addresses https://bugs.gnu.org/66698
2023-10-23 14:04:38 +01:00
Pádraig Brady
2e0dcd87bf doc: fix RFC references
* doc/coreutils.texi: Adjust RFC URLs as the original
now give 404 errors.
2023-10-23 12:29:03 +01:00
Pádraig Brady
caa716803a tests: move all basenc tests to their own directory
* tests/misc/base64.pl: Move to tests/basenc/base64.pl
* tests/misc/basenc.pl: Move to tests/basenc/basenc.pl
* tests/local.mk: Adjust accordingly
2023-10-06 18:22:35 +01:00
Pádraig Brady
378dc38f48 basenc: auto pad base32 and base64 inputs when decoding
Padding of encoded data is useful in cases where
base64 encoded data is concatenated / streamed.
I.e. where there are padding chars _within_ the stream.
In other cases padding is optional and can be inferred.
Note we continue to treat partial padding as invalid,
as that would be indicative of truncation.

* src/basenc.c (do_decode): Auto pad the end of the input.
* NEWS: Mention the change in behavior.
* tests/misc/base64.pl: Adjust to not fail for missing padding.
Addresses https://bugs.gnu.org/66265
2023-10-06 18:21:12 +01:00
Paul Eggert
a2434d3e58 sort: improve --help
Problem reported by Jorge Stolfi (bug#66253).
* src/sort.c (usage): Suggest looking at the manual for -n details.
2023-09-28 18:03:34 -07:00
Pádraig Brady
0c46704832 doc: rm --help: mention that '.' or '..' are rejected
* src/rm.c (usage): State that '.' or '..' are rejected.
2023-09-25 15:26:31 +01:00
Paul Eggert
de4e704273 wc: pacify ‘make syntax-check’
* src/wc_avx2.c (wc_lines_avx2): Explicitly make it ‘extern’.
Not sure why this is needed.
2023-09-23 17:20:26 -07:00
Paul Eggert
2245a95806 wc: distribute src/wc.h
* src/local.mk (noinst_HEADERS): Add src/wc.h.
2023-09-23 17:20:25 -07:00
Paul Eggert
f40c6b5cf2 wc: goto considered harmful
* src/wc.c: Do not include assure.h.  Replace the only
use of ‘assure’ with ‘unreachable’ which is good enough.
(wc, main): Remove labels and gotos.  This doesn’t affect
performance in any way I can measure, and makes the code
a bit easier to follow.
2023-09-23 17:07:52 -07:00
Paul Eggert
6b8b1f9e77 wc: prefer signed integers
Prefer signed to unsigned integers, to make it easier to catch
integer overflow errors.
* src/wc.c: Do not include safe-read.
(total_lines_overflow, total_words_overflow, total_chars_overflow)
(total_bytes_overflow): Now bool, not uintmax_t.  All uses changed.
(max_line_length): Now intmax_t, not uintmax_t.  All uses changed.
The total_... vars are still uintmax_t because overflow into them
is checked.
(page_size): Now idx_t, not size_t.
(wc_lines, wc, get_input_fstatus, compute_number_width, main):
Prefer signed to unsigned ints where either should do.
(wc_lines, wc): Use read rather than safe_read, since we don’t
need safe_read’s checks for huge buffers.
(wc): Redo call to mbrtoc32 to lessen the number of comparisons
against its returned value.  Do this partly by keeping a pointer
to the end of the buffer rather than a count.  Simplify
overflow-checking code.
(compute_number_width): Check for integer overflow.
Don’t assume size_t fits into unsigned long.
* src/wc.h (struct wc_lines): Prefer signed integers.
* src/wc_avx2.c: Do not include safe-read.h.
(wc_lines_avx2): Prefer signed integers.  Use read, not safe_read.
2023-09-23 17:07:52 -07:00
Paul Eggert
8d41285fe4 wc: improve avx2 API
* src/wc.c: Use "#include <...>" for files not in the current dir.
Include "wc.h" instead of declaring wc_lines_avx2 by hand.
(wc_lines): New API, with no file name (no longer needed) and
with a return struct rather than arg pointers.  All uses changed.
Use avx2_supported directly instead of using a function pointer.
Exploit C99-style declarations after statements.
Multiply by 15 rather than dividing; it’s faster and more accurate
and cannot overflow here.
(wc): Simplify based on wc_lines API change.
* src/wc.h: New file.
* src/wc_avx2.c: Include it, to check API better.
(wc_lines_avx2): Use new API.  All uses changed.  Exploit C99.
Make locals more local.
2023-09-23 17:07:52 -07:00
Paul Eggert
769ace51e8 factor,tail: avoid quadratic reallocation
* src/factor.c (struct mp_factors): New member nalloc.
(mp_factor_init): Initialize it.
* src/factor.c (mp_factor_insert):
* src/tail.c (parse_options): Use xpalloc to avoid quadratic
worst-case behavior on reallocation.
* src/tail.c (pids_alloc): New static var.
2023-09-23 01:15:50 -07:00
Paul Eggert
9ecc4f4e44 doc: mention Unicode exceptions for wc 2023-09-23 00:28:28 -07:00
Paul Eggert
a6064bb864 wc: simplify by removing SUPPORT_OLD_MBRTOWC
* src/wc.c (SUPPORT_OLD_MBRTOWC): Remove.  All uses removed.
(wc): Simplify by assuming C99-or-later behavior for mbrtoc32,
which after all is a C11 API.  Fix the !SUPPORT_OLD_MBRTOWC
code, which evidently was never tested seriously.
2023-09-23 00:28:27 -07:00
Paul Eggert
17a9e79023 wc: 3× speedup in C locale
The 3× speedup was measured by invoking 'wc $(find * -type f)'
on the coreutils sources etc. on an Ubuntu 23.04 x86-64.
These changes also speed up wc 20% in UTF-8 locales.
* src/wc.c (wc_isprint, wc_isspace): New static vars.
(wc): Use them for speed.
(main): Initialize them if needed.
(isnbspace): Remove; no longer used.
2023-09-23 00:28:27 -07:00
Paul Eggert
bee39b93f5 wc: treat encoding errors as non white space
* src/wc.c (wc): Treat encoding errors like non white space
characters.
2023-09-23 00:28:27 -07:00
Paul Eggert
31076e8689 wc: fix word count bug
* bootstrap.conf (gnulib_modules): Remove c32isprint.
* src/wc.c (wc): Consider all non-white-space characters
to be word constituents, even if they are not printable.
POSIX requires this, and it is what BSD does.
Partly do this by simplifying the check for a word,
by counting word starts rather than word ends.
* tests/wc/wc.pl: Test for the bug.
2023-09-23 00:28:27 -07:00
Paul Eggert
a6648d4102 maint: omit some unused function tests
* m4/jm-macros.m4: Do not check for ftruncate, iswspace,
mkfifo, mbrlen, sysctl.  Coreutils no longer uses the
corresponding HAVE_* macros, typically because Gnulib
handles them now.
* src/wc.c (iswspace): Remove; unused.
2023-09-23 00:28:27 -07:00