1
0
mirror of git://git.sv.gnu.org/coreutils.git synced 2026-04-20 02:36:16 +02:00
Commit Graph

8797 Commits

Author SHA1 Message Date
Paul Eggert
11b01fc21f join,uniq: support multi-byte separators
* NEWS: Mention this.
* bootstrap.conf (gnulib_modules): Remove cu-ctype, as this module
is now more trouble than it’s worth.  All uses removed.
Add skipchars.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
Remove.
* gl/lib/skipchars.c, gl/lib/skipchars.h, gl/modules/skipchars:
* tests/misc/join-utf8.sh:
New files.
* src/join.c: Include skipchars.h and mcel.h instead of cu-ctype.h.
(tab): Now mcel_t, not int.  All uses changed.
(output_separator, output_seplen): New static vars.
(eq_tab, newline_or_blank, comma_or_blank): New functions.
(xfields, prfields, prjoin, add_field_list, main):
Support multi-byte characters.
* src/numfmt.c: Include ctype.h, skipchars.h.
Do not include cu-ctype.h.
(newline_or_blank): New function.
(next_field): Support multi-byte characters.
* src/sort.c: Include ctype.h instead of cu-ctype.h.
(inittables): Open-code field_sep since it no longer exists.
‘sort’ is not multi-byte safe yet, but when it is this code
will need revamping anyway.
* src/uniq.c: Include mcel.h and skipchars.h instead of cu-ctype.h.
(newline_or_blank): New function.
(find_field): Support multi-byte characters.
* tests/local.mk (all_tests): Add tests/misc/join-utf8.sh
2023-10-30 00:58:04 -07:00
Paul Eggert
2709bea0f4 test: allow non-blank white space in numbers
* src/test.c (find_int): Use isspace, not isblank,
for compatibility with how strtol works, which
is how most other shells do this.
2023-10-30 00:58:04 -07:00
Paul Eggert
a3ce33c106 stdbuf: port to oddball toupper
* src/stdbuf.c: Do not include ctype.h.
(set_libstdbuf_options): Use c_toupper, not toupper,
since the C locale is intended here.
2023-10-30 00:58:04 -07:00
Paul Eggert
8d60cd8ad6 dircolors: assume C-locale spaces
* src/dircolors.c: Include c-ctype.h, not ctype.h.
(parse_line): Use c_isspace, not isspace, as the .dircolors
file format (which does not seem to be documented!) appears
to be ASCII.
2023-10-30 00:58:04 -07:00
Paul Eggert
5602342a16 maint: port to oddball tolower
* src/digest.c (hex_equal): Work even in oddball locales
where tolower does not work as expected on ASCII letters.
2023-10-30 00:58:04 -07:00
Paul Eggert
4edb14d20f maint: include ctype.h selectively
Include ctype.h only in files that need it.  Many of its uses
are incorrect, as they assume single-byte locales.  The idea is
to remove the incorrect uses later, when there is time.
* src/chroot.c, src/csplit.c, src/dd.c, src/digest.c, src/dircolors.c:
* src/expand-common.c, src/expand.c, src/fmt.c, src/fold.c, src/ls.c:
* src/od.c, src/pinky.c, src/pr.c, src/ptx.c, src/seq.c:
* src/set-fields.c, src/split.c, src/stdbuf.c, src/test.c:
* src/tr.c, src/truncate.c, src/unexpand.c, src/wc.c:
Include ctype.h.
* src/system.h: Do not include ctype.h.

include ctype.h.o
2023-10-30 00:58:04 -07:00
Paul Eggert
684e810ae2 maint: move field_sep into separate module
This is so that we don’t need to have every source file
include ctype.h.
* bootstrap.conf (gnulib_modules): Add cu-ctype.
* gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype:
New files.
* src/join.c, src/numfmt.c, src/sort.c, src/uniq.c:
Include cu-ctype.h, for field_sep.
* src/system.h (field_sep): Remove; now supplied by cu-ctype.
2023-10-30 00:58:04 -07:00
Paul Eggert
2f3d9524bb digest: omit unnecessary b2sum includes
* src/blake2/b2sum.c: Do not include string.h, errno.h,
ctype.h, unistd.h, getopt.h.
2023-10-30 00:58:03 -07:00
Paul Eggert
0292a5678a maint: prefer c_isxdigit when that is the intent
* src/digest.c (valid_digits, split_3):
* src/echo.c (main):
* src/printf.c (print_esc):
* src/ptx.c (unescape_string):
* src/stat.c (print_it):
When the code is supposed to support only POSIX-locale hex digits,
use c_isxdigit rather than isxdigit.  Include c-ctype.h as needed.
This defends against oddball locales where isxdigit != c_isxdigit.
2023-10-30 00:58:03 -07:00
Pádraig Brady
f7e25d5bb5 maint: fix syntax check issue
* src/basenc.c: Fix preprocessor indentation.
2023-10-28 13:13:50 +01:00
Paul Eggert
60bd7bad9d basenc: fix unlikely locale issue; tune
This sped up ‘basenc -d --base16’ by 60% on my old platform,
AMD Phenom II X4 910e, Fedora 38.
* src/basenc.c (struct base16_decode_context): Simplify by
omitting have_nibble.  ‘nibble’ is now negative if it’s missing.
All uses changed.
(B16): New macro, inspired by ../lib/base64.c.
(base16_to_int): New static var, likewise.
(isubase16): Reimplement using base16_to_int, since isxdigit is
not guaranteed to succeed on the chars we want when the locale is
oddball.
(base16_decode_ctx): Tune by using base16_to_int and by
2023-10-25 15:09:27 -07:00
Paul Eggert
dcc1514d9a basenc: tweak checks to use unsigned char
This tends to generate better code, at least on x86-64,
because callers are just as fast and callees can avoid a conversion.
* src/basenc.c: The following renamings also change the arg type
from char to unsigned char.  All uses changed.
(isubase): Rename from isbase.
(isubase64url): Rename from isbase64url.
(isubase32hex): Rename from isbase32hex.
(isubase16): Rename from isbase16.
(isuz85): Rename from isz85.
(isubase2): Rename from isbase2.

2023-10-24  Paul Eggert  <eggert@cs.ucla.edu>

* src/basenc.c (struct base16_decode_context):
Simplify by storing -1 for missing nibbles.  All uses changed.
2023-10-25 15:09:27 -07:00
Pádraig Brady
5f538c27a1 basenc: --base16: also allow lower case with --ignore-garbage
* src/basenc.c (isbase16): Also return true for lower case.
* tests/basenc/basenc.pl: Add a test case.
Reported by Paul Eggert.
2023-10-25 14:04:00 +01:00
Pádraig Brady
d733f2ec26 basenc: --base16: support lower case hex digits
* src/basenc.c (base16_decode_ctx): Convert to uppercase
before converting from hex.
* tests/basenc/basenc.pl: Add a test case.
* NEWS: Mention the change in behavior.
Addresses https://bugs.gnu.org/66698
2023-10-23 14:04:38 +01:00
Pádraig Brady
378dc38f48 basenc: auto pad base32 and base64 inputs when decoding
Padding of encoded data is useful in cases where
base64 encoded data is concatenated / streamed.
I.e. where there are padding chars _within_ the stream.
In other cases padding is optional and can be inferred.
Note we continue to treat partial padding as invalid,
as that would be indicative of truncation.

* src/basenc.c (do_decode): Auto pad the end of the input.
* NEWS: Mention the change in behavior.
* tests/misc/base64.pl: Adjust to not fail for missing padding.
Addresses https://bugs.gnu.org/66265
2023-10-06 18:21:12 +01:00
Paul Eggert
a2434d3e58 sort: improve --help
Problem reported by Jorge Stolfi (bug#66253).
* src/sort.c (usage): Suggest looking at the manual for -n details.
2023-09-28 18:03:34 -07:00
Pádraig Brady
0c46704832 doc: rm --help: mention that '.' or '..' are rejected
* src/rm.c (usage): State that '.' or '..' are rejected.
2023-09-25 15:26:31 +01:00
Paul Eggert
de4e704273 wc: pacify ‘make syntax-check’
* src/wc_avx2.c (wc_lines_avx2): Explicitly make it ‘extern’.
Not sure why this is needed.
2023-09-23 17:20:26 -07:00
Paul Eggert
2245a95806 wc: distribute src/wc.h
* src/local.mk (noinst_HEADERS): Add src/wc.h.
2023-09-23 17:20:25 -07:00
Paul Eggert
f40c6b5cf2 wc: goto considered harmful
* src/wc.c: Do not include assure.h.  Replace the only
use of ‘assure’ with ‘unreachable’ which is good enough.
(wc, main): Remove labels and gotos.  This doesn’t affect
performance in any way I can measure, and makes the code
a bit easier to follow.
2023-09-23 17:07:52 -07:00
Paul Eggert
6b8b1f9e77 wc: prefer signed integers
Prefer signed to unsigned integers, to make it easier to catch
integer overflow errors.
* src/wc.c: Do not include safe-read.
(total_lines_overflow, total_words_overflow, total_chars_overflow)
(total_bytes_overflow): Now bool, not uintmax_t.  All uses changed.
(max_line_length): Now intmax_t, not uintmax_t.  All uses changed.
The total_... vars are still uintmax_t because overflow into them
is checked.
(page_size): Now idx_t, not size_t.
(wc_lines, wc, get_input_fstatus, compute_number_width, main):
Prefer signed to unsigned ints where either should do.
(wc_lines, wc): Use read rather than safe_read, since we don’t
need safe_read’s checks for huge buffers.
(wc): Redo call to mbrtoc32 to lessen the number of comparisons
against its returned value.  Do this partly by keeping a pointer
to the end of the buffer rather than a count.  Simplify
overflow-checking code.
(compute_number_width): Check for integer overflow.
Don’t assume size_t fits into unsigned long.
* src/wc.h (struct wc_lines): Prefer signed integers.
* src/wc_avx2.c: Do not include safe-read.h.
(wc_lines_avx2): Prefer signed integers.  Use read, not safe_read.
2023-09-23 17:07:52 -07:00
Paul Eggert
8d41285fe4 wc: improve avx2 API
* src/wc.c: Use "#include <...>" for files not in the current dir.
Include "wc.h" instead of declaring wc_lines_avx2 by hand.
(wc_lines): New API, with no file name (no longer needed) and
with a return struct rather than arg pointers.  All uses changed.
Use avx2_supported directly instead of using a function pointer.
Exploit C99-style declarations after statements.
Multiply by 15 rather than dividing; it’s faster and more accurate
and cannot overflow here.
(wc): Simplify based on wc_lines API change.
* src/wc.h: New file.
* src/wc_avx2.c: Include it, to check API better.
(wc_lines_avx2): Use new API.  All uses changed.  Exploit C99.
Make locals more local.
2023-09-23 17:07:52 -07:00
Paul Eggert
769ace51e8 factor,tail: avoid quadratic reallocation
* src/factor.c (struct mp_factors): New member nalloc.
(mp_factor_init): Initialize it.
* src/factor.c (mp_factor_insert):
* src/tail.c (parse_options): Use xpalloc to avoid quadratic
worst-case behavior on reallocation.
* src/tail.c (pids_alloc): New static var.
2023-09-23 01:15:50 -07:00
Paul Eggert
a6064bb864 wc: simplify by removing SUPPORT_OLD_MBRTOWC
* src/wc.c (SUPPORT_OLD_MBRTOWC): Remove.  All uses removed.
(wc): Simplify by assuming C99-or-later behavior for mbrtoc32,
which after all is a C11 API.  Fix the !SUPPORT_OLD_MBRTOWC
code, which evidently was never tested seriously.
2023-09-23 00:28:27 -07:00
Paul Eggert
17a9e79023 wc: 3× speedup in C locale
The 3× speedup was measured by invoking 'wc $(find * -type f)'
on the coreutils sources etc. on an Ubuntu 23.04 x86-64.
These changes also speed up wc 20% in UTF-8 locales.
* src/wc.c (wc_isprint, wc_isspace): New static vars.
(wc): Use them for speed.
(main): Initialize them if needed.
(isnbspace): Remove; no longer used.
2023-09-23 00:28:27 -07:00
Paul Eggert
bee39b93f5 wc: treat encoding errors as non white space
* src/wc.c (wc): Treat encoding errors like non white space
characters.
2023-09-23 00:28:27 -07:00
Paul Eggert
31076e8689 wc: fix word count bug
* bootstrap.conf (gnulib_modules): Remove c32isprint.
* src/wc.c (wc): Consider all non-white-space characters
to be word constituents, even if they are not printable.
POSIX requires this, and it is what BSD does.
Partly do this by simplifying the check for a word,
by counting word starts rather than word ends.
* tests/wc/wc.pl: Test for the bug.
2023-09-23 00:28:27 -07:00
Paul Eggert
a6648d4102 maint: omit some unused function tests
* m4/jm-macros.m4: Do not check for ftruncate, iswspace,
mkfifo, mbrlen, sysctl.  Coreutils no longer uses the
corresponding HAVE_* macros, typically because Gnulib
handles them now.
* src/wc.c (iswspace): Remove; unused.
2023-09-23 00:28:27 -07:00
Paul Eggert
14d35d5bad maint: prefer char32_t to wchar_t
This should work better on non-glibc platforms that don’t
use Unicode for wchar_t.  However, POSIX appears to prohibit
this for printf.c so leave that alone.
* bootstrap.conf (gnulib_modules): Add btoc32, c32iscntrl,
c32isprint, c32isspace, c32width, mbrtoc32.  Remove btoc, wcwidth.
* src/df.c, src/ls.c, src/wc.c:
Include uchar.h instead of wchar.h and wctype.h.
* src/df.c (replace_invalid_chars):
* src/ls.c (quote_name_buf):
* src/wc.c (isnbspace, wc):
Use char32_t instead of wchar_t.
2023-09-23 00:28:27 -07:00
Paul Eggert
c5a210a9c8 wc: simplify #if MB_LEN_MAX
* src/wc.c: Don’t have special #ifs for platforms where
MB_LEN_MAX is 1.  On these platforms, MB_CUR_MAX is 1 as well,
so the compiler should optimize away all multi-byte code.
2023-09-23 00:28:26 -07:00
Paul Eggert
fb51f74ff6 wc: avoid undefined conversion state
* src/wc.c (wc): When mbrtowc returns (size_t) -1, zero the
conversion state, since POSIX says it’s undefined.
2023-09-23 00:28:26 -07:00
Paul Eggert
092f8178c0 maint: use mbszero
* bootstrap.conf (gnulib_modules): Add mbszero.
* src/df.c (replace_invalid_chars):
* src/ls.c (quote_name_buf):
* src/pathchk.c (portable_chars_only):
* src/printf.c (STRTOX):
* src/wc.c (wc):
Prefer mbszero to clearing an mbstate_t by hand.
2023-09-23 00:28:26 -07:00
Paul Eggert
6c16044d8d wc: stop worrying about EBCDIC, shift-JIS, etc
* src/wc.c: Do not include mbchar.h.
(wc): Check for ASCII characters instead of using is_basic.
Other parts of Gnulib and coreutils already assume the encoding
is upward compatible with ASCII, and the old code wouldn’t
have worked anyway with shift-JIS.
2023-09-23 00:28:26 -07:00
Paul Eggert
17bddc047b expr: use mcel
The mcel API is simpler and corresponds more closely to how
Emacs etc. behave when the input has encoding errors,
since it treats each encoding-error byte separately.
* bootstrap.conf (gnulib_modules): Add mcel.
* src/expr.c: Include mcel.h instead of mbuiter.h.
(mbs_logical_cspn, mbs_logical_substr, mbs_offset_to_chars):
Use mcel API.
(mbs_logical_substr): Use ximemdup0 so as not to waste memory in
the result, fixing a FIXME.
2023-09-23 00:28:26 -07:00
Pádraig Brady
2593502290 build: avoid build failures on gcc <= 10, or clang
On gcc 10 the following build failure occurs:
  "error: a label can only be part of a statement
   and a declaration is not a statement"
This is because the current code is non standards conforming,
but GCC >= 11 will compile it (even with the -Wpedantic option).
This issue is tracked for GCC at:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111526

* src/tail.c (parse_options): Avoid a declaration after label,
by using a surrounding block.
2023-09-21 19:04:14 +01:00
Stephen Kitt
d24a117707 tail: allow multiple PIDs
tail can watch multiple files, but currently only a single writer. It
can be useful to watch files from multiple writers, or even processes
not directly related to the files (e.g. watch log files written by a
server process, for the duration of a test driven by a separate
client).

* src/tail.c (writers_are_dead): New function.
(tail_forever): Use it to wait for writers.
(tail_forever_inotify): As above.
(parse_options): Manage --pid options in an array.
* doc/coreutils.texi: Update documentation.
* tests/tail/pid.sh: Add a variant with two PIDs.
* News: Mention the new feature.
2023-09-20 15:53:34 +01:00
Sylvestre Ledru
8367b95a13 ls: --dired now implies long format with hyperlinks disabled
Currently --dired is silently ignored
with conflicting output formats

* src/ls.c (decode_switches): Set default format and hyperlink mode
when the --dired option is specified.
* tests/ls/dired.sh: Check that formats are implied / overridden.
* NEWS: Mention the change in behavior.
* doc/coreutils.texi (ls invocation): Adjust --dired description.
2023-09-17 18:47:50 +01:00
Pádraig Brady
3b0f5b9971 maint: use C99 int size specifiers rather than PRI.MAX defines
Following on from commit v9.3-128-gf31229ebd
replace all uses of the PRI.MAX portability defines
with C99 size specifiers %z, %j, and %t.
2023-09-13 23:08:02 +01:00
Paul Eggert
c7ec75a276 cp,mv,install: add copy_internal comment
* src/copy.c (copy_internal): Add comment about
some particularly tricky logic.
2023-09-08 16:25:39 -07:00
Paul Eggert
3cff27ddc1 cp: avoid needless unlinkat after fstatat ELOOP
* src/copy.c (copy_internal): When cp -f's fstatat fails on the
destination with ELOOP, report an error immediately when fstatat
used AT_SYMLINK_NOFOLLOW, as the later unlinkat would fail too.
2023-09-08 16:25:39 -07:00
Paul Eggert
a66a4b77a5 cp,mv,install: minor copy_internal refactoring
* src/copy.c (copy_internal): Redo to avoid need for calculating
fstatat_flags when not needed.  This is for clarity, not speed.
2023-09-08 16:25:39 -07:00
Paul Eggert
67324bf19c cp,mv,install: fix comment punctuation
* src/copy.h: Fix punctuation in comment.
2023-09-08 16:25:39 -07:00
Paul Eggert
69bd8be403 cp,mv,install: simplify copy_internal
* src/copy.c (copy_internal): Simplify.
2023-09-08 16:25:39 -07:00
Paul Eggert
68f4c238ca maint: prefer psame_inode, PSAME_INODE, STP_*
Prefer psame_inode, PSAME_INODE, STP_NBLOCKS, and STP_BLKSIZE,
which take addresses of objects, to their counterparts that
take the whole objects.  In some cases the whole objects might
not be initialized, which would be undefined behavior strictly
speaking.
* gl/lib/root-dev-ino.h (ROOT_DEV_INO_CHECK):
* src/cp-hash.c (src_to_dest_compare):
* src/ls.c (dev_ino_compare):
* src/pwd.c (robust_getcwd):
Prefer PSAME_INODE to SAME_INODE.
* src/chown-core.c (restricted_chown):
* src/copy.c (copy_reg, same_file_ok, source_is_dst_backup)
(copy_internal):
* src/ln.c (do_link):
* src/pwd.c (logical_getcwd):
* src/sort.c (avoid_trashing_input):
* src/split.c (create):
* src/stat.c (find_bind_mount):
Prefer psame_inode to SAME_INODE.
* src/copy.c (infer_scantype):
* src/du.c (process_file):
* src/ls.c (gobble_file, print_long_format)
(print_file_name_and_frills, length_of_file_name_and_frills):
* src/stat.c (print_stat):
Prefer STP_NBLOCKS to ST_NBLOCKS.
* src/copy.c (copy_reg):
* src/head.c (elide_tail_bytes_file, elide_tail_lines_file):
* src/ioblksize.h (io_blksize):
* src/od.c (skip):
* src/shred.c (do_wipefd):
* src/stat.c (print_stat):
* src/tail.c (tail_bytes):
* src/truncate.c (do_ftruncate):
* src/wc.c (wc):
Prefer STP_BLKSIZE to ST_BLKSIZE.
* src/ioblksize.h (io_blksize):
Arg is now struct stat const *, not struct stat.  All callers changed.
2023-09-04 23:12:02 -07:00
Paul Eggert
65a1c5b441 cp,mv,install: a bit more up-to-date source stat
* src/copy.c (copy_reg): Replace caller’s source status
with the more recent version.
2023-09-04 23:12:02 -07:00
Paul Eggert
9cd52bd999 cp,mv,install: fix chmod on Linux CIFS
This bug occurs only when temporarily setting the mode to the
intersection of old and new modes when changing ownership.
* src/copy.c (owner_failure_ok): Treat EACCES like EPERM.
2023-09-02 13:28:23 -07:00
Paul Eggert
5f97136160 cp,mv,install: fix chown on Linux CIFS
* src/copy.c (chown_failure_ok): Also treat EACCES as OK.
2023-09-01 15:10:45 -07:00
Paul Eggert
de4a5220f2 maint: simplify set_owner
* src/copy.c (HAVE_FCHOWN, fchown): Remove.
(fchmod_or_lchmod): Move up.
(fchown_or_lchown): New function.
(set_owner): Use it to simplify.
2023-09-01 15:10:45 -07:00
Paul Eggert
74439d15d7 chown: port to mingw and MSVC 14
* src/chown-core.c (restricted_chown): Don’t assume fchown exists.
The Gnulib doc says that nowadays this is needed only for
ports to mingw and MSVC 14, but it’s an easy port so let’s do it.
2023-09-01 15:10:44 -07:00
Paul Eggert
e0326b0473 maint: regularize struct initializers
* src/chmod.c (process_file):
* src/df.c (replace_invalid_chars):
* src/iopoll.c (iopoll_internal):
* src/ls.c (quote_name_buf):
* src/pathchk.c (portable_chars_only):
* src/printf.c (STRTOX):
* src/shred.c (main):
* src/stat.c (neg_to_zero, do_stat):
* src/timeout.c (settimeout):
* src/tr.c (card_of_complement):
* src/wc.c (wc):
Prefer ‘{0}’ to initialize everything to zero.
* src/stat.c (do_stat):
* src/timeout.c (settimeout):
Do not assume the usual order for struct members,
as POSIX does not guarantee this.
2023-08-30 20:32:13 -07:00