coreutils

mirror of git://git.sv.gnu.org/coreutils.git synced 2026-04-20 02:36:16 +02:00

Author	SHA1	Message	Date
Paul Eggert	11b01fc21f	join,uniq: support multi-byte separators * NEWS: Mention this. * bootstrap.conf (gnulib_modules): Remove cu-ctype, as this module is now more trouble than it’s worth. All uses removed. Add skipchars. * gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype: Remove. * gl/lib/skipchars.c, gl/lib/skipchars.h, gl/modules/skipchars: * tests/misc/join-utf8.sh: New files. * src/join.c: Include skipchars.h and mcel.h instead of cu-ctype.h. (tab): Now mcel_t, not int. All uses changed. (output_separator, output_seplen): New static vars. (eq_tab, newline_or_blank, comma_or_blank): New functions. (xfields, prfields, prjoin, add_field_list, main): Support multi-byte characters. * src/numfmt.c: Include ctype.h, skipchars.h. Do not include cu-ctype.h. (newline_or_blank): New function. (next_field): Support multi-byte characters. * src/sort.c: Include ctype.h instead of cu-ctype.h. (inittables): Open-code field_sep since it no longer exists. ‘sort’ is not multi-byte safe yet, but when it is this code will need revamping anyway. * src/uniq.c: Include mcel.h and skipchars.h instead of cu-ctype.h. (newline_or_blank): New function. (find_field): Support multi-byte characters. * tests/local.mk (all_tests): Add tests/misc/join-utf8.sh	2023-10-30 00:58:04 -07:00
Paul Eggert	2709bea0f4	test: allow non-blank white space in numbers * src/test.c (find_int): Use isspace, not isblank, for compatibility with how strtol works, which is how most other shells do this.	2023-10-30 00:58:04 -07:00
Paul Eggert	a3ce33c106	stdbuf: port to oddball toupper * src/stdbuf.c: Do not include ctype.h. (set_libstdbuf_options): Use c_toupper, not toupper, since the C locale is intended here.	2023-10-30 00:58:04 -07:00
Paul Eggert	8d60cd8ad6	dircolors: assume C-locale spaces * src/dircolors.c: Include c-ctype.h, not ctype.h. (parse_line): Use c_isspace, not isspace, as the .dircolors file format (which does not seem to be documented!) appears to be ASCII.	2023-10-30 00:58:04 -07:00
Paul Eggert	5602342a16	maint: port to oddball tolower * src/digest.c (hex_equal): Work even in oddball locales where tolower does not work as expected on ASCII letters.	2023-10-30 00:58:04 -07:00
Paul Eggert	4edb14d20f	maint: include ctype.h selectively Include ctype.h only in files that need it. Many of its uses are incorrect, as they assume single-byte locales. The idea is to remove the incorrect uses later, when there is time. * src/chroot.c, src/csplit.c, src/dd.c, src/digest.c, src/dircolors.c: * src/expand-common.c, src/expand.c, src/fmt.c, src/fold.c, src/ls.c: * src/od.c, src/pinky.c, src/pr.c, src/ptx.c, src/seq.c: * src/set-fields.c, src/split.c, src/stdbuf.c, src/test.c: * src/tr.c, src/truncate.c, src/unexpand.c, src/wc.c: Include ctype.h. * src/system.h: Do not include ctype.h. include ctype.h.o	2023-10-30 00:58:04 -07:00
Paul Eggert	684e810ae2	maint: move field_sep into separate module This is so that we don’t need to have every source file include ctype.h. * bootstrap.conf (gnulib_modules): Add cu-ctype. * gl/lib/cu-ctype.c, gl/lib/cu-ctype.h, gl/modules/cu-ctype: New files. * src/join.c, src/numfmt.c, src/sort.c, src/uniq.c: Include cu-ctype.h, for field_sep. * src/system.h (field_sep): Remove; now supplied by cu-ctype.	2023-10-30 00:58:04 -07:00
Paul Eggert	2f3d9524bb	digest: omit unnecessary b2sum includes * src/blake2/b2sum.c: Do not include string.h, errno.h, ctype.h, unistd.h, getopt.h.	2023-10-30 00:58:03 -07:00
Paul Eggert	0292a5678a	maint: prefer c_isxdigit when that is the intent * src/digest.c (valid_digits, split_3): * src/echo.c (main): * src/printf.c (print_esc): * src/ptx.c (unescape_string): * src/stat.c (print_it): When the code is supposed to support only POSIX-locale hex digits, use c_isxdigit rather than isxdigit. Include c-ctype.h as needed. This defends against oddball locales where isxdigit != c_isxdigit.	2023-10-30 00:58:03 -07:00
Pádraig Brady	f7e25d5bb5	maint: fix syntax check issue * src/basenc.c: Fix preprocessor indentation.	2023-10-28 13:13:50 +01:00
Paul Eggert	60bd7bad9d	basenc: fix unlikely locale issue; tune This sped up ‘basenc -d --base16’ by 60% on my old platform, AMD Phenom II X4 910e, Fedora 38. * src/basenc.c (struct base16_decode_context): Simplify by omitting have_nibble. ‘nibble’ is now negative if it’s missing. All uses changed. (B16): New macro, inspired by ../lib/base64.c. (base16_to_int): New static var, likewise. (isubase16): Reimplement using base16_to_int, since isxdigit is not guaranteed to succeed on the chars we want when the locale is oddball. (base16_decode_ctx): Tune by using base16_to_int and by	2023-10-25 15:09:27 -07:00
Paul Eggert	dcc1514d9a	basenc: tweak checks to use unsigned char This tends to generate better code, at least on x86-64, because callers are just as fast and callees can avoid a conversion. * src/basenc.c: The following renamings also change the arg type from char to unsigned char. All uses changed. (isubase): Rename from isbase. (isubase64url): Rename from isbase64url. (isubase32hex): Rename from isbase32hex. (isubase16): Rename from isbase16. (isuz85): Rename from isz85. (isubase2): Rename from isbase2. 2023-10-24 Paul Eggert <eggert@cs.ucla.edu> * src/basenc.c (struct base16_decode_context): Simplify by storing -1 for missing nibbles. All uses changed.	2023-10-25 15:09:27 -07:00
Pádraig Brady	5f538c27a1	basenc: --base16: also allow lower case with --ignore-garbage * src/basenc.c (isbase16): Also return true for lower case. * tests/basenc/basenc.pl: Add a test case. Reported by Paul Eggert.	2023-10-25 14:04:00 +01:00
Pádraig Brady	d733f2ec26	basenc: --base16: support lower case hex digits * src/basenc.c (base16_decode_ctx): Convert to uppercase before converting from hex. * tests/basenc/basenc.pl: Add a test case. * NEWS: Mention the change in behavior. Addresses https://bugs.gnu.org/66698	2023-10-23 14:04:38 +01:00
Pádraig Brady	378dc38f48	basenc: auto pad base32 and base64 inputs when decoding Padding of encoded data is useful in cases where base64 encoded data is concatenated / streamed. I.e. where there are padding chars _within_ the stream. In other cases padding is optional and can be inferred. Note we continue to treat partial padding as invalid, as that would be indicative of truncation. * src/basenc.c (do_decode): Auto pad the end of the input. * NEWS: Mention the change in behavior. * tests/misc/base64.pl: Adjust to not fail for missing padding. Addresses https://bugs.gnu.org/66265	2023-10-06 18:21:12 +01:00
Paul Eggert	a2434d3e58	sort: improve --help Problem reported by Jorge Stolfi (bug#66253). * src/sort.c (usage): Suggest looking at the manual for -n details.	2023-09-28 18:03:34 -07:00
Pádraig Brady	0c46704832	doc: rm --help: mention that '.' or '..' are rejected * src/rm.c (usage): State that '.' or '..' are rejected.	2023-09-25 15:26:31 +01:00
Paul Eggert	de4e704273	wc: pacify ‘make syntax-check’ * src/wc_avx2.c (wc_lines_avx2): Explicitly make it ‘extern’. Not sure why this is needed.	2023-09-23 17:20:26 -07:00
Paul Eggert	2245a95806	wc: distribute src/wc.h * src/local.mk (noinst_HEADERS): Add src/wc.h.	2023-09-23 17:20:25 -07:00
Paul Eggert	f40c6b5cf2	wc: goto considered harmful * src/wc.c: Do not include assure.h. Replace the only use of ‘assure’ with ‘unreachable’ which is good enough. (wc, main): Remove labels and gotos. This doesn’t affect performance in any way I can measure, and makes the code a bit easier to follow.	2023-09-23 17:07:52 -07:00
Paul Eggert	6b8b1f9e77	wc: prefer signed integers Prefer signed to unsigned integers, to make it easier to catch integer overflow errors. * src/wc.c: Do not include safe-read. (total_lines_overflow, total_words_overflow, total_chars_overflow) (total_bytes_overflow): Now bool, not uintmax_t. All uses changed. (max_line_length): Now intmax_t, not uintmax_t. All uses changed. The total_... vars are still uintmax_t because overflow into them is checked. (page_size): Now idx_t, not size_t. (wc_lines, wc, get_input_fstatus, compute_number_width, main): Prefer signed to unsigned ints where either should do. (wc_lines, wc): Use read rather than safe_read, since we don’t need safe_read’s checks for huge buffers. (wc): Redo call to mbrtoc32 to lessen the number of comparisons against its returned value. Do this partly by keeping a pointer to the end of the buffer rather than a count. Simplify overflow-checking code. (compute_number_width): Check for integer overflow. Don’t assume size_t fits into unsigned long. * src/wc.h (struct wc_lines): Prefer signed integers. * src/wc_avx2.c: Do not include safe-read.h. (wc_lines_avx2): Prefer signed integers. Use read, not safe_read.	2023-09-23 17:07:52 -07:00
Paul Eggert	8d41285fe4	wc: improve avx2 API * src/wc.c: Use "#include <...>" for files not in the current dir. Include "wc.h" instead of declaring wc_lines_avx2 by hand. (wc_lines): New API, with no file name (no longer needed) and with a return struct rather than arg pointers. All uses changed. Use avx2_supported directly instead of using a function pointer. Exploit C99-style declarations after statements. Multiply by 15 rather than dividing; it’s faster and more accurate and cannot overflow here. (wc): Simplify based on wc_lines API change. * src/wc.h: New file. * src/wc_avx2.c: Include it, to check API better. (wc_lines_avx2): Use new API. All uses changed. Exploit C99. Make locals more local.	2023-09-23 17:07:52 -07:00
Paul Eggert	769ace51e8	factor,tail: avoid quadratic reallocation * src/factor.c (struct mp_factors): New member nalloc. (mp_factor_init): Initialize it. * src/factor.c (mp_factor_insert): * src/tail.c (parse_options): Use xpalloc to avoid quadratic worst-case behavior on reallocation. * src/tail.c (pids_alloc): New static var.	2023-09-23 01:15:50 -07:00
Paul Eggert	a6064bb864	wc: simplify by removing SUPPORT_OLD_MBRTOWC * src/wc.c (SUPPORT_OLD_MBRTOWC): Remove. All uses removed. (wc): Simplify by assuming C99-or-later behavior for mbrtoc32, which after all is a C11 API. Fix the !SUPPORT_OLD_MBRTOWC code, which evidently was never tested seriously.	2023-09-23 00:28:27 -07:00
Paul Eggert	17a9e79023	wc: 3× speedup in C locale The 3× speedup was measured by invoking 'wc $(find * -type f)' on the coreutils sources etc. on an Ubuntu 23.04 x86-64. These changes also speed up wc 20% in UTF-8 locales. * src/wc.c (wc_isprint, wc_isspace): New static vars. (wc): Use them for speed. (main): Initialize them if needed. (isnbspace): Remove; no longer used.	2023-09-23 00:28:27 -07:00
Paul Eggert	bee39b93f5	wc: treat encoding errors as non white space * src/wc.c (wc): Treat encoding errors like non white space characters.	2023-09-23 00:28:27 -07:00
Paul Eggert	31076e8689	wc: fix word count bug * bootstrap.conf (gnulib_modules): Remove c32isprint. * src/wc.c (wc): Consider all non-white-space characters to be word constituents, even if they are not printable. POSIX requires this, and it is what BSD does. Partly do this by simplifying the check for a word, by counting word starts rather than word ends. * tests/wc/wc.pl: Test for the bug.	2023-09-23 00:28:27 -07:00
Paul Eggert	a6648d4102	maint: omit some unused function tests * m4/jm-macros.m4: Do not check for ftruncate, iswspace, mkfifo, mbrlen, sysctl. Coreutils no longer uses the corresponding HAVE_* macros, typically because Gnulib handles them now. * src/wc.c (iswspace): Remove; unused.	2023-09-23 00:28:27 -07:00
Paul Eggert	14d35d5bad	maint: prefer char32_t to wchar_t This should work better on non-glibc platforms that don’t use Unicode for wchar_t. However, POSIX appears to prohibit this for printf.c so leave that alone. * bootstrap.conf (gnulib_modules): Add btoc32, c32iscntrl, c32isprint, c32isspace, c32width, mbrtoc32. Remove btoc, wcwidth. * src/df.c, src/ls.c, src/wc.c: Include uchar.h instead of wchar.h and wctype.h. * src/df.c (replace_invalid_chars): * src/ls.c (quote_name_buf): * src/wc.c (isnbspace, wc): Use char32_t instead of wchar_t.	2023-09-23 00:28:27 -07:00
Paul Eggert	c5a210a9c8	wc: simplify #if MB_LEN_MAX * src/wc.c: Don’t have special #ifs for platforms where MB_LEN_MAX is 1. On these platforms, MB_CUR_MAX is 1 as well, so the compiler should optimize away all multi-byte code.	2023-09-23 00:28:26 -07:00
Paul Eggert	fb51f74ff6	wc: avoid undefined conversion state * src/wc.c (wc): When mbrtowc returns (size_t) -1, zero the conversion state, since POSIX says it’s undefined.	2023-09-23 00:28:26 -07:00
Paul Eggert	092f8178c0	maint: use mbszero * bootstrap.conf (gnulib_modules): Add mbszero. * src/df.c (replace_invalid_chars): * src/ls.c (quote_name_buf): * src/pathchk.c (portable_chars_only): * src/printf.c (STRTOX): * src/wc.c (wc): Prefer mbszero to clearing an mbstate_t by hand.	2023-09-23 00:28:26 -07:00
Paul Eggert	6c16044d8d	wc: stop worrying about EBCDIC, shift-JIS, etc * src/wc.c: Do not include mbchar.h. (wc): Check for ASCII characters instead of using is_basic. Other parts of Gnulib and coreutils already assume the encoding is upward compatible with ASCII, and the old code wouldn’t have worked anyway with shift-JIS.	2023-09-23 00:28:26 -07:00
Paul Eggert	17bddc047b	expr: use mcel The mcel API is simpler and corresponds more closely to how Emacs etc. behave when the input has encoding errors, since it treats each encoding-error byte separately. * bootstrap.conf (gnulib_modules): Add mcel. * src/expr.c: Include mcel.h instead of mbuiter.h. (mbs_logical_cspn, mbs_logical_substr, mbs_offset_to_chars): Use mcel API. (mbs_logical_substr): Use ximemdup0 so as not to waste memory in the result, fixing a FIXME.	2023-09-23 00:28:26 -07:00
Pádraig Brady	2593502290	build: avoid build failures on gcc <= 10, or clang On gcc 10 the following build failure occurs: "error: a label can only be part of a statement and a declaration is not a statement" This is because the current code is non standards conforming, but GCC >= 11 will compile it (even with the -Wpedantic option). This issue is tracked for GCC at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111526 * src/tail.c (parse_options): Avoid a declaration after label, by using a surrounding block.	2023-09-21 19:04:14 +01:00
Stephen Kitt	d24a117707	tail: allow multiple PIDs tail can watch multiple files, but currently only a single writer. It can be useful to watch files from multiple writers, or even processes not directly related to the files (e.g. watch log files written by a server process, for the duration of a test driven by a separate client). * src/tail.c (writers_are_dead): New function. (tail_forever): Use it to wait for writers. (tail_forever_inotify): As above. (parse_options): Manage --pid options in an array. * doc/coreutils.texi: Update documentation. * tests/tail/pid.sh: Add a variant with two PIDs. * News: Mention the new feature.	2023-09-20 15:53:34 +01:00
Sylvestre Ledru	8367b95a13	ls: --dired now implies long format with hyperlinks disabled Currently --dired is silently ignored with conflicting output formats * src/ls.c (decode_switches): Set default format and hyperlink mode when the --dired option is specified. * tests/ls/dired.sh: Check that formats are implied / overridden. * NEWS: Mention the change in behavior. * doc/coreutils.texi (ls invocation): Adjust --dired description.	2023-09-17 18:47:50 +01:00
Pádraig Brady	3b0f5b9971	maint: use C99 int size specifiers rather than PRI.MAX defines Following on from commit v9.3-128-gf31229ebd replace all uses of the PRI.MAX portability defines with C99 size specifiers %z, %j, and %t.	2023-09-13 23:08:02 +01:00
Paul Eggert	c7ec75a276	cp,mv,install: add copy_internal comment * src/copy.c (copy_internal): Add comment about some particularly tricky logic.	2023-09-08 16:25:39 -07:00
Paul Eggert	3cff27ddc1	cp: avoid needless unlinkat after fstatat ELOOP * src/copy.c (copy_internal): When cp -f's fstatat fails on the destination with ELOOP, report an error immediately when fstatat used AT_SYMLINK_NOFOLLOW, as the later unlinkat would fail too.	2023-09-08 16:25:39 -07:00
Paul Eggert	a66a4b77a5	cp,mv,install: minor copy_internal refactoring * src/copy.c (copy_internal): Redo to avoid need for calculating fstatat_flags when not needed. This is for clarity, not speed.	2023-09-08 16:25:39 -07:00
Paul Eggert	67324bf19c	cp,mv,install: fix comment punctuation * src/copy.h: Fix punctuation in comment.	2023-09-08 16:25:39 -07:00
Paul Eggert	69bd8be403	cp,mv,install: simplify copy_internal * src/copy.c (copy_internal): Simplify.	2023-09-08 16:25:39 -07:00
Paul Eggert	68f4c238ca	maint: prefer psame_inode, PSAME_INODE, STP_* Prefer psame_inode, PSAME_INODE, STP_NBLOCKS, and STP_BLKSIZE, which take addresses of objects, to their counterparts that take the whole objects. In some cases the whole objects might not be initialized, which would be undefined behavior strictly speaking. * gl/lib/root-dev-ino.h (ROOT_DEV_INO_CHECK): * src/cp-hash.c (src_to_dest_compare): * src/ls.c (dev_ino_compare): * src/pwd.c (robust_getcwd): Prefer PSAME_INODE to SAME_INODE. * src/chown-core.c (restricted_chown): * src/copy.c (copy_reg, same_file_ok, source_is_dst_backup) (copy_internal): * src/ln.c (do_link): * src/pwd.c (logical_getcwd): * src/sort.c (avoid_trashing_input): * src/split.c (create): * src/stat.c (find_bind_mount): Prefer psame_inode to SAME_INODE. * src/copy.c (infer_scantype): * src/du.c (process_file): * src/ls.c (gobble_file, print_long_format) (print_file_name_and_frills, length_of_file_name_and_frills): * src/stat.c (print_stat): Prefer STP_NBLOCKS to ST_NBLOCKS. * src/copy.c (copy_reg): * src/head.c (elide_tail_bytes_file, elide_tail_lines_file): * src/ioblksize.h (io_blksize): * src/od.c (skip): * src/shred.c (do_wipefd): * src/stat.c (print_stat): * src/tail.c (tail_bytes): * src/truncate.c (do_ftruncate): * src/wc.c (wc): Prefer STP_BLKSIZE to ST_BLKSIZE. * src/ioblksize.h (io_blksize): Arg is now struct stat const *, not struct stat. All callers changed.	2023-09-04 23:12:02 -07:00
Paul Eggert	65a1c5b441	cp,mv,install: a bit more up-to-date source stat * src/copy.c (copy_reg): Replace caller’s source status with the more recent version.	2023-09-04 23:12:02 -07:00
Paul Eggert	9cd52bd999	cp,mv,install: fix chmod on Linux CIFS This bug occurs only when temporarily setting the mode to the intersection of old and new modes when changing ownership. * src/copy.c (owner_failure_ok): Treat EACCES like EPERM.	2023-09-02 13:28:23 -07:00
Paul Eggert	5f97136160	cp,mv,install: fix chown on Linux CIFS * src/copy.c (chown_failure_ok): Also treat EACCES as OK.	2023-09-01 15:10:45 -07:00
Paul Eggert	de4a5220f2	maint: simplify set_owner * src/copy.c (HAVE_FCHOWN, fchown): Remove. (fchmod_or_lchmod): Move up. (fchown_or_lchown): New function. (set_owner): Use it to simplify.	2023-09-01 15:10:45 -07:00
Paul Eggert	74439d15d7	chown: port to mingw and MSVC 14 * src/chown-core.c (restricted_chown): Don’t assume fchown exists. The Gnulib doc says that nowadays this is needed only for ports to mingw and MSVC 14, but it’s an easy port so let’s do it.	2023-09-01 15:10:44 -07:00
Paul Eggert	e0326b0473	maint: regularize struct initializers * src/chmod.c (process_file): * src/df.c (replace_invalid_chars): * src/iopoll.c (iopoll_internal): * src/ls.c (quote_name_buf): * src/pathchk.c (portable_chars_only): * src/printf.c (STRTOX): * src/shred.c (main): * src/stat.c (neg_to_zero, do_stat): * src/timeout.c (settimeout): * src/tr.c (card_of_complement): * src/wc.c (wc): Prefer ‘{0}’ to initialize everything to zero. * src/stat.c (do_stat): * src/timeout.c (settimeout): Do not assume the usual order for struct members, as POSIX does not guarantee this.	2023-08-30 20:32:13 -07:00

1 2 3 4 5 ...

8797 Commits