coreutils

mirror of git://git.sv.gnu.org/coreutils.git synced 2026-06-05 09:17:58 +02:00

Author	SHA1	Message	Date
oech3	c495cbcfe2	tests: yes: support more zero-copy related syscalls * tests/misc/yes.sh: Disable other related zero-copy syscalls to ensure better testing of future or other implementations. https://github.com/coreutils/coreutils/pull/227	2026-03-24 20:35:01 +00:00
Collin Funk	430822d5f7	maint: remove some unnecessary casts * src/sort.c (begfield, limfield): Remove size_t casts.	2026-03-23 19:32:21 -07:00
Sylvestre Ledru	4fd2de166a	tests: cut: add test for -z with NUL delimiter and -s flag * tests/cut/cut.pl (zerot-7): New test. Identified https://github.com/uutils/coreutils/pull/11394 https://github.com/coreutils/coreutils/pull/226	2026-03-23 12:11:49 +00:00
Sylvestre Ledru	939def75f3	tests: tr: add test for invalid character class name * tests/tr/tr.pl (invalid-class): New test. Identified : https://github.com/uutils/coreutils/pull/11398 https://github.com/coreutils/coreutils/pull/225	2026-03-23 12:11:35 +00:00
Chris Down	67b8aaae8d	sort: speed up keyed field sorting significantly using memchr When sort is invoked with an explicit field separator with `-t SEP`, begfield() and limfield() scan for the separator to locate boundaries. Right now the implementation there uses a loop that iterates over bytes one by one, which is not ideal since we must scan past many bytes of non-separator data one byte at a time. Let's replace each of these loops with memchr(). On glibc systems, memchr() uses SIMD to scan 16 bytes per step (NEON on aarch64) or 32 bytes per step (AVX2 on x86_64), rather than 1 byte at a time, so any field longer than a handful of bytes stands to benefit quite significantly. Using the following input data: awk 'BEGIN { srand(42) for (i = 1; i <= 500000; i++) printf "%d,%d,%d\n", 4+int(rand()9), 0, 4+int(rand()9), 0, int(rand()10000) }' > short_csv_500k awk 'BEGIN { for (i = 1; i <= 500000; i++) printf "%100d,%100d,%d\n", 0, 0, int(rand()10000) }' > wide_csv_500k One can benchmark with: hyperfine --warmup 10 --runs 50 \ "LC_ALL=C sort_before -t, -k3,3n short_csv_500k > /dev/null" \ "LC_ALL=C sort_after -t, -k3,3n short_csv_500k > /dev/null" hyperfine --warmup 10 --runs 50 \ "LC_ALL=C sort_before -t, -k3,3n wide_csv_500k > /dev/null" \ "LC_ALL=C sort_after -t, -k3,3n wide_csv_500k > /dev/null" hyperfine --warmup 10 --runs 50 \ "LC_ALL=C sort_before wide_csv_500k > /dev/null" \ "LC_ALL=C sort_after wide_csv_500k > /dev/null" The results on i9-14900HX x86_64 with -O2: sort -t, -k3,3n (500K lines, 4-12 byte short fields): Before: 123.1 ms After: 108.1 ms (-12.2%) sort -t, -k3,3n (500K lines, 100 byte wide fields): Before: 243.5 ms After: 165.9 ms (-31.9%) sort (default, no -k, 500K lines): Before: 141.6 ms After: 141.8 ms (unchanged) And on M1 Pro aarch64 with -O2: sort -t, -k3,3n (500K lines, 4-12 byte short fields): Before: 98.0 ms After: 92.3 ms (-5.8%) sort -t, -k3,3n (500K lines, 100 byte wide fields): Before: 240.8 ms After: 183.0 ms (-24.0%) sort (default, no -k, 500K lines): Before: 145.6 ms After: 145.6 ms (unchanged) Looking at profiling, the improvement is larger on x86_64 in these runs because glibc's memchr uses AVX2 to scan 32 bytes per step versus 16 bytes per step with NEON on aarch64.	2026-03-23 11:49:02 +00:00
Collin Funk	26b5e3d3cf	maint: fix an incomplete sentence * tests/pwd/argument.sh: Fix the test description. Reported by G. Branden Robinson.	2026-03-22 13:32:52 -07:00
Collin Funk	35a1c88f48	tests: pwd: test the behavior when given an argument * tests/pwd/argument.sh: New file. * tests/local.mk (all_tests): Add the new test.	2026-03-21 22:16:59 -07:00
Collin Funk	4bc336b7ce	tac: avoid unnecessary standard output buffering This has removes a tiny amount of overhead: $ seq 10000000 > input $ perf stat -e cpu-clock --repeat 1000 taskset 1 ./src/tac \ input 2>&1 > /dev/null \| grep -F 'seconds time' 0.095707 +- 0.000223 seconds time elapsed ( +- 0.23% ) $ perf stat -e cpu-clock --repeat 1000 taskset 1 ./src/tac-prev \ input 2>&1 > /dev/null \| grep -F 'seconds time' 0.1009378 +- 0.0000995 seconds time elapsed ( +- 0.10% ) * src/tac.c (output): Use full_write instead of fread since we already buffer the output ourselves.	2026-03-21 15:50:33 -07:00
Collin Funk	6a37187a5f	tests: rm: fix a test that would sometimes hang * tests/rm/dash-hint.sh: Add the file name argument to grep, as I intended when adding this test.	2026-03-21 12:19:21 -07:00
Collin Funk	30a5cbec0e	tac: promptly diagnose write errors This patch also fixes a bug where 'tac' would print a vague error on some inputs: $ seq 10000 \| ./src/tac-prev > /dev/full tac-prev: write error $ seq 10000 \| ./src/tac > /dev/full tac: write error: No space left on device In this case ferror (stdout) is true, but errno has been set back to zero by a successful fclose (stdout) call. * src/tac.c (output): Call write_error() if fwrite fails. * tests/misc/io-errors.sh: Check that 'tac' prints a detailed write error. * NEWS: Mention the improvement.	2026-03-21 12:10:59 -07:00
Pádraig Brady	4b680f8392	tests: support checking for specific write errors * tests/misc/io-errors.sh: Support checkout for a specific error in commands that don't run indefinitely. Currently all the explicitly listed commands output a specific error and do not need to be tagged.	2026-03-21 12:43:38 +00:00
Collin Funk	f679990df2	tests: nl: check that all files are processed * tests/nl/multiple-files.sh: New file. * tests/local.mk (all_tests): Add the new test.	2026-03-20 19:46:04 -07:00
Collin Funk	5ec45a1aa5	test: truncate: improve the test added in the previous commit * tests/truncate/multiple-files.sh: Check that nothing is printed to standard output and that standard error has the correct error.	2026-03-19 23:17:47 -07:00
Collin Funk	71c17d6f2e	tests: truncate: check that all files are processed * tests/truncate/multiple-files.sh: New file. * tests/local.mk (all_tests): Add the new test.	2026-03-19 22:56:52 -07:00
Collin Funk	f421d01128	sort,split,yes: ensure pipe and pipe2 don't open standard descriptors * bootstrap.conf (gnulib_modules): Add pipe2-safer. * cfg.mk (sc_require_unistd_safer): New rule for 'make syntax-check'. * gl/lib/fd-reopen.c: Include unistd--.h instead of unistd.h. * src/sort.c: Include unistd--.h. * src/split.c: Likewise. * src/yes.c: Likewise.	2026-03-17 23:08:37 -07:00
Pádraig Brady	58d88d2435	tests: dd: fix false failure on NetBSD 10 * tests/dd/partial-write.sh: Skip the test if nothing written at all, as was seen on NetBSD 10. Reported by Bruno Haible.	2026-03-16 22:34:58 +00:00
Pádraig Brady	c1346460d1	tests: ls: fix false failure on FreeBSD * tests/ls/non-utf8-hidden.sh: Avoid sorting in ls, to avoid: ls: cannot compare file names ...: Illegal byte sequence seen on FreeBSD 14. Reported by Bruno Haible.	2026-03-16 22:25:42 +00:00
Collin Funk	8fc17919a3	maint: tee: remove an affirm call to silence coverity * src/iopoll.c (write_wait): Don't check that an unsigned integer is always great than or equal to zero since that is always true.	2026-03-16 15:04:24 -07:00
Collin Funk	9b29c9c899	wc: make sure input buffer for neon 'wc -l' is aligned * src/wc_neon.c (wc_lines_neon): Use alignas.	2026-03-16 12:20:38 -07:00
Collin Funk	9b855166ed	tee: prefer file descriptors over streams We disable buffering on the streams anyways, so we were effectively calling the write system call previously despite using streams. * src/iopoll.h (fclose_wait, fwrite_wait): Remove declarations. (close_wait, write_wait): Add declarations. * src/iopoll.c (fwait_for_nonblocking_write, fclose_wait, fwrite_wait): Remove functions. (wait_for_nonblocking_write): New function based on fwait_for_nonblocking_write. (close_wait): New function based on fclose_wait. (write_wait): New function based on fwrite_wait. * src/tee.c: Include fcntl--.h. Don't include stdio--.h. (get_next_out): Operate on file descriptors instead of streams. (fail_output): Likewise. Remove clearerr call since we no longer call fwrite on stdout. (tee_files): Operate on file descriptors instead of streams. Remove calls to setvbuf.	2026-03-15 15:32:08 -07:00
Collin Funk	e644eea122	timeout: don't exit immediately if the parent is the init process * src/timeout.c (main): Save the process ID before creating a child process. Check if the result of getppid is different than the saved process ID instead of checking if it is 1. * tests/timeout/init-parent.sh: New file. * tests/local.mk (all_tests): Add the new test. * NEWS: Mention the bug fix. Also mention that this change allows 'timeout' to work when reparented by a subreaper process instead of init.	2026-03-13 20:37:10 -07:00
Pádraig Brady	8cbe20a2ff	doc: fix missing '=' in texi option descriptions * doc/coreutils.texi (cut invocation, fold invocation): Fix missing '=' before option parameters.	2026-03-13 10:30:46 +00:00
Pádraig Brady	711fa8f9a5	dd: always diagnose partial writes on write failure * src/dd.c (dd_copy): Increment the partial write count upon failure. * tests/dd/partial-write.sh: Add a new test. * tests/local.mk: Reference the new test. * NEWS: Mention the bug fix. Fixes https://bugs.gnu.org/80583	2026-03-12 21:19:33 +00:00
Pádraig Brady	6382995b0a	doc: clarify a recent NEWS item * NEWS: It was ambiguous as to whether we quoted a range of observered throughputs. Clarify this was the old and new throughput on a single test system.	2026-03-12 20:05:58 +00:00
Collin Funk	0da738fe40	doc: NEWS: adjust 'wc -l' aarch64 benchmark after recent commit After commit `e0190a9d1` (wc: improve aarch64 Neon optimization for 'wc -l', 2026-03-09), on a Ampere eMAG machine: $ yes \| head -n 10000000000 > input $ (time ./src/wc -l input) 10000000000 input real 0m3.447s user 0m1.533s sys 0m1.913s $ (export GLIBC_TUNABLES='glibc.cpu.hwcaps=-ASIMD,-AVX2,-AVX512F'; \ time ./src/wc -l input) 10000000000 input real 0m15.758s user 0m14.039s sys 0m1.720s * NEWS: Mention the improved benchmark.	2026-03-10 23:16:09 -07:00
Collin Funk	ba85544a19	tests: rm: check for hints when running 'rm -foo' * tests/rm/dash-hint.sh: New file. * tests/local.mk (all_tests): Add the new test.	2026-03-10 21:39:22 -07:00
Pádraig Brady	e016498d6f	maint: adjust to placate coverity * src/system.h (c32issep): Adjust to more standard layout.	2026-03-10 20:14:42 +00:00
Pádraig Brady	2b1c059e6a	yes: use a zero-copy implementation via (vm)splice A good reference for the concepts used here is: https://mazzo.li/posts/fast-pipes.html We don't consider huge pages or busy loops here, but use vmsplice(), and splice() to get significant speedups: i7-5600U-laptop $ taskset 1 yes \| taskset 2 pv > /dev/null ... [4.98GiB/s] i7-5600U-laptop $ taskset 1 src/yes \| taskset 2 pv > /dev/null ... [34.1GiB/s] IBM,9043-MRX $ taskset 1 yes \| taskset 2 pv > /dev/null ... [11.6GiB/s] IBM,9043-MRX $ taskset 1 src/yes \| taskset 2 pv > /dev/null ... [175GiB/s] Also throughput to file (on BTRFS) was seen to increase significantly. With a Fedora 43 laptop improving from 690MiB/s to 1.1GiB/s. * bootstrap.conf: Ensure sys/uio.h is present. This was an existing transitive dependency. * m4/jm-macros.m4: Define HAVE_SPLICE appropriately. We assume vmsplice() is available if splice() is as they were introduced at the same time to Linux and glibc. * src/yes.c (repeat_pattern): A new function to efficiently duplicate a pattern in a buffer with memcpy calls that double in size. This also makes the setup for the existing write() path more efficient. (pipe_splice_size): A new function to increase the kernel pipe buffer if possible, and use an appropriately sized buffer based on that (25%). (splice_write): A new function to call vmplice() when outputting to a pipe, and also splice() if outputting to a non-pipe. * tests/misc/yes.sh: Verify the non-pipe output case, (main): Adjust to always calling write on the minimal buffer first, then trying vmsplice(), then falling back to write from bigger buffer. and the vmsplice() fallback to write() case. * NEWS: Mention the improvement.	2026-03-10 18:13:34 +00:00
Pádraig Brady	5ca27c1929	all: use more consistent blank character determination * src/system.h (c32issep): A new function that is essentially iswblank() on GLIBC platforms, and iswspace() with exceptions elsewhere. * src/expand.c: Use it instead of c32isblank(). * src/fold.c: Likewise. * src/join.c: Likewise. * src/numfmt.c: Likewise. * src/unexpand.c: Likewise. * src/uniq.c: Likewise. * NEWS: Mention the improvement.	2026-03-10 16:30:52 +00:00
Pádraig Brady	3ef107fa12	cksum: fix tagged output on 32 bit platforms Fix an unreleased issue due to the recent change to using idx_t in commit v9.10-91-g02983e493 * src/cksum.c (output_file): Cast the idx_t before passing to printf.	2026-03-10 16:18:29 +00:00
Collin Funk	e0190a9d1b	wc: improve aarch64 Neon optimization for 'wc -l' $ yes abcdefghijklmnopqrstuvwxyz \| head -n 200000000 > input $ time ./src/wc-prev -l input 200000000 input real 0m1.240s user 0m0.456s sys 0m0.784s $ time ./src/wc -l input 200000000 input real 0m0.936s user 0m0.141s sys 0m0.795s * configure.ac: Use unsigned char for the buffer to avoid potential compiler warnings. Check for the functions being used in src/wc_neon.c after this patch. * src/wc_neon.c (wc_lines_neon): Use vreinterpretq_s8_u8 to convert 0xff into -1 instead of bitwise AND instructions into convert it into 1. Perform the pairwise addition and lane extraction once every 8192 bytes instead of once every 64 bytes. Thanks to Lasse Collin for spotting this and reviewing a draft of this patch.	2026-03-09 20:06:07 -07:00
Pádraig Brady	a4cf72f5a7	tests: expand: fix false failure on various systems * tests/expand/mb.sh: Use $LOCALE_FR_UTF8 rather than hardcoding "en_US.UTF-8". * tests/unexpand/mb.sh: Likewise. Reported by Bruno Haible.	2026-03-09 21:08:40 +00:00
Pádraig Brady	0422f0fd4b	build: update to latest gnulib * src/ls.c: Adjust for renamed acl permissions member.	2026-03-09 13:14:54 +00:00
Collin Funk	466ffeb847	maint: remove duplicate names from THANKS * .mailmap: Prefer the most recently used email address from each commit author.	2026-03-08 10:37:45 -07:00
Collin Funk	ab80700bd2	maint: prefer memset_explicit to explicit_bzero The explicit_bzero function is a common extension, but memset_explicit was standardized in C23. It will likely become more portable in the future, and Gnulib provides an implementation if needed. * bootstrap.conf (gnulib_modules): Add memset_explicit. Remove explicit_bzero. * gl/lib/randint.c (randint_free): Use memset_explicit instead of explicit_bzero. * gl/lib/randread.c (randread_free_body): Likewise.	2026-03-07 16:16:01 -08:00
Lukáš Zaoral	2b92c16d26	expand,unexpand: support multi-byte input * src/expand.c: Use mbbuf to support multi-byte input. * src/unexpand.c: Likewise. * tests/expand/mb.sh: New multi-byte test. * tests/unexpand/mb.sh: Likewise. * tests/local.mk: Reference new tests. * NEWS: Mention the improvement.	2026-03-07 12:39:37 +00:00
Weixie Cui	052eed069f	maint: shred: fix typo in comment * src/shred.c: Fix "then" -> "than" in comment.	2026-03-06 20:39:38 -08:00
Weixie Cui	e71ed47088	maint: dd: fix typo in comment * src/dd.c: Fix "that that" -> "that the" in comment.	2026-03-06 13:27:52 +00:00
Collin Funk	6ec84cfa15	build: update gnulib submodule to latest	2026-03-06 01:10:07 -08:00
Collin Funk	694e18c356	build: update gnulib submodule to latest	2026-03-05 22:24:38 -08:00
Collin Funk	90d2b511e4	maint: touch: reduce variable scope * src/touch.c (main): Declare variables where they are used instead of at the start of the function.	2026-03-04 23:40:03 -08:00
Collin Funk	e2e78f9e8e	maint: chown,chgrp: reduce variable scope * src/chown-core.c (describe_change, restricted_chown) (change_file_owner, chown_files): Declare variables where they are used instead of at the start of the function. * src/chown.c (main): Likewise.	2026-03-04 23:34:45 -08:00
Collin Funk	b7ff7e7c2c	install: allow the combination of --compare and --preserve-timestamps * NEWS: Mention the improvement. * src/install.c (enum copy_status): New type to let the caller know if the copy was performed or skipped. (copy_file): Return the new type instead of bool. Reduce variable scope. (install_file_in_file): Only strip the file if the copy was performed. Update the timestamps if the copy was skipped. (main): Don't error when --compare and --preserve-timestamps are combined. * tests/install/install-C.sh: Add some test cases.	2026-03-04 18:42:48 -08:00
Pádraig Brady	b3fe24213e	cksum: use more defensive escaping for --check cksum --check is often the first interaction users have with possibly untrusted downloads, so we should try to be as defensive as possible when processing it. Specifically we currently only escape \n characters in file names presented in checksum files being parsed with cksum --check. This gives some possibilty of dumping arbitrary data to the terminal when checking downloads from an untrusted source. This change gives these advantages: 1. Avoids dumping arbitrary data to vulnerable terminals 2. Avoids visual deception with ansi codes hiding checksum failures 3. More secure if users copy and paste file names from --check output 4. Simplifies programmatic parsing Note this changes programmatic parsing, but given the original format was so awkward to parse, I expect that's extremely rare. I was not able to find example in the wild at least. To parse the new format from from shell, you can do something like: cksum -c checksums \| while IFS= read -r line; do case $line in ': FAILED') filename=$(eval "printf '%s' ${line%: FAILED}") cp -v "$filename" /quarantine ;; esac done This change also slightly reduces the size of the sum(1) utility. This change also apples to md5sum, shasum, and b2sum. * src/cksum.c (digest_check): Call quotef() instead of cksum(1) specific quoting. * tests/cksum/md5sum-bsd.sh: Adjust accordingly. * doc/coreutils.texi (cksum general options): Describe the shell quoting used for problematic file names. * NEWS: Mention the change in behavior. Reported by: Aaron Rainbolt	2026-03-04 22:17:39 +00:00
Pádraig Brady	e24372e6d0	maint: tests: refactor uses of bad_unicode() * init.cfg: Use 0xFF rather than 0xC3 everywhere. * tests/fold/fold-characters.sh: Reuse bad_unicode(). * tests/tac/tac-locale.sh: Likewise.	2026-03-04 17:57:54 +00:00
Pádraig Brady	a85e9182b1	fold: fix output truncation with 0xFF bytes in input On signed char platforms, 0xFF was converted to -1 which matches MBBUF_EOF, causing fold to stop processing. * NEWS: Mention the bug fix. * gl/lib/mbbuf.h: Avoid sign extension on signed char platforms. * tests/fold/fold-characters.sh: Adjust test case. Reported at https://src.fedoraproject.org/rpms/coreutils/pull-request/20	2026-03-04 17:45:57 +00:00
Sylvestre Ledru	6c6bb37e2f	tests: date: add timezone conversion test *tests/date/date.pl: Add the test case. Add test case for https://github.com/uutils/coreutils/issues/10800 to verify `date -u -d '10:30 UTC-05'` converts to 15:30 UTC.	2026-03-04 13:22:46 +00:00
Sylvestre Ledru	452bf39162	tests: date: add edge cases for modifiers * tests/date/date.pl: Add the test case. Add test cases for https://github.com/uutils/coreutils/issues/10957	2026-03-04 13:20:33 +00:00
Sylvestre Ledru	8fe6d92989	tests: cut: add test case for newline delimiter with -s flag * tests/cut/cut.pl: Add a new test case. https://github.com/coreutils/coreutils/pull/211	2026-03-04 12:18:34 +00:00
oech3	3ba39c3c24	tests: mktemp: ensure mktemp does not depend on getrandom and ASLR * tests/mktemp/mktemp-misc.sh: Add new test. * tests/local.mk: Reference new test. https://github.com/coreutils/coreutils/pull/206	2026-03-03 13:03:23 +00:00

1 2 3 4 5 ...

31311 Commits