Input buffering is best avoided because it introduces
delayed processing of output for intermittent input,
especially when the output size is less than that of
the input buffer. This is significant when output
is being further processed which could happen if split
is writing to precreated fifos, or through --filter.
If input is arriving quickly from a pipe then this will
already be buffered before we read it, so fast arriving
input shouldn't be a performance issue.
* src/split.c (lines_split, lines_bytes_split, bytes_split,
lines_chunk_split, bytes_chunk_extract): s/full_read/safe_read/.
* THANKS.in: Mention the reporter.
* NEWS: Mention the improvement.
Run "make update-copyright", but then also run this,
perl -pi -e 's/2\d\d\d-//' tests/sample-test
to make that one script use the single most recent year number.
Also do not end option descriptions with a period, properly indent
continuation lines, and make some tiny clarifications.
* src/du.c (usage): Lowercase after semicolon.
* src/ls.c (usage): Semicolons instead of periods, small rephrasing
and two hyphens for clarity, proper indentation.
* src/mktemp.c (usage): Semicolons and lowercase.
* src/od.c (usage): Semicolons.
* src/ptx.c (usage): Use the standard phrase, clarify default option.
* src/setuidgid.c (usage): Properly indent continuation line.
* src/split.c (usage): Semicolons, lowercase, no final period.
* src/stat.c (usage): Semicolons, lowercase.
* src/tail.c (usage): Proper indentation, one shorter rephrasing,
semicolons, no final periods.
* src/timeout.c (usage): Properly indent, semicolons, no final periods.
Fixes http://bugs.gnu.org/14976
* src/split.c (line_bytes_split): Rewrite to only buffer
when necessary. I.E. only increase the buffer when we've
already lines output in a split and we encounter a line
larger than the input buffer size, in which case a hold
buffer will be increased in increments of the input buffer size.
(lines_rr): Use the more abstract xalloc_die() just like
we did in line_bytes_split(), rather than explicitly
printing the "memory exhausted" message and exiting.
* tests/split/line-bytes.sh: Add a new test for this
function which previously had no test coverage.
* tests/local.mk: Reference the new test.
* NEWS: Mention the improvement.
Fixes http://bugs.gnu.org/13537
Each program with at least one long option which is marked as
'required_argument' and which has also a short option for that
option, should print a note about mandatory arguments.
Define that well-known note centrally and use it rather than
literal printf/fputs, and add it where it was missing.
* src/system.h (emit_mandatory_arg_note): Add new function.
* src/cp.c (usage): Use it rather than literal printf/fputs.
* src/csplit.c, src/cut.c, src/date.c, src/df.c, src/du.c:
* src/expand.c, src/fmt.c, src/fold.c, src/head.c, src/install.c:
* src/kill.c, src/ln.c, src/ls.c, src/mkdir.c, src/mkfifo.c:
* src/mknod.c, src/mv.c, src/nl.c, src/od.c, src/paste.c:
* src/pr.c, src/ptx.c, src/shred.c, src/shuf.c, src/sort.c:
* src/split.c, src/stdbuf.c, src/tac.c, src/tail.c, src/timeout.c:
* src/touch.c, src/truncate.c, src/unexpand.c, src/uniq.c:
Likewise.
* src/base64.c (usage): Add call of the above new function
because at least one long option has a required argument.
* src/basename.c, src/chcon.c, src/date.c, src/env.c:
* src/nice.c, src/runcon.c, src/seq.c, src/stat.c, src/stty.c:
Likewise.
Run "make update-copyright", but then also run this,
perl -pi -e 's/2\d\d\d-//' tests/sample-test
to make that one script use the single most recent year number.
* src/split.c (lines_rr) [IF_LINT]: Plug a harmless leak.
(main) [IF_LINT]: Free a usually-small (~70KB) buffer
just before exit, mainly to take this off the radar of
leak-detecting tools.
Improved-by: Pádraig Brady.
* src/split.c (create): Check if output file is the
same inode as the input file.
* tests/split/guard-input: New test case.
* tests/Makefile.am: Reference new test case.
* NEWS: Mention the fix.
Improved-by: Jim Meyering
Reported-by: François Pinard
Problem reported by Samuel Thibault in <http://bugs.gnu.org/11424>.
* NEWS: Document this.
* src/dd.c (skip): Handle skipping past EOF on shared or typed
memory objects the same way as with regular files.
(dd_copy): It's OK to truncate shared memory objects.
* src/du.c (duinfo_add): Check for overflow.
(print_only_size): Report overflow.
(process_file): Ignore negative file sizes in the --apparent-size case.
* src/od.c (skip): Fix comment about st_size.
* src/split.c (main):
* src/truncate.c (do_ftruncate, main):
On files where st_size is not portable, fall back on using lseek
with SEEK_END to determine the size. Although strictly speaking
POSIX says the behavior is implementation-defined, in practice
if lseek returns a nonnegative value it's a reasonable one to
use for the file size.
* src/system.h (usable_st_size): Symlinks have reliable st_size too.
* tests/misc/truncate-dir-fail: Don't assume that getting the size
of a dir is not allowed, as it's now allowed on many platforms,
e.g., GNU/Linux.
* src/split.c (main): Use stat.st_size only for regular files.
Samuel Thibault reported in http://bugs.gnu.org/11424 that the
/dev/zero-splitting tests would appear to infloop on GNU/Hurd,
because /dev/zero's st_size is LONG_MAX. It was only a problem
when using the --number (-n) option.
* NEWS (Bug fixes): Mention it.
This bug was introduced with the --number option, via
commit v8.7-25-gbe10739
* src/split.c (next_file_name): If `suffix_auto' is true and the first
suffix character is 'z', generate a new file file name adding `z' to
the prefix and increasing the suffix length by one.
(set_suffix_length): Disable auto suffix width in various cases.
* tests/split/suffix-auto-length: Test it.
* doc/coreutils.texi (split invocation): Mention it.
* NEWS (Improvements): Likewise.
Add the --additional-suffix option, to append an
additional static suffix to output file names.
* src/split.c (next_file_name): Append suffix to output file names.
(main): Handle new --additional-suffix option.
* NEWS (New features): Mention it.
* doc/coreutils.texi (split invocation): Mention it.
* tests/split/additional-suffix: New file. Test --additional-suffix.
* tests/Makefile.am (TESTS): Add it.
Requested by Peng Yu, in bug 6554
Allow changing the --numeric-suffixes start number
from the default of 0.
* src/split.c (next_file_name): Initialize the suffix index
and the output filename according to start value.
(main): Check that the suffix length is large enough for the
numerical suffix start value.
* doc/coreutils.texi (split invocation): Mention it.
* NEWS (New features): Mention it.
* tests/split/numeric: New file. Test --numeric-suffixes[=FROM].
* tests/Makefile.am (TESTS): Reference the new test.
All affected lines end with \ or \n\, so run this command
until it produces no new changes (4 times):
git grep -E -l '`[^ ]+'\''.*\\' src \
|xargs perl -pi -e 's/`([^ ]+'\''.*\\)/'\''$1/'
* src/split.c (lines_chunk_split): Fix logic bug that led to
unwarranted failure of "split -n l/2 /dev/zero" on NetBSD 5.1.
The same would happen when splitting a growing file, where
open/lseek-end gives one size, but by the time we read, there
is more data available.
(bytes_chunk_extract): Likewise.
* NEWS (Bug fixes): Mention this.
* tests/split/l-chunk: The latter case was not exercised.
Add code to do that.
Bug introduced with the chunk-selecting feature in v8.7-25-gbe10739.
Co-authored-by: Jim Meyering <meyering@redhat.com>
... which is not available on some platforms,
and the replacement currently requires linking
with threading libraries.
* src/split.c (closeout): Remove the call to strsignal()
which is largely redundant anyway as sig2str()
is already used to map number to name in the error.
Reported by Bruno Haible on AIX 6.1 and 7.1
* gnulib: Update to latest.
* src/system.h: Definitions of ST_* macros have moved into the
gnulib module stat-size (specifically, the header file
stat-size.h), so remove them from here.
* src/truncate.c: Include stat-size.h.
* src/stat.c: Likewise.
* src/shred.c: Likewise.
* src/ls.c: Likewise.
* src/du.c: Likewise.
* src/ioblksize.h: New file. Move definition of io_blksize out of
system.h so that system.h does not have to include stat-size.h.
* src/cat.c: Include ioblksize.h.
* src/split.c: Likewise.
* src/copy.c: Include both stat-size.h and ioblksize.h.
* src/Makefile.am (noinst_HEADERS): Add ioblksize.h.
* src/split.c (lines_chunk_split): Don't use ignore_error() which
is redundant and confusing when not running with --filter.
(lines_rr): Likewise.
(ofile_open): Likewise. Add a comment to clarify that
filters aren't restarted under file descriptor pressure.
* src/split.c (main): Exit with a diagnostic if --filter
is specified along with a specific chunk number.
* test/split/filter: Ensure this combination fails.
* src/split.c (bytes_split): Stop reading when we
can no longer write to a child process.
(lines_rr): Likewise.
(lines_bytes_split): No change is made here since
input is bounded by the original file size.
* test/split/filter: Add test cases.
src/split.c (main): Don't unblock SIGPIPE before cleanup,
as then any pending signals will be sent and cause
the main split process to exit with a non zero status (141).
* test/split/filter: Add a test for this case.
* src/split.c (lines_bytes_chunk): Handle the edge case
where the file is truncated as we read.
* tests/misc/split-lchunk: Cleanup; no functional change.
* src/split.c (lines_chunk_split): Ensure that data is only
written to stdout when k specified. Also ensure that
extra files are not created when there is more data available
than reported in the file size.
* tests/misc/split-lchunk: Verify that split -n l/k/n doesn't
generate any files, and that -n l/n always generates n files.
* NEWS: Mention the fix.
* src/split.c: Include <signal.h>, <sys/wait.h> and "sig2str.h".
(FILTER_OPTION): New anonymous enum member.
(filter_command, filter_pid): New globals.
(open_pipes, open_pipes_alloc, n_open_pipes): Likewise.
(oldblocked, newblocked): Likewise.
(longopts): Add "filter".
(usage): Document --filter.
(create): Extend to create a pipe and fork "sh -c CMD".
(closeout): Adapt to close a pipe and wait for child process.
(cwrite): Call closeout, not just close.
(lines_chunk_split): FIXME
(bytes_chunk_extract): FIXME
(opid, ofile_open, lines_rr, main): FIXME
(ignorable): New function, to encapsulate EPIPE test.
* src/split.c (set_suffix_length): Only auto-calculate
the suffix length when the number of files is specified.
* tests/misc/split-a: Add a case to trigger the bug,
and exercise the suffix length auto-calculation.
* NEWS: Mention the fix.
Reported by Dmitry V. Levin and Sergey Vlasov at
https://bugzilla.altlinux.org/show_bug.cgi?id=24841
When -n l/N is used and long lines are present that both
span partitions and multiple buffers, one would get
inconsistent chunk sizes.
* src/split.c (main): Add a new undocumented ---io-blksize option
to support full testing with varied buffer sizes.
(cwrite): Refactor most handling of --elide-empty to here.
(bytes_split): Remove handling of --elide-empty.
(lines_chunk_split): Likewise. The specific issue here
was the first handling of elide_empty_files interfered
with the replenishing of the input buffer.
* test/misc/split-lchunk: Add -e and the new ---io-blksize
combinations to the test.
* src/split.c (usage, long_options, main): New options --number,
--unbuffered, --elide-empty-files.
(set_suffix_length): New function to auto increase suffix length
to handle a specified number of files.
(create): New function. Refactored from cwrite() and ofile_open().
(bytes_split): Add max_files argument to support byte chunking.
(lines_chunk_split): New function. Split file into chunks of lines.
(bytes_chunk_extract): New function. Extract a chunk of file.
(of_info): New struct. Used by functions lines_rr and ofile_open
to keep track of file descriptors associated with output files.
(ofile_open): New function. Shuffle file descriptors when there
are more output files than available file descriptors.
(lines_rr): New function to distribute lines round-robin to files.
(chunk_parse): New function. Parses K/N syntax.
* tests/misc/split-bchunk: New test for byte chunking.
* tests/misc/split-lchunk: New test for line delimited chunking.
* tests/misc/split-rchunk: New test for round-robin chunking.
* tests/Makefile.am: Reference new tests.
* tests/misc/split-fail: Add failure scenarios for new options.
* tests/misc/split-l: Fix a typo. s/ln/split/.
* doc/coreutils.texi (split invocation): Document --number.
* NEWS: Mention the new feature.
* .mailmap: Map new email address for shortlog.
Signed-off-by: Pádraig Brady <P@draigBrady.com>
The bug was introduced with commit 23f6d41f, 19-02-2003.
* src/split.c (bytes_split, lines_split, line_bytes_split):
Correctly check the return from full_read().
* tests/misc/split-fail: Ensure split fails when
it can't read its input.
* NEWS: Mention the fix.
* src/system.h: Rename emit_bug_reporting_address() to
emit_ancillary_info() and update it to not print the translation
project address in en_* locales, and _do_ print it in the 'C'
(and other) locales so that it's included in the default man page.
Also mention how to invoke the texinfo documentation for each command.
Also move the "hard-locale.h" include to the 8 files that now use it.
* man/help2man: Strip the newly added texinfo reference from the
--help output as a more verbose version is already added by help2man.
Suggestion from C de-Avillez
* doc/coreutils.texi (multiplierSuffixes): Mention that
the suffix can be specified without a leading number
* src/split.c (usage): Refactor SIZE help to within a function
* src/truncate.c (usage): Likewise
* src/ls.c (usage): Likewise
* src/df.c (usage): Likewise. Also add a function with BLOCKSIZE help
* src/du.c (usage): Likewise.
* src/system.h: Define 2 functions to emit common help text
This was prompted by https://bugzilla.redhat.com/show_bug.cgi?id=511188
This is following on from this change:
[02c3dc9d 2008-03-06 cat: use larger buffer sizes ...]
which increased the IO block size used by cat by 8 times,
but also capped it at 32KiB.
* NEWS: Mention the change in behavior.
* src/system.h: Add a new io_blksize() function that
returns the max of ST_BLKSIZE or 32KiB, as this was
seen as a good value for a minimum block size to use
to get good performance while minimizing system call overhead.
* src/cat.c: Use it.
* src/copy.c: ditto
* src/split.c: ditto