1
0
mirror of git://git.sv.gnu.org/coreutils.git synced 2026-04-20 18:56:39 +02:00
Commit Graph

18 Commits

Author SHA1 Message Date
Pádraig Brady
a499a0ce58 cut: add the -z,--zero-terminated option
* doc/coreutils.texi (cut invocation): Reference the description.
* src/cut.c: Parameterize '\n' references.
* tests/misc/cut.pl: Add tests for character and field processing.
* NEWS: Mention the new feature.
2016-01-13 10:59:56 +00:00
Pádraig Brady
b16e999f55 maint: update all copyright year number ranges
Run "make update-copyright" and then...

* gnulib: Update to latest with copyright year adjusted.
* tests/init.sh: Sync with gnulib to pick up copyright year.
* bootstrap: Likewise.
* tests/sample-test: Adjust to use the single most recent year.
2016-01-01 14:10:41 +00:00
Assaf Gordon
bd1bc2b3fe cut: refactor into set-fields module
Extract the functionality of parsing --field=LIST into a separate
module, to be used by other programs.

* src/cut.c: move field parsing code from here ...
* src/set-fields.{c,h}: ... to here.
  (set_fields): generalize by supporting multiple parsing/reporting
  options.
  (struct range_pair): rename to field_range_pair.
* src/local.mk: link cut with set-field.
* po/POTFILES.in: add set-field.c
* tests/misc/cut.pl: update wording of error messages
2015-09-12 02:27:32 +00:00
Pádraig Brady
e0afeb0099 maint: update all copyright year number ranges
Run "make update-copyright" and then...

* tests/sample-test: Adjust to use the single most recent year.
* tests/du/bind-mount-dir-cycle-v2.sh: Fix case in copyright message,
so that year is updated automatically in future.
2015-01-01 04:52:17 +00:00
Pádraig Brady
5c6cf94ba5 cut: restore special case handling of -f with -d$'\n'
commits v8.20-98-g51ce0bf and v8.20-99-gd302aed changed cut(1)
to process each line independently and thus promptly output
each line without buffering.  As part of those changes we removed
the special handling of --delimiter=$'\n' --fields=... which
could be used to select arbitrary (ranges of) lines, so as to
simplify and optimize the implementation while also matching the
behavior of different cut(1) implementations.

However that GNU behavior was in place for a long time, and
could be useful in certain cases like making a separated list like
`seq 10 | cut -f1- -d$'\n' --output-delimiter=,` although other tools
like head(1) and paste(1) are more suited to this operation.
This patch reinstates that functionality but restricts the
"line behind" buffering behavior to only the -d$'\n' case.

We also fix the following related edge case to be more consistent:

  before> printf "\n" | cut -s -d$'\n' -f1- | wc -l
  2
  before> printf "\n" | cut    -d$'\n' -f1- | wc -l
  1
  after > printf "\n" | cut -s -d$'\n' -f1- | wc -l
  1
  after > printf "\n" | cut    -d$'\n' -f1- | wc -l
  1

* src/cut.c (cut_fields): Adjust as discussed above.
* tests/misc/cut.pl: Likewise.
* NEWS: Mention the change in behavior both for v8.21
and this effective revert.
* cfg.mk (old_NEWS_hash): Adjust for originally omitted v8.21 entry.
* src/paste.c: s/delimeter/delimiter/ comment typo fix.
2014-06-01 12:14:39 +01:00
Bernhard Voelker
275c078fb4 maint: update all copyright year number ranges
Run "make update-copyright", but then also run this,
  perl -pi -e 's/2\d\d\d-//' tests/sample-test
to make that one script use the single most recent year number.
2014-01-02 22:19:59 +01:00
Cojocaru Alexandru
3e466ad051 cut: make memory allocation independent of range width
The current implementation of cut, uses a bit array,
an array of `struct range_pair's, and (when --output-delimiter
is specified) a hash_table.  The new implementation will use
only an array of `struct range_pair's.
The old implementation is memory inefficient because:
 1. When -b with a big num is specified, it allocates a lot of
    memory for `printable_field'.
 2. When --output-delimiter is specified, it will allocate 31 buckets.
    Even if only a few ranges are specified.

Note CPU overhead is increased to determine if an item is to be printed,
as shown by:

$ yes abcdfeg | head -n1MB > big-file
$ for c in with-bitarray without-bitarray; do
    src/cut-$c 2>/dev/null
    echo -ne "\n== $c =="
    time src/cut-$c -b1,3 big-file > /dev/null
  done

== with-bitarray ==
real    0m0.084s
user    0m0.078s
sys     0m0.006s

== without-bitarray ==
real    0m0.111s
user    0m0.108s
sys     0m0.002s

Subsequent patches will reduce this overhead.

* src/cut.c (set_fields): Set and initialize RP
instead of printable_field.
* src/cut.c (is_range_start_index): Use CURRENT_RP rather than a hash.
* tests/misc/cut.pl: Check if `eol_range_start' is set correctly.
* tests/misc/cut-huge-range.sh: Rename from cut-huge-to-eol-range.sh,
and add a test to verify large amounts of mem aren't allocated.
Fixes http://bugs.gnu.org/13127
2013-04-29 17:54:27 +01:00
Pádraig Brady
be7932e863 cut: fix a segfault with disjoint open ended ranges
Fixes the issue introduced in unreleased commit v8.20-60-gec48bea.

* src/cut.c (set_fields): Don't access the bit array if
we've an open ended range that's outside any finite range.
* tests/misc/cut.pl: Add tests for this case.
Reported by Marcel Böhme in http://bugs.gnu.org/13627
2013-02-04 13:55:01 +00:00
Pádraig Brady
d302aed182 cut: fix -f to work with the -d$'\n' edge case
* src/cut.c (cut_fields): Handle the edge case where '\n' is
the delimiter, which could be used for example to suppress
the last line if it doesn't contain a '\n'.
* test/misc/cut.pl: Add tests for this edge case.
2013-01-26 02:31:59 +00:00
Pádraig Brady
51ce0bf844 cut: with -f, process each line independently
Previously line N+1 was inspected before line N was fully output,
which causes output ordering issues at the terminal or delays
from intermittent sources like tail -f.

* src/cut.c (cut_fields): Adjust so that we record the
previous output character so we can use that info to
determine wether to output a '\n' or not.
* tests/misc/cut.pl: Add tests to ensure existing
functionality isn't broken.
* NEWS: Mention the fix.
Fixes bug http://bugs.gnu.org/13498
2013-01-26 02:31:53 +00:00
Jim Meyering
77da73c754 maint: update all copyright year number ranges
Run "make update-copyright", but then also run this,
  perl -pi -e 's/2\d\d\d-//' tests/sample-test
to make that one script use the single most recent year number.
2013-01-01 04:51:20 +01:00
Pádraig Brady
00743a1f6e maint: fix a referenced coreutils version in a test comment
* tests/misc/cut.pl: This particular bug existed up to v8.10.
2012-12-06 18:28:46 +00:00
Pádraig Brady
64b8751813 tests: cut.pl: adjust for changed diagnostic
* tests/misc/cut.pl: Since we now output the more
complete error message irrespective of running
in a multi-byte locale or not, adjust the test accordingly.
2012-12-06 09:37:17 +00:00
Cojocaru Alexandru
dd3193cd95 cut: improve error reporting
* src/cut.c (main): Treat a NUL delimiter (-d '') consistently
with non NUL delimiters, and disallow such a delimiter option,
unless a field is also specified.
(set_fields): Provide a more accurate error message when
a given list is invalid.
* tests/misc/cut.pl: Add a test case.
2012-12-06 09:37:16 +00:00
Jim Meyering
06aeeecb3f cut: do not print extraneous delimiters in some unusual cases
When printing output delimiters, and when a to-EOL range subsumes
at least one other range, cut would mistakenly print delimiters for
the subsumed range.  This bug was probably introduced via commit
v5.2.1-639-g847e066.
* src/cut.c (set_fields): Ignore any range that is subsumed by a
to-EOL range.  Also, move two declarations down.
* tests/misc/cut.pl: Add tests to exercise this.
* NEWS (Bug fixes): Mention it.
Reported by Marcel Böhme in http://bugs.gnu.org/12966
2012-11-24 15:23:31 -08:00
Jim Meyering
1b874511b6 cut: treat -b2-,3- like -b2-, not like -b3-
* src/cut.c (set_fields): When two right-open-ended ranges are
specified, don't blindly let the latter one take precedence over
the former.  Instead, use the union of the ranges.
* tests/misc/cut.pl: Add tests to exercise this.
* NEWS (Bug fixes): Mention it.
Reported by Marcel Böhme in http://bugs.gnu.org/12966
Thanks to Berhard Voelker for catching log and NEWS typos.
2012-11-24 15:23:28 -08:00
Bernhard Voelker
1482f730b4 cut: do not accept the invalid range 0-
The command "echo 12345 | cut -b 0-" prints an empty line while
it should fail with "fields and positions are numbered from 1".

* src/cut.c (set_fields): Add a diagnostic for the invalid open
range which starts with Zero, i.e., the range 0-.
* tests/misc/cut.pl: Add tests to ensure the range 0- fails for
fields (-f) and for positions (-b, -c).
* NEWS: Mention the fix.

Reported by Marcel Böhme in <http://bugs.gnu.org/12903>.
2012-11-19 00:08:38 +01:00
Stefano Lattarini
9eb4c31eb7 tests: add .sh and .pl suffixes to shell and perl tests, respectively
Not only this shrinks the size of the generated Makefile (from > 6300
lines to ~3000), but will allow further simplifications in future
changes.

* tests/Makefile.am (TEST_EXTENSIONS): Add '.sh' and '.pl'.
(PL_LOG_COMPILER, SH_LOG_COMPILER): New, still defined simply to
$(LOG_COMPILER) for the time being.
(TESTS, root_tests): Adjust as described.
* All tests: Rename as described.
2012-08-30 18:55:59 +02:00