* bootstrap.conf: Include the new module
* gl/lib/fadvise.c: Provide a simpler interface to posix_fadvise.
(fadvise): Provide hint to the whole file associated with a stream.
(fdadvise): Provide hint to the specific portion of a file
associated with a file descriptor.
* gl/lib/fadvise.h: Redefine POSIX_FADV_* to FADVISE_* enums.
* gl/modules/fadvise: New file.
* m4/jm-macros.m4: Remove the no longer needed posix_fadvise check.
* .x-sc_program_name: Exclude test-fadvise.c from this check.
* gl/tests/test-fadvise (main): New test program.
* gl/modules/fadvise-testss: A new index to reference the tests.
* src/sort.c (stream_open): Use the new interface.
* src/dd.c (iwrite): Likewise.
This patch is by Gene Auyeung, Chris Dickens, Chen Guo, and Mike
Nichols, based off of a patch by Paul Eggert, Glen Lenker, et. al.,
with a basic heap implementation based off of the GDSL heap,
originally by Nicolas Darnis.
The number of sorts done in parallel is limited to the number
of available processors by default, or can be further restricted
with the --parallel option.
On a dual-die, 8 core Intel Xeon, results show sorting with
8 threads is almost 4 times faster than using a single thread.
Timings when sorting a 96MB file:
THREADS TIME (s)
1 5.10
2 2.87
4 1.75
8 1.31
Single threaded sorting has also been improved,
especially for cheaper comparison operations:
COMMAND BEFORE (s) AFTER (s)
sort 8.822 8.716
sort -g 10.336 10.222
sort -n 3.077 2.961
LANG=C sort 2.169 2.066
* bootstrap.conf: Add heap, pthread.
* coreutils.texi (sort): Describe the new --parallel option.
* gl/lib/heap.c: New file. Very basic heap implementation.
* gl/lib/heap.h: New file.
* gl/modules/heap: New file.
* src/Makefile.am: Add LIB_PTHREAD.
* src/sort.c: Include heap.h, nproc.h, pthread.h.
(MAX_MERGE): New macro.
(SUBTHREAD_LINES_HEURISTIC, PARALLEL_OPTION): New constants.
(MERGE_END, MERGE_ROOT): New constants.
(struct merge_node): New struct.
(struct merge_node_queue): New struct.
(sortlines temp): Remove declaration.
(usage, long_options, main): New option, --parallel.
(specify_nthreads): New function.
(mergelines): New signature, to emphasize the fact that the HI area
must be part of the destination. All callers changed.
(sequential_sort): New function, renamed from sortlines. Merge in
the functionality of sortlines_temp.
(compare_nodes): New function.
(lock_node, unlock_node): New functions.
(queue_destroy): New function.
(queue_init): New function.
(queue_insert): New function.
(queue_pop): New function.
(write_unique): New function.
(mergelines_node): New function.
(check_insert): New function.
(update_parent): New function.
(merge_loop): New function.
(sortlines): Rewrite to support and use parallelism, with a new
signature. All callers changed.
(struct thread_args): New struct.
(sortlines_thread): New function.
(sortlines_temp): Remove.
(sort): New argument NTHREADS. All uses changed. Output moved to
mergelines_node.
(main): disable threading if we are sorting at random.
* tests/Makefile.am (TESTS): Add misc/sort-benchmark-random.
* tests/misc/sort-benchmark-random: New file.
Signed-off-by: Pádraig Brady <P@draigBrady.com>
When processing a hard-linked file, du must keep track of the file's
device and inode numbers in order to avoid counting its storage
more than once. When du would process many hard linked files --
as are created by some backup tools -- the amount of memory required
for the supporting data structure could become prohibitively large.
This patch takes advantage of the fact that the amount of information
in the numbers of the typical dev,inode pair is far less than even
32 bits, and hence usually fits in the space of a pointer, be it
32 or 64 bits wide. A typical du traversal examines files on no
more than a handful of distinct devices, so the device number can
be encoded in just a few bits. Similarly, few inode numbers use
all of the high bits in an ino_t. Before, we would represent the
dev,inode pair using a naive struct, and allocate space for each.
Thus, an entry in the hash table consisted of a pointer (to that
struct) and a "next" pointer. With this change, we encode the
dev,inode information and put those bits in place of the pointer,
and thus do away with the need to allocate additional space for
each dev,inode pair.
* src/du.c: Include "di-set.h".
Don't include "hash.h"; it's no longer used.
(INITIAL_DI_SET_SIZE): Define.
(di_set): New global, to replace "htab".
(entry_hash, entry_compare, hash_init): Remove functions.
(hash_ins): Use di-set functions, rather than ones from the hash module.
(main): Likewise.
* bootstrap.conf (gnulib_modules): Add the new di-set module.
* NEWS (New features): Mention it.
* src/stat.c (alignof): Remove definition.
Instead, include "alignof.h", and sort the #include directives.
And get its definition from the gnulib module by that name:
* bootstrap.conf (gnulib_modules): Add alignof.
* bootstrap.conf (gnulib_modules): Add the following:
netinet_in, sys_ioctl, sys_wait, so that we can eliminate
the #if HAVE_<header>_H tests guarding their header inclusions.
Now that even MinGW provides ftruncate, we know that all
reasonable portability targets provide this function.
Remove the workaround code. We nearly removed the gnulib
module three years ago:
http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/9203
and it is now officially "obsolete".
* bootstrap.conf (gnulib_modules): Remove ftruncate.
* src/copy.c (copy_reg): Remove use of HAVE_FTRUNCATE and its
no-longer-used workaround code.
* src/truncate.c: Remove a comment about handling missing ftruncate.
* configure.ac: Require autoconf-2.62 and automake-1.11.1 or newer.
* bootstrap.conf (buildreq): Require automake-1.11.1 or newer,
to ensure people use a version with the fix for CVE-2009-4029.
Note that the coreutils-8.2 tarball included a fixed Makefile.in.
Require autoconf-2.62, per automake.
* bootstrap.conf (gnulib_modules): Remove the strverscmp module
which is not used since commit e505736f, on 03-10-2008,
"ls and sort: use filevercmp instead of strverscmp"
If stdin or stdout is closed, then freopen(,stderr) can violate
the premise that STDERR_FILENO==fileno(stderr), which in turn
breaks mktemp -q.
* bootstrap.conf (gnulib_modules): Add freopen-safer.
* src/mktemp.c (includes): Use stdio--.h.
* tests/misc/close-stdout: Enhance test to catch bug.
* bootstrap.conf (gnulib_modules): Add do-release-commit-and-tag.
* build-aux/do-release-commit-and-tag: Remove file. Now it's in gnulib.
* gnulib: Update submodule to the latest, to get the just-moved script.
* src/mktemp.c (main): Remove just-created file if stdout had
problems.
* bootstrap.conf (gnulib_modules): Add remove.
* tests/misc/close-stdout: Test it.
* NEWS: Document it.
* gl/lib/mbsalign.c (mbsalign): Mark unused parameter.
* bootstrap.conf (gnulib_modules): Remove obsolete
rename-dest-slash.
* gnulib-tests/Makefile.am (AM_CFLAGS): Reduce set of warnings for
gnulib tests.
* gl/modules/rename-tests.diff (Makefile.am): New file, to add
LIBINTL to LDADD, since we avoid canonicalize-lgpl module.
* gl/lib/regcomp.c.diff (regerror, calc_next)
(build_collating_symbol, parse_bracket_element, build_equiv_class)
(free_tree): Mark unused parameters.
* gl/lib/regex_internal.h.diff (re_string_elem_size_at): New file,
to mark unused parameters.
* gl/lib/printf-args.c.diff (PRINTF_FETCHARGS): New file, to avoid
type mismatch.
* gl/lib/vasnprintf.c (VASNPRINTF): New file, to avoid shadowing
local variable name.
* .gitignore: Ignore temporary build artifacts.
* src/env.c (main): Use unsetenv rather than putenv to remove
items from environ, and check for failure.
* bootstrap.conf (gnulib_modules): Add unsetenv.
* tests/misc/env: Test this.
* NEWS: Document it.
* bootstrap.conf (gnulib_modules): Add gnu-web-doc-update.
Remove gendocs, since gnu-web-doc-update depends on it.
* gnu-web-doc-update: Remove file, now that we get it from gnulib.
* bootstrap.conf (gnulib_modules): Add freopen, strsignal, fsync.
Exposed via make CFLAGS=-DGNULIB_POSIXCHECK 2>&1 \
|perl -lne '/.* use gnulib module (\S+).*/ and print $1' \
|sort |uniq -c|sort -nr
(avoided_gnulib_modules): Don't avoid the "lock" module.
Now it's required, as a dependency of the strsignal module.
* bootstrap (bootstrap_epilogue): Define a default, empty function.
Remove coreutils-specific code, and instead,
invoke this new function at the end of this script.
* bootstrap.conf (bootstrap_epilogue): Define, to override the default.
* bootstrap.conf (obsolete_gnulib_modules): Move rename...
(gnulib_modules): ...here. Add symlink.
* NEWS: Document the change in readlink.
* doc/coreutils.texi (readlink invocation): Likewise.
* tests/readlink/can-f: Update test to new semantics, and add test
of loop.
* bootstrap.conf (gnulib_modules): Drop rmdir-errno.
* src/rmdir.c (errno_rmdir_non_empty): Check both cases allowed by
POSIX, rather than relying on configure-time check that might
fail during cross-compilation. Reverts commit 9b6eb98d41.
* bootstrap.conf (gnulib_modules): Add faccessat. Replace strdup
with strdup-posix.
* m4/jm-macros.m4 (coreutils_MACROS): Revert previous change, now
that gnulib does it for us.
* src/remove.c (write_protected_non_symlink): Use faccessat in
more situations.
Before this change, :>f; ln -T f no-such/ would succeed on Solaris 10.
After it, ln fails, as it should: ln: accessing `z/': Not a directory
The command, link f no-such/, had the same problem on that system.
* bootstrap.conf (gnulib_modules): Add "link".
* tests/ln/slash-decorated-nonexistent-dest: New test.
* tests/Makefile.am (TESTS): Add it.
* NEWS (Portability): Mention the improvement.
* bootstrap.conf (obsolete_gnulib_modules): Remove memchr from
the list, now that it fixes a problem in some modern C libraries.
(gnulib_modules): Add it here.
Use gnulib's new priv-set module and updated write-any-file.
With them, the remove-called can_write_any_file function no
longer tries to drop the unlink-directory privilege, so now
each caller of remove must do that separately, calling
priv_set_remove_linkdir.
* bootstrap.conf (gnulib_modules): Add priv-set.
* src/rm.c: Include "priv-set.h".
(main): Call priv_set_remove_linkdir.
* src/mv.c (main): Likewise.
* gnulib: Update submodule to latest.
Reported by Samuel Thibault and Stéphane Raimbault, as the glibc fr_FR
locale has recently changed to use the official but variable width
abbreviated month names. Other glibc locales also have variable widths.
http://sourceware.org/ml/libc-locales/2008-q1/msg00035.htmlhttp://sourceware.org/bugzilla/show_bug.cgi?id=9859
* NEWS: Mention the fix
* gl/lib/mbsalign.c: A new module to align and truncate a
string in a specified number of screen cells, while handling
multi-byte characters appropriately.
* gl/lib/mbsalign.h: Ditto
* gl/modules/mbsalign: Ditto
* bootstrap.conf: Reference the new module
* src/ls.c (abmon_init): New function, precompute the abbreviated
months aligned left in a minimum width column <= 5 screen cells.
(align_nstrftime): New function, replace the first %b in the
format specification to strftime with the precomputed month string.
Note using the cached month strings speeds up `ls -lU` by around 17%
on glibc-2.7-2 on linux at least. Also if we implement this function
using heap storage rather than automatic storage, and use snprintf
instead of strcpy, ls will slow down by 2% and 1% respectively
(i.e. a net gain of 14% rather than 17%).
* tests/ls/abmon-align: A new test to test ls alignment for
various formats and locales
* tests/Makefile.am: Reference the new test
* bootstrap.conf (gnulib_modules): Add manywarnings.
* configure.ac: Use gl_MANYWARN_ALL_GCC, and exclude options
I don't want or that provoke too many warnings.
(WARN_CFLAGS, WERROR_CFLAGS): Define.
(lint, GNULIB_PORTCHECK): Define.
(_FORTIFY_SOURCE): Define to 2.