1
0
mirror of git://git.sv.gnu.org/coreutils.git synced 2026-02-20 22:32:17 +02:00
Commit Graph

35 Commits

Author SHA1 Message Date
Chen Guo
9face836f3 sort: parallelize internal sort
This patch is by Gene Auyeung, Chris Dickens, Chen Guo, and Mike
Nichols, based off of a patch by Paul Eggert, Glen Lenker, et. al.,
with a basic heap implementation based off of the GDSL heap,
originally by Nicolas Darnis.

The number of sorts done in parallel is limited to the number
of available processors by default, or can be further restricted
with the --parallel option.

On a dual-die, 8 core Intel Xeon, results show sorting with
8 threads is almost 4 times faster than using a single thread.
Timings when sorting a 96MB file:
THREADS     TIME (s)
1            5.10
2            2.87
4            1.75
8            1.31

Single threaded sorting has also been improved,
especially for cheaper comparison operations:
COMMAND             BEFORE (s)  AFTER (s)
sort                 8.822       8.716
sort -g             10.336      10.222
sort -n              3.077       2.961
LANG=C sort          2.169       2.066

* bootstrap.conf: Add heap, pthread.
* coreutils.texi (sort): Describe the new --parallel option.
* gl/lib/heap.c: New file. Very basic heap implementation.
* gl/lib/heap.h: New file.
* gl/modules/heap: New file.
* src/Makefile.am: Add LIB_PTHREAD.
* src/sort.c: Include heap.h, nproc.h, pthread.h.
(MAX_MERGE): New macro.
(SUBTHREAD_LINES_HEURISTIC, PARALLEL_OPTION): New constants.
(MERGE_END, MERGE_ROOT): New constants.
(struct merge_node): New struct.
(struct merge_node_queue): New struct.
(sortlines temp): Remove declaration.
(usage, long_options, main): New option, --parallel.
(specify_nthreads): New function.
(mergelines): New signature, to emphasize the fact that the HI area
must be part of the destination.  All callers changed.
(sequential_sort): New function, renamed from sortlines. Merge in
the functionality of sortlines_temp.
(compare_nodes): New function.
(lock_node, unlock_node): New functions.
(queue_destroy): New function.
(queue_init): New function.
(queue_insert): New function.
(queue_pop): New function.
(write_unique): New function.
(mergelines_node): New function.
(check_insert): New function.
(update_parent): New function.
(merge_loop): New function.
(sortlines): Rewrite to support and use parallelism, with a new
signature. All callers changed.
(struct thread_args): New struct.
(sortlines_thread): New function.
(sortlines_temp): Remove.
(sort): New argument NTHREADS. All uses changed. Output moved to
mergelines_node.
(main): disable threading if we are sorting at random.
* tests/Makefile.am (TESTS): Add misc/sort-benchmark-random.
* tests/misc/sort-benchmark-random: New file.

Signed-off-by: Pádraig Brady <P@draigBrady.com>
2010-07-13 01:44:46 +01:00
Paul Eggert
fb1a26c3f6 du: Hash with a mechanism that's simpler and takes less memory.
* gl/lib/dev-map.c, gl/lib/dev-map.h, gl/modules/dev-map: Remove.
* gl/lib/ino-map.c, gl/lib/ino-map.h, gl/modules/ino-map: New files.
* gl/modules/dev-map-tests, gl/tests/test-dev-map.c: Remove.
* gl/modules/ino-map-tests, gl/tests/test-ino-map.c: New files.
* gl/lib/di-set.h (struct di_set): Renamed from struct di_set_state,
and now private.  All uses changed.
(_ATTRIBUTE_NONNULL_): Don't assume C99.
(di_set_alloc): Renamed from di_set_init, with no size arg.
Now allocates the object rather than initializing it.
For now, this no longer takes an initial size; we can put this
back later if it is needed.
* gl/lib/di-set.c: Include hash.h, ino-map.h, and limits.h instead of
stdio.h, assert.h, stdint.h, sys/types.h (di-set.h includes that
now), sys/stat.h, and verify.h.
(N_DEV_BITS_4, N_INO_BITS_4, N_DEV_BITS_8, N_INO_BITS_8): Remove.
(struct dev_ino_4, struct dev_ino_8, struct dev_ino_full): Remove.
(enum di_mode): Remove.
(hashint): New typedef.
(HASHINT_MAX, LARGE_INO_MIN): New macros.
(struct di_ent): Now maps a dev_t to a inode set, instead of
containing a union.
(struct dev_map_ent): Remove.
(struct di_set): New type.
(is_encoded_ptr, decode_ptr, di_ent_create): Remove.
(di_ent_hash, di_ent_compare, di_ent_free, di_set_alloc, di_set_free):
(di_set_insert): Adjust to new representation.
(di_ino_hash, map_device, map_inode_number): New functions.
* gl/modules/di-set (Depends-on): Replace dev-map with ino-map.
Remove 'verify'.
* gl/tests/test-di-set.c: Adjust to the above changes to API.
* src/du.c (INITIAL_DI_SET_SIZE): Remove.
(hash_ins, main): Adjust to new di-set API.
2010-07-06 14:58:48 -07:00
Jim Meyering
f42496b72b di-set: manipulate sets of dev/inode pairs efficiently
* gl/lib/di-set.c: Implementation.
* gl/lib/di-set.h: Declarations.
* gl/modules/di-set: Define module.
* gl/modules/di-set-tests: Define test module.
* gl/tests/test-di-set.c: Likewise.
2010-07-04 08:40:40 +02:00
Jim Meyering
6357909ee6 dev-map: map device number to small non-negative
* gl/lib/dev-map.c: New file.
* gl/lib/dev-map.h: Declarations.
* gl/modules/dev-map: Define primary modules.
* gl/modules/dev-map-tests: Define test module.
* gl/tests/test-dev-map.c: Test it.
2010-07-04 08:40:40 +02:00
Pádraig Brady
dfe0d336a0 maint: update the mbsalign module
* gl/lib/mbsalign.c (mbsalign):  Support the MBA_UNIBYTE_FALLBACK
flag which reverts to unibyte mode if one can't allocate memory
or if there are invalid multibyte characters present.
Note memory is no longer dynamically allocated in unibyte mode so
one can assume that mbsalign() will not return an error if this
flag is present.  Don't calculate twice, the number of spaces,
when centering.  Suppress a signed/unsigned comparison warning.
(ambsalign): A new wrapper function to dynamically allocate
the minimum memory required to hold the aligned string.
* gl/lib/mbsalign.h: Add the MBA_UNIBYTE_FALLBACK flag and
also document others that may be implemented in future.
(ambsalign): A prototype for the new wrapper.
* gl/tests/test-mbsalign.c (main): New test program.
* gl/modules/mbsalign-tests: A new index to reference the tests.
* .x-sc_program_name: Exclude test-mbsalign.c from this check.
2010-03-19 19:23:45 +00:00
Eric Blake
b498c58013 tests: fix link failure on cygwin
Counterpart to commit 8fe40b84bd, since test-link.c uses rename,
and we override gnulib with a rename() replacement that can xalloc_die.

* gl/modules/link-tests.diff: New file.
2009-11-24 06:36:13 -07:00
Jim Meyering
395b1c9375 maint: move xfreopen module to gnulib
* gl/lib/xfreopen.c: Remove file.
* gl/lib/xfreopen.h: Likewise.
* gl/modules/xfreopen: Likewise.
2009-11-20 07:37:56 +01:00
Eric Blake
6a31fd8d73 build: update gnulib, for getgroups improvements
A replacement getgroups is now guaranteed to exist, but it may
fail with ENOSYS.  mgetgroups is moved to gnulib, and now takes
gid_t instead of GETGROUPS_T (but setgroups still needs GETGROUPS_T).

* gnulib: Update to latest.
* gl/modules/mgetgroups: Delete, moved to gnulib.
* gl/m4/mgetgroups.m4: Likewise.
* gl/lib/mgetgroups.h: Likewise.
* gl/lib/mgetgroups.c: Likewise.
* src/group-list.c (print_group_list): Adjust callers.
* src/id.c (print_full_info): Likewise.
2009-11-13 07:50:20 -07:00
Eric Blake
56b85e035b build: consistently use freopen-safer
cat, head, ptx, shuf, tac, tail, tee, tr, and uniq used freopen
on stdout, and were potentially vulnerable.  dircolors, du, and
tsort only used it on stdin, which is unaffected by freopen_safer,
but this covers all uses for consistency.

* cfg.mk (sc_require_stdio_safer): New rule.
* gl/modules/xfreopen (Depends-on): Add freopen-safer.
* gl/lib/xfreopen.c (includes): Use stdio--.h.
* src/ptx.c (includes): Likewise.
* src/shuf.c (includes): Likewise.
* src/uniq.c (includes): Likewise.
* src/dircolors.c (includes): Likewise.
* src/du.c (includes): Likewise.
* src/tsort.c (includes): Likewise.
2009-11-07 10:10:28 -07:00
Eric Blake
8fe40b84bd build: avoid some warnings
* gl/lib/mbsalign.c (mbsalign): Mark unused parameter.
* bootstrap.conf (gnulib_modules): Remove obsolete
rename-dest-slash.
* gnulib-tests/Makefile.am (AM_CFLAGS): Reduce set of warnings for
gnulib tests.
* gl/modules/rename-tests.diff (Makefile.am): New file, to add
LIBINTL to LDADD, since we avoid canonicalize-lgpl module.
* gl/lib/regcomp.c.diff (regerror, calc_next)
(build_collating_symbol, parse_bracket_element, build_equiv_class)
(free_tree): Mark unused parameters.
* gl/lib/regex_internal.h.diff (re_string_elem_size_at): New file,
to mark unused parameters.
* gl/lib/printf-args.c.diff (PRINTF_FETCHARGS): New file, to avoid
type mismatch.
* gl/lib/vasnprintf.c (VASNPRINTF): New file, to avoid shadowing
local variable name.
* .gitignore: Ignore temporary build artifacts.
2009-11-02 06:34:24 -07:00
Jim Meyering
c5c15884df maint: move selinux-at module from gl/ to gnulib
* gl/lib/selinux-at.c: Remove file.
* gl/lib/selinux-at.h: Likewise.
* gl/modules/selinux-at: Likewise.
* gnulib: update to latest, to get the new module.
2009-08-06 14:32:38 +02:00
Jim Meyering
ec34511ccf move argv-iter module to gnulib
* gl/lib/argv-iter.c: Remove file.
* gl/lib/argv-iter.h: Remove file.
* gl/modules/argv-iter: Remove file.
* gl/modules/argv-iter-tests: Remove file.
* gl/tests/test-argv-iter.c: Remove file.
* gnulib: Update submodule, to get argv-iter
2009-07-04 17:41:39 +02:00
Pádraig Brady
612b647dd1 ls: fix alignment when month names have varying widths
Reported by Samuel Thibault and Stéphane Raimbault, as the glibc fr_FR
locale has recently changed to use the official but variable width
abbreviated month names. Other glibc locales also have variable widths.
http://sourceware.org/ml/libc-locales/2008-q1/msg00035.html
http://sourceware.org/bugzilla/show_bug.cgi?id=9859
* NEWS: Mention the fix
* gl/lib/mbsalign.c: A new module to align and truncate a
string in a specified number of screen cells, while handling
multi-byte characters appropriately.
* gl/lib/mbsalign.h: Ditto
* gl/modules/mbsalign: Ditto
* bootstrap.conf: Reference the new module
* src/ls.c (abmon_init): New function, precompute the abbreviated
months aligned left in a minimum width column <= 5 screen cells.
(align_nstrftime): New function, replace the first %b in the
format specification to strftime with the precomputed month string.
Note using the cached month strings speeds up `ls -lU` by around 17%
on glibc-2.7-2 on linux at least.  Also if we implement this function
using heap storage rather than automatic storage, and use snprintf
instead of strcpy, ls will slow down by 2% and 1% respectively
(i.e. a net gain of 14% rather than 17%).
* tests/ls/abmon-align: A new test to test ls alignment for
various formats and locales
* tests/Makefile.am: Reference the new test
2009-04-03 00:34:11 +01:00
Jim Meyering
d3b5555f10 argv-iter: add tests
* gl/modules/argv-iter-tests: New module.
* gl/tests/test-argv-iter.c: New file.
2008-12-01 17:41:08 +01:00
Jim Meyering
ca738e4414 argv-iter: new module
* gl/lib/argv-iter.h: New file.
* gl/lib/argv-iter.c: New file.
* gl/modules/argv-iter: New file.
With a suggestion for improved memory management by Pádraig Brady.
2008-12-01 17:40:59 +01:00
Jim Meyering
32d4d0dd5e xfreopen: new module
* gl/lib/xfreopen.c: New file.
* gl/lib/xfreopen.h: New file.
* gl/modules/xfreopen: New file.
2008-11-10 08:11:59 +01:00
Jim Meyering
5afac2aee1 clean up gl/modules/selinux-at
* gl/modules/selinux-at:
Ensure that LIB_SELINUX is cleared, in case it's set in the environment.
m4-quote the first two args to AC_SEARCH_LIBS.
Don't violate autoconf's ac_ namespace: s/ac_save/gl_save/
Drop the useless double quotes around a simple assignment RHS.
2008-10-22 12:44:17 +02:00
Jim Meyering
aa67daf63b move selinux-h module from gl/ to gnulib
* gl/lib/se-context.in.h: Remove file.
* gl/lib/se-selinux.in.h: Likewise.
* gl/m4/selinux-context-h.m4: Likewise.
* gl/m4/selinux-selinux-h.m4: Likewise.
* gl/modules/selinux-h: Likewise.
2008-10-21 18:05:55 +02:00
Jim Meyering
f7a009c17c prepare to move selinux-h module to gnulib
* gl/modules/selinux-h (Makefile.am)
[selinux/selinux.h, selinux/context.h]:
Remove temporary file and target, in case they're read-only.
Use $(MKDIR_P), not mkdir -p.
(License): Relax to LGPLv2+.
Remove vestigial comments.
2008-10-21 15:58:11 +02:00
Jim Meyering
31a884dab7 accommodate gnulib header removals
* src/copy.c: Don't include "euidaccess.h" or "lchmod.h".
* src/cp.c: Don't include "lchmod.h".
* src/ls.c: Don't include "dirfd.h".
* src/mkdir.c: Don't include "lchmod.h".
* src/pwd.c: Don't include "dirfd.h".
* src/remove.c: Don't include "dirfd.h" or "euidaccess.h".
* src/test.c: Don't include "euidaccess.h".
* gl/modules/getloadavg.diff: Adjust diff for changed context.
* src/uptime.c (uptime): Remove declaration.
2008-10-19 14:15:31 +02:00
Jim Meyering
433881d802 move sha256 and sha512 modules to gnulib
* bootstrap.conf (gnulib_modules) [sha256, sha512]: Add "crypto/"
prefix to module name, now that they come from gnulib.
* gl/lib/sha256.c: Remove file.
* gl/lib/sha256.h: Likewise.
* gl/lib/sha512.c: Likewise.
* gl/lib/sha512.h: Likewise.
* gl/lib/u64.h: Likewise.
* gl/m4/sha256.m4: Likewise.
* gl/m4/sha512.m4: Likewise.
* gl/modules/sha256: Likewise.
* gl/modules/sha512: Likewise.
2008-05-11 09:00:59 +02:00
Jim Meyering
f33599c144 Create sha256 and sha512 modules and move files into gl/.
* bootstrap.conf (gnulib_modules): Add sha256 and sha512.
* m4/prereq.m4: Don't require gl_SHA256 or gl_SHA512.
* gl/modules/sha512: New file.
* gl/modules/sha256: New file.
* m4/sha256.m4: Move to ...
* gl/m4/sha256.m4: ...here, removing use of AC_SOURCES.
* m4/sha512.m4: Move to ...
* gl/m4/sha512.m4: ...here, removing use of AC_SOURCES.
* lib/sha256.c, lib/sha256.h: Move to ...
* gl/lib/sha256.c, gl/lib/sha256.h: ...here.
* lib/sha512.c, lib/sha512.h: Move to ...
* gl/lib/sha512.c, gl/lib/sha512.h: ...here.
* lib/u64.h: Move to ...
* gl/lib/u64.h: ...here.
2008-03-02 12:16:49 +01:00
Jim Meyering
46e51208c7 Don't depend on gnulib's deprecated "free" module.
* bootstrap.conf (obsolete_gnulib_modules): Remove free.
* gl/modules/mgetgroups (Depends-on): Remove free.
2008-03-01 09:56:56 +01:00
Paul Eggert
77f8502ea7 * gl/modules/randread (Depends-on): Remove nonexistent rand-isaac. 2007-11-28 22:29:38 +01:00
Jim Meyering
7eab7d027e Make tempname more random, via the randint module.
* gl/modules/tempname (Depends-on): Add randint and stdbool.
* gl/lib/tempname.c: Include randint.h and stdbool.h.
(uint64_t): Remove definition.  Not needed.
[_LIBC] (RANDOM_BITS): Remove this block, now that we have proper random bits.
(check_x_suffix): New function.
(gen_tempname_len): Rename from __gen_tempname.
Add a parameter, x_suffix_len, telling how many X's there must be at
the end of the template.
Use pseudo-random numbers all the way, rather than adding 7777
from one iteration to the next.
(__gen_tempname): New function, to call gen_tempname_len, requiring a
suffix length of 6.
* gl/lib/tempname.h: Add prototype for gen_tempname_len.

Signed-off-by: Jim Meyering <meyering@redhat.com>
2007-10-07 19:44:07 +02:00
Jim Meyering
58a7ead41d Convert coreutils' rand*.{c,h,m4} into modules.
First step: move these files to gl/lib:
* lib/rand-isaac.c, lib/rand-isaac.h
* lib/randint.c, lib/randint.h
* lib/randperm.c, lib/randperm.h
* lib/randread.c, lib/randread.h

Step 2: add modules/rand* and remove now-unneeded .m4 files.
* gl/modules/randint: New file.
* gl/modules/randperm: New file.
* gl/modules/randread: New file.
* m4/randint.m4: Remove file.
* m4/randperm.m4: Remove file.
* m4/randread.m4: Remove file.

Step 3: use the new modules
* bootstrap.conf (gnulib_modules): Add randint and randperm.
* m4/prereq.m4 (gl_RANDINT, gl_RANDREAD, gl_RANDPERM): Don't require;
These have been removed.
(gl_ROOT_DEV_INO): Don't require; already handled via bootstrap.conf.
2007-10-07 19:43:35 +02:00
Jim Meyering
696f4a8042 Copy from gnulib the parts of tempname that we'll modify.
* gl/lib/tempname.c: Copy from gnulib.
* gl/lib/tempname.h: Likewise.
* gl/modules/tempname: Likewise.

Allow GPLv2 on temporarily(?)-imported file from gnulib/libc.
* .x-sc_GPL_version: New file.
* Makefile.am (EXTRA_DIST): Add .x-sc_GPL_version
2007-10-07 19:42:53 +02:00
Jim Meyering
43d0c30bf0 Adapt to new gnulib naming scheme.
* gl/lib/se-context.in.h: Rename from gl/lib/se-context_.h.
* gl/lib/se-selinux.in.h: Rename from gl/lib/se-selinux_.h.
* gl/m4/selinux-context-h.m4: Remove use of AC_LIBSOURCES.
* gl/m4/selinux-selinux-h.m4: Likewise.
* gl/modules/selinux-h (Files, Makefile.am): Reflect renaming.
(Makefile.am) [lib_SOURCES]: Add se-context.in.h and se-selinux.in.h.
2007-10-04 12:33:32 +02:00
Jim Meyering
491c54ca99 Move file-set and hash-triple modules to gnulib.
* bootstrap.conf (gnulib_modules): Remove file-set, now that
it's in gnulib, and the canonicalize module requires it there.
* gl/lib/file-set.c, gl/lib/file-set.h, gl/modules/hash-triple: Remove.
* gl/lib/hash-triple.c, gl/lib/hash-triple.h, gl/modules/file-set: Remove.
2007-09-27 10:53:36 +02:00
Jim Meyering
22ed81c410 Move functions from copy.c into new modules, since ln needs them, too.
* bootstrap.conf (gnulib_modules): Add file-set.
* gl/lib/file-set.c (record_file, seen_file): Functions from copy.c.
* gl/lib/file-set.h: Add prototypes.
* gl/lib/hash-triple.c (triple_hash, triple_hash_no_name):
(triple_compare, triple_free): Functions from copy.c.
* gl/lib/hash-triple.h (struct F_triple): Define.  From copy.c.
Add prototypes.
* gl/modules/file-set: New module.
* gl/modules/hash-triple: New module.
* src/Makefile.am (copy_sources): New variable.
(ginstall_SOURCES, cp_SOURCES, mv_SOURCES): Use it.
* src/copy.c: Include hash-triple.h.
No longer include hash-pjw.h.
(copy_internal): Don't pass a NULL third argument to record_file,
since that function no longer accepts that.
(record_file): Move this function to file-set.c.
Along the way, remove the code to allow a NULL stat-buffer pointer.
Adjust sole caller.
(seen_file): Move this function to file-set.c.
(struct F_triple): Move declaration to hash-triple.h.
(triple_compare, triple_free, triple_hash, triple_hash_no_name):
Move these functions to hash-triple.c.

Signed-off-by: Jim Meyering <jim@meyering.net>
2007-08-23 13:59:40 +02:00
Jim Meyering
e0066f36c2 setuidgid: set all groups, not just the primary one; mgetgroups: new module
I wanted to use the xgetgroups function from id.c, so factored it out
and made it into a non-exiting function (hence the "m" prefix rather than "x").
* src/setuidgid.c (main): Use mgetgroups. Include "mgetgroups.h".

* src/id.c (xgetgroups): Remove function.
Include "mgetgroups.h".
(print_group_list): Use mgetgroups, not xgetgroups.

* gl/modules/mgetgroups: New module.
* gl/lib/mgetgroups.c: New file.  mgetgroups is derived from
id.c's xgetgroups function.
* bootstrap.conf (gnulib_modules): Add mgetgroups.
* gl/m4/mgetgroups.m4: New file.
* gl/lib/mgetgroups.h: New file.
2007-07-06 07:44:39 +02:00
Jim Meyering
87516b80a5 New program: chcon
* gl/modules/selinux-at: New module.  Check for libselinux and set
LIB_SELINUX here, unconditionally, rather than depending on
the configure-time --enable-selinux option.
* gl/modules/selinux-h: New module.
* bootstrap.conf (gnulib_modules): Add selinux-at.
* gl/lib/selinux-at.c, gl/lib/selinux-at.h: New files.
* gl/lib/se-selinux_.h: New file.
* gl/lib/se-context_.h: New file.
* gl/m4/selinux-selinux-h.m4: New file.
* gl/m4/selinux-context-h.m4: New file.
* src/Makefile.am (bin_PROGRAMS): Add chcon.
(chcon_LDADD): Define.
* README: Add chcon to the list of programs.
* src/chcon.c: Rewrite the original (Red Hat) chcon to use fts.
2007-03-29 21:37:05 +02:00
Jim Meyering
3ef2f939f7 Adapt to new version of gnulib-tool.
* gl/modules/root-dev-ino: New file.
* lib/root-dev-ino.c, lib/root-dev-ino.h: Move these files ...
* gl/lib/root-dev-ino.c, gl/lib/root-dev-ino.h: ... to here.
* m4/root-dev-ino.m4: Move this file ...
* gl/m4/root-dev-ino.m4: ... to here.
* bootstrap.conf (gnulib_modules): Add root-dev-ino.
2006-11-14 14:02:18 +01:00
Jim Meyering
328efced8b remove trailing blanks 2006-09-23 17:00:08 +00:00
Jim Meyering
7c8dece8c6 * gl/modules/getloadavg.diff: New file. Work around the way the latest
version of the getloadavg module interacts with our bootstrap script.
* bootstrap (gnulib_tool_options): Add "--local-dir gl".
2006-09-23 16:51:53 +00:00