|
View:
New views
8 Messages
—
Rating Filter:
Alert me
|
|
|
[PATCH] grep: -r no follows symlinksHere's a proposed change to GNU grep so that 'grep -r'
no longer follows symlinks, and by default does not read devices, which is more convenient in the typical use case. Symlinks and devices on the command line are still dereferenced; the change affects only symlinks and devices encountered recursively. Users who want dereferencing for all files can use -R, which retains its old meaning. I considered adding --dereference, --no-dereference, --dereference-command-line, etc, but that was a lot of extra complexity (particularly in the documentation) for little payoff -- anybody who wants something that complicated can use 'find', as the whole point of 'grep -r' is that it should be simple. As a side effect this patch changes grep to use fts, which should make it more robust with large directory hierarchies. This is why the patch subtracts more lines than it adds. This assumes the latest gnulib, so the first patch syncs to gnulib, and the second one does the real work. From b2abf4baf9bb6c6c091d0ac13a6c27fadecf5794 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@...> Date: Sat, 10 Mar 2012 12:23:03 -0800 Subject: [PATCH 1/2] build: update gnulib submodule to latest --- gnulib | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/gnulib b/gnulib index eb21377..f9d2fe2 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit eb213779301aa663ab84ac947e8e181e9ad554d0 +Subproject commit f9d2fe251f3a104df656ab6ffc64821893ab9003 -- 1.7.6.5 From 7a97e788391b13ebf6242ff74d7a19a1944d56c1 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@...> Date: Sat, 10 Mar 2012 11:29:32 -0800 Subject: [PATCH 2/2] grep: -r no follows symlinks Change -r to follow only command-line symlinks, and by default to reads only devices named on the command line. This is a simple way to get a more-useful behavior when searching random directories; the idea is to use 'find' if you want something fancy. -R acts as before and gets a new alias --dereference-recursive. The code now uses fts internally, so it should be more robust and faster with large hierarchies. * .gitignore: Remove lib/savedir.c, lib/savedir.h. * tests/symlink: New file * Makefile.boot (LIB_OBJS_core): Remove isdir.o, savedir.o. Perhaps other changes are needed too, but I'm not sure what this makefile is for. * NEWS: Document changes. * doc/grep.texi (File and Directory Selection): Likewise. * bootstrap.conf (gnulib_modules): Remove dirent, dirname, isdir, open. Add fstatat, fts, openat-safer. * lib/Makefile.am (libgreputils_a_SOURCES): Remove savedir.c, savedir.h. * lib/savedir.c, lib/savedir.h: Remove. * po/POTFILES.in: Add lib/openat-die.c. * src/main.c: Include fcntl-safer.h, fts_.h. Don't include isdir.h, savedir.h. (struct stats, stats_base): Remove. (long_options, usage, main): Add --dereference-recursive and implement -r vs -R. (filename_prefix_len, fts_options): New static vars. (basic_fts_options, READ_COMMAND_LINE_DEVICES): New constants. (devices): Now defaults to READ_COMMAND_LINE_DEVICES. (reset, grep): Now takes just struct stat rather than file name and struct stats. All callers changed. (fillbuf): Now takes struct stat reather than struct stats. All callers changed. (grep): Don't worry about recursing too deeply; fts and grepdesc handle this now. (grepdirent, grepdesc, grep_command_line_args): New functions. (grepfile): New args DIRDESC, FOLLOW, COMMAND_LINE. Remove struct stats arg. All callers changed. Use openat_safer rather than open. Use desc == STDIN_FILENO to tell whether we're reading "-". Don't worry about EINTR when closing -- not possible, since we're not catching signals. * tests/Makefile.am (TESTS): Add symlink. * tests/symlink: New file. --- .gitignore | 2 - Makefile.boot | 2 - NEWS | 11 ++ bootstrap.conf | 7 +- doc/grep.texi | 26 +++- lib/Makefile.am | 2 +- lib/savedir.c | 163 ---------------------- lib/savedir.h | 11 -- po/POTFILES.in | 1 + src/main.c | 400 +++++++++++++++++++++++++++------------------------- tests/Makefile.am | 1 + tests/symlink | 65 +++++++++ 12 files changed, 312 insertions(+), 379 deletions(-) delete mode 100644 lib/savedir.c delete mode 100644 lib/savedir.h create mode 100755 tests/symlink diff --git a/.gitignore b/.gitignore index 35f5d10..0b195d9 100644 --- a/.gitignore +++ b/.gitignore @@ -64,5 +64,3 @@ TAGS !/lib/colorize-posix.c !/lib/colorize-w32.c !/lib/colorize.h -!/lib/savedir.c -!/lib/savedir.h diff --git a/Makefile.boot b/Makefile.boot index 043429b..4414110 100644 --- a/Makefile.boot +++ b/Makefile.boot @@ -41,10 +41,8 @@ LIB_OBJS_core = \ $(libdir)/error.$(OBJEXT) \ $(libdir)/exclude.$(OBJEXT) \ $(libdir)/hard-locale.$(OBJEXT) \ - $(libdir)/isdir.$(OBJEXT) \ $(libdir)/quotearg.$(OBJEXT) \ $(libdir)/regex.$(OBJEXT) \ - $(libdir)/savedir.$(OBJEXT) \ $(libdir)/strtoumax.$(OBJEXT) \ $(libdir)/xmalloc.$(OBJEXT) \ $(libdir)/xstrtol.$(OBJEXT) \ diff --git a/NEWS b/NEWS index d0a63d5..3a752b1 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,17 @@ GNU grep NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** New features + + The -R option now has a long-option alias --dereference-recursive. + +** Changes in behavior + + The -r (--recursive) option now follows only command-line symlinks. + Also, by default -r now reads only devices named on the command + line; this can be overridden with --devices. -R acts as before, so + use -R if you prefer the old behavior of following all symlinks and + defaulting to reading all devices. * Noteworthy changes in release 2.11 (2012-03-02) [stable] diff --git a/bootstrap.conf b/bootstrap.conf index 45bb33d..7a71015 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -24,13 +24,13 @@ binary-io btowc c-ctype closeout -dirent -dirname do-release-commit-and-tag error exclude fcntl-h fnmatch +fstatat +fts getopt-gnu getpagesize gettext-h @@ -44,7 +44,6 @@ intprops inttypes isatty isblank -isdir iswctype largefile lseek @@ -58,7 +57,7 @@ memchr mempcpy minmax obstack -open +openat-safer progname propername quote diff --git a/doc/grep.texi b/doc/grep.texi index c014d8f..1840e21 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -606,16 +606,21 @@ if the terminal driver interprets some of it as commands. @opindex --devices @cindex device search If an input file is a device, FIFO, or socket, use @var{action} to process it. -By default, @var{action} is @samp{read}, -which means that devices are read just as if they were ordinary files. +If @var{action} is @samp{read}, +all devices are read just as if they were ordinary files. If @var{action} is @samp{skip}, devices, FIFOs, and sockets are silently skipped. +By default, devices are read if they are on the command line or if the +@option{-R} (@option{--dereference-recursive}) option is used, and are +skipped if they are encountered recursively and the @option{-r} +(@option{--recursive}) option is used. @item -d @var{action} @itemx --directories=@var{action} @opindex -d @opindex --directories @cindex directory search +@cindex symbolic links If an input file is a directory, use @var{action} to process it. By default, @var{action} is @samp{read}, which means that directories are read just as if they were ordinary files @@ -624,7 +629,8 @@ and will cause @command{grep} to print error messages for every directory or silently skip them). If @var{action} is @samp{skip}, directories are silently skipped. If @var{action} is @samp{recurse}, -@command{grep} reads all files under each directory, recursively; +@command{grep} reads all files under each directory, recursively, +following command-line symbolic links and skipping other symlinks; this is equivalent to the @option{-r} option. @item --exclude=@var{glob} @@ -663,16 +669,28 @@ Search only files whose base name matches @var{glob} (using wildcard matching as described under @option{--exclude}). @item -r -@itemx -R @itemx --recursive @opindex -r @opindex --recursive @cindex recursive search @cindex searching directory trees +@cindex symbolic links For each directory operand, read and process all files in that directory, recursively. +Follow symbolic links on the command line, but skip symlinks +that are encountered recursively. This is the same as the @samp{--directories=recurse} option. +@item -R +@itemx --dereference-recursive +@opindex -R +@opindex --dereference-recursive +@cindex recursive search +@cindex searching directory trees +@cindex symbolic links +For each directory operand, read and process all files in that +directory, recursively, following all symbolic links. + @end table @node Other Options diff --git a/lib/Makefile.am b/lib/Makefile.am index 04ae51e..527c6f5 100644 --- a/lib/Makefile.am +++ b/lib/Makefile.am @@ -32,7 +32,7 @@ INCLUDES = -I.. -I$(srcdir) AM_CFLAGS += $(GNULIB_WARN_CFLAGS) $(WERROR_CFLAGS) libgreputils_a_SOURCES += \ - colorize.c colorize.h savedir.c savedir.h + colorize.c colorize.h EXTRA_DIST += colorize-posix.c colorize-w32.c diff --git a/lib/savedir.c b/lib/savedir.c deleted file mode 100644 index 3f93c16..0000000 --- a/lib/savedir.c +++ /dev/null @@ -1,163 +0,0 @@ -/* savedir.c -- save the list of files in a directory in a string - Copyright (C) 1990, 1997-2001, 2009-2012 Free Software Foundation, Inc. - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 3, or (at your option) - any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software Foundation, - Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA. */ - -/* Written by David MacKenzie <djm@...>. */ - -#include <config.h> - -#include <sys/types.h> -#include <unistd.h> -#include <dirent.h> - -#ifdef CLOSEDIR_VOID -/* Fake a return value. */ -# define CLOSEDIR(d) (closedir (d), 0) -#else -# define CLOSEDIR(d) closedir (d) -#endif - -#include <stdlib.h> -#include <string.h> -#include <fnmatch.h> -#include "savedir.h" -#include "xalloc.h" - -static char *path; -static size_t pathlen; - -extern int isdir (const char *name); - -static int -isdir1 (const char *dir, const char *file) -{ - size_t dirlen = strlen (dir); - size_t filelen = strlen (file); - - while (dirlen && dir[dirlen - 1] == '/') - dirlen--; - - if ((dirlen + filelen + 2) > pathlen) - { - pathlen *= 2; - if ((dirlen + filelen + 2) > pathlen) - pathlen = dirlen + filelen + 2; - - path = xrealloc (path, pathlen); - } - - memcpy (path, dir, dirlen); - path[dirlen] = '/'; - strcpy (path + dirlen + 1, file); - return isdir (path); -} - -/* Return a freshly allocated string containing the filenames - in directory DIR, separated by '\0' characters; - the end is marked by two '\0' characters in a row. - NAME_SIZE is the number of bytes to initially allocate - for the string; it will be enlarged as needed. - Return NULL if DIR cannot be opened or if out of memory. */ -char * -savedir (const char *dir, off_t name_size, struct exclude *included_patterns, - struct exclude *excluded_patterns, struct exclude *excluded_directory_patterns ) -{ - DIR *dirp; - struct dirent *dp; - char *name_space; - char *namep; - - dirp = opendir (dir); - if (dirp == NULL) - return NULL; - - /* Be sure name_size is at least `1' so there's room for - the final NUL byte. */ - if (name_size <= 0) - name_size = 1; - - name_space = (char *) malloc (name_size); - if (name_space == NULL) - { - closedir (dirp); - return NULL; - } - namep = name_space; - - while ((dp = readdir (dirp)) != NULL) - { - /* Skip "." and ".." (some NFS file systems' directories lack them). */ - if (dp->d_name[0] != '.' - || (dp->d_name[1] != '\0' - && (dp->d_name[1] != '.' || dp->d_name[2] != '\0'))) - { - size_t namlen = strlen (dp->d_name); - size_t size_needed = (namep - name_space) + namlen + 2; - - if ((included_patterns || excluded_patterns) - && !isdir1 (dir, dp->d_name)) - { - if (included_patterns - && excluded_file_name (included_patterns, dp->d_name)) - continue; - if (excluded_patterns - && excluded_file_name (excluded_patterns, dp->d_name)) - continue; - } - - if ( excluded_directory_patterns - && isdir1 (dir, dp->d_name) ) - { - if (excluded_directory_patterns - && excluded_file_name (excluded_directory_patterns, dp->d_name)) - continue; - } - - if (size_needed > name_size) - { - char *new_name_space; - - while (size_needed > name_size) - name_size += 1024; - - new_name_space = realloc (name_space, name_size); - if (new_name_space == NULL) - { - closedir (dirp); - goto fail; - } - namep = new_name_space + (namep - name_space); - name_space = new_name_space; - } - strcpy (namep, dp->d_name); - namep += namlen + 1; - } - } - *namep = '\0'; - if (CLOSEDIR (dirp)) - { - fail: - free (name_space); - name_space = NULL; - } - if (path) - { - free (path); - path = NULL; - pathlen = 0; - } - return name_space; -} diff --git a/lib/savedir.h b/lib/savedir.h deleted file mode 100644 index 00cb1a9..0000000 --- a/lib/savedir.h +++ /dev/null @@ -1,11 +0,0 @@ -#if !defined SAVEDIR_H_ -# define SAVEDIR_H_ - -#include "exclude.h" - -extern char * -savedir (const char *dir, off_t name_size, - struct exclude *, struct exclude *, - struct exclude *); - -#endif diff --git a/po/POTFILES.in b/po/POTFILES.in index b33c126..65f4f57 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -20,6 +20,7 @@ lib/closeout.c lib/error.c lib/getopt.c lib/obstack.c +lib/openat-die.c lib/quotearg.c lib/regcomp.c lib/version-etc.c diff --git a/src/main.c b/src/main.c index 2f6c761..5364cd1 100644 --- a/src/main.c +++ b/src/main.c @@ -36,14 +36,14 @@ #include "error.h" #include "exclude.h" #include "exitfail.h" +#include "fcntl-safer.h" +#include "fts_.h" #include "getopt.h" #include "grep.h" #include "intprops.h" -#include "isdir.h" #include "progname.h" #include "propername.h" #include "quote.h" -#include "savedir.h" #include "version-etc.h" #include "xalloc.h" #include "xstrtol.h" @@ -63,15 +63,6 @@ avoiding a potential (racy) infinite loop. */ static struct stat out_stat; -struct stats -{ - struct stats const *parent; - struct stat stat; -}; - -/* base of chain of stat buffers, used to detect directory loops */ -static struct stats stats_base; - /* if non-zero, display usage information and exit */ static int show_help; @@ -347,7 +338,7 @@ static struct option const long_options[] = {"only-matching", no_argument, NULL, 'o'}, {"quiet", no_argument, NULL, 'q'}, {"recursive", no_argument, NULL, 'r'}, - {"recursive", no_argument, NULL, 'R'}, + {"dereference-recursive", no_argument, NULL, 'R'}, {"regexp", required_argument, NULL, 'e'}, {"invert-match", no_argument, NULL, 'v'}, {"silent", no_argument, NULL, 'q'}, @@ -369,6 +360,7 @@ unsigned char eolbyte; /* For error messages. */ /* The input file name, or (if standard input) "-" or a --label argument. */ static char const *filename; +static int filename_prefix_len; static int errseen; static int write_error_seen; @@ -392,14 +384,19 @@ ARGMATCH_VERIFY (directories_args, directories_types); static enum directories_type directories = READ_DIRECTORIES; +enum { basic_fts_options = FTS_CWDFD | FTS_NOSTAT | FTS_TIGHT_CYCLE_CHECK }; +static int fts_options = basic_fts_options | FTS_COMFOLLOW | FTS_PHYSICAL; + /* How to handle devices. */ static enum { + READ_COMMAND_LINE_DEVICES, READ_DEVICES, SKIP_DEVICES - } devices = READ_DEVICES; + } devices = READ_COMMAND_LINE_DEVICES; -static int grepdir (char const *, struct stats const *); +static int grepfile (int, char const *, int, int); +static int grepdesc (int, int); #if defined HAVE_DOS_FILE_CONTENTS static inline int undossify_input (char *, size_t); #endif @@ -473,7 +470,7 @@ static off_t after_last_match; /* Pointer after last matching line that /* Reset the buffer for a new file, returning zero if we should skip it. Initialize on the first time through. */ static int -reset (int fd, char const *file, struct stats *stats) +reset (int fd, struct stat const *st) { if (! pagesize) { @@ -488,9 +485,9 @@ reset (int fd, char const *file, struct stats *stats) bufbeg[-1] = eolbyte; bufdesc = fd; - if (S_ISREG (stats->stat.st_mode)) + if (S_ISREG (st->st_mode)) { - if (file) + if (fd != STDIN_FILENO) bufoffset = 0; else { @@ -510,7 +507,7 @@ reset (int fd, char const *file, struct stats *stats) to the beginning of the buffer contents, and 'buflim' points just after the end. Return zero if there's an error. */ static int -fillbuf (size_t save, struct stats const *stats) +fillbuf (size_t save, struct stat const *st) { size_t fillsize = 0; int cc = 1; @@ -543,9 +540,9 @@ fillbuf (size_t save, struct stats const *stats) is large. However, do not use the original file size as a heuristic if we've already read past the file end, as most likely the file is growing. */ - if (S_ISREG (stats->stat.st_mode)) + if (S_ISREG (st->st_mode)) { - off_t to_be_read = stats->stat.st_size - bufoffset; + off_t to_be_read = st->st_size - bufoffset; off_t maxsize_off = save + to_be_read; if (0 <= to_be_read && to_be_read <= maxsize_off && maxsize_off == (size_t) maxsize_off @@ -1085,7 +1082,7 @@ grepbuf (char const *beg, char const *lim) but if the file is a directory and we search it recursively, then return -2 if there was a match, and -1 otherwise. */ static intmax_t -grep (int fd, char const *file, struct stats *stats) +grep (int fd, struct stat const *st) { intmax_t nlines, i; int not_text; @@ -1095,19 +1092,9 @@ grep (int fd, char const *file, struct stats *stats) char *lim; char eol = eolbyte; - if (!reset (fd, file, stats)) + if (! reset (fd, st)) return 0; - if (file && directories == RECURSE_DIRECTORIES - && S_ISDIR (stats->stat.st_mode)) - { - /* Close fd now, so that we don't open a lot of file descriptors - when we recurse deeply. */ - if (close (fd) != 0) - suppressible_error (file, errno); - return grepdir (file, stats) - 2; - } - totalcc = 0; lastout = 0; totalnl = 0; @@ -1119,7 +1106,7 @@ grep (int fd, char const *file, struct stats *stats) residue = 0; save = 0; - if (! fillbuf (save, stats)) + if (! fillbuf (save, st)) { suppressible_error (filename, errno); return 0; @@ -1190,7 +1177,7 @@ grep (int fd, char const *file, struct stats *stats) totalcc = add_count (totalcc, buflim - bufbeg - save); if (out_line) nlscan (beg); - if (! fillbuf (save, stats)) + if (! fillbuf (save, st)) { suppressible_error (filename, errno); goto finish_grep; @@ -1214,53 +1201,183 @@ grep (int fd, char const *file, struct stats *stats) } static int -grepfile (char const *file, struct stats *stats) +grepdirent (FTS *fts, FTSENT *ent) { - int desc; - intmax_t count; - int status; + int follow, dirdesc; + int command_line = ent->fts_level == FTS_ROOTLEVEL; + struct stat *st = ent->fts_statp; - filename = (file ? file : label ? label : _("(standard input)")); + if (ent->fts_info == FTS_DP) + { + if (directories == RECURSE_DIRECTORIES && command_line) + out_file &= ~ (2 * !no_filenames); + return 1; + } - if (! file) - desc = STDIN_FILENO; - else if (devices == SKIP_DEVICES) + if ((ent->fts_info == FTS_D || ent->fts_info == FTS_DC + || ent->fts_info == FTS_DNR) + ? (directories == SKIP_DIRECTORIES + || (excluded_directory_patterns + && excluded_file_name (excluded_directory_patterns, + ent->fts_name))) + : ((included_patterns + && excluded_file_name (included_patterns, ent->fts_name)) + || (excluded_patterns + && excluded_file_name (excluded_patterns, ent->fts_name)))) { - /* Don't open yet, since that might have side effects on a device. */ - desc = -1; + fts_set (fts, ent, FTS_SKIP); + return 1; } - else + + filename = ent->fts_path + filename_prefix_len; + follow = (fts->fts_options & FTS_LOGICAL + || (fts->fts_options & FTS_COMFOLLOW && command_line)); + + switch (ent->fts_info) { - /* When skipping directories, don't worry about directories - that can't be opened. */ - desc = open (file, O_RDONLY); - if (desc < 0 && directories != SKIP_DIRECTORIES) - { - suppressible_error (file, errno); + case FTS_D: + { + int skip_this_directory = + (directories == SKIP_DIRECTORIES + || (excluded_directory_patterns + && excluded_file_name (excluded_directory_patterns, filename))); + if (skip_this_directory || directories == READ_DIRECTORIES) + fts_set (fts, ent, FTS_SKIP); + if (skip_this_directory) return 1; + if (directories == RECURSE_DIRECTORIES) + { + out_file |= 2 * !no_filenames; + return 1; + } + } + break; + + case FTS_DC: + if (!suppress_errors) + error (0, 0, _("warning: %s: %s"), filename, + _("recursive directory loop")); + return 1; + + case FTS_DNR: + case FTS_ERR: + case FTS_NS: + suppressible_error (filename, ent->fts_errno); + return 1; + + case FTS_DEFAULT: + case FTS_NSOK: + if (devices == SKIP_DEVICES + || (devices == READ_COMMAND_LINE_DEVICES && !command_line)) + { + struct stat st1; + if (! st->st_mode) + { + /* The file type is not already known. Get the file status + before opening, since opening might have side effects + on a device. */ + int flag = follow ? 0 : AT_SYMLINK_NOFOLLOW; + if (fstatat (fts->fts_cwd_fd, ent->fts_accpath, &st1, flag) != 0) + { + suppressible_error (filename, errno); + return 1; + } + st = &st1; + } + if (S_ISCHR (st->st_mode) || S_ISBLK (st->st_mode) + || S_ISSOCK (st->st_mode) || S_ISFIFO (st->st_mode)) + return 1; } + break; + + case FTS_F: + case FTS_SLNONE: + break; + + case FTS_SL: + case FTS_W: + return 1; + + default: + abort (); } - if (desc < 0 - ? stat (file, &stats->stat) != 0 - : fstat (desc, &stats->stat) != 0) + dirdesc = ((fts->fts_options & (FTS_NOCHDIR | FTS_CWDFD)) == FTS_CWDFD + ? fts->fts_cwd_fd + : AT_FDCWD); + return grepfile (dirdesc, ent->fts_accpath, follow, command_line); +} + +static int +grepfile (int dirdesc, char const *name, int follow, int command_line) +{ + int desc = openat_safer (dirdesc, name, O_RDONLY | (follow ? 0 : O_NOFOLLOW)); + if (desc < 0) { - suppressible_error (filename, errno); - if (file) - close (desc); + if (follow || errno != ELOOP) + suppressible_error (filename, errno); return 1; } + return grepdesc (desc, command_line); +} - if ((directories == SKIP_DIRECTORIES && S_ISDIR (stats->stat.st_mode)) - || (devices == SKIP_DEVICES && (S_ISCHR (stats->stat.st_mode) - || S_ISBLK (stats->stat.st_mode) - || S_ISSOCK (stats->stat.st_mode) - || S_ISFIFO (stats->stat.st_mode)))) +static int +grepdesc (int desc, int command_line) +{ + intmax_t count; + int status = 1; + struct stat st; + + /* Get the file status, possibly for the second time. This catches + a race condition if the directory entry changes after the + directory entry is read and before the file is opened. For + example, normally DESC is a directory only at the top level, but + there is an exception if some other process substitutes a + directory for a non-directory while 'grep' is running. */ + if (fstat (desc, &st) != 0) { - if (file) - close (desc); - return 1; + suppressible_error (filename, errno); + goto closeout; } + if (desc != STDIN_FILENO + && directories == RECURSE_DIRECTORIES && S_ISDIR (st.st_mode)) + { + /* Traverse the directory starting with its full name, because + unfortunately fts provides no way to traverse the directory + starting from its file descriptor. */ + + FTS *fts; + FTSENT *ent; + int opts = fts_options & ~(command_line ? 0 : FTS_COMFOLLOW); + char *fts_arg[2]; + + /* Close DESC now, to conserve file descriptors if the race + condition occurs many times in a deep recursion. */ + if (close (desc) != 0) + suppressible_error (filename, errno); + + fts_arg[0] = (char *) filename; + fts_arg[1] = NULL; + fts = fts_open (fts_arg, opts, NULL); + + if (!fts) + xalloc_die (); + while ((ent = fts_read (fts))) + status &= grepdirent (fts, ent); + if (errno) + suppressible_error (filename, errno); + if (fts_close (fts) != 0) + suppressible_error (filename, errno); + return status; + } + if ((directories == SKIP_DIRECTORIES && S_ISDIR (st.st_mode)) + || ((devices == SKIP_DEVICES + || (devices == READ_COMMAND_LINE_DEVICES && !command_line)) + && (S_ISCHR (st.st_mode) + || S_ISBLK (st.st_mode) + || S_ISSOCK (st.st_mode) + || S_ISFIFO (st.st_mode)))) + goto closeout; /* If there is a regular file on stdout and the current file refers to the same i-node, we have to report the problem and skip it. @@ -1282,24 +1399,12 @@ grepfile (char const *file, struct stats *stats) condition that could result in "alternate" output. */ if (!out_quiet && list_files == 0 && 1 < max_count && S_ISREG (out_stat.st_mode) && out_stat.st_ino - && SAME_INODE (stats->stat, out_stat)) + && SAME_INODE (st, out_stat)) { if (! suppress_errors) error (0, 0, _("input file %s is also the output"), quote (filename)); errseen = 1; - if (file) - close (desc); - return 1; - } - - if (desc < 0) - { - desc = open (file, O_RDONLY); - if (desc < 0) - { - suppressible_error (file, errno); - return 1; - } + goto closeout; } #if defined SET_BINARY @@ -1309,7 +1414,7 @@ grepfile (char const *file, struct stats *stats) SET_BINARY (desc); #endif - count = grep (desc, file, stats); + count = grep (desc, &st); if (count < 0) status = count + 2; else @@ -1334,103 +1439,35 @@ grepfile (char const *file, struct stats *stats) fputc ('\n' & filename_mask, stdout); } - if (! file) + if (desc == STDIN_FILENO) { off_t required_offset = outleft ? bufoffset : after_last_match; if (required_offset != bufoffset && lseek (desc, required_offset, SEEK_SET) < 0 - && S_ISREG (stats->stat.st_mode)) + && S_ISREG (st.st_mode)) suppressible_error (filename, errno); } - else - while (close (desc) != 0) - if (errno != EINTR) - { - suppressible_error (file, errno); - break; - } } + closeout: + if (desc != STDIN_FILENO && close (desc) != 0) + suppressible_error (filename, errno); return status; } static int -grepdir (char const *dir, struct stats const *stats) +grep_command_line_arg (char const *arg) { - char const *dir_or_dot = (dir ? dir : "."); - struct stats const *ancestor; - char *name_space; - int status = 1; - if (excluded_directory_patterns - && excluded_file_name (excluded_directory_patterns, dir)) - return 1; - - /* Mingw32 does not support st_ino. No known working hosts use zero - for st_ino, so assume that the Mingw32 bug applies if it's zero. */ - if (stats->stat.st_ino) + if (STREQ (arg, "-")) { - for (ancestor = stats; (ancestor = ancestor->parent) != 0; ) - { - if (ancestor->stat.st_ino == stats->stat.st_ino - && ancestor->stat.st_dev == stats->stat.st_dev) - { - if (!suppress_errors) - error (0, 0, _("warning: %s: %s"), dir, - _("recursive directory loop")); - errseen = 1; - return 1; - } - } - } - - name_space = savedir (dir_or_dot, stats->stat.st_size, included_patterns, - excluded_patterns, excluded_directory_patterns); - - if (! name_space) - { - if (errno) - suppressible_error (dir_or_dot, errno); - else - xalloc_die (); + filename = label ? label : _("(standard input)"); + return grepdesc (STDIN_FILENO, 1); } else { - size_t dirlen = 0; - int needs_slash = 0; - char *file_space = NULL; - char const *namep = name_space; - struct stats child; - if (dir) - { - dirlen = strlen (dir); - needs_slash = ! (dirlen == FILE_SYSTEM_PREFIX_LEN (dir) - || ISSLASH (dir[dirlen - 1])); - } - child.parent = stats; - out_file += !no_filenames; - while (*namep) - { - size_t namelen = strlen (namep); - char const *file; - if (! dir) - file = namep; - else - { - file_space = xrealloc (file_space, dirlen + 1 + namelen + 1); - strcpy (file_space, dir); - file_space[dirlen] = '/'; - strcpy (file_space + dirlen + needs_slash, namep); - file = file_space; - } - namep += namelen + 1; - status &= grepfile (file, &child); - } - out_file -= !no_filenames; - free (file_space); - free (name_space); + filename = arg; + return grepfile (AT_FDCWD, arg, 1, 1); } - - return status; } _Noreturn void usage (int); @@ -1500,7 +1537,8 @@ Output control:\n\ ACTION is `read', `recurse', or `skip'\n\ -D, --devices=ACTION how to handle devices, FIFOs and sockets;\n\ ACTION is `read' or `skip'\n\ - -R, -r, --recursive equivalent to --directories=recurse\n\ + -r, --recursive like --directories=recurse\n\ + -R, --dereference-recursive likewise, but follow all symlinks\n\ ")); printf (_("\ --include=FILE_PATTERN search only files that match FILE_PATTERN\n\ @@ -1981,6 +2019,8 @@ main (int argc, char **argv) break; case 'R': + fts_options = basic_fts_options | FTS_LOGICAL; + /* Fall through. */ case 'r': directories = RECURSE_DIRECTORIES; last_recursive = prev_optind; @@ -2177,48 +2217,24 @@ main (int argc, char **argv) if (max_count == 0) exit (EXIT_FAILURE); + if (fts_options & FTS_LOGICAL && devices == READ_COMMAND_LINE_DEVICES) + devices = READ_DEVICES; + if (optind < argc) { status = 1; do - { - char *file = argv[optind]; - if (!STREQ (file, "-") - && (included_patterns || excluded_patterns - || excluded_directory_patterns)) - { - if (isdir (file)) - { - if (excluded_directory_patterns - && excluded_file_name (excluded_directory_patterns, - file)) - continue; - } - else - { - if (included_patterns - && excluded_file_name (included_patterns, file)) - continue; - if (excluded_patterns - && excluded_file_name (excluded_patterns, file)) - continue; - } - } - status &= grepfile (STREQ (file, "-") ? (char *) NULL : file, - &stats_base); - } + status &= grep_command_line_arg (argv[optind]); while (++optind < argc); } else if (directories == RECURSE_DIRECTORIES && prepended < last_recursive) { - status = 1; - if (stat (".", &stats_base.stat) == 0) - status = grepdir (NULL, &stats_base); - else - suppressible_error (".", errno); + /* Grep through ".", omitting leading "./" from diagnostics. */ + filename_prefix_len = 2; + status = grep_command_line_arg ("."); } else - status = grepfile ((char *) NULL, &stats_base); + status = grep_command_line_arg ("-"); /* We register via atexit() to test stdout. */ exit (errseen ? EXIT_TROUBLE : status); diff --git a/tests/Makefile.am b/tests/Makefile.am index c2cd2f7..13061fe 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -82,6 +82,7 @@ TESTS = \ spencer1 \ spencer1-locale \ status \ + symlink \ turkish-I \ warn-char-classes \ word-delim-multibyte \ diff --git a/tests/symlink b/tests/symlink new file mode 100755 index 0000000..012d8f8 --- /dev/null +++ b/tests/symlink @@ -0,0 +1,65 @@ +#!/bin/sh +# Check that "grep -r" does the right thing with symbolic links. + +# Copyright (C) 2012 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# written by Paul Eggert + +. "${srcdir=.}/init.sh"; path_prepend_ ../src + +mkdir dir || framework_failure_ +echo a > dir/a || framework_failure_ +echo b > dir/b || framework_failure_ +ln -s a dir/c || framework_failure_ +ln -s . dir/d || framework_failure_ +ln -s dangling dir/e || framework_failure_ + +touch out || framework_failure_ + +for recursion in '' -r -R +do + for files in '' '*' + do + case $recursion,$files in + -R,* | *,'*') expected_status=2 ;; + *) expected_status=0 ;; + esac + + (cd dir && grep $recursion '^' $files <a ) >grepout + test $? -eq $expected_status || fail=1 + + case $recursion,$files in + ,) + exp='a\n' ;; + ,'*' | -R,) + exp='a:a\nb:b\nc:a\n' ;; + -r,) + exp='a:a\nb:b\n' ;; + -r,'*') + exp='a:a\nb:b\nc:a\nd/a:a\nd/b:b\n' ;; + -R,'*') + exp='a:a\nb:b\nc:a\nd/a:a\nd/b:b\nd/c:a\n' ;; + *) + framework_failure_ ;; + esac + + printf "$exp" >exp || framework_failure_ + LC_ALL=C sort grepout >out || fail=1 + compare exp out || fail=1 + done +done + +Exit $fail -- 1.7.6.5 |
|
|
Re: [PATCH] grep: -r no follows symlinksPaul Eggert wrote:
> Here's a proposed change to GNU grep so that 'grep -r' > no longer follows symlinks, and by default does not > read devices, which is more convenient in the typical > use case. Symlinks and devices on the command line are > still dereferenced; the change affects only symlinks and > devices encountered recursively. Users who want dereferencing > for all files can use -R, which retains its old meaning. > > I considered adding --dereference, --no-dereference, > --dereference-command-line, etc, but that was a lot of extra complexity > (particularly in the documentation) for little payoff -- > anybody who wants something that complicated can use 'find', > as the whole point of 'grep -r' is that it should be simple. > > As a side effect this patch changes grep to use fts, > which should make it more robust with large directory > hierarchies. This is why the patch subtracts more lines > than it adds. > > This assumes the latest gnulib, so the first patch > syncs to gnulib, and the second one does the real work. Thanks for working on this. Just a couple of nits, so far. I ran out of time before I got into the meat of it, not that I expect to find anything... > Subject: [PATCH 1/2] build: update gnulib submodule to latest > > --- > gnulib | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/gnulib b/gnulib > index eb21377..f9d2fe2 160000 > --- a/gnulib > +++ b/gnulib > @@ -1 +1 @@ > -Subproject commit eb213779301aa663ab84ac947e8e181e9ad554d0 > +Subproject commit f9d2fe251f3a104df656ab6ffc64821893ab9003 > -- > 1.7.6.5 > > >>From 7a97e788391b13ebf6242ff74d7a19a1944d56c1 Mon Sep 17 00:00:00 2001 > From: Paul Eggert <eggert@...> > Date: Sat, 10 Mar 2012 11:29:32 -0800 > Subject: [PATCH 2/2] grep: -r no follows symlinks > > Change -r to follow only command-line symlinks, and by default to > reads only devices named on the command line. This is a simple s/reads/read/ > way to get a more-useful behavior when searching random > directories; the idea is to use 'find' if you want something fancy. > -R acts as before and gets a new alias --dereference-recursive. > The code now uses fts internally, so it should be more robust and > faster with large hierarchies. > * .gitignore: Remove lib/savedir.c, lib/savedir.h. > * tests/symlink: New file > * Makefile.boot (LIB_OBJS_core): Remove isdir.o, savedir.o. > Perhaps other changes are needed too, but I'm not sure what > this makefile is for. > * NEWS: Document changes. > * doc/grep.texi (File and Directory Selection): Likewise. > * bootstrap.conf (gnulib_modules): Remove dirent, dirname, isdir, open. > Add fstatat, fts, openat-safer. > * lib/Makefile.am (libgreputils_a_SOURCES): Remove savedir.c, savedir.h. > * lib/savedir.c, lib/savedir.h: Remove. > * po/POTFILES.in: Add lib/openat-die.c. > * src/main.c: Include fcntl-safer.h, fts_.h. Don't include > isdir.h, savedir.h. > (struct stats, stats_base): Remove. > (long_options, usage, main): Add --dereference-recursive and > implement -r vs -R. > (filename_prefix_len, fts_options): New static vars. > (basic_fts_options, READ_COMMAND_LINE_DEVICES): New constants. > (devices): Now defaults to READ_COMMAND_LINE_DEVICES. > (reset, grep): Now takes just struct stat rather than file name and > struct stats. All callers changed. > (fillbuf): Now takes struct stat reather than struct stats. > All callers changed. > (grep): Don't worry about recursing too deeply; fts and grepdesc > handle this now. > (grepdirent, grepdesc, grep_command_line_args): New functions. > (grepfile): New args DIRDESC, FOLLOW, COMMAND_LINE. Remove struct stats > arg. All callers changed. Use openat_safer rather than open. > Use desc == STDIN_FILENO to tell whether we're reading "-". > Don't worry about EINTR when closing -- not possible, since we're > not catching signals. > * tests/Makefile.am (TESTS): Add symlink. > * tests/symlink: New file. > --- > .gitignore | 2 - > Makefile.boot | 2 - > NEWS | 11 ++ > bootstrap.conf | 7 +- > doc/grep.texi | 26 +++- > lib/Makefile.am | 2 +- > lib/savedir.c | 163 ---------------------- > lib/savedir.h | 11 -- > po/POTFILES.in | 1 + > src/main.c | 400 +++++++++++++++++++++++++++------------------------- > tests/Makefile.am | 1 + > tests/symlink | 65 +++++++++ > 12 files changed, 312 insertions(+), 379 deletions(-) > delete mode 100644 lib/savedir.c > delete mode 100644 lib/savedir.h > create mode 100755 tests/symlink > > diff --git a/.gitignore b/.gitignore > index 35f5d10..0b195d9 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -64,5 +64,3 @@ TAGS > !/lib/colorize-posix.c > !/lib/colorize-w32.c > !/lib/colorize.h > -!/lib/savedir.c > -!/lib/savedir.h > diff --git a/Makefile.boot b/Makefile.boot > index 043429b..4414110 100644 > --- a/Makefile.boot > +++ b/Makefile.boot > @@ -41,10 +41,8 @@ LIB_OBJS_core = \ > $(libdir)/error.$(OBJEXT) \ > $(libdir)/exclude.$(OBJEXT) \ > $(libdir)/hard-locale.$(OBJEXT) \ > - $(libdir)/isdir.$(OBJEXT) \ > $(libdir)/quotearg.$(OBJEXT) \ > $(libdir)/regex.$(OBJEXT) \ > - $(libdir)/savedir.$(OBJEXT) \ > $(libdir)/strtoumax.$(OBJEXT) \ > $(libdir)/xmalloc.$(OBJEXT) \ > $(libdir)/xstrtol.$(OBJEXT) \ > diff --git a/NEWS b/NEWS > index d0a63d5..3a752b1 100644 > --- a/NEWS > +++ b/NEWS > @@ -2,6 +2,17 @@ GNU grep NEWS -*- outline -*- > > * Noteworthy changes in release ?.? (????-??-??) [?] > > +** New features > + > + The -R option now has a long-option alias --dereference-recursive. > + > +** Changes in behavior > + > + The -r (--recursive) option now follows only command-line symlinks. > + Also, by default -r now reads only devices named on the command > + line; this can be overridden with --devices. -R acts as before, so > + use -R if you prefer the old behavior of following all symlinks and > + defaulting to reading all devices. > > * Noteworthy changes in release 2.11 (2012-03-02) [stable] > > diff --git a/bootstrap.conf b/bootstrap.conf > diff --git a/doc/grep.texi b/doc/grep.texi ... > diff --git a/lib/Makefile.am b/lib/Makefile.am ... > diff --git a/lib/savedir.c b/lib/savedir.c ... > diff --git a/lib/savedir.h b/lib/savedir.h ... > diff --git a/po/POTFILES.in b/po/POTFILES.in ... > diff --git a/src/main.c b/src/main.c ... > @@ -369,6 +360,7 @@ unsigned char eolbyte; > /* For error messages. */ > /* The input file name, or (if standard input) "-" or a --label argument. */ > static char const *filename; > +static int filename_prefix_len; I'd use an unsigned type, since it will never be negative. > static int errseen; > static int write_error_seen; > > @@ -392,14 +384,19 @@ ARGMATCH_VERIFY (directories_args, directories_types); > > static enum directories_type directories = READ_DIRECTORIES; > > +enum { basic_fts_options = FTS_CWDFD | FTS_NOSTAT | FTS_TIGHT_CYCLE_CHECK }; > +static int fts_options = basic_fts_options | FTS_COMFOLLOW | FTS_PHYSICAL; > + > /* How to handle devices. */ > static enum > { > + READ_COMMAND_LINE_DEVICES, > READ_DEVICES, > SKIP_DEVICES > - } devices = READ_DEVICES; > + } devices = READ_COMMAND_LINE_DEVICES; > > -static int grepdir (char const *, struct stats const *); > +static int grepfile (int, char const *, int, int); > +static int grepdesc (int, int); I see legacy types like the latter "int" that should be bool, but that can obviously wait. > #if defined HAVE_DOS_FILE_CONTENTS > static inline int undossify_input (char *, size_t); > #endif > @@ -473,7 +470,7 @@ static off_t after_last_match; /* Pointer after last matching line that > /* Reset the buffer for a new file, returning zero if we should skip it. > Initialize on the first time through. */ > static int > -reset (int fd, char const *file, struct stats *stats) > +reset (int fd, struct stat const *st) Nice simplification. I'll have to stop here for now. Haven't even tested yet. Thanks again. |
|
|
Re: [PATCH] grep: -r no follows symlinksOn 03/10/2012 02:19 PM, Jim Meyering wrote:
> I see legacy types like the latter "int" that should be bool, > but that can obviously wait. My sentiments exactly. > I'd use an unsigned type, since it will never be negative. I've changed it to 'size_t' in my private copy, but I can't resist telling you my reasoning. The value in question is always either 0 or 2. So, 'int' is wrong (because it allows bogus negative values); but 'unsigned' is equally wrong (because it allows values greater than INT_MAX, which is the same number of bogus values). If it's a question of disallowing bogus values, then (other things being equal) signed is better than unsigned, since some debugging C implementations diagnose signed overflow, whereas C implementations are not allowed to diagnose unsigned overflow. Furthermore, C compilers can sometimes generate better code for signed arithmetic than unsigned, precisely because overflow has undefined behavior so this gives them more freedom to optimize. Finally, signed arithmetic is closer to what naive programmers expect. It is true that using size_t allows carefully-written code to access arrays whose sizes are in the range PTRDIFF_MAX+1 .. SIZE_MAX bytes, and this is an advantage of using size_t. However, most code is not that carefully-written -- including most GNU code, I expect -- because pointer subtraction does not work with such arrays, and in applications this disadvantage often outweighs any advantage of supporting such large arrays. So, when either a signed or an unsigned value will work, it's typically better to use signed. |
|
|
Re: [PATCH] grep: -r no follows symlinksPaul Eggert wrote:
> Here's a proposed change to GNU grep so that 'grep -r' > no longer follows symlinks, and by default does not > read devices, which is more convenient in the typical > use case. Symlinks and devices on the command line are > still dereferenced; the change affects only symlinks and > devices encountered recursively. Users who want dereferencing > for all files can use -R, which retains its old meaning. > > I considered adding --dereference, --no-dereference, > --dereference-command-line, etc, but that was a lot of extra complexity > (particularly in the documentation) for little payoff -- > anybody who wants something that complicated can use 'find', > as the whole point of 'grep -r' is that it should be simple. > > As a side effect this patch changes grep to use fts, > which should make it more robust with large directory > hierarchies. This is why the patch subtracts more lines > than it adds. As you mentioned below, it is more efficient. For example, to search a dir-only hierarchy of depth 2040 on a tmpfs file system, I see ~20-30x speed-up (0.60 elapsed vs 0.02) The hierarchy looks like z/z/z/.../z with 2039 slashes. $ env time grep -r FFF z 0.03user 0.55system 0:00.59elapsed 98%CPU (0avgtext+0avgdata 5744maxresident)k 0inputs+0outputs (0major+1482minor)pagefaults 0swaps $ env time ~/w/co/grep/src/grep -r FFF z 0.00user 0.01system 0:00.02elapsed 91%CPU (0avgtext+0avgdata 1560maxresident)k 0inputs+0outputs (0major+421minor)pagefaults 0swaps [those times remained consistent across repeated trials] Making it slightly deeper (when the name reached length 4096/PATH_MAX), the result became too long for the old version, which failed with "File name too long". With 500,000 empty files on a tmpfs file system, created like this: mkdir j; cd j; seq 500000|xargs touch I see that env time grep -r FFF . is 4-5% faster with elapsed time going from (old-to-new) 1.22s to 1.15s. > Subject: [PATCH 1/2] build: update gnulib submodule to latest ... > Subject: [PATCH 2/2] grep: -r no follows symlinks s/no/no longer/ Please mention fts in the subject, too, since rewriting the dir-traversal code is pretty fundamental. Maybe this: grep: -r no longer follows symlinks; use fts > Change -r to follow only command-line symlinks, and by default to > reads only devices named on the command line. This is a simple > way to get a more-useful behavior when searching random > directories; the idea is to use 'find' if you want something fancy. > -R acts as before and gets a new alias --dereference-recursive. > The code now uses fts internally, so it should be more robust and s/should be/is/ ;-) > faster with large hierarchies. > * .gitignore: Remove lib/savedir.c, lib/savedir.h. > * tests/symlink: New file > * Makefile.boot (LIB_OBJS_core): Remove isdir.o, savedir.o. > Perhaps other changes are needed too, but I'm not sure what > this makefile is for. > * NEWS: Document changes. > * doc/grep.texi (File and Directory Selection): Likewise. > * bootstrap.conf (gnulib_modules): Remove dirent, dirname, isdir, open. > Add fstatat, fts, openat-safer. > * lib/Makefile.am (libgreputils_a_SOURCES): Remove savedir.c, savedir.h. > * lib/savedir.c, lib/savedir.h: Remove. > * po/POTFILES.in: Add lib/openat-die.c. > * src/main.c: Include fcntl-safer.h, fts_.h. Don't include > isdir.h, savedir.h. > (struct stats, stats_base): Remove. > (long_options, usage, main): Add --dereference-recursive and > implement -r vs -R. > (filename_prefix_len, fts_options): New static vars. > (basic_fts_options, READ_COMMAND_LINE_DEVICES): New constants. > (devices): Now defaults to READ_COMMAND_LINE_DEVICES. > (reset, grep): Now takes just struct stat rather than file name and > struct stats. All callers changed. > (fillbuf): Now takes struct stat reather than struct stats. > All callers changed. > (grep): Don't worry about recursing too deeply; fts and grepdesc > handle this now. > (grepdirent, grepdesc, grep_command_line_args): New functions. > (grepfile): New args DIRDESC, FOLLOW, COMMAND_LINE. Remove struct stats > arg. All callers changed. Use openat_safer rather than open. > Use desc == STDIN_FILENO to tell whether we're reading "-". > Don't worry about EINTR when closing -- not possible, since we're > not catching signals. > * tests/Makefile.am (TESTS): Add symlink. > * tests/symlink: New file. I should finish the review tonight. |
|
|
Re: [PATCH] grep: -r no follows symlinksThanks for the review. Here's a followup that takes your
suggestions into account. Also, the recent bugfix for -r --exclude-dir means the patch needs to be adjusted slightly. The new patch is a tad shorter, yay! As before, this patch assumes a sync to the latest gnulib. From e6cee987efe899ad1a88c9af0ffc62011b99a006 Mon Sep 17 00:00:00 2001 From: Paul Eggert <eggert@...> Date: Sat, 10 Mar 2012 11:29:32 -0800 Subject: [PATCH] grep: -r no longer follows symlinks; use fts Change -r to follow only command-line symlinks, and by default to read only devices named on the command line. This is a simple way to get a more-useful behavior when searching random directories; the idea is to use 'find' if you want something fancy. -R acts as before and gets a new alias --dereference-recursive. The code now uses fts internally, so it is more robust and faster with large hierarchies. * .gitignore: Remove lib/savedir.c, lib/savedir.h. * tests/symlink: New file * Makefile.boot (LIB_OBJS_core): Remove isdir.o, savedir.o. Perhaps other changes are needed too, but I'm not sure what this makefile is for. * NEWS: Document changes. * doc/grep.texi (File and Directory Selection): Likewise. * bootstrap.conf (gnulib_modules): Remove dirent, dirname, isdir, open. Add fstatat, fts, openat-safer. * lib/Makefile.am (libgreputils_a_SOURCES): Remove savedir.c, savedir.h. * lib/savedir.c, lib/savedir.h: Remove. * po/POTFILES.in: Add lib/openat-die.c. * src/main.c: Include fcntl-safer.h, fts_.h. Don't include isdir.h, savedir.h. (struct stats, stats_base): Remove. (long_options, usage, main): Add --dereference-recursive and implement -r vs -R. (filename_prefix_len, fts_options): New static vars. (basic_fts_options, READ_COMMAND_LINE_DEVICES): New constants. (devices): Now defaults to READ_COMMAND_LINE_DEVICES. (reset, grep): Now takes just struct stat rather than file name and struct stats. All callers changed. (fillbuf): Now takes struct stat reather than struct stats. All callers changed. (grep): Don't worry about recursing too deeply; fts and grepdesc handle this now. (grepdirent, grepdesc, grep_command_line_args): New functions. (grepfile): New args DIRDESC, FOLLOW, COMMAND_LINE. Remove struct stats arg. All callers changed. Use openat_safer rather than open. Use desc == STDIN_FILENO to tell whether we're reading "-". Don't worry about EINTR when closing -- not possible, since we're not catching signals. * tests/Makefile.am (TESTS): Add symlink. * tests/symlink: New file. --- .gitignore | 2 - Makefile.boot | 2 - NEWS | 16 +++ bootstrap.conf | 7 +- doc/grep.texi | 26 +++- lib/Makefile.am | 2 +- lib/savedir.c | 163 ---------------------- lib/savedir.h | 11 -- po/POTFILES.in | 1 + src/main.c | 390 +++++++++++++++++++++++++++-------------------------- tests/Makefile.am | 1 + tests/symlink | 65 +++++++++ 12 files changed, 308 insertions(+), 378 deletions(-) delete mode 100644 lib/savedir.c delete mode 100644 lib/savedir.h create mode 100755 tests/symlink diff --git a/.gitignore b/.gitignore index 35f5d10..0b195d9 100644 --- a/.gitignore +++ b/.gitignore @@ -64,5 +64,3 @@ TAGS !/lib/colorize-posix.c !/lib/colorize-w32.c !/lib/colorize.h -!/lib/savedir.c -!/lib/savedir.h diff --git a/Makefile.boot b/Makefile.boot index 043429b..4414110 100644 --- a/Makefile.boot +++ b/Makefile.boot @@ -41,10 +41,8 @@ LIB_OBJS_core = \ $(libdir)/error.$(OBJEXT) \ $(libdir)/exclude.$(OBJEXT) \ $(libdir)/hard-locale.$(OBJEXT) \ - $(libdir)/isdir.$(OBJEXT) \ $(libdir)/quotearg.$(OBJEXT) \ $(libdir)/regex.$(OBJEXT) \ - $(libdir)/savedir.$(OBJEXT) \ $(libdir)/strtoumax.$(OBJEXT) \ $(libdir)/xmalloc.$(OBJEXT) \ $(libdir)/xstrtol.$(OBJEXT) \ diff --git a/NEWS b/NEWS index d4d70f5..6e413d8 100644 --- a/NEWS +++ b/NEWS @@ -7,6 +7,22 @@ GNU grep NEWS -*- outline -*- grep no longer segfaults with -r --exclude-dir and no file operand. I.e., ":|grep -r --exclude-dir=D PAT" would segfault. + Recursive grep now uses fts for directory traversal, so it can + handle much-larger directories without reporting things like "File + name too long", and it can run much faster when dealing with large + directory hierarchies. + +** New features + + The -R option now has a long-option alias --dereference-recursive. + +** Changes in behavior + + The -r (--recursive) option now follows only command-line symlinks. + Also, by default -r now reads only devices named on the command + line; this can be overridden with --devices. -R acts as before, so + use -R if you prefer the old behavior of following all symlinks and + defaulting to reading all devices. * Noteworthy changes in release 2.11 (2012-03-02) [stable] diff --git a/bootstrap.conf b/bootstrap.conf index 45bb33d..7a71015 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -24,13 +24,13 @@ binary-io btowc c-ctype closeout -dirent -dirname do-release-commit-and-tag error exclude fcntl-h fnmatch +fstatat +fts getopt-gnu getpagesize gettext-h @@ -44,7 +44,6 @@ intprops inttypes isatty isblank -isdir iswctype largefile lseek @@ -58,7 +57,7 @@ memchr mempcpy minmax obstack -open +openat-safer progname propername quote diff --git a/doc/grep.texi b/doc/grep.texi index c014d8f..1840e21 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -606,16 +606,21 @@ if the terminal driver interprets some of it as commands. @opindex --devices @cindex device search If an input file is a device, FIFO, or socket, use @var{action} to process it. -By default, @var{action} is @samp{read}, -which means that devices are read just as if they were ordinary files. +If @var{action} is @samp{read}, +all devices are read just as if they were ordinary files. If @var{action} is @samp{skip}, devices, FIFOs, and sockets are silently skipped. +By default, devices are read if they are on the command line or if the +@option{-R} (@option{--dereference-recursive}) option is used, and are +skipped if they are encountered recursively and the @option{-r} +(@option{--recursive}) option is used. @item -d @var{action} @itemx --directories=@var{action} @opindex -d @opindex --directories @cindex directory search +@cindex symbolic links If an input file is a directory, use @var{action} to process it. By default, @var{action} is @samp{read}, which means that directories are read just as if they were ordinary files @@ -624,7 +629,8 @@ and will cause @command{grep} to print error messages for every directory or silently skip them). If @var{action} is @samp{skip}, directories are silently skipped. If @var{action} is @samp{recurse}, -@command{grep} reads all files under each directory, recursively; +@command{grep} reads all files under each directory, recursively, +following command-line symbolic links and skipping other symlinks; this is equivalent to the @option{-r} option. @item --exclude=@var{glob} @@ -663,16 +669,28 @@ Search only files whose base name matches @var{glob} (using wildcard matching as described under @option{--exclude}). @item -r -@itemx -R @itemx --recursive @opindex -r @opindex --recursive @cindex recursive search @cindex searching directory trees +@cindex symbolic links For each directory operand, read and process all files in that directory, recursively. +Follow symbolic links on the command line, but skip symlinks +that are encountered recursively. This is the same as the @samp{--directories=recurse} option. +@item -R +@itemx --dereference-recursive +@opindex -R +@opindex --dereference-recursive +@cindex recursive search +@cindex searching directory trees +@cindex symbolic links +For each directory operand, read and process all files in that +directory, recursively, following all symbolic links. + @end table @node Other Options diff --git a/lib/Makefile.am b/lib/Makefile.am index 04ae51e..527c6f5 100644 --- a/lib/Makefile.am +++ b/lib/Makefile.am @@ -32,7 +32,7 @@ INCLUDES = -I.. -I$(srcdir) AM_CFLAGS += $(GNULIB_WARN_CFLAGS) $(WERROR_CFLAGS) libgreputils_a_SOURCES += \ - colorize.c colorize.h savedir.c savedir.h + colorize.c colorize.h EXTRA_DIST += colorize-posix.c colorize-w32.c diff --git a/lib/savedir.c b/lib/savedir.c deleted file mode 100644 index 3f93c16..0000000 --- a/lib/savedir.c +++ /dev/null @@ -1,163 +0,0 @@ -/* savedir.c -- save the list of files in a directory in a string - Copyright (C) 1990, 1997-2001, 2009-2012 Free Software Foundation, Inc. - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 3, or (at your option) - any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software Foundation, - Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA. */ - -/* Written by David MacKenzie <djm@...>. */ - -#include <config.h> - -#include <sys/types.h> -#include <unistd.h> -#include <dirent.h> - -#ifdef CLOSEDIR_VOID -/* Fake a return value. */ -# define CLOSEDIR(d) (closedir (d), 0) -#else -# define CLOSEDIR(d) closedir (d) -#endif - -#include <stdlib.h> -#include <string.h> -#include <fnmatch.h> -#include "savedir.h" -#include "xalloc.h" - -static char *path; -static size_t pathlen; - -extern int isdir (const char *name); - -static int -isdir1 (const char *dir, const char *file) -{ - size_t dirlen = strlen (dir); - size_t filelen = strlen (file); - - while (dirlen && dir[dirlen - 1] == '/') - dirlen--; - - if ((dirlen + filelen + 2) > pathlen) - { - pathlen *= 2; - if ((dirlen + filelen + 2) > pathlen) - pathlen = dirlen + filelen + 2; - - path = xrealloc (path, pathlen); - } - - memcpy (path, dir, dirlen); - path[dirlen] = '/'; - strcpy (path + dirlen + 1, file); - return isdir (path); -} - -/* Return a freshly allocated string containing the filenames - in directory DIR, separated by '\0' characters; - the end is marked by two '\0' characters in a row. - NAME_SIZE is the number of bytes to initially allocate - for the string; it will be enlarged as needed. - Return NULL if DIR cannot be opened or if out of memory. */ -char * -savedir (const char *dir, off_t name_size, struct exclude *included_patterns, - struct exclude *excluded_patterns, struct exclude *excluded_directory_patterns ) -{ - DIR *dirp; - struct dirent *dp; - char *name_space; - char *namep; - - dirp = opendir (dir); - if (dirp == NULL) - return NULL; - - /* Be sure name_size is at least `1' so there's room for - the final NUL byte. */ - if (name_size <= 0) - name_size = 1; - - name_space = (char *) malloc (name_size); - if (name_space == NULL) - { - closedir (dirp); - return NULL; - } - namep = name_space; - - while ((dp = readdir (dirp)) != NULL) - { - /* Skip "." and ".." (some NFS file systems' directories lack them). */ - if (dp->d_name[0] != '.' - || (dp->d_name[1] != '\0' - && (dp->d_name[1] != '.' || dp->d_name[2] != '\0'))) - { - size_t namlen = strlen (dp->d_name); - size_t size_needed = (namep - name_space) + namlen + 2; - - if ((included_patterns || excluded_patterns) - && !isdir1 (dir, dp->d_name)) - { - if (included_patterns - && excluded_file_name (included_patterns, dp->d_name)) - continue; - if (excluded_patterns - && excluded_file_name (excluded_patterns, dp->d_name)) - continue; - } - - if ( excluded_directory_patterns - && isdir1 (dir, dp->d_name) ) - { - if (excluded_directory_patterns - && excluded_file_name (excluded_directory_patterns, dp->d_name)) - continue; - } - - if (size_needed > name_size) - { - char *new_name_space; - - while (size_needed > name_size) - name_size += 1024; - - new_name_space = realloc (name_space, name_size); - if (new_name_space == NULL) - { - closedir (dirp); - goto fail; - } - namep = new_name_space + (namep - name_space); - name_space = new_name_space; - } - strcpy (namep, dp->d_name); - namep += namlen + 1; - } - } - *namep = '\0'; - if (CLOSEDIR (dirp)) - { - fail: - free (name_space); - name_space = NULL; - } - if (path) - { - free (path); - path = NULL; - pathlen = 0; - } - return name_space; -} diff --git a/lib/savedir.h b/lib/savedir.h deleted file mode 100644 index 00cb1a9..0000000 --- a/lib/savedir.h +++ /dev/null @@ -1,11 +0,0 @@ -#if !defined SAVEDIR_H_ -# define SAVEDIR_H_ - -#include "exclude.h" - -extern char * -savedir (const char *dir, off_t name_size, - struct exclude *, struct exclude *, - struct exclude *); - -#endif diff --git a/po/POTFILES.in b/po/POTFILES.in index b33c126..65f4f57 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -20,6 +20,7 @@ lib/closeout.c lib/error.c lib/getopt.c lib/obstack.c +lib/openat-die.c lib/quotearg.c lib/regcomp.c lib/version-etc.c diff --git a/src/main.c b/src/main.c index f4f1235..5bd150d 100644 --- a/src/main.c +++ b/src/main.c @@ -36,14 +36,14 @@ #include "error.h" #include "exclude.h" #include "exitfail.h" +#include "fcntl-safer.h" +#include "fts_.h" #include "getopt.h" #include "grep.h" #include "intprops.h" -#include "isdir.h" #include "progname.h" #include "propername.h" #include "quote.h" -#include "savedir.h" #include "version-etc.h" #include "xalloc.h" #include "xstrtol.h" @@ -63,15 +63,6 @@ avoiding a potential (racy) infinite loop. */ static struct stat out_stat; -struct stats -{ - struct stats const *parent; - struct stat stat; -}; - -/* base of chain of stat buffers, used to detect directory loops */ -static struct stats stats_base; - /* if non-zero, display usage information and exit */ static int show_help; @@ -347,7 +338,7 @@ static struct option const long_options[] = {"only-matching", no_argument, NULL, 'o'}, {"quiet", no_argument, NULL, 'q'}, {"recursive", no_argument, NULL, 'r'}, - {"recursive", no_argument, NULL, 'R'}, + {"dereference-recursive", no_argument, NULL, 'R'}, {"regexp", required_argument, NULL, 'e'}, {"invert-match", no_argument, NULL, 'v'}, {"silent", no_argument, NULL, 'q'}, @@ -369,6 +360,7 @@ unsigned char eolbyte; /* For error messages. */ /* The input file name, or (if standard input) "-" or a --label argument. */ static char const *filename; +static size_t filename_prefix_len; static int errseen; static int write_error_seen; @@ -392,14 +384,19 @@ ARGMATCH_VERIFY (directories_args, directories_types); static enum directories_type directories = READ_DIRECTORIES; +enum { basic_fts_options = FTS_CWDFD | FTS_NOSTAT | FTS_TIGHT_CYCLE_CHECK }; +static int fts_options = basic_fts_options | FTS_COMFOLLOW | FTS_PHYSICAL; + /* How to handle devices. */ static enum { + READ_COMMAND_LINE_DEVICES, READ_DEVICES, SKIP_DEVICES - } devices = READ_DEVICES; + } devices = READ_COMMAND_LINE_DEVICES; -static int grepdir (char const *, struct stats const *); +static int grepfile (int, char const *, int, int); +static int grepdesc (int, int); #if defined HAVE_DOS_FILE_CONTENTS static inline int undossify_input (char *, size_t); #endif @@ -473,7 +470,7 @@ static off_t after_last_match; /* Pointer after last matching line that /* Reset the buffer for a new file, returning zero if we should skip it. Initialize on the first time through. */ static int -reset (int fd, char const *file, struct stats *stats) +reset (int fd, struct stat const *st) { if (! pagesize) { @@ -488,9 +485,9 @@ reset (int fd, char const *file, struct stats *stats) bufbeg[-1] = eolbyte; bufdesc = fd; - if (S_ISREG (stats->stat.st_mode)) + if (S_ISREG (st->st_mode)) { - if (file) + if (fd != STDIN_FILENO) bufoffset = 0; else { @@ -510,7 +507,7 @@ reset (int fd, char const *file, struct stats *stats) to the beginning of the buffer contents, and 'buflim' points just after the end. Return zero if there's an error. */ static int -fillbuf (size_t save, struct stats const *stats) +fillbuf (size_t save, struct stat const *st) { size_t fillsize = 0; int cc = 1; @@ -543,9 +540,9 @@ fillbuf (size_t save, struct stats const *stats) is large. However, do not use the original file size as a heuristic if we've already read past the file end, as most likely the file is growing. */ - if (S_ISREG (stats->stat.st_mode)) + if (S_ISREG (st->st_mode)) { - off_t to_be_read = stats->stat.st_size - bufoffset; + off_t to_be_read = st->st_size - bufoffset; off_t maxsize_off = save + to_be_read; if (0 <= to_be_read && to_be_read <= maxsize_off && maxsize_off == (size_t) maxsize_off @@ -1085,7 +1082,7 @@ grepbuf (char const *beg, char const *lim) but if the file is a directory and we search it recursively, then return -2 if there was a match, and -1 otherwise. */ static intmax_t -grep (int fd, char const *file, struct stats *stats) +grep (int fd, struct stat const *st) { intmax_t nlines, i; int not_text; @@ -1095,19 +1092,9 @@ grep (int fd, char const *file, struct stats *stats) char *lim; char eol = eolbyte; - if (!reset (fd, file, stats)) + if (! reset (fd, st)) return 0; - if (file && directories == RECURSE_DIRECTORIES - && S_ISDIR (stats->stat.st_mode)) - { - /* Close fd now, so that we don't open a lot of file descriptors - when we recurse deeply. */ - if (close (fd) != 0) - suppressible_error (file, errno); - return grepdir (file, stats) - 2; - } - totalcc = 0; lastout = 0; totalnl = 0; @@ -1119,7 +1106,7 @@ grep (int fd, char const *file, struct stats *stats) residue = 0; save = 0; - if (! fillbuf (save, stats)) + if (! fillbuf (save, st)) { suppressible_error (filename, errno); return 0; @@ -1190,7 +1177,7 @@ grep (int fd, char const *file, struct stats *stats) totalcc = add_count (totalcc, buflim - bufbeg - save); if (out_line) nlscan (beg); - if (! fillbuf (save, stats)) + if (! fillbuf (save, st)) { suppressible_error (filename, errno); goto finish_grep; @@ -1214,53 +1201,175 @@ grep (int fd, char const *file, struct stats *stats) } static int -grepfile (char const *file, struct stats *stats) +grepdirent (FTS *fts, FTSENT *ent) { - int desc; - intmax_t count; - int status; + int follow, dirdesc; + int command_line = ent->fts_level == FTS_ROOTLEVEL; + struct stat *st = ent->fts_statp; - filename = (file ? file : label ? label : _("(standard input)")); + if (ent->fts_info == FTS_DP) + { + if (directories == RECURSE_DIRECTORIES && command_line) + out_file &= ~ (2 * !no_filenames); + return 1; + } - if (! file) - desc = STDIN_FILENO; - else if (devices == SKIP_DEVICES) + if ((ent->fts_info == FTS_D || ent->fts_info == FTS_DC + || ent->fts_info == FTS_DNR) + ? (directories == SKIP_DIRECTORIES + || (! (command_line && filename_prefix_len != 0) + && excluded_directory_patterns + && excluded_file_name (excluded_directory_patterns, + ent->fts_name))) + : ((included_patterns + && excluded_file_name (included_patterns, ent->fts_name)) + || (excluded_patterns + && excluded_file_name (excluded_patterns, ent->fts_name)))) { - /* Don't open yet, since that might have side effects on a device. */ - desc = -1; + fts_set (fts, ent, FTS_SKIP); + return 1; } - else + + filename = ent->fts_path + filename_prefix_len; + follow = (fts->fts_options & FTS_LOGICAL + || (fts->fts_options & FTS_COMFOLLOW && command_line)); + + switch (ent->fts_info) { - /* When skipping directories, don't worry about directories - that can't be opened. */ - desc = open (file, O_RDONLY); - if (desc < 0 && directories != SKIP_DIRECTORIES) + case FTS_D: + if (directories == RECURSE_DIRECTORIES) { - suppressible_error (file, errno); + out_file |= 2 * !no_filenames; return 1; } + fts_set (fts, ent, FTS_SKIP); + break; + + case FTS_DC: + if (!suppress_errors) + error (0, 0, _("warning: %s: %s"), filename, + _("recursive directory loop")); + return 1; + + case FTS_DNR: + case FTS_ERR: + case FTS_NS: + suppressible_error (filename, ent->fts_errno); + return 1; + + case FTS_DEFAULT: + case FTS_NSOK: + if (devices == SKIP_DEVICES + || (devices == READ_COMMAND_LINE_DEVICES && !command_line)) + { + struct stat st1; + if (! st->st_mode) + { + /* The file type is not already known. Get the file status + before opening, since opening might have side effects + on a device. */ + int flag = follow ? 0 : AT_SYMLINK_NOFOLLOW; + if (fstatat (fts->fts_cwd_fd, ent->fts_accpath, &st1, flag) != 0) + { + suppressible_error (filename, errno); + return 1; + } + st = &st1; + } + if (S_ISCHR (st->st_mode) || S_ISBLK (st->st_mode) + || S_ISSOCK (st->st_mode) || S_ISFIFO (st->st_mode)) + return 1; + } + break; + + case FTS_F: + case FTS_SLNONE: + break; + + case FTS_SL: + case FTS_W: + return 1; + + default: + abort (); } - if (desc < 0 - ? stat (file, &stats->stat) != 0 - : fstat (desc, &stats->stat) != 0) + dirdesc = ((fts->fts_options & (FTS_NOCHDIR | FTS_CWDFD)) == FTS_CWDFD + ? fts->fts_cwd_fd + : AT_FDCWD); + return grepfile (dirdesc, ent->fts_accpath, follow, command_line); +} + +static int +grepfile (int dirdesc, char const *name, int follow, int command_line) +{ + int desc = openat_safer (dirdesc, name, O_RDONLY | (follow ? 0 : O_NOFOLLOW)); + if (desc < 0) { - suppressible_error (filename, errno); - if (file) - close (desc); + if (follow || errno != ELOOP) + suppressible_error (filename, errno); return 1; } + return grepdesc (desc, command_line); +} - if ((directories == SKIP_DIRECTORIES && S_ISDIR (stats->stat.st_mode)) - || (devices == SKIP_DEVICES && (S_ISCHR (stats->stat.st_mode) - || S_ISBLK (stats->stat.st_mode) - || S_ISSOCK (stats->stat.st_mode) - || S_ISFIFO (stats->stat.st_mode)))) +static int +grepdesc (int desc, int command_line) +{ + intmax_t count; + int status = 1; + struct stat st; + + /* Get the file status, possibly for the second time. This catches + a race condition if the directory entry changes after the + directory entry is read and before the file is opened. For + example, normally DESC is a directory only at the top level, but + there is an exception if some other process substitutes a + directory for a non-directory while 'grep' is running. */ + if (fstat (desc, &st) != 0) { - if (file) - close (desc); - return 1; + suppressible_error (filename, errno); + goto closeout; + } + if (desc != STDIN_FILENO + && directories == RECURSE_DIRECTORIES && S_ISDIR (st.st_mode)) + { + /* Traverse the directory starting with its full name, because + unfortunately fts provides no way to traverse the directory + starting from its file descriptor. */ + + FTS *fts; + FTSENT *ent; + int opts = fts_options & ~(command_line ? 0 : FTS_COMFOLLOW); + char *fts_arg[2]; + + /* Close DESC now, to conserve file descriptors if the race + condition occurs many times in a deep recursion. */ + if (close (desc) != 0) + suppressible_error (filename, errno); + + fts_arg[0] = (char *) filename; + fts_arg[1] = NULL; + fts = fts_open (fts_arg, opts, NULL); + + if (!fts) + xalloc_die (); + while ((ent = fts_read (fts))) + status &= grepdirent (fts, ent); + if (errno) + suppressible_error (filename, errno); + if (fts_close (fts) != 0) + suppressible_error (filename, errno); + return status; } + if ((directories == SKIP_DIRECTORIES && S_ISDIR (st.st_mode)) + || ((devices == SKIP_DEVICES + || (devices == READ_COMMAND_LINE_DEVICES && !command_line)) + && (S_ISCHR (st.st_mode) + || S_ISBLK (st.st_mode) + || S_ISSOCK (st.st_mode) + || S_ISFIFO (st.st_mode)))) + goto closeout; /* If there is a regular file on stdout and the current file refers to the same i-node, we have to report the problem and skip it. @@ -1282,24 +1391,12 @@ grepfile (char const *file, struct stats *stats) condition that could result in "alternate" output. */ if (!out_quiet && list_files == 0 && 1 < max_count && S_ISREG (out_stat.st_mode) && out_stat.st_ino - && SAME_INODE (stats->stat, out_stat)) + && SAME_INODE (st, out_stat)) { if (! suppress_errors) error (0, 0, _("input file %s is also the output"), quote (filename)); errseen = 1; - if (file) - close (desc); - return 1; - } - - if (desc < 0) - { - desc = open (file, O_RDONLY); - if (desc < 0) - { - suppressible_error (file, errno); - return 1; - } + goto closeout; } #if defined SET_BINARY @@ -1309,7 +1406,7 @@ grepfile (char const *file, struct stats *stats) SET_BINARY (desc); #endif - count = grep (desc, file, stats); + count = grep (desc, &st); if (count < 0) status = count + 2; else @@ -1334,103 +1431,35 @@ grepfile (char const *file, struct stats *stats) fputc ('\n' & filename_mask, stdout); } - if (! file) + if (desc == STDIN_FILENO) { off_t required_offset = outleft ? bufoffset : after_last_match; if (required_offset != bufoffset && lseek (desc, required_offset, SEEK_SET) < 0 - && S_ISREG (stats->stat.st_mode)) + && S_ISREG (st.st_mode)) suppressible_error (filename, errno); } - else - while (close (desc) != 0) - if (errno != EINTR) - { - suppressible_error (file, errno); - break; - } } + closeout: + if (desc != STDIN_FILENO && close (desc) != 0) + suppressible_error (filename, errno); return status; } static int -grepdir (char const *dir, struct stats const *stats) +grep_command_line_arg (char const *arg) { - char const *dir_or_dot = (dir ? dir : "."); - struct stats const *ancestor; - char *name_space; - int status = 1; - if (dir && excluded_directory_patterns - && excluded_file_name (excluded_directory_patterns, dir)) - return 1; - - /* Mingw32 does not support st_ino. No known working hosts use zero - for st_ino, so assume that the Mingw32 bug applies if it's zero. */ - if (stats->stat.st_ino) + if (STREQ (arg, "-")) { - for (ancestor = stats; (ancestor = ancestor->parent) != 0; ) - { - if (ancestor->stat.st_ino == stats->stat.st_ino - && ancestor->stat.st_dev == stats->stat.st_dev) - { - if (!suppress_errors) - error (0, 0, _("warning: %s: %s"), dir, - _("recursive directory loop")); - errseen = 1; - return 1; - } - } - } - - name_space = savedir (dir_or_dot, stats->stat.st_size, included_patterns, - excluded_patterns, excluded_directory_patterns); - - if (! name_space) - { - if (errno) - suppressible_error (dir_or_dot, errno); - else - xalloc_die (); + filename = label ? label : _("(standard input)"); + return grepdesc (STDIN_FILENO, 1); } else { - size_t dirlen = 0; - int needs_slash = 0; - char *file_space = NULL; - char const *namep = name_space; - struct stats child; - if (dir) - { - dirlen = strlen (dir); - needs_slash = ! (dirlen == FILE_SYSTEM_PREFIX_LEN (dir) - || ISSLASH (dir[dirlen - 1])); - } - child.parent = stats; - out_file += !no_filenames; - while (*namep) - { - size_t namelen = strlen (namep); - char const *file; - if (! dir) - file = namep; - else - { - file_space = xrealloc (file_space, dirlen + 1 + namelen + 1); - strcpy (file_space, dir); - file_space[dirlen] = '/'; - strcpy (file_space + dirlen + needs_slash, namep); - file = file_space; - } - namep += namelen + 1; - status &= grepfile (file, &child); - } - out_file -= !no_filenames; - free (file_space); - free (name_space); + filename = arg; + return grepfile (AT_FDCWD, arg, 1, 1); } - - return status; } _Noreturn void usage (int); @@ -1500,7 +1529,8 @@ Output control:\n\ ACTION is `read', `recurse', or `skip'\n\ -D, --devices=ACTION how to handle devices, FIFOs and sockets;\n\ ACTION is `read' or `skip'\n\ - -R, -r, --recursive equivalent to --directories=recurse\n\ + -r, --recursive like --directories=recurse\n\ + -R, --dereference-recursive likewise, but follow all symlinks\n\ ")); printf (_("\ --include=FILE_PATTERN search only files that match FILE_PATTERN\n\ @@ -1981,6 +2011,8 @@ main (int argc, char **argv) break; case 'R': + fts_options = basic_fts_options | FTS_LOGICAL; + /* Fall through. */ case 'r': directories = RECURSE_DIRECTORIES; last_recursive = prev_optind; @@ -2177,48 +2209,24 @@ main (int argc, char **argv) if (max_count == 0) exit (EXIT_FAILURE); + if (fts_options & FTS_LOGICAL && devices == READ_COMMAND_LINE_DEVICES) + devices = READ_DEVICES; + if (optind < argc) { status = 1; do - { - char *file = argv[optind]; - if (!STREQ (file, "-") - && (included_patterns || excluded_patterns - || excluded_directory_patterns)) - { - if (isdir (file)) - { - if (excluded_directory_patterns - && excluded_file_name (excluded_directory_patterns, - file)) - continue; - } - else - { - if (included_patterns - && excluded_file_name (included_patterns, file)) - continue; - if (excluded_patterns - && excluded_file_name (excluded_patterns, file)) - continue; - } - } - status &= grepfile (STREQ (file, "-") ? (char *) NULL : file, - &stats_base); - } + status &= grep_command_line_arg (argv[optind]); while (++optind < argc); } else if (directories == RECURSE_DIRECTORIES && prepended < last_recursive) { - status = 1; - if (stat (".", &stats_base.stat) == 0) - status = grepdir (NULL, &stats_base); - else - suppressible_error (".", errno); + /* Grep through ".", omitting leading "./" from diagnostics. */ + filename_prefix_len = 2; + status = grep_command_line_arg ("."); } else - status = grepfile ((char *) NULL, &stats_base); + status = grep_command_line_arg ("-"); /* We register via atexit() to test stdout. */ exit (errseen ? EXIT_TROUBLE : status); diff --git a/tests/Makefile.am b/tests/Makefile.am index c2cd2f7..13061fe 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -82,6 +82,7 @@ TESTS = \ spencer1 \ spencer1-locale \ status \ + symlink \ turkish-I \ warn-char-classes \ word-delim-multibyte \ diff --git a/tests/symlink b/tests/symlink new file mode 100755 index 0000000..012d8f8 --- /dev/null +++ b/tests/symlink @@ -0,0 +1,65 @@ +#!/bin/sh +# Check that "grep -r" does the right thing with symbolic links. + +# Copyright (C) 2012 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# written by Paul Eggert + +. "${srcdir=.}/init.sh"; path_prepend_ ../src + +mkdir dir || framework_failure_ +echo a > dir/a || framework_failure_ +echo b > dir/b || framework_failure_ +ln -s a dir/c || framework_failure_ +ln -s . dir/d || framework_failure_ +ln -s dangling dir/e || framework_failure_ + +touch out || framework_failure_ + +for recursion in '' -r -R +do + for files in '' '*' + do + case $recursion,$files in + -R,* | *,'*') expected_status=2 ;; + *) expected_status=0 ;; + esac + + (cd dir && grep $recursion '^' $files <a ) >grepout + test $? -eq $expected_status || fail=1 + + case $recursion,$files in + ,) + exp='a\n' ;; + ,'*' | -R,) + exp='a:a\nb:b\nc:a\n' ;; + -r,) + exp='a:a\nb:b\n' ;; + -r,'*') + exp='a:a\nb:b\nc:a\nd/a:a\nd/b:b\n' ;; + -R,'*') + exp='a:a\nb:b\nc:a\nd/a:a\nd/b:b\nd/c:a\n' ;; + *) + framework_failure_ ;; + esac + + printf "$exp" >exp || framework_failure_ + LC_ALL=C sort grepout >out || fail=1 + compare exp out || fail=1 + done +done + +Exit $fail -- 1.7.6.5 |
|
|
Re: [PATCH] grep: -r no follows symlinksPaul Eggert wrote:
> Here's a proposed change to GNU grep so that 'grep -r' > no longer follows symlinks, and by default does not > read devices, which is more convenient in the typical > use case. Symlinks and devices on the command line are > still dereferenced; the change affects only symlinks and > devices encountered recursively. Users who want dereferencing > for all files can use -R, which retains its old meaning. ... > diff --git a/NEWS b/NEWS ... > +** Changes in behavior > + > + The -r (--recursive) option now follows only command-line symlinks. > + Also, by default -r now reads only devices named on the command That could be misconstrued ;-) How about this instead: Also, by default -r now reads a device only if it is named on the command Something came up, so this is all for now. |
|
|
Re: [PATCH] grep: -r no follows symlinksPaul Eggert wrote:
> Thanks for the review. Here's a followup that takes your > suggestions into account. All of the rest looks fine, too. Only one suggestion: > Subject: [PATCH] grep: -r no longer follows symlinks; use fts ... > + if (S_ISCHR (st->st_mode) || S_ISBLK (st->st_mode) > + || S_ISSOCK (st->st_mode) || S_ISFIFO (st->st_mode)) > + return 1; ... > + if ((directories == SKIP_DIRECTORIES && S_ISDIR (st.st_mode)) > + || ((devices == SKIP_DEVICES > + || (devices == READ_COMMAND_LINE_DEVICES && !command_line)) > + && (S_ISCHR (st.st_mode) > + || S_ISBLK (st.st_mode) > + || S_ISSOCK (st.st_mode) > + || S_ISFIFO (st.st_mode)))) > + goto closeout; With the same 4-term disjunct in two places, we might as well give it a name and factor it out. Thanks for doing all that work. Do you know anyone who might be prepared to convert copy.c or ls.c in a similar manner? Those are the only remaining dir-traversing coreutils programs that fail due to names longer than PATH_MAX. |
|
|
Re: [PATCH] grep: -r no follows symlinksOn 03/14/2012 12:39 PM, Jim Meyering wrote:
> With the same 4-term disjunct in two places, > we might as well give it a name and factor it out. OK, thanks, I made that change and pushed it. > Do you know anyone who might be prepared to convert copy.c or ls.c > in a similar manner? I can ask around.... Maybe I can talk a student or two into doing it. |
| Free embeddable forum powered by Nabble | Forum Help |