SDIR(1) SDIR(1) NAME sdir - scan the directory hierarchy and execute any command. SYNOPSIS sdir [OPTIONS]... any_command any_options_passed_to_this_command DESCRIPTION The command sdir scans directories starting from the current direc- tory, and passing through all directory tree originated from the cur- rent directory (visiting all its subdirectories and for each of them visiting its subdirectories etc.), possibly (if requested) prints their names and possibly (if requested) executes any command in each (sub)directory. Sdir changes the current directory to each of the passed subdirecto- ries and runs the target command once per directory. Sdir does not pass to the target command any additional arguments except any_options_passed_to_this_command. The program supports various ways of calling executed "target" com- mand, which allows to adapt the output for the needs of the user. Options of sdir are one-character words started from "-". As usually for programs like "ls" or "tar", many options can be combined in one word. Sdir expects that the target program is the first argument that is not started from "-". All the farther arguments are assumed to be arguments of the target program. In order to form the target program and its arguments correctly, the user has to take into account the UNIX basic features of name expan- sion by shells and some other features. To assure correct name expan- sion the user should frequently begin the target command by the call of a shell and put arguments in "" to gather different arguments in one word or to avoid premature name expansion. In the case of proba- ble errors the program prints warnings. The typical target command for sdir is grep. The use of ls and du may also be convenient in some instances. All these commands themselves as well as in combination with sdir and find execute slightly differ- ent actions and give some alternative forms of output. If one needs some total estimations such as the frequently needed the number of lines of source code residing in a directory tree, one can also apply sdirstat. This "sister" program is made by similar technology, but it does not call any other commands. Instead it scans and analyses the directory system itself and prints some useful summaries at the end. The lines in source files in UNIX can also be counted by wc command, which can be used separately as well as together with sdir. Unfortu- nately, in the both ways it have rather limited possibilities. Sdir- stat solves the "lines-of-code"-related tasks much more consistently and conveniently. Internally sdir runs the target command through system(3) or by fork(3) and execvp(3) possibly with passing the standard output of the target program trough pipe, which allows to recognize the directories, in which the target program does not issue any meaningfull output and to avoid printing the directory names for them. This allows to make the output the most meaningful. OPTIONS -v Versouse, report scanned directory names and something else. -w No warnings and auxiliary information, which can interfere if this program is run from scripts. -e Outputs error stream of the target command. Otherwise (by default) it is dumped to /dev/null, which is convenient for many comands in other to get rid of senseless statements like "grep: No match." (which can also be issued at file name expansion by shell). -h Prints help and quit. -c Prints copyright and quit. -s Uses system(...) command. Otherwise the program will use fork(...) and execvp(...). -t Inserts "tcsh -c" in the beginning of the target command sent to system(...) or to execvp(...) depending on the precence of -s option. You can achieve the same effect by manual printing the composite target command in the form tcsh -fc "target com- mand with necessary parameters". Here -f is necessary in order to avoid execution of .tcshrc file, which can make scanning of large directory trees too slow. By this way you can invoke not only tcsh (or bash, see below), but any other shell. The rea- son for the use of -t (or -b) instead of insertion of explicit calls in command is only reduction of typing. The reason for the invoking shells by this or that way is necessity of expan- sion of file names unique for each directory. Otherwise "*"-signs will be expanded according to content of current starting directory, which is obviously incorrect to apply to subdirectories. -b Inserts "bash -c" in the beginning of the target command sent to system(...) or to execvp(...) depending on the precence of -s option. You can achieve the same effect by manual incerting bash -c -p Receives and passes the standard output of the target command through pipe. Used only with fork, that is -s should not be given. The use of this option allows to determine the directo- ries, where the target command does not issue any output and to avoid printing their names. Practice indicates that this option may be convenient to obtain some file in very large and deep directory trees, whose complete listing is too long, such as installation directories of some large programs. On the other hand, the user sometimes needs the whole directory listing, and in this case this option should not be invoked. -l Follows symbolic links. Otherwise does not follow them. But even if the option -l is supplied and the program follows links, it will not enter the directories second time if it already once entered them (which allows to avoid infinite loops in the case of backward references). -f Prints directory names with full path. Otherwise full path is printed only if it differs from the current directory from which the program was started. This is possible if the option -l is supplied and if the directories contain links to outside of the tree. -n Only prints the header messages and do not do anything. Allows to check visually whether target command and arguments are set correctly. -a Prints statistics at the end: the number of visited directories and levels (that is the deepest level found in this tree), the number of files and their total size. For more detailed statistical information about visited directories apply the "sister" program sdirstat(1). VERSION This description is valid for version 2.2 issued 27.11.2007. AUTHOR Igor B. Smirnov REPORTING BUGS ibsmirnov@mail.ru COPYRIGHT Copyright (c) 2007, I. B. Smirnov This program can be used, copied, modified, and distributed according to the terms of GNU General Public License version 3 published by the Free Software Foundation 29 June 2007, and provided that the above copyright notice, this permission notice, and notices about any modi- fications of the original text appear in all copies and in supporting documentation. The program is provided "as is" without express or implied warranty. SEE ALSO Documentation on system(3), fork(3), execvp(3), tcsh(3), bash(3), ls(3), grep(3), du(3), locate(3), sdirstat(1). SUMMARY of frequently used commands to scan files and directories. 1. To list the current directory ls (obviously; note for beginners; use various options) 2. To list subdirectories (including the current one): du (with cumulative disk usage) du -S (with disk usage for particular directories) du -s (no sundirectories, total disk usage only) sdir (without sizes) sdir du -sS (disk usage for particular directories) 3. To list subdirectories and files in them (possibly with sizes): find -name "*" -exec ls -ld {} ";" du -a (cumulative disk usage for directories and particular for files) du -aS (particular disk usage) sdir ls (without sizes) sdir ls -l (with file sizes) ls -R (without sizes) ls -Rl (file sizes) 4. To list or to find files with certain name (filename, not pattern) in the directory tree, rooted at the current directory:" find -name filename sdir -a ls filename (longer output with summary) sdir ls filename (longer output) sdir -p ls filename (shorter output) sdir -pw ls filename (more shorter output) 5. To list or to find files with certain name extension (let us assume *.txt) in the directory tree, rooted at the cur- rent directory:" find -name "*.txt" sdir -efta "ls *.txt" (longer output with summary) sdir -eft "ls *.txt" (longer output) sdir -ft "ls *.txt" (medium output) sdir -ftp "ls *.txt" (shorter output) sdir -ftpw "ls *.txt" (more shorter output) sdir -tpw "ls *.txt" (the shortest output) sdirstat -A (will give, in particular, the table of all name extensions met in tree from current directory and the number of files having each extension.) 6. To list or to find files with certain name extension (let us assume *.txt) in all the file system:" locate .txt (but the name should be more detailed than it is in this example in order to reduce possibly huge amount of output or output of locate should be piped to grep like the following) locate .txt | grep some_other_pattern_of_path_or_filename. (In theory all commands based on find and sdir will also work for the whole system, but probably too long.) 7. To find files in a directiory tree with a keyword (let us assume "gold") inside them." find -name "*" -exec grep -H gold {} ; (or) find -name "*" -exec grep -H gold {} ";" grep -RH gold * sdir -eft "grep -H gold *" (longer output) (The option -H works only in GNU grep.) sdir -ft "grep -H gold *" (medium output) sdir -ftp "grep -H gold *" (shorter output) sdir -ftpw "grep -H gold *" (more shorter output) sdir -tpw "grep -H gold *" (the shortest output) sdir -tpwa "grep -H gold *" (the shortest output with statistical summary) 8. To find files with certain name extension (let us assume *.txt) and with a keyword (let us assume "gold") inside them." find -name "*.txt" -exec grep -H gold {} ; (or) find -name "*.txt" -exec grep -H gold {} ";" grep -RH --include="*.txt" gold * (will scan all files in current directory and files *.txt in subdirectories) grep -RH --include="*.txt" gold *.txt (will scan files *.txt in current directory, then will enter subdirectories matching *.txt [which is, of course, nonsence] and scan files *.txt in subdirectories; there seems to be no way to scan ONLY files *.txt in the current directory and in ALL subdirectories; but the following combinations with sdir do this.) sdir -eft "grep -H gold *.txt" (longer output) sdir -ft "grep -H gold *.txt" (medium output) sdir -ftp "grep -H gold *.txt" (shorter output) sdir -ftpw "grep -H gold *.txt" (more shorter output) sdir -tpw "grep -H gold *.txt" (the shortest output) sdir -tpwa "grep -H gold *.txt" (the shortest output with statistical summary) EXAMPLE For example, let us suppose that you forgot in which system file the C/C++ symbol DBL_MAX is defined and the compiler does not want to com- pile your program without its description. Assume that you do not have good C++ textbook on your table and do not have connection to Inter- net. To find this file right in your computer, you can change to "/usr/include" (as all C/C++ programmers know, this is the beginning of the directory tree in which most of C/C++ - related header file reside in UNIX-like systems) and run sdir -tpw "grep -H DBL_MAX *.h" Here -t means that the target command "grep DBL_MAX *.h" will be pre- ceded by tcsh -c. So in each directory the command that is to be run is tcsh -c "grep -H DBL_MAX *.h" Quotation marks are needed in both cases for two purposes. The first one is to avoid expansion of *.h according to the content of the directory, in which the program was launched. This content usually do not coincide with the content of subdirectories. The second purpose is to put all the target program as a single argument of tcsh, which requires this for its execution, see its man page. So, after a running this "sdir ..." command and waiting a few seconds you will receive the terminal output something like this (example from the current computer of the author): -->.: values.h:#define MAXDOUBLE DBL_MAX values.h:#define DMAXEXP DBL_MAX_EXP -->./kde/arts/gsl: gslglib.h:#define G_MAXDOUBLE DBL_MAX -->./kpathsea: c-minmax.h:#ifndef DBL_MAX c-minmax.h:#define DBL_MAX 1e+37 -->./mysql: my_global.h:#ifndef DBL_MAX my_global.h:#define DBL_MAX 1.79769313486231470e+308 Here the mark "-->" means that the following is the name of the direc- tory, in which the target program is run. This mark looks convenient in order to distinguish visually the directory names which are output by "sdir" from the output of the target program, "grep" in this case. With these combination of options (which includes -p) only those directory names are printed for which the target program returns non- empty output to the "stardard output stream". Otherwise, have the program printed all files names, it would not be convenient enough. For example, running the same command with additional option -a, full command: sdir -tpwa "grep -H DBL_MAX *.h" gives an additional output lines with something like: Statistics recorded by sdir: 460 directories at 7 levels were visited. 7732 file names where seen (including the directory names and links, but not including "." and ".." symbols). 7249 real files (not links or directories) were met. Their total size is 86762452 bytes or 86.7625 Mbytes. (The Mbytes are here in order to allow you to avoid counting decimal digits of the previous number expressed in bytes.) Obviously, to see the names of all these 460 directories and to search visually only several useful lines between them would not be convenient. (You can easily print a full sub-directory names listing by just typing "sdir" without arguments. See item 2 above for other methods to output list of directories.) You see that the symbol you looked for is mentioned in 4 files and defined in 3. Accidentally it is defined in the file, located in the current directory (so it could be found by the ordinary "grep"), but "sdir" should have found it even if it was located in any subdirecto- ries (or directories linked to the current directory or its subdirec- tories if the option -l is supplied). For example, it found its use in the 3 other files listed above. Note that the ordinary "grep" in not adequate for this kind of search, study the item 8 above. This output shows you that you should include into your C++ program the file or first just look into it. Obviously, the other 3 files belongs to particular packages. If you look into the file , you will find there that you can also include instead of ( was not found itself by that command, since, as you can determine by "locate" com- mand, its directory is started from "/usr/lib/..." and then goes some compiler related directory path), or you may guess according to the ordinary C++ practice, that you can also include . All possi- bilities will work. Loking farther in the output, you might also notice that the macro in question was defined by different ways in pakages "kpathsea" and "mysql", and obviously incorrectly in first. But this does not matter for us. Note that you can use for the same purpose the "find" program like this: find -name "*.h" -exec grep -H DBL_MAX {} ";" This gives the following output: ./values.h:#define MAXDOUBLE DBL_MAX ./values.h:#define DMAXEXP DBL_MAX_EXP ./kpathsea/c-minmax.h:#ifndef DBL_MAX ./kpathsea/c-minmax.h:#define DBL_MAX 1e+37 ./mysql/my_global.h:#ifndef DBL_MAX ./mysql/my_global.h:#define DBL_MAX 1.79769313486231470e+308 ./kde/arts/gsl/gslglib.h:#define G_MAXDOUBLE DBL_MAX The results are the same but with different order of printing. (In is also interesting that different versions of Linux systems give differ- ent order of this output by "find".) But in the given case it is unimportant and you can interpret the output of "find" by the same way. However, if you have very deep directory tree with long names of subdirectories, you may also find that it is very inconvenient to see file names in one line with sub-directory names and with the content of the lines from files: this gets too long and you may need, in par- ticular, to widen your terminal to more than classical 80 characters to get sense of what you see. Now compare which form is easier: sdir -tpw "grep -H DBL_MAX *.h" find -name "*.h" -exec grep -H DBL_MAX {} ";" First of all, you see, that the form with "sdir" or shorter (by 1/3). Second, in the case of "sdir" you see (and type) the target command as it goes and usually it appears in quotas, which nicely separates it from preceeding "sdir"-related options. Whereas in the "find" case the form of the target command is crippled by necessity to incert "{}" mark instead of file-name pattern and finish the command by ";". And the file-name pattern should appear separately after "-name" option usually in quotation marks. One cannot remember all this unless one deals with this every day! Thus, the form with "sdir" is simplier and also shorter than that with "find", although none of them might be accepted as trivial. Therefore the summary above might be a useful reference notebook. "Sdir" works sometimes slightly faster than "find" in identical envor- inment and for similar tasks, but sometimes also moderately slower than that. Having usually launching shell in each directory, "sdir" depends strongly on how quickly the shell is initialized. An idea to launch shell once and then make it "change" the current directory and execute the target command in each without quitting results in many difficult-to-solve problems of interprocess communications and syn- chronization. SDIR(1)