Shell Script to traverse all internal URLs and reporting any errors in the “traverse.errors” file
If you are using a web server or are responsible for a website, either simple or complex, you probably find yourself doing certain tasks with high frequency, significantly identifying broken internal and external site links. Using shell scripts, you can create many of these tasks, as well as other normal clients/server functions such as managing access information to the password-protected website index. The Below Shell script is used to traverse all internal URLs on the given Web site, reporting errors (if any) in the “traverse.errors” file.
Usage: traverse.sh <URL LINK>
lynx="/usr/local/bin/lynx" trap "$(which rm) -f traverse.dat traverse2.dat" 0 if [ -z "$1" ] ; then echo "Usage: checklinks URL" >&2 exit 1 fi baseurl="$(echo $1 | cut -d/ -f3 | sed 's/http:\/\///')" lynx -traversal -accept_all_cookies -realm "$1" > /dev/null if [ -s "traverse.errors" ] ; then /bin/echo -n $(wc -l < traverse.errors) errors encountered. echo Checked $(grep '^http' traverse.dat | wc -l) pages at ${1}: sed "s|$1||g" < traverse.errors mv traverse.errors ${baseurl}.errors echo "A copy of this output has been saved in ${baseurl}.errors" else /bin/echo -n "No errors encountered. "; echo Checked $(grep '^http' traverse.dat | wc -l) pages at ${1} fi if [ -s "reject.dat" ]; then mv reject.dat ${baseurl}.rejects fi exit 0
Scenario 1: No Errors
Scenario 2: Some Errors