BASH Website

From Indie IT Wiki

Get List Of All Pages In A Web Site

lynx -listonly -dump https://www.google.co.uk | awk '/http/{print $2}' | sort | uniq

Reload Page In Remote Web Browser Over SSH

sudo apt-get install xdotool
DISPLAY=:0 xdotool key F5

https://stackoverflow.com/questions/28132070/how-to-reload-google-chrome-tab-from-terminal

Check Dead Links

sudo apt install linkchecker
linkchecker --verbose --file-output=text -r1 --no-follow-url=www.mydomain.com http://www.mydomain.com/path/to/page
grep -B2 'Error' linkchecker-out.txt

Dump HTTP Header Using WGET

wget --server-response --spider http://www.google.co.uk

Download Web Page And All Files Linked

$ wget -r -np -k http://syncapp.bittorrent.com/1.4.111/

Download An Entire Web Site

Quietly download an entire web site including all assets from a CDN like AWS Cloudfront ...

wget --no-verbose --append-output=wget.log --span-hosts --page-requisites --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off -U mozilla --domains cloudfront.net --domains domain.co.uk --no-parent https://www.domain.co.uk/

Show Web Page Headers

curl -I www.bbc.co.uk

Sample output...

HTTP/1.1 200 OK
Server: Apache
Content-Type: text/html
Content-Language: en-GB
Etag: "4c572592f520bedc9a1e2c0238accaa6"
X-PAL-Host: pal041.back.live.cwwtf.local:80
Transfer-Encoding: chunked
Date: Fri, 06 Mar 2015 11:30:51 GMT
Connection: keep-alive
Set-Cookie: BBC-UID=15d4ff99887f5eab78a60516b12f651317b26b71e7f4e4a61ad0970254e467200curl/7.35.0; expires=Tue, 05-Mar-19 11:30:51 GMT; path=/; domain=.bbc.co.uk
X-Cache-Action: HIT
X-Cache-Hits: 1149
X-Cache-Age: 64
Cache-Control: private, max-age=0, must-revalidate
Vary: X-CDN

Thanks to CyberCiti.