BASH Website

From Indie IT Wiki
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Get List Of All Pages In A Web Site

lynx -listonly -dump https://www.google.co.uk | awk '/http/{print $2}' | sort | uniq

Reload Page In Remote Web Browser Over SSH

sudo apt-get install xdotool
DISPLAY=:0 xdotool key F5

https://stackoverflow.com/questions/28132070/how-to-reload-google-chrome-tab-from-terminal

Check Dead Links

sudo apt install linkchecker
linkchecker --verbose --file-output=text -r1 --no-follow-url=www.mydomain.com http://www.mydomain.com/path/to/page
grep -B2 'Error' linkchecker-out.txt

Dump HTTP Header Using WGET

wget --server-response --spider http://www.google.co.uk

Download Web Page And All Files Linked

$ wget -r -np -k http://syncapp.bittorrent.com/1.4.111/

Download An Entire Web Site

Quietly download an entire web site including all assets from a CDN like AWS Cloudfront ...

wget --no-verbose --append-output=wget.log --span-hosts --page-requisites --no-clobber --convert-links --random-wait -r -p --level 1 -E -e robots=off -U mozilla --domains cloudfront.net --domains domain.co.uk --no-parent https://www.domain.co.uk/

Show Web Page Headers

curl -I www.bbc.co.uk

Sample output...

HTTP/1.1 200 OK
Server: Apache
Content-Type: text/html
Content-Language: en-GB
Etag: "4c572592f520bedc9a1e2c0238accaa6"
X-PAL-Host: pal041.back.live.cwwtf.local:80
Transfer-Encoding: chunked
Date: Fri, 06 Mar 2015 11:30:51 GMT
Connection: keep-alive
Set-Cookie: BBC-UID=15d4ff99887f5eab78a60516b12f651317b26b71e7f4e4a61ad0970254e467200curl/7.35.0; expires=Tue, 05-Mar-19 11:30:51 GMT; path=/; domain=.bbc.co.uk
X-Cache-Action: HIT
X-Cache-Hits: 1149
X-Cache-Age: 64
Cache-Control: private, max-age=0, must-revalidate
Vary: X-CDN

Thanks to CyberCiti.