AndyJarrett

Understanding Googles Skipfish arguments

Skipfish works off dictionary files (*.wl) in the /dictionaries/ folder and make sure you read the README-FIRST file in there to understand what these do. After the dictionary files comes the arguments which makes this a powerful tool but due to sheer amount of them I've created this blog post to take in the information Don't get me wrong a lot of the text here is just copied from the Wiki but its either summarised or re-written.

-A user:pass
Pass HTTP authentication details
-B
In some cases, you do not want to actually crawl a third-partydomain, but you trust the owner of that domain enough not toworry about cross-domain content inclusion from that location.To suppress warnings, you can use the -B option
-C name=val
If the site relies on HTTP cookies instead, log in in your browser orusing a simple curl script, and then provide skipfish with a session cookie:
-D
allowing you to specify additional hosts or domains to considerin-scope for the test i.e. sub-domains -D test2.example.com or wild cards -D .example.com or
-E
Certain pedantic sites may care about cases where caching is restrictedon HTTP/1.1 level, but no explicit HTTP/1.0 caching directive is givenon specifying -E in the command-line causes skipfish to log all such cases carefully
-f
controls the maximum number of consecutive HTTP errors you are willing to see before aborting the scan
-g
set the maximum number of connections to maintain, globally, to all targets(it is sensible to keep this under 50 or so to avoid overwhelming the TCP/IPstack on your system or on the nearby NAT / firewall devices)
-I
only spider URLs matching a substring
-J
By default, skipfish complains loudly about all MIME or character set mismatcheson renderable documents, and classifies many of them as "medium risk"; this isbecause, if any user-controlled content is returned, the situation could leadto cross-site scripting attacks in certain browsers. On some poorly designedand maintained sites, this may contribute too much noise; if so, you may use-J to mark these issues as "low risk" unless the scanner can explicitly seesits own user input being echoed back on the resulting page. This may miss many subtle attack vectors, though.
-L
suppress auto-learning
-m
set the per-IP limit (experiment a bit: 2-4 is usually good for localhost, 4-8 forlocal networks, 10-20 for external targets, 30+ for really lagged or non-keep-alive hosts).
-M
Some sites that handle sensitive user data care about SSL - and aboutgetting it right. Skipfish may optionally assist you in figuring outproblematic mixed content scenarios - use the -M option to enable this.The scanner will complain about situations such as http:// scripts beingloaded on https:// pages - but will disregard non-risk scenarios such as images.
-o
Reduces the risk of persistent effects of a scan which inhibits all form parsing and submission steps.
-p
An interesting option is available for repeated assessments: -p. Byspecifying a percentage between 1 and 100%, it is possible to tellthe crawler to follow fewer than 100% of all links, and try fewerthan 100% of all dictionary entries. This - naturally - limits thecompleteness of a scan, but unlike most other settings, it does soin a balanced, non-deterministic manner. It is extremely useful whenyou are setting up time-bound, but periodic assessments of your infrastructure.
-P
In certain quick assessments, you might also have no interest in payingany particular attention to the desired functionality of the site -hoping to explore non-linked secrets only. In such a case, you mayspecify -P to inhibit all HTML parsing. This limits the coverage andtakes away the ability for the scanner to learn new keywords by lookingat the HTML, but speeds up the test dramatically.
-q
which sets the initial random seed for the crawler to a specified value.This can be used to exactly reproduce a previous scan to compare results.Randomness is relied upon most heavily in the -p mode, but also for making acouple of other scan management decisions elsewhere.
-Q
Some particularly complex (or broken) services may involve avery high number of identical or nearly identical pages. Althoughthese occurrences are by default grayed out in the report, theystill use up some screen estate and take a while to process onJavaScript level. In such extreme cases, you may use the -Q optionto suppress reporting of duplicate nodes altogether, before the reportis written. This may give you a less comprehensive understanding ofhow the site is organized, but has no impact on test coverage.
-R
drop old dictionary entries
-s
sets the maximum length of a response to fetch and parse (longer responses will be truncated).
-S
ignore links on pages where a substring appears in response body
-t
set the total request timeout, to account for really slow or really fast sites.
-T login=test123 -T password=test321
Specify form fields for autocompletion. These values should be non-malicious,as they are not meant to implement security checks.
-V
suppress dictionary updates
-X /icons
Helps you exclude folders i.e. /icons/, /doc/, /manuals
-w
to set the I/O timeout (i.e., skipfish will wait only so long for an individual read or write)
-W
specify a custom wordlist/dictionary file

Notes:

Skipfish is an active web application security reconnaissance tool. It prepares an interactive sitemap for the targeted site by carrying out a recursive crawl and dictionary-based probes.

This tool allows you to test your server against low to high risk errors. This includes Server-side SQL injection, XML / XPath injection, XSS attacks, and right down to bad caching directives on less sensitive content.

You can find more at http://code.google.com/p/skipfish/