[ Index ] |
PHP Cross Reference of phpwcms V1.4.3 _r380 (23.11.09) |
[Source view] [Print] [Project Stats]
htmlfilter.inc --------------- This set of functions allows you to filter html in order to remove any malicious tags from it. Useful in cases when you need to filter user input for any cross-site-scripting attempts. Copyright (C) 2002-2004 by Duke University
Author: | Konstantin Riabitsev <icon@linux.duke.edu> |
Version: | 1.1 ($Date: 2005/06/30 18:06:08 $) |
File Size: | 1021 lines (38 kb) |
Included or required: | 1 time |
Referenced: | 0 times |
Includes or requires: | 0 files |
spew($message) X-Ref |
This is a debugging function used throughout the code. To enable debugging you have to specify a global variable called "debug" before calling sanitize() and set it to true. Note: Although insignificantly, debugging does slow you down even when $debug is set to false. If you wish to get rid of all debugging calls, run the following command: fgrep -v 'spew("' htmlfilter.inc > htmlfilter.inc.new htmlfilter.inc.new will contain no debugging calls. param: $message A string with the message to output. return: void. |
tagprint($tagname, $attary, $tagtype) X-Ref |
This function returns the final tag out of the tag name, an array of attributes, and the type of the tag. This function is called by sanitize internally. param: $tagname the name of the tag. param: $attary the array of attributes and their values param: $tagtype The type of the tag (see in comments). return: a string with the final tag representation. |
casenormalize(&$val) X-Ref |
A small helper function to use with array_walk. Modifies a by-ref value and makes it lowercase. param: $val a value passed by-ref. return: void since it modifies a by-ref value. |
skipspace($body, $offset) X-Ref |
This function skips any whitespace from the current position within a string and to the next non-whitespace value. param: $body the string param: $offset the offset within the string where we should start return: the location within the $body where the next |
findnxstr($body, $offset, $needle) X-Ref |
This function looks for the next character within a string. It's really just a glorified "strpos", except it catches the failures nicely. param: $body The string to look for needle in. param: $offset Start looking from this position. param: $needle The character/string to look for. return: location of the next occurance of the needle, or |
findnxreg($body, $offset, $reg) X-Ref |
This function takes a PCRE-style regexp and tries to match it within the string. param: $body The string to look for needle in. param: $offset Start looking from here. param: $reg A PCRE-style regex to match. return: Returns a false if no matches found, or an array |
getnxtag($body, $offset) X-Ref |
This function looks for the next tag. param: $body String where to look for the next tag. param: $offset Start looking from here. return: false if no more tags exist in the body, or |
deent(&$attvalue, $regex, $hex=false) X-Ref |
Translates entities into literal values so they can be checked. param: $attvalue the by-ref value to check. param: $regex the regular expression to check against. param: $hex whether the entites are hexadecimal. return: True or False depending on whether there were matches. |
defang(&$attvalue) X-Ref |
This function checks attribute values for entity-encoded values and returns them translated into 8-bit strings so we can run checks on them. param: $attvalue A string to run entity check against. return: Nothing, modifies a reference value. |
unspace(&$attvalue) X-Ref |
Kill any tabs, newlines, or carriage returns. Our friends the makers of the browser with 95% market value decided that it'd be funny to make "java[tab]script" be just as good as "javascript". param: attvalue The attribute value before extraneous spaces removed. return: attvalue Nothing, modifies a reference value. |
fixatts($tagname, $attary, $rm_attnames,$bad_attvals,$add_attr_to_tag) X-Ref |
This function runs various checks against the attributes. param: $tagname String with the name of the tag. param: $attary Array with all tag attributes. param: $rm_attnames See description for sanitize param: $bad_attvals See description for sanitize param: $add_attr_to_tag See description for sanitize return: Array with modified attributes. |
sanitize($body, $tag_list = array() X-Ref |
This is the main function and the one you should actually be calling. There are several variables you should be aware of an which need special description. $tag_list ---------- This is a simple one-dimentional array of strings, except for the very first one. The first member should be einter false or true. In case it's FALSE, the following list will be considered a list of tags that should be explicitly REMOVED from the body, and all others that did not match the list will be allowed. If the first member is TRUE, then the list is the list of tags that should be explicitly ALLOWED -- any tag not matching this list will be discarded. Examples: $tag_list = Array( false, "blink", "link", "object", "meta", "marquee", "html" ); This will allow all tags except for blink, link, object, meta, marquee, and html. $tag_list = Array( true, "b", "a", "i", "img", "strong", "em", "p" ); This will remove all tags from the body except b, a, i, img, strong, em and p. $rm_tags_with_content --------------------- This is a simple one-dimentional array of strings, which specifies the tags to be removed with any and all content between the beginning and the end of the tag. Example: $rm_tags_with_content = Array( "script", "style", "applet", "embed" ); This will remove the following structure: <script> window.alert("Isn't cross-site-scripting fun?!"); </script> $self_closing_tags ------------------ This is a simple one-dimentional array of strings, which specifies which tags contain no content and should not be forcefully closed if this option is turned on (see further). Example: $self_closing_tags = Array( "img", "br", "hr", "input" ); $force_tag_closing ------------------ Set it to true to forcefully close any tags opened within the document. This is good if you want to take care of people who like to screw up the pages by leaving unclosed tags like <a>, <b>, <i>, etc. $rm_attnames ------------- Now we come to parameters that are more obscure. This parameter is a nested array which is used to specify which attributes should be removed. It goes like so: $rm_attnames = Array( "PCRE regex to match tag name" => Array( "PCRE regex to match attribute name" ) ); Example: $rm_attnames = Array( "|.*|" => Array( "|target|i", "|^on.*|i" ) ); This will match all attributes (.*), and specify that all attributes named "target" and starting with "on" should be removed. This will take care of the following problem: <em onmouseover="window.alert('muahahahaha')"> The "onmouseover" will be removed. $bad_attvals ------------ This is where it gets ugly. This is a nested array with many levels. It goes like so: $bad_attvals = Array( "pcre regex to match tag name" => Array( "pcre regex to match attribute name" => Array( "pcre regex to match attribute value" ) Array( "pcre regex replace a match from above with" ) ) ); An extensive example: $bad_attvals = Array( "|.*|" => Array( "/^src|background|href|action/i" => Array( Array( "/^([\'\"])\s*\S+script\s*:.*([\'\"])/si" ), Array( "\\1http://veryfunny.com/\\2" ) ), "/^style/i" => Array( Array( "/expression/si", "/url\(([\'\"])\s*https*:.*([\'\"])\)/si", "/url\(([\'\"])\s*\S+script:.*([\'\"])\)/si" ), Array( "idiocy", "url(\\1http://veryfunny.com/\\2)", "url(\\1http://veryfynny.com/\\2)" ) ) ) ); This will take care of nearly all known cross-site scripting exploits, plus some (see my filter sample at http://www.mricon.com/html/phpfilter.html for a working version). $add_attr_to_tag ---------------- This is a useful little feature which lets you add attributes to certain tags. It is a nested array as well, but not at all like the previous one. It goes like so: $add_attr_to_tag = Array( "PCRE regex to match tag name" => Array( "attribute name"=>'"attribute value"' ) ); Note: don't forget quotes around attribute value. Example: $add_attr_to_tag = Array( "/^a$/si" => Array( 'target'=>'"_new"' ) ); This will change all <a> tags and add target="_new" to them so all links open in a new window. param: $body the string with HTML you wish to filter param: $tag_list see description above param: $rm_tags_with_content see description above param: $self_closing_tags see description above param: $force_tag_closing see description above param: $rm_attnames see description above param: $bad_attvals see description above param: $add_attr_to_tag see description above return: sanitized html safe to show on your pages. |
Generated: Wed Dec 30 05:55:15 2009 | Cross-referenced by PHPXref 0.7 |