diff options
Diffstat (limited to 'src/hydrilla/proxy/self_doc/url_patterns.html.jinja')
-rw-r--r-- | src/hydrilla/proxy/self_doc/url_patterns.html.jinja | 409 |
1 files changed, 0 insertions, 409 deletions
diff --git a/src/hydrilla/proxy/self_doc/url_patterns.html.jinja b/src/hydrilla/proxy/self_doc/url_patterns.html.jinja deleted file mode 100644 index f3415c5..0000000 --- a/src/hydrilla/proxy/self_doc/url_patterns.html.jinja +++ /dev/null @@ -1,409 +0,0 @@ -{# -SPDX-License-Identifier: GPL-3.0-or-later OR CC-BY-SA-4.0 - -Documentation page describing URL patterns understood by Haketilo. - -This file is part of Hydrilla&Haketilo. - -Copyright (C) 2022 Wojtek Kosior - -Dual licensed under -* GNU General Public License v3.0 or later and -* Creative Commons Attribution Share Alike 4.0 International. - -You can choose to use either of these licenses or both. - - -I, Wojtek Kosior, thereby promise not to sue for violation of this -file's licenses. Although I request that you do not make use of this -code in a proprietary work, I am not going to enforce this in court. -#} -{% extends "doc_base.html.jinja" %} - -{% block title %} URL patterns {% endblock %} - -{% block main %} - {{ big_heading('Haketio URL patterns') }} - - {% call section() %} - {% call paragraph() %} - We want to be able to apply different rules and custom scripts for - different websites. However, merely specifying "do this for all documents - under <code>https://example.com</code>" is not enough. Single site's pages - might differ strongly and require different custom scripts to be - loaded. Always matching against a full URL like - <code>https://example.com/something/somethingelse</code> is also not - a good option. It doesn't allow us to properly handle a site that serves - similar pages for multiple values substituted for - <code>somethingelse</code>. - {% endcall %} - {% endcall %} - - {% call section() %} - {{ medium_heading('Employed solution') }} - - {% call paragraph() %} - Wildcards are being used to address the problem. Each payload and rule in - Haketilo has a URL pattern that specifies to which internet pages it - applies. A URL pattern can be as as simple as literal URL in which case it - only matches itself. It can also contain wildcards in the form of one or - more asterisks (<code>*</code>) that correspond to multiple possible - strings occurring in that place. - {% endcall %} - - {% call paragraph() %} - Wildcards can appear in URL's domain and path that follows it. These 2 - types of wildcards are handled separately. - {% endcall %} - {% endcall %} - - {% call section() %} - {{ small_heading('Domain wildcards') }} - - {% call paragraph() %} - A domain wildcard takes the form of one, two or three asterisks occurring - in place of a single domain name segment at the beginning - (left). Depending on the number of asterisks, the meaning is as follows - {% endcall %} - - {% call unordered_list() %} - {% call list_entry() %} - no asterisks (e.g. <code>example.com</code>) - match domain name exactly - (e.g. <code>example.com</code>) - {% endcall %} - {% call list_entry() %} - one asterisk (e.g. <code>*.example.com</code>) - match all domains - resulting from substituting <code>*</code> with a - <span class="bold">single</span> segment (e.g. - <code>banana.example.com</code> or <code>pineapple.example.com</code> - but <span class="bold">not</span> <code>pineapple.pen.example.com</code> - nor <code>example.com</code>) - {% endcall %} - {% call list_entry() %} - two asterisks (e.g. <code>**.example.com</code>) - match all domains - resulting from substituting <code>**</code> with - <span class="bold">two or more</span> segments (e.g. - <code>monad.breakfast.example.com</code> or - <code>pure.monad.breakfast.example.com</code> but - <span class="bold">not</span> <code>cabalhell.example.com</code> nor - <code>example.com</code>) - {% endcall %} - {% call list_entry() %} - three asterisks (e.g. <code>***.example.com</code>) - match all domains - resulting from substituting <code>***</code> with - <span class="bold">zero or more</span> segments (e.g. - <code>hello.parkmeter.example.com</code> or - <code>iliketrains.example.com</code> or <code>example.com</code>) - {% endcall %} - {% endcall %} - {% endcall %} - - {% call section() %} - {{ small_heading('Path wildcards') }} - - {% call paragraph() %} - A path wildcard takes the form of one, two or three asterisks occurring in - place of a single path segment at the end of path (right). Depending on - the number of asterisks, the meaning is as follows - {% endcall %} - - {% call unordered_list() %} - {% call list_entry() %} - no asterisks (e.g. <code>/joke/clowns</code>) - match path exactly (e.g. - <code>/joke/clowns</code>) - {% endcall %} - {% call list_entry() %} - one asterisk (e.g. <code>/itscalled/*</code>) - match all paths - resulting from substituting <code>*</code> with a - <span class="bold">single</span> segment (e.g. - <code>/itscalled/gnulinux</code> or <code>/itscalled/glamp</code> but - <span class="bold">not</span> <code>/itscalled/</code> nor - <code>/itscalled/gnu/linux</code>) - {% endcall %} - {% call list_entry() %} - two asterisks (e.g. <code>/another/**</code>) - match all paths - resulting from substituting <code>**</code> with - <span class="bold">two or more</span> segments (e.g. - <code>/another/nsa/backdoor</code> or - <code>/another/best/programming/language</code> but - <span class="bold">not</span> <code>/another/apibreak</code> nor - <code>/another</code>) - {% endcall %} - {% call list_entry() %} - three asterisks (e.g. <code>/mail/dmarc/***</code>) - match all paths - resulting from substituting <code>***</code> with - <span class="bold">zero or more</span> segments (e.g. - <code>/mail/dmarc/spf</code>, <code>/mail/dmarc</code> or - <code>/mail/dmarc/dkim/failure</code> but - <span class="bold">not</span> <code>/mail/</code>) - {% endcall %} - {% endcall %} - - {% call paragraph() %} - If pattern ends <span class="bold">without</span> a trailing slash, it - mathes paths with any number of trailing slashes, including zero. If - pattern ends <span class="bold">with</span> a trailing slash, it only - mathes paths with one or more trailing slashes. For example, - <code>/itscalled/*</code> matches <code>/itscalled/gnulinux</code>, - <code>/itscalled/gnulinux/</code> and <code>/itscalled/gnulinux//</code> - while <code>/itscalled/*/</code> only matches - <code>/itscalled/gnulinux/</code> and <code>/itscalled/gnulinux//</code> - out of those three. - {% endcall %} - - {% call paragraph() %} - If two patterns only differ by the presence of a trailing slash, - pattern <span class="bold">with</span> a trailing slash is considered - <span class="bold">more specific</span>. - {% endcall %} - - {% call paragraph() %} - Additionally, any path with literal trailing asterisks is matched by - itself, even if such pattern would otherwise be treated as wildcard - (e.g. <code>/gobacktoxul/**</code> matches <code>/gobacktoxul/**</code>). - This is likely to change in the future and would best not be relied upon. - Appending three additional asterisks to path pattern to represent literal - asterisks is being considered. - {% endcall %} - {% endcall %} - - {% call section() %} - {{ small_heading('URL scheme wildcard') }} - - {% call paragraph() %} - <code>http://</code> and <code>https://</code> shemes in the URL are - matched exactly. However, starting with Haketilo 3.0, it is also possible - for scheme pseudo-wildcard of <code>http*://</code> to be used. Use of URL - pattern with this scheme is equivalent to the use of 2 separate patterns - starting with <code>http://</code> and <code>https://</code>, - respectively. For example, pattern <code>http*://example.com</code> shall - match both <code>https://example.com</code> and - <code>http://example.com</code>. - {% endcall %} - - {% call paragraph() %} - <code>http*://</code> may be considered not to be a true wildcard but - rather an alias for either of the other 2 values. As of Haketilo 3.0, the - speicificity of a URL pattern starting with <code>http*://</code> is - considered to be the same as that of the corresponding URL pattern - starting with <code>http://</code> or <code>https://</code>. In case of a - conflict, the order of precedence of such patterns is unspecified. This - behavior is likely to change in the future versions of Haketilo. - {% endcall %} - {% endcall %} - - {% call section() %} - {{ small_heading('Wildcard pattern priorities and querying') }} - - {% call paragraph() %} - In case multiple patterns match some URL, the more specific one is - preferred. Specificity is considered as follows - {% endcall %} - - {% call unordered_list() %} - {% call list_entry() %} - If patterns only differ in the final path segment, the one with least - wildcard asterisks in that segment if preferred. - {% endcall %} - {% call list_entry() %} - If patterns, besides the above, only differ in path length, one with - longer path is preferred. Neither final wildcard segment nor trailing - dashes account for path length. - {% endcall %} - {% call list_entry() %} - If patterns, besides the above, only differ in the initial domain - segment, one with least wildcard asterisks in that segment is preferred. - {% endcall %} - {% call list_entry() %} - If patterns differ in domain length, one with longer domain is - preferred. Initial wildcard segment does not account for domain length. - {% endcall %} - {% endcall %} - - {% call paragraph() %} - As an example, consider the URL - <code>http://settings.query.example.com/google/tries/destroy/adblockers//</code>. - Patterns matching it are, in the following order - {% endcall %} - - {% call verbatim() %} -http://settings.query.example.com/google/tries/destroy/adblockers/ -http://settings.query.example.com/google/tries/destroy/adblockers -http://settings.query.example.com/google/tries/destroy/adblockers/***/ -http://settings.query.example.com/google/tries/destroy/adblockers/*** -http://settings.query.example.com/google/tries/destroy/*/ -http://settings.query.example.com/google/tries/destroy/* -http://settings.query.example.com/google/tries/destroy/***/ -http://settings.query.example.com/google/tries/destroy/*** -http://settings.query.example.com/google/tries/**/ -http://settings.query.example.com/google/tries/** -http://settings.query.example.com/google/tries/***/ -http://settings.query.example.com/google/tries/*** -http://settings.query.example.com/google/**/ -http://settings.query.example.com/google/** -http://settings.query.example.com/google/***/ -http://settings.query.example.com/google/*** -http://settings.query.example.com/**/ -http://settings.query.example.com/** -http://settings.query.example.com/***/ -http://settings.query.example.com/*** -http://***.settings.query.example.com/google/tries/destroy/adblockers/ -http://***.settings.query.example.com/google/tries/destroy/adblockers -http://***.settings.query.example.com/google/tries/destroy/adblockers/***/ -http://***.settings.query.example.com/google/tries/destroy/adblockers/*** -http://***.settings.query.example.com/google/tries/destroy/*/ -http://***.settings.query.example.com/google/tries/destroy/* -http://***.settings.query.example.com/google/tries/destroy/***/ -http://***.settings.query.example.com/google/tries/destroy/*** -http://***.settings.query.example.com/google/tries/**/ -http://***.settings.query.example.com/google/tries/** -http://***.settings.query.example.com/google/tries/***/ -http://***.settings.query.example.com/google/tries/*** -http://***.settings.query.example.com/google/**/ -http://***.settings.query.example.com/google/** -http://***.settings.query.example.com/google/***/ -http://***.settings.query.example.com/google/*** -http://***.settings.query.example.com/**/ -http://***.settings.query.example.com/** -http://***.settings.query.example.com/***/ -http://***.settings.query.example.com/*** -http://*.query.example.com/google/tries/destroy/adblockers/ -http://*.query.example.com/google/tries/destroy/adblockers -http://*.query.example.com/google/tries/destroy/adblockers/***/ -http://*.query.example.com/google/tries/destroy/adblockers/*** -http://*.query.example.com/google/tries/destroy/*/ -http://*.query.example.com/google/tries/destroy/* -http://*.query.example.com/google/tries/destroy/***/ -http://*.query.example.com/google/tries/destroy/*** -http://*.query.example.com/google/tries/**/ -http://*.query.example.com/google/tries/** -http://*.query.example.com/google/tries/***/ -http://*.query.example.com/google/tries/*** -http://*.query.example.com/google/**/ -http://*.query.example.com/google/** -http://*.query.example.com/google/***/ -http://*.query.example.com/google/*** -http://*.query.example.com/**/ -http://*.query.example.com/** -http://*.query.example.com/***/ -http://*.query.example.com/*** -http://***.query.example.com/google/tries/destroy/adblockers/ -http://***.query.example.com/google/tries/destroy/adblockers -http://***.query.example.com/google/tries/destroy/adblockers/***/ -http://***.query.example.com/google/tries/destroy/adblockers/*** -http://***.query.example.com/google/tries/destroy/*/ -http://***.query.example.com/google/tries/destroy/* -http://***.query.example.com/google/tries/destroy/***/ -http://***.query.example.com/google/tries/destroy/*** -http://***.query.example.com/google/tries/**/ -http://***.query.example.com/google/tries/** -http://***.query.example.com/google/tries/***/ -http://***.query.example.com/google/tries/*** -http://***.query.example.com/google/**/ -http://***.query.example.com/google/** -http://***.query.example.com/google/***/ -http://***.query.example.com/google/*** -http://***.query.example.com/**/ -http://***.query.example.com/** -http://***.query.example.com/***/ -http://***.query.example.com/*** -http://**.example.com/google/tries/destroy/adblockers/ -http://**.example.com/google/tries/destroy/adblockers -http://**.example.com/google/tries/destroy/adblockers/***/ -http://**.example.com/google/tries/destroy/adblockers/*** -http://**.example.com/google/tries/destroy/*/ -http://**.example.com/google/tries/destroy/* -http://**.example.com/google/tries/destroy/***/ -http://**.example.com/google/tries/destroy/*** -http://**.example.com/google/tries/**/ -http://**.example.com/google/tries/** -http://**.example.com/google/tries/***/ -http://**.example.com/google/tries/*** -http://**.example.com/google/**/ -http://**.example.com/google/** -http://**.example.com/google/***/ -http://**.example.com/google/*** -http://**.example.com/**/ -http://**.example.com/** -http://**.example.com/***/ -http://**.example.com/*** -http://***.example.com/google/tries/destroy/adblockers/ -http://***.example.com/google/tries/destroy/adblockers -http://***.example.com/google/tries/destroy/adblockers/***/ -http://***.example.com/google/tries/destroy/adblockers/*** -http://***.example.com/google/tries/destroy/*/ -http://***.example.com/google/tries/destroy/* -http://***.example.com/google/tries/destroy/***/ -http://***.example.com/google/tries/destroy/*** -http://***.example.com/google/tries/**/ -http://***.example.com/google/tries/** -http://***.example.com/google/tries/***/ -http://***.example.com/google/tries/*** -http://***.example.com/google/**/ -http://***.example.com/google/** -http://***.example.com/google/***/ -http://***.example.com/google/*** -http://***.example.com/**/ -http://***.example.com/** -http://***.example.com/***/ -http://***.example.com/*** - {% endcall %} - - {% call paragraph() %} - Variants of those patterns starting with <code>http*://</code> would of - course match as well. They have been omitted for simplicity. - {% endcall %} - - {% call paragraph() %} - For a simpler URL like <code>https://example.com</code> the patterns would - be - {% endcall %} - - {% call verbatim() %} -https://example.com -https://example.com/*** -https://***.example.com -https://***.example.com/*** - {% endcall %} - - {% call paragraph() %} - Variants of those patterns with a trailing dash added - would <span class="bold">not</span> match the URL. Also, the pattern - variants starting with <code>http*://</code> have been once again omitted. - {% endcall %} - {% endcall %} - - {% call section() %} - {{ small_heading('Limits') }} - - {% call paragraph() %} - In order to prevent some easy-to-conduct DoS attacks, older versions of - Haketilo and Hydrilla limited the lengths of domain and path parts of - processed URLs. This is no longer the case. - {% endcall %} - {% endcall %} - - {% call section() %} - {{ medium_heading('Alternative solution idea: mimicking web server mechanics') }} - - {% call paragraph() %} - While wildcard patterns as presented give a lot of flexibility, they are - not the only viable approach to specifying what URLs to apply - rules/payloads to. In fact, wildcards are different from how the server - side of a typical website decides what to return for a given URL request. - {% endcall %} - - {% call paragraph() %} - In a typical scenario, an HTTP server like Apache reads configuration - files provided by its administrator and uses various (virtual host, - redirect, request rewrite, CGI, etc.) instructions to decide how to handle - given URL. Perhps using a scheme that mimics the configuration options - typically used with web servers would give more efficiency in specifying - what page settings to apply when. - {% endcall %} - - {% call paragraph() %} - This approach may be considered in the future. - {% endcall %} - {% endcall %} -{% endblock main %} |