aboutsummaryrefslogtreecommitdiff
path: root/src/hydrilla/proxy/self_doc/url_patterns.html.jinja
diff options
context:
space:
mode:
Diffstat (limited to 'src/hydrilla/proxy/self_doc/url_patterns.html.jinja')
-rw-r--r--src/hydrilla/proxy/self_doc/url_patterns.html.jinja181
1 files changed, 141 insertions, 40 deletions
diff --git a/src/hydrilla/proxy/self_doc/url_patterns.html.jinja b/src/hydrilla/proxy/self_doc/url_patterns.html.jinja
index 7d2718f..f3415c5 100644
--- a/src/hydrilla/proxy/self_doc/url_patterns.html.jinja
+++ b/src/hydrilla/proxy/self_doc/url_patterns.html.jinja
@@ -20,123 +20,210 @@ code in a proprietary work, I am not going to enforce this in court.
#}
{% extends "doc_base.html.jinja" %}
-{% block title %}{{ _('doc.url_patterns.title') }}{% endblock %}
+{% block title %} URL patterns {% endblock %}
{% block main %}
- {{ big_heading(_('doc.url_patterns.h_big')) }}
+ {{ big_heading('Haketio URL patterns') }}
{% call section() %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.intro')|safe }}
+ We want to be able to apply different rules and custom scripts for
+ different websites. However, merely specifying "do this for all documents
+ under <code>https://example.com</code>" is not enough. Single site's pages
+ might differ strongly and require different custom scripts to be
+ loaded. Always matching against a full URL like
+ <code>https://example.com/something/somethingelse</code> is also not
+ a good option. It doesn't allow us to properly handle a site that serves
+ similar pages for multiple values substituted for
+ <code>somethingelse</code>.
{% endcall %}
{% endcall %}
{% call section() %}
- {{ medium_heading(_('doc.url_patterns.h_medium.employed_solution')) }}
+ {{ medium_heading('Employed solution') }}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.wildcards_intro')|safe }}
+ Wildcards are being used to address the problem. Each payload and rule in
+ Haketilo has a URL pattern that specifies to which internet pages it
+ applies. A URL pattern can be as as simple as literal URL in which case it
+ only matches itself. It can also contain wildcards in the form of one or
+ more asterisks (<code>*</code>) that correspond to multiple possible
+ strings occurring in that place.
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.wildcards_types_introduced')|safe }}
+ Wildcards can appear in URL's domain and path that follows it. These 2
+ types of wildcards are handled separately.
{% endcall %}
{% endcall %}
{% call section() %}
- {{ small_heading(_('doc.url_patterns.h_small.domain_wildcards')) }}
+ {{ small_heading('Domain wildcards') }}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.domain_wildcards_intro')|safe }}
+ A domain wildcard takes the form of one, two or three asterisks occurring
+ in place of a single domain name segment at the beginning
+ (left). Depending on the number of asterisks, the meaning is as follows
{% endcall %}
{% call unordered_list() %}
{% call list_entry() %}
- {{ _('doc.url_patterns.html.domain_no_asterisks_example')|safe }}
+ no asterisks (e.g. <code>example.com</code>) - match domain name exactly
+ (e.g. <code>example.com</code>)
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.html.domain_one_asterisk_example')|safe }}
+ one asterisk (e.g. <code>*.example.com</code>) - match all domains
+ resulting from substituting <code>*</code> with a
+ <span class="bold">single</span> segment (e.g.
+ <code>banana.example.com</code> or <code>pineapple.example.com</code>
+ but <span class="bold">not</span> <code>pineapple.pen.example.com</code>
+ nor <code>example.com</code>)
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.html.domain_two_asterisks_example')|safe }}
+ two asterisks (e.g. <code>**.example.com</code>) - match all domains
+ resulting from substituting <code>**</code> with
+ <span class="bold">two or more</span> segments (e.g.
+ <code>monad.breakfast.example.com</code> or
+ <code>pure.monad.breakfast.example.com</code> but
+ <span class="bold">not</span> <code>cabalhell.example.com</code> nor
+ <code>example.com</code>)
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.html.domain_three_asterisks_example')|safe }}
+ three asterisks (e.g. <code>***.example.com</code>) - match all domains
+ resulting from substituting <code>***</code> with
+ <span class="bold">zero or more</span> segments (e.g.
+ <code>hello.parkmeter.example.com</code> or
+ <code>iliketrains.example.com</code> or <code>example.com</code>)
{% endcall %}
{% endcall %}
{% endcall %}
{% call section() %}
- {{ small_heading(_('doc.url_patterns.h_small.path_wildcards')) }}
+ {{ small_heading('Path wildcards') }}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.path_wildcards_intro')|safe }}
+ A path wildcard takes the form of one, two or three asterisks occurring in
+ place of a single path segment at the end of path (right). Depending on
+ the number of asterisks, the meaning is as follows
{% endcall %}
{% call unordered_list() %}
{% call list_entry() %}
- {{ _('doc.url_patterns.html.path_no_asterisks_example')|safe }}
+ no asterisks (e.g. <code>/joke/clowns</code>) - match path exactly (e.g.
+ <code>/joke/clowns</code>)
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.html.path_one_asterisk_example')|safe }}
+ one asterisk (e.g. <code>/itscalled/*</code>) - match all paths
+ resulting from substituting <code>*</code> with a
+ <span class="bold">single</span> segment (e.g.
+ <code>/itscalled/gnulinux</code> or <code>/itscalled/glamp</code> but
+ <span class="bold">not</span> <code>/itscalled/</code> nor
+ <code>/itscalled/gnu/linux</code>)
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.html.path_two_asterisks_example')|safe }}
+ two asterisks (e.g. <code>/another/**</code>) - match all paths
+ resulting from substituting <code>**</code> with
+ <span class="bold">two or more</span> segments (e.g.
+ <code>/another/nsa/backdoor</code> or
+ <code>/another/best/programming/language</code> but
+ <span class="bold">not</span> <code>/another/apibreak</code> nor
+ <code>/another</code>)
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.html.path_three_asterisks_example')|safe }}
+ three asterisks (e.g. <code>/mail/dmarc/***</code>) - match all paths
+ resulting from substituting <code>***</code> with
+ <span class="bold">zero or more</span> segments (e.g.
+ <code>/mail/dmarc/spf</code>, <code>/mail/dmarc</code> or
+ <code>/mail/dmarc/dkim/failure</code> but
+ <span class="bold">not</span> <code>/mail/</code>)
{% endcall %}
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.path_trailing_slash')|safe }}
+ If pattern ends <span class="bold">without</span> a trailing slash, it
+ mathes paths with any number of trailing slashes, including zero. If
+ pattern ends <span class="bold">with</span> a trailing slash, it only
+ mathes paths with one or more trailing slashes. For example,
+ <code>/itscalled/*</code> matches <code>/itscalled/gnulinux</code>,
+ <code>/itscalled/gnulinux/</code> and <code>/itscalled/gnulinux//</code>
+ while <code>/itscalled/*/</code> only matches
+ <code>/itscalled/gnulinux/</code> and <code>/itscalled/gnulinux//</code>
+ out of those three.
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.path_trailing_slash_priority')|safe }}
+ If two patterns only differ by the presence of a trailing slash,
+ pattern <span class="bold">with</span> a trailing slash is considered
+ <span class="bold">more specific</span>.
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.path_literal_trailing_asterisks')|safe }}
+ Additionally, any path with literal trailing asterisks is matched by
+ itself, even if such pattern would otherwise be treated as wildcard
+ (e.g. <code>/gobacktoxul/**</code> matches <code>/gobacktoxul/**</code>).
+ This is likely to change in the future and would best not be relied upon.
+ Appending three additional asterisks to path pattern to represent literal
+ asterisks is being considered.
{% endcall %}
{% endcall %}
{% call section() %}
- {{ small_heading(_('doc.url_patterns.h_small.protocol_wildcards')) }}
+ {{ small_heading('URL scheme wildcard') }}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.protocol_wildcards')|safe }}
+ <code>http://</code> and <code>https://</code> shemes in the URL are
+ matched exactly. However, starting with Haketilo 3.0, it is also possible
+ for scheme pseudo-wildcard of <code>http*://</code> to be used. Use of URL
+ pattern with this scheme is equivalent to the use of 2 separate patterns
+ starting with <code>http://</code> and <code>https://</code>,
+ respectively. For example, pattern <code>http*://example.com</code> shall
+ match both <code>https://example.com</code> and
+ <code>http://example.com</code>.
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.protocol_wildcards_are_aliases')|safe }}
+ <code>http*://</code> may be considered not to be a true wildcard but
+ rather an alias for either of the other 2 values. As of Haketilo 3.0, the
+ speicificity of a URL pattern starting with <code>http*://</code> is
+ considered to be the same as that of the corresponding URL pattern
+ starting with <code>http://</code> or <code>https://</code>. In case of a
+ conflict, the order of precedence of such patterns is unspecified. This
+ behavior is likely to change in the future versions of Haketilo.
{% endcall %}
{% endcall %}
{% call section() %}
- {{ small_heading(_('doc.url_patterns.h_small.wildcard_priorities')) }}
+ {{ small_heading('Wildcard pattern priorities and querying') }}
{% call paragraph() %}
- {{ _('doc.url_patterns.priorities_intro') }}
+ In case multiple patterns match some URL, the more specific one is
+ preferred. Specificity is considered as follows
{% endcall %}
{% call unordered_list() %}
{% call list_entry() %}
- {{ _('doc.url_patterns.priorities_rule_path_ending')|safe }}
+ If patterns only differ in the final path segment, the one with least
+ wildcard asterisks in that segment if preferred.
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.priorities_rule_path_length')|safe }}
+ If patterns, besides the above, only differ in path length, one with
+ longer path is preferred. Neither final wildcard segment nor trailing
+ dashes account for path length.
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.priorities_rule_domain_beginning')|safe }}
+ If patterns, besides the above, only differ in the initial domain
+ segment, one with least wildcard asterisks in that segment is preferred.
{% endcall %}
{% call list_entry() %}
- {{ _('doc.url_patterns.priorities_rule_domain_length')|safe }}
+ If patterns differ in domain length, one with longer domain is
+ preferred. Initial wildcard segment does not account for domain length.
{% endcall %}
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.priorities_example1_intro')|safe }}
+ As an example, consider the URL
+ <code>http://settings.query.example.com/google/tries/destroy/adblockers//</code>.
+ Patterns matching it are, in the following order
{% endcall %}
{% call verbatim() %}
@@ -263,11 +350,13 @@ http://***.example.com/***
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.priorities_example1_note')|safe }}
+ Variants of those patterns starting with <code>http*://</code> would of
+ course match as well. They have been omitted for simplicity.
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.priorities_example2_intro')|safe }}
+ For a simpler URL like <code>https://example.com</code> the patterns would
+ be
{% endcall %}
{% call verbatim() %}
@@ -278,31 +367,43 @@ https://***.example.com/***
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.html.priorities_example2_note')|safe }}
+ Variants of those patterns with a trailing dash added
+ would <span class="bold">not</span> match the URL. Also, the pattern
+ variants starting with <code>http*://</code> have been once again omitted.
{% endcall %}
{% endcall %}
{% call section() %}
- {{ small_heading(_('doc.url_patterns.h_small.limits')) }}
+ {{ small_heading('Limits') }}
{% call paragraph() %}
- {{ _('doc.url_patterns.limits')|safe }}
+ In order to prevent some easy-to-conduct DoS attacks, older versions of
+ Haketilo and Hydrilla limited the lengths of domain and path parts of
+ processed URLs. This is no longer the case.
{% endcall %}
{% endcall %}
{% call section() %}
- {{ medium_heading(_('doc.url_patterns.h_medium.alt_solution')) }}
+ {{ medium_heading('Alternative solution idea: mimicking web server mechanics') }}
{% call paragraph() %}
- {{ _('doc.url_patterns.url_pattern_drawbacks') }}
+ While wildcard patterns as presented give a lot of flexibility, they are
+ not the only viable approach to specifying what URLs to apply
+ rules/payloads to. In fact, wildcards are different from how the server
+ side of a typical website decides what to return for a given URL request.
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.server_behavior_mimicking_idea') }}
+ In a typical scenario, an HTTP server like Apache reads configuration
+ files provided by its administrator and uses various (virtual host,
+ redirect, request rewrite, CGI, etc.) instructions to decide how to handle
+ given URL. Perhps using a scheme that mimics the configuration options
+ typically used with web servers would give more efficiency in specifying
+ what page settings to apply when.
{% endcall %}
{% call paragraph() %}
- {{ _('doc.url_patterns.approach_may_be_considered') }}
+ This approach may be considered in the future.
{% endcall %}
{% endcall %}
{% endblock main %}