aboutsummaryrefslogtreecommitdiff
path: root/src/hydrilla/proxy/self_doc/url_patterns.html.jinja
diff options
context:
space:
mode:
Diffstat (limited to 'src/hydrilla/proxy/self_doc/url_patterns.html.jinja')
-rw-r--r--src/hydrilla/proxy/self_doc/url_patterns.html.jinja409
1 files changed, 0 insertions, 409 deletions
diff --git a/src/hydrilla/proxy/self_doc/url_patterns.html.jinja b/src/hydrilla/proxy/self_doc/url_patterns.html.jinja
deleted file mode 100644
index f3415c5..0000000
--- a/src/hydrilla/proxy/self_doc/url_patterns.html.jinja
+++ /dev/null
@@ -1,409 +0,0 @@
-{#
-SPDX-License-Identifier: GPL-3.0-or-later OR CC-BY-SA-4.0
-
-Documentation page describing URL patterns understood by Haketilo.
-
-This file is part of Hydrilla&Haketilo.
-
-Copyright (C) 2022 Wojtek Kosior
-
-Dual licensed under
-* GNU General Public License v3.0 or later and
-* Creative Commons Attribution Share Alike 4.0 International.
-
-You can choose to use either of these licenses or both.
-
-
-I, Wojtek Kosior, thereby promise not to sue for violation of this
-file's licenses. Although I request that you do not make use of this
-code in a proprietary work, I am not going to enforce this in court.
-#}
-{% extends "doc_base.html.jinja" %}
-
-{% block title %} URL patterns {% endblock %}
-
-{% block main %}
- {{ big_heading('Haketio URL patterns') }}
-
- {% call section() %}
- {% call paragraph() %}
- We want to be able to apply different rules and custom scripts for
- different websites. However, merely specifying "do this for all documents
- under <code>https://example.com</code>" is not enough. Single site's pages
- might differ strongly and require different custom scripts to be
- loaded. Always matching against a full URL like
- <code>https://example.com/something/somethingelse</code> is also not
- a good option. It doesn't allow us to properly handle a site that serves
- similar pages for multiple values substituted for
- <code>somethingelse</code>.
- {% endcall %}
- {% endcall %}
-
- {% call section() %}
- {{ medium_heading('Employed solution') }}
-
- {% call paragraph() %}
- Wildcards are being used to address the problem. Each payload and rule in
- Haketilo has a URL pattern that specifies to which internet pages it
- applies. A URL pattern can be as as simple as literal URL in which case it
- only matches itself. It can also contain wildcards in the form of one or
- more asterisks (<code>*</code>) that correspond to multiple possible
- strings occurring in that place.
- {% endcall %}
-
- {% call paragraph() %}
- Wildcards can appear in URL's domain and path that follows it. These 2
- types of wildcards are handled separately.
- {% endcall %}
- {% endcall %}
-
- {% call section() %}
- {{ small_heading('Domain wildcards') }}
-
- {% call paragraph() %}
- A domain wildcard takes the form of one, two or three asterisks occurring
- in place of a single domain name segment at the beginning
- (left). Depending on the number of asterisks, the meaning is as follows
- {% endcall %}
-
- {% call unordered_list() %}
- {% call list_entry() %}
- no asterisks (e.g. <code>example.com</code>) - match domain name exactly
- (e.g. <code>example.com</code>)
- {% endcall %}
- {% call list_entry() %}
- one asterisk (e.g. <code>*.example.com</code>) - match all domains
- resulting from substituting <code>*</code> with a
- <span class="bold">single</span> segment (e.g.
- <code>banana.example.com</code> or <code>pineapple.example.com</code>
- but <span class="bold">not</span> <code>pineapple.pen.example.com</code>
- nor <code>example.com</code>)
- {% endcall %}
- {% call list_entry() %}
- two asterisks (e.g. <code>**.example.com</code>) - match all domains
- resulting from substituting <code>**</code> with
- <span class="bold">two or more</span> segments (e.g.
- <code>monad.breakfast.example.com</code> or
- <code>pure.monad.breakfast.example.com</code> but
- <span class="bold">not</span> <code>cabalhell.example.com</code> nor
- <code>example.com</code>)
- {% endcall %}
- {% call list_entry() %}
- three asterisks (e.g. <code>***.example.com</code>) - match all domains
- resulting from substituting <code>***</code> with
- <span class="bold">zero or more</span> segments (e.g.
- <code>hello.parkmeter.example.com</code> or
- <code>iliketrains.example.com</code> or <code>example.com</code>)
- {% endcall %}
- {% endcall %}
- {% endcall %}
-
- {% call section() %}
- {{ small_heading('Path wildcards') }}
-
- {% call paragraph() %}
- A path wildcard takes the form of one, two or three asterisks occurring in
- place of a single path segment at the end of path (right). Depending on
- the number of asterisks, the meaning is as follows
- {% endcall %}
-
- {% call unordered_list() %}
- {% call list_entry() %}
- no asterisks (e.g. <code>/joke/clowns</code>) - match path exactly (e.g.
- <code>/joke/clowns</code>)
- {% endcall %}
- {% call list_entry() %}
- one asterisk (e.g. <code>/itscalled/*</code>) - match all paths
- resulting from substituting <code>*</code> with a
- <span class="bold">single</span> segment (e.g.
- <code>/itscalled/gnulinux</code> or <code>/itscalled/glamp</code> but
- <span class="bold">not</span> <code>/itscalled/</code> nor
- <code>/itscalled/gnu/linux</code>)
- {% endcall %}
- {% call list_entry() %}
- two asterisks (e.g. <code>/another/**</code>) - match all paths
- resulting from substituting <code>**</code> with
- <span class="bold">two or more</span> segments (e.g.
- <code>/another/nsa/backdoor</code> or
- <code>/another/best/programming/language</code> but
- <span class="bold">not</span> <code>/another/apibreak</code> nor
- <code>/another</code>)
- {% endcall %}
- {% call list_entry() %}
- three asterisks (e.g. <code>/mail/dmarc/***</code>) - match all paths
- resulting from substituting <code>***</code> with
- <span class="bold">zero or more</span> segments (e.g.
- <code>/mail/dmarc/spf</code>, <code>/mail/dmarc</code> or
- <code>/mail/dmarc/dkim/failure</code> but
- <span class="bold">not</span> <code>/mail/</code>)
- {% endcall %}
- {% endcall %}
-
- {% call paragraph() %}
- If pattern ends <span class="bold">without</span> a trailing slash, it
- mathes paths with any number of trailing slashes, including zero. If
- pattern ends <span class="bold">with</span> a trailing slash, it only
- mathes paths with one or more trailing slashes. For example,
- <code>/itscalled/*</code> matches <code>/itscalled/gnulinux</code>,
- <code>/itscalled/gnulinux/</code> and <code>/itscalled/gnulinux//</code>
- while <code>/itscalled/*/</code> only matches
- <code>/itscalled/gnulinux/</code> and <code>/itscalled/gnulinux//</code>
- out of those three.
- {% endcall %}
-
- {% call paragraph() %}
- If two patterns only differ by the presence of a trailing slash,
- pattern <span class="bold">with</span> a trailing slash is considered
- <span class="bold">more specific</span>.
- {% endcall %}
-
- {% call paragraph() %}
- Additionally, any path with literal trailing asterisks is matched by
- itself, even if such pattern would otherwise be treated as wildcard
- (e.g. <code>/gobacktoxul/**</code> matches <code>/gobacktoxul/**</code>).
- This is likely to change in the future and would best not be relied upon.
- Appending three additional asterisks to path pattern to represent literal
- asterisks is being considered.
- {% endcall %}
- {% endcall %}
-
- {% call section() %}
- {{ small_heading('URL scheme wildcard') }}
-
- {% call paragraph() %}
- <code>http://</code> and <code>https://</code> shemes in the URL are
- matched exactly. However, starting with Haketilo 3.0, it is also possible
- for scheme pseudo-wildcard of <code>http*://</code> to be used. Use of URL
- pattern with this scheme is equivalent to the use of 2 separate patterns
- starting with <code>http://</code> and <code>https://</code>,
- respectively. For example, pattern <code>http*://example.com</code> shall
- match both <code>https://example.com</code> and
- <code>http://example.com</code>.
- {% endcall %}
-
- {% call paragraph() %}
- <code>http*://</code> may be considered not to be a true wildcard but
- rather an alias for either of the other 2 values. As of Haketilo 3.0, the
- speicificity of a URL pattern starting with <code>http*://</code> is
- considered to be the same as that of the corresponding URL pattern
- starting with <code>http://</code> or <code>https://</code>. In case of a
- conflict, the order of precedence of such patterns is unspecified. This
- behavior is likely to change in the future versions of Haketilo.
- {% endcall %}
- {% endcall %}
-
- {% call section() %}
- {{ small_heading('Wildcard pattern priorities and querying') }}
-
- {% call paragraph() %}
- In case multiple patterns match some URL, the more specific one is
- preferred. Specificity is considered as follows
- {% endcall %}
-
- {% call unordered_list() %}
- {% call list_entry() %}
- If patterns only differ in the final path segment, the one with least
- wildcard asterisks in that segment if preferred.
- {% endcall %}
- {% call list_entry() %}
- If patterns, besides the above, only differ in path length, one with
- longer path is preferred. Neither final wildcard segment nor trailing
- dashes account for path length.
- {% endcall %}
- {% call list_entry() %}
- If patterns, besides the above, only differ in the initial domain
- segment, one with least wildcard asterisks in that segment is preferred.
- {% endcall %}
- {% call list_entry() %}
- If patterns differ in domain length, one with longer domain is
- preferred. Initial wildcard segment does not account for domain length.
- {% endcall %}
- {% endcall %}
-
- {% call paragraph() %}
- As an example, consider the URL
- <code>http://settings.query.example.com/google/tries/destroy/adblockers//</code>.
- Patterns matching it are, in the following order
- {% endcall %}
-
- {% call verbatim() %}
-http://settings.query.example.com/google/tries/destroy/adblockers/
-http://settings.query.example.com/google/tries/destroy/adblockers
-http://settings.query.example.com/google/tries/destroy/adblockers/***/
-http://settings.query.example.com/google/tries/destroy/adblockers/***
-http://settings.query.example.com/google/tries/destroy/*/
-http://settings.query.example.com/google/tries/destroy/*
-http://settings.query.example.com/google/tries/destroy/***/
-http://settings.query.example.com/google/tries/destroy/***
-http://settings.query.example.com/google/tries/**/
-http://settings.query.example.com/google/tries/**
-http://settings.query.example.com/google/tries/***/
-http://settings.query.example.com/google/tries/***
-http://settings.query.example.com/google/**/
-http://settings.query.example.com/google/**
-http://settings.query.example.com/google/***/
-http://settings.query.example.com/google/***
-http://settings.query.example.com/**/
-http://settings.query.example.com/**
-http://settings.query.example.com/***/
-http://settings.query.example.com/***
-http://***.settings.query.example.com/google/tries/destroy/adblockers/
-http://***.settings.query.example.com/google/tries/destroy/adblockers
-http://***.settings.query.example.com/google/tries/destroy/adblockers/***/
-http://***.settings.query.example.com/google/tries/destroy/adblockers/***
-http://***.settings.query.example.com/google/tries/destroy/*/
-http://***.settings.query.example.com/google/tries/destroy/*
-http://***.settings.query.example.com/google/tries/destroy/***/
-http://***.settings.query.example.com/google/tries/destroy/***
-http://***.settings.query.example.com/google/tries/**/
-http://***.settings.query.example.com/google/tries/**
-http://***.settings.query.example.com/google/tries/***/
-http://***.settings.query.example.com/google/tries/***
-http://***.settings.query.example.com/google/**/
-http://***.settings.query.example.com/google/**
-http://***.settings.query.example.com/google/***/
-http://***.settings.query.example.com/google/***
-http://***.settings.query.example.com/**/
-http://***.settings.query.example.com/**
-http://***.settings.query.example.com/***/
-http://***.settings.query.example.com/***
-http://*.query.example.com/google/tries/destroy/adblockers/
-http://*.query.example.com/google/tries/destroy/adblockers
-http://*.query.example.com/google/tries/destroy/adblockers/***/
-http://*.query.example.com/google/tries/destroy/adblockers/***
-http://*.query.example.com/google/tries/destroy/*/
-http://*.query.example.com/google/tries/destroy/*
-http://*.query.example.com/google/tries/destroy/***/
-http://*.query.example.com/google/tries/destroy/***
-http://*.query.example.com/google/tries/**/
-http://*.query.example.com/google/tries/**
-http://*.query.example.com/google/tries/***/
-http://*.query.example.com/google/tries/***
-http://*.query.example.com/google/**/
-http://*.query.example.com/google/**
-http://*.query.example.com/google/***/
-http://*.query.example.com/google/***
-http://*.query.example.com/**/
-http://*.query.example.com/**
-http://*.query.example.com/***/
-http://*.query.example.com/***
-http://***.query.example.com/google/tries/destroy/adblockers/
-http://***.query.example.com/google/tries/destroy/adblockers
-http://***.query.example.com/google/tries/destroy/adblockers/***/
-http://***.query.example.com/google/tries/destroy/adblockers/***
-http://***.query.example.com/google/tries/destroy/*/
-http://***.query.example.com/google/tries/destroy/*
-http://***.query.example.com/google/tries/destroy/***/
-http://***.query.example.com/google/tries/destroy/***
-http://***.query.example.com/google/tries/**/
-http://***.query.example.com/google/tries/**
-http://***.query.example.com/google/tries/***/
-http://***.query.example.com/google/tries/***
-http://***.query.example.com/google/**/
-http://***.query.example.com/google/**
-http://***.query.example.com/google/***/
-http://***.query.example.com/google/***
-http://***.query.example.com/**/
-http://***.query.example.com/**
-http://***.query.example.com/***/
-http://***.query.example.com/***
-http://**.example.com/google/tries/destroy/adblockers/
-http://**.example.com/google/tries/destroy/adblockers
-http://**.example.com/google/tries/destroy/adblockers/***/
-http://**.example.com/google/tries/destroy/adblockers/***
-http://**.example.com/google/tries/destroy/*/
-http://**.example.com/google/tries/destroy/*
-http://**.example.com/google/tries/destroy/***/
-http://**.example.com/google/tries/destroy/***
-http://**.example.com/google/tries/**/
-http://**.example.com/google/tries/**
-http://**.example.com/google/tries/***/
-http://**.example.com/google/tries/***
-http://**.example.com/google/**/
-http://**.example.com/google/**
-http://**.example.com/google/***/
-http://**.example.com/google/***
-http://**.example.com/**/
-http://**.example.com/**
-http://**.example.com/***/
-http://**.example.com/***
-http://***.example.com/google/tries/destroy/adblockers/
-http://***.example.com/google/tries/destroy/adblockers
-http://***.example.com/google/tries/destroy/adblockers/***/
-http://***.example.com/google/tries/destroy/adblockers/***
-http://***.example.com/google/tries/destroy/*/
-http://***.example.com/google/tries/destroy/*
-http://***.example.com/google/tries/destroy/***/
-http://***.example.com/google/tries/destroy/***
-http://***.example.com/google/tries/**/
-http://***.example.com/google/tries/**
-http://***.example.com/google/tries/***/
-http://***.example.com/google/tries/***
-http://***.example.com/google/**/
-http://***.example.com/google/**
-http://***.example.com/google/***/
-http://***.example.com/google/***
-http://***.example.com/**/
-http://***.example.com/**
-http://***.example.com/***/
-http://***.example.com/***
- {% endcall %}
-
- {% call paragraph() %}
- Variants of those patterns starting with <code>http*://</code> would of
- course match as well. They have been omitted for simplicity.
- {% endcall %}
-
- {% call paragraph() %}
- For a simpler URL like <code>https://example.com</code> the patterns would
- be
- {% endcall %}
-
- {% call verbatim() %}
-https://example.com
-https://example.com/***
-https://***.example.com
-https://***.example.com/***
- {% endcall %}
-
- {% call paragraph() %}
- Variants of those patterns with a trailing dash added
- would <span class="bold">not</span> match the URL. Also, the pattern
- variants starting with <code>http*://</code> have been once again omitted.
- {% endcall %}
- {% endcall %}
-
- {% call section() %}
- {{ small_heading('Limits') }}
-
- {% call paragraph() %}
- In order to prevent some easy-to-conduct DoS attacks, older versions of
- Haketilo and Hydrilla limited the lengths of domain and path parts of
- processed URLs. This is no longer the case.
- {% endcall %}
- {% endcall %}
-
- {% call section() %}
- {{ medium_heading('Alternative solution idea: mimicking web server mechanics') }}
-
- {% call paragraph() %}
- While wildcard patterns as presented give a lot of flexibility, they are
- not the only viable approach to specifying what URLs to apply
- rules/payloads to. In fact, wildcards are different from how the server
- side of a typical website decides what to return for a given URL request.
- {% endcall %}
-
- {% call paragraph() %}
- In a typical scenario, an HTTP server like Apache reads configuration
- files provided by its administrator and uses various (virtual host,
- redirect, request rewrite, CGI, etc.) instructions to decide how to handle
- given URL. Perhps using a scheme that mimics the configuration options
- typically used with web servers would give more efficiency in specifying
- what page settings to apply when.
- {% endcall %}
-
- {% call paragraph() %}
- This approach may be considered in the future.
- {% endcall %}
- {% endcall %}
-{% endblock main %}