From cc40af334e619bb549038238507407866f774f8f Mon Sep 17 00:00:00 2001 From: hongbotian Date: Mon, 30 Nov 2015 01:35:09 -0500 Subject: upload apache JIRA: BOTTLENECK-10 Change-Id: I67eae31de6dc824097dfa56ab454ba36fdd23a2c Signed-off-by: hongbotian --- rubbos/app/apache2/manual/rewrite/index.html | 9 + rubbos/app/apache2/manual/rewrite/index.html.en | 99 ++ .../app/apache2/manual/rewrite/index.html.tr.utf8 | 91 ++ .../app/apache2/manual/rewrite/rewrite_guide.html | 5 + .../apache2/manual/rewrite/rewrite_guide.html.en | 788 ++++++++++++ .../manual/rewrite/rewrite_guide_advanced.html | 5 + .../manual/rewrite/rewrite_guide_advanced.html.en | 1289 ++++++++++++++++++++ .../app/apache2/manual/rewrite/rewrite_intro.html | 5 + .../apache2/manual/rewrite/rewrite_intro.html.en | 117 ++ .../app/apache2/manual/rewrite/rewrite_tech.html | 5 + .../apache2/manual/rewrite/rewrite_tech.html.en | 166 +++ 11 files changed, 2579 insertions(+) create mode 100644 rubbos/app/apache2/manual/rewrite/index.html create mode 100644 rubbos/app/apache2/manual/rewrite/index.html.en create mode 100644 rubbos/app/apache2/manual/rewrite/index.html.tr.utf8 create mode 100644 rubbos/app/apache2/manual/rewrite/rewrite_guide.html create mode 100644 rubbos/app/apache2/manual/rewrite/rewrite_guide.html.en create mode 100644 rubbos/app/apache2/manual/rewrite/rewrite_guide_advanced.html create mode 100644 rubbos/app/apache2/manual/rewrite/rewrite_guide_advanced.html.en create mode 100644 rubbos/app/apache2/manual/rewrite/rewrite_intro.html create mode 100644 rubbos/app/apache2/manual/rewrite/rewrite_intro.html.en create mode 100644 rubbos/app/apache2/manual/rewrite/rewrite_tech.html create mode 100644 rubbos/app/apache2/manual/rewrite/rewrite_tech.html.en (limited to 'rubbos/app/apache2/manual/rewrite') diff --git a/rubbos/app/apache2/manual/rewrite/index.html b/rubbos/app/apache2/manual/rewrite/index.html new file mode 100644 index 00000000..23ec1ec0 --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/index.html @@ -0,0 +1,9 @@ +# GENERATED FROM XML -- DO NOT EDIT + +URI: index.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 + +URI: index.html.tr.utf8 +Content-Language: tr +Content-type: text/html; charset=UTF-8 diff --git a/rubbos/app/apache2/manual/rewrite/index.html.en b/rubbos/app/apache2/manual/rewrite/index.html.en new file mode 100644 index 00000000..a442b2e8 --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/index.html.en @@ -0,0 +1,99 @@ + + + +Apache mod_rewrite - Apache HTTP Server + + + + + +
<-
+

Apache mod_rewrite

+
+

Available Languages:  en  | + tr 

+
+ +
+

``The great thing about mod_rewrite is it gives you + all the configurability and flexibility of Sendmail. + The downside to mod_rewrite is that it gives you all + the configurability and flexibility of Sendmail.''

+ +

-- Brian Behlendorf
+ Apache Group

+ +
+ +
+

`` Despite the tons of examples and docs, + mod_rewrite is voodoo. Damned cool voodoo, but still + voodoo. ''

+ +

-- Brian Moore
+ bem@news.cmc.net

+ +
+ +

Welcome to mod_rewrite, the Swiss Army Knife of URL + manipulation!

+ +

This module uses a rule-based rewriting engine (based on a + regular-expression parser) to rewrite requested URLs on the + fly. It supports an unlimited number of rules and an + unlimited number of attached rule conditions for each rule to + provide a really flexible and powerful URL manipulation + mechanism. The URL manipulations can depend on various tests, + for instance server variables, environment variables, HTTP + headers, time stamps and even external database lookups in + various formats can be used to achieve granular URL + matching.

+ +

This module operates on the full URLs (including the + path-info part) both in per-server context + (httpd.conf) and per-directory context + (.htaccess) and can even generate query-string + parts on result. The rewritten result can lead to internal + sub-processing, external request redirection or even to an + internal proxy throughput.

+ +

But all this functionality and flexibility has its + drawback: complexity. So don't expect to understand this + entire module in just one day.

+ +
+ +
top
+
top
+
+

mod_rewrite

+

Extensive documentation on the directives +provided by this module is provided in the mod_rewrite reference documentation. +

+
+
+

Available Languages:  en  | + tr 

+
+ \ No newline at end of file diff --git a/rubbos/app/apache2/manual/rewrite/index.html.tr.utf8 b/rubbos/app/apache2/manual/rewrite/index.html.tr.utf8 new file mode 100644 index 00000000..d9fd5806 --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/index.html.tr.utf8 @@ -0,0 +1,91 @@ + + + +Apache mod_rewrite - Apache HTTP Sunucusu + + + + + +
<-
+

Apache mod_rewrite

+
+

Mevcut Diller:  en  | + tr 

+
+ +
+

``mod_rewrite’ı harika yapan şey, Sendmail’ın tüm yapılandırma + kolaylığı ve esnekliğine sahip olmasıdır. mod_rewrite’ı kötü yapan + şey ise Sendmail’ın tüm yapılandırma kolaylığı ve esnekliğine sahip + olmasıdır.''

+ +

-- Brian Behlendorf
+ Apache Group

+
+ +
+

``Hakkında tonlarca örnek ve belge olmasına rağmen mod_rewrite kara + büyüdür. Müthiş güzel bir kara büyü ama yine de kara büyü.''

+ +

-- Brian Moore
+ bem@news.cmc.net

+
+ +

URL kurgulamasının İsviçre Çakısı olan mod_rewrite + modülünün belgelerine hoşgeldiniz!

+ +

Bu modül istenen URL’leri çalışma anında yeniden yazmak için (düzenli + ifade çözümleyiciden yararlanan) kurallara dayalı bir yeniden yazma + motoru kullanır. Gerçekten esnek ve güçlü bir URL kurgulama + mekanizması oluşturmak için sınısız sayıda kural ve her kural için de + sınırsız sayıda koşul destekler. URL değişiklikleri çeşitli sınamalara + dayanır; sunucu değişkenleri, HTTP başlıkları, ortam değişkenleri, + zaman damgaları hatta çeşitli biçimlerde harici veritabanı sorguları + bile bu amaçla kullanılabilir.

+ +

Bu modül URL’lerin tamamında (path-info kısmı dahil) hem sunucu + bağlamında (httpd.conf) hem de dizin bağlamında + (.htaccess) çalışır ve URL üzerinde sorgu dizgesi bölümleri + bile oluşturabilir. Yeniden yazılan URL sonuçta dahili işlemlerde, harici + yönlendirmelerde ve hatta dahili vekalet işlemlerinde kullanılabilir.

+ +

Fakat tüm bu işlevsellik ve esnekliğin bir bedeli vardır: karmaşıklık. + Bu yüzden bu modülün yapabildiklerini bir günde anlayabilmeyi + beklemeyin.

+ +
+ +
top
+
top
+
+

mod_rewrite Modülü

+

Bu modülce sağlanan yönergeler ve ortam değişkenleri + mod_rewrite başvuru kılavuzunda ayrıntılı olarak + açıklanmıştır.

+
+
+

Mevcut Diller:  en  | + tr 

+
+ \ No newline at end of file diff --git a/rubbos/app/apache2/manual/rewrite/rewrite_guide.html b/rubbos/app/apache2/manual/rewrite/rewrite_guide.html new file mode 100644 index 00000000..69c288c8 --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/rewrite_guide.html @@ -0,0 +1,5 @@ +# GENERATED FROM XML -- DO NOT EDIT + +URI: rewrite_guide.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/rubbos/app/apache2/manual/rewrite/rewrite_guide.html.en b/rubbos/app/apache2/manual/rewrite/rewrite_guide.html.en new file mode 100644 index 00000000..527cdac2 --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/rewrite_guide.html.en @@ -0,0 +1,788 @@ + + + +URL Rewriting Guide - Apache HTTP Server + + + + + +
<-
+

URL Rewriting Guide

+
+

Available Languages:  en 

+
+ + +

This document supplements the mod_rewrite + reference documentation. + It describes how one can use Apache's mod_rewrite + to solve typical URL-based problems with which webmasters are + commonony confronted. We give detailed descriptions on how to + solve each problem by configuring URL rewriting rulesets.

+ +
ATTENTION: Depending on your server configuration + it may be necessary to slightly change the examples for your + situation, e.g. adding the [PT] flag when + additionally using mod_alias and + mod_userdir, etc. Or rewriting a ruleset + to fit in .htaccess context instead + of per-server context. Always try to understand what a + particular ruleset really does before you use it. This + avoids many problems.
+ +
+ +
top
+
+

Canonical URLs

+ + + +
+
Description:
+ +
+

On some webservers there are more than one URL for a + resource. Usually there are canonical URLs (which should be + actually used and distributed) and those which are just + shortcuts, internal ones, etc. Independent of which URL the + user supplied with the request he should finally see the + canonical one only.

+
+ +
Solution:
+ +
+

We do an external HTTP redirect for all non-canonical + URLs to fix them in the location view of the Browser and + for all subsequent requests. In the example ruleset below + we replace /~user by the canonical + /u/user and fix a missing trailing slash for + /u/user.

+ +
+RewriteRule   ^/~([^/]+)/?(.*)    /u/$1/$2  [R]
+RewriteRule   ^/([uge])/([^/]+)$  /$1/$2/   [R]
+
+
+
+ +
top
+
+

Canonical Hostnames

+ +
+
Description:
+ +
The goal of this rule is to force the use of a particular + hostname, in preference to other hostnames which may be used to + reach the same site. For example, if you wish to force the use + of www.example.com instead of + example.com, you might use a variant of the + following recipe.
+ +
Solution:
+ +
+

For sites running on a port other than 80:

+
+RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
+RewriteCond %{HTTP_HOST}   !^$
+RewriteCond %{SERVER_PORT} !^80$
+RewriteRule ^/(.*)         http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R]
+
+ +

And for a site running on port 80

+
+RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
+RewriteCond %{HTTP_HOST}   !^$
+RewriteRule ^/(.*)         http://fully.qualified.domain.name/$1 [L,R]
+
+
+
+ +
top
+
+

Moved DocumentRoot

+ + + +
+
Description:
+ +
+

Usually the DocumentRoot +of the webserver directly relates to the URL "/". +But often this data is not really of top-level priority. For example, +you may wish for visitors, on first entering a site, to go to a +particular subdirectory /about/. This may be accomplished +using the following ruleset:

+
+ +
Solution:
+ +
+

We redirect the URL / to + /about/: +

+ +
+RewriteEngine on
+RewriteRule   ^/$  /about/  [R]
+
+ +

Note that this can also be handled using the RedirectMatch directive:

+ +

+RedirectMatch ^/$ http://example.com/e/www/ +

+
+
+ +
top
+
+

Trailing Slash Problem

+ + + +
+
Description:
+ +

The vast majority of "trailing slash" problems can be dealt + with using the techniques discussed in the FAQ + entry. However, occasionally, there is a need to use mod_rewrite + to handle a case where a missing trailing slash causes a URL to + fail. This can happen, for example, after a series of complex + rewrite rules.

+
+ +
Solution:
+ +
+

The solution to this subtle problem is to let the server + add the trailing slash automatically. To do this + correctly we have to use an external redirect, so the + browser correctly requests subsequent images etc. If we + only did a internal rewrite, this would only work for the + directory page, but would go wrong when any images are + included into this page with relative URLs, because the + browser would request an in-lined object. For instance, a + request for image.gif in + /~quux/foo/index.html would become + /~quux/image.gif without the external + redirect!

+ +

So, to do this trick we write:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo$  foo/  [R]
+
+ +

Alternately, you can put the following in a + top-level .htaccess file in the content directory. + But note that this creates some processing overhead.

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteCond    %{REQUEST_FILENAME}  -d
+RewriteRule    ^(.+[^/])$           $1/  [R]
+
+
+
+ +
top
+
+

Move Homedirs to Different Webserver

+ + + +
+
Description:
+ +
+

Many webmasters have asked for a solution to the + following situation: They wanted to redirect just all + homedirs on a webserver to another webserver. They usually + need such things when establishing a newer webserver which + will replace the old one over time.

+
+ +
Solution:
+ +
+

The solution is trivial with mod_rewrite. + On the old webserver we just redirect all + /~user/anypath URLs to + http://newserver/~user/anypath.

+ +
+RewriteEngine on
+RewriteRule   ^/~(.+)  http://newserver/~$1  [R,L]
+
+
+
+ +
top
+
+

Search pages in more than one directory

+ + + +
+
Description:
+ +
+

Sometimes it is necessary to let the webserver search + for pages in more than one directory. Here MultiViews or + other techniques cannot help.

+
+ +
Solution:
+ +
+

We program a explicit ruleset which searches for the + files in the directories.

+ +
+RewriteEngine on
+
+#   first try to find it in custom/...
+#   ...and if found stop and be happy:
+RewriteCond         /your/docroot/dir1/%{REQUEST_FILENAME}  -f
+RewriteRule  ^(.+)  /your/docroot/dir1/$1  [L]
+
+#   second try to find it in pub/...
+#   ...and if found stop and be happy:
+RewriteCond         /your/docroot/dir2/%{REQUEST_FILENAME}  -f
+RewriteRule  ^(.+)  /your/docroot/dir2/$1  [L]
+
+#   else go on for other Alias or ScriptAlias directives,
+#   etc.
+RewriteRule   ^(.+)  -  [PT]
+
+
+
+ +
top
+
+

Set Environment Variables According To URL Parts

+ + + +
+
Description:
+ +
+

Perhaps you want to keep status information between + requests and use the URL to encode it. But you don't want + to use a CGI wrapper for all pages just to strip out this + information.

+
+ +
Solution:
+ +
+

We use a rewrite rule to strip out the status information + and remember it via an environment variable which can be + later dereferenced from within XSSI or CGI. This way a + URL /foo/S=java/bar/ gets translated to + /foo/bar/ and the environment variable named + STATUS is set to the value "java".

+ +
+RewriteEngine on
+RewriteRule   ^(.*)/S=([^/]+)/(.*)    $1/$3 [E=STATUS:$2]
+
+
+
+ +
top
+
+

Virtual User Hosts

+ + + +
+
Description:
+ +
+

Assume that you want to provide + www.username.host.domain.com + for the homepage of username via just DNS A records to the + same machine and without any virtualhosts on this + machine.

+
+ +
Solution:
+ +
+

For HTTP/1.0 requests there is no solution, but for + HTTP/1.1 requests which contain a Host: HTTP header we + can use the following ruleset to rewrite + http://www.username.host.com/anypath + internally to /home/username/anypath:

+ +
+RewriteEngine on
+RewriteCond   %{HTTP_HOST}                 ^www\.[^.]+\.host\.com$
+RewriteRule   ^(.+)                        %{HTTP_HOST}$1          [C]
+RewriteRule   ^www\.([^.]+)\.host\.com(.*) /home/$1$2
+
+
+
+ +
top
+
+

Redirect Homedirs For Foreigners

+ + + +
+
Description:
+ +
+

We want to redirect homedir URLs to another webserver + www.somewhere.com when the requesting user + does not stay in the local domain + ourdomain.com. This is sometimes used in + virtual host contexts.

+
+ +
Solution:
+ +
+

Just a rewrite condition:

+ +
+RewriteEngine on
+RewriteCond   %{REMOTE_HOST}  !^.+\.ourdomain\.com$
+RewriteRule   ^(/~.+)         http://www.somewhere.com/$1 [R,L]
+
+
+
+ +
top
+
+

Redirecting Anchors

+ + + +
+
Description:
+ +
+

By default, redirecting to an HTML anchor doesn't work, + because mod_rewrite escapes the # character, + turning it into %23. This, in turn, breaks the + redirection.

+
+ +
Solution:
+ +
+

Use the [NE] flag on the + RewriteRule. NE stands for No Escape. +

+
+
+ +
top
+
+

Time-Dependent Rewriting

+ + + +
+
Description:
+ +
+

When tricks like time-dependent content should happen a + lot of webmasters still use CGI scripts which do for + instance redirects to specialized pages. How can it be done + via mod_rewrite?

+
+ +
Solution:
+ +
+

There are a lot of variables named TIME_xxx + for rewrite conditions. In conjunction with the special + lexicographic comparison patterns <STRING, + >STRING and =STRING we can + do time-dependent redirects:

+ +
+RewriteEngine on
+RewriteCond   %{TIME_HOUR}%{TIME_MIN} >0700
+RewriteCond   %{TIME_HOUR}%{TIME_MIN} <1900
+RewriteRule   ^foo\.html$             foo.day.html
+RewriteRule   ^foo\.html$             foo.night.html
+
+ +

This provides the content of foo.day.html + under the URL foo.html from + 07:00-19:00 and at the remaining time the + contents of foo.night.html. Just a nice + feature for a homepage...

+
+
+ +
top
+
+

Backward Compatibility for YYYY to XXXX migration

+ + + +
+
Description:
+ +
+

How can we make URLs backward compatible (still + existing virtually) after migrating document.YYYY + to document.XXXX, e.g. after translating a + bunch of .html files to .phtml?

+
+ +
Solution:
+ +
+

We just rewrite the name to its basename and test for + existence of the new extension. If it exists, we take + that name, else we rewrite the URL to its original state.

+ + +
+#   backward compatibility ruleset for
+#   rewriting document.html to document.phtml
+#   when and only when document.phtml exists
+#   but no longer document.html
+RewriteEngine on
+RewriteBase   /~quux/
+#   parse out basename, but remember the fact
+RewriteRule   ^(.*)\.html$              $1      [C,E=WasHTML:yes]
+#   rewrite to document.phtml if exists
+RewriteCond   %{REQUEST_FILENAME}.phtml -f
+RewriteRule   ^(.*)$ $1.phtml                   [S=1]
+#   else reverse the previous basename cutout
+RewriteCond   %{ENV:WasHTML}            ^yes$
+RewriteRule   ^(.*)$ $1.html
+
+
+
+ +
top
+
+

Content Handling

+ + + +

From Old to New (intern)

+ + + +
+
Description:
+ +
+

Assume we have recently renamed the page + foo.html to bar.html and now want + to provide the old URL for backward compatibility. Actually + we want that users of the old URL even not recognize that + the pages was renamed.

+
+ +
Solution:
+ +
+

We rewrite the old URL to the new one internally via the + following rule:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  bar.html
+
+
+
+ + + +

From Old to New (extern)

+ + + +
+
Description:
+ +
+

Assume again that we have recently renamed the page + foo.html to bar.html and now want + to provide the old URL for backward compatibility. But this + time we want that the users of the old URL get hinted to + the new one, i.e. their browsers Location field should + change, too.

+
+ +
Solution:
+ +
+

We force a HTTP redirect to the new URL which leads to a + change of the browsers and thus the users view:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  bar.html  [R]
+
+
+
+ + + +

From Static to Dynamic

+ + + +
+
Description:
+ +
+

How can we transform a static page + foo.html into a dynamic variant + foo.cgi in a seamless way, i.e. without notice + by the browser/user.

+
+ +
Solution:
+ +
+

We just rewrite the URL to the CGI-script and force the + correct MIME-type so it gets really run as a CGI-script. + This way a request to /~quux/foo.html + internally leads to the invocation of + /~quux/foo.cgi.

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  foo.cgi  [T=application/x-httpd-cgi]
+
+
+
+ + +
top
+
+

Access Restriction

+ + + +

Blocking of Robots

+ + + +
+
Description:
+ +
+

How can we block a really annoying robot from + retrieving pages of a specific webarea? A + /robots.txt file containing entries of the + "Robot Exclusion Protocol" is typically not enough to get + rid of such a robot.

+
+ +
Solution:
+ +
+

We use a ruleset which forbids the URLs of the webarea + /~quux/foo/arc/ (perhaps a very deep + directory indexed area where the robot traversal would + create big server load). We have to make sure that we + forbid access only to the particular robot, i.e. just + forbidding the host where the robot runs is not enough. + This would block users from this host, too. We accomplish + this by also matching the User-Agent HTTP header + information.

+ +
+RewriteCond %{HTTP_USER_AGENT}   ^NameOfBadRobot.*
+RewriteCond %{REMOTE_ADDR}       ^123\.45\.67\.[8-9]$
+RewriteRule ^/~quux/foo/arc/.+   -   [F]
+
+
+
+ + + +

Blocked Inline-Images

+ + + +
+
Description:
+ +
+

Assume we have under http://www.quux-corp.de/~quux/ + some pages with inlined GIF graphics. These graphics are + nice, so others directly incorporate them via hyperlinks to + their pages. We don't like this practice because it adds + useless traffic to our server.

+
+ +
Solution:
+ +
+

While we cannot 100% protect the images from inclusion, + we can at least restrict the cases where the browser + sends a HTTP Referer header.

+ +
+RewriteCond %{HTTP_REFERER} !^$
+RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC]
+RewriteRule .*\.gif$        -                                    [F]
+
+ +
+RewriteCond %{HTTP_REFERER}         !^$
+RewriteCond %{HTTP_REFERER}         !.*/foo-with-gif\.html$
+RewriteRule ^inlined-in-foo\.gif$   -                        [F]
+
+
+
+ + + +

Proxy Deny

+ + + +
+
Description:
+ +
+

How can we forbid a certain host or even a user of a + special host from using the Apache proxy?

+
+ +
Solution:
+ +
+

We first have to make sure mod_rewrite + is below(!) mod_proxy in the Configuration + file when compiling the Apache webserver. This way it gets + called before mod_proxy. Then we + configure the following for a host-dependent deny...

+ +
+RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+ +

...and this one for a user@host-dependent deny:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST}  ^badguy@badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+
+
+ + + +
top
+
+

Other

+ + + +

External Rewriting Engine

+ + + +
+
Description:
+ +
+

A FAQ: How can we solve the FOO/BAR/QUUX/etc. + problem? There seems no solution by the use of + mod_rewrite...

+
+ +
Solution:
+ +
+

Use an external RewriteMap, i.e. a program which acts + like a RewriteMap. It is run once on startup of Apache + receives the requested URLs on STDIN and has + to put the resulting (usually rewritten) URL on + STDOUT (same order!).

+ +
+RewriteEngine on
+RewriteMap    quux-map       prg:/path/to/map.quux.pl
+RewriteRule   ^/~quux/(.*)$  /~quux/${quux-map:$1}
+
+ +
+#!/path/to/perl
+
+#   disable buffered I/O which would lead
+#   to deadloops for the Apache server
+$| = 1;
+
+#   read URLs one per line from stdin and
+#   generate substitution URL on stdout
+while (<>) {
+    s|^foo/|bar/|;
+    print $_;
+}
+
+ +

This is a demonstration-only example and just rewrites + all URLs /~quux/foo/... to + /~quux/bar/.... Actually you can program + whatever you like. But notice that while such maps can be + used also by an average user, only the + system administrator can define it.

+
+
+ + + +
+
+

Available Languages:  en 

+
+ \ No newline at end of file diff --git a/rubbos/app/apache2/manual/rewrite/rewrite_guide_advanced.html b/rubbos/app/apache2/manual/rewrite/rewrite_guide_advanced.html new file mode 100644 index 00000000..d08ed10d --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/rewrite_guide_advanced.html @@ -0,0 +1,5 @@ +# GENERATED FROM XML -- DO NOT EDIT + +URI: rewrite_guide_advanced.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/rubbos/app/apache2/manual/rewrite/rewrite_guide_advanced.html.en b/rubbos/app/apache2/manual/rewrite/rewrite_guide_advanced.html.en new file mode 100644 index 00000000..c457e5af --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/rewrite_guide_advanced.html.en @@ -0,0 +1,1289 @@ + + + +URL Rewriting Guide - Advanced topics - Apache HTTP Server + + + + + +
<-
+

URL Rewriting Guide - Advanced topics

+
+

Available Languages:  en 

+
+ + +

This document supplements the mod_rewrite + reference documentation. + It describes how one can use Apache's mod_rewrite + to solve typical URL-based problems with which webmasters are + commonly confronted. We give detailed descriptions on how to + solve each problem by configuring URL rewriting rulesets.

+ +
ATTENTION: Depending on your server configuration + it may be necessary to adjust the examples for your + situation, e.g., adding the [PT] flag if + using mod_alias and + mod_userdir, etc. Or rewriting a ruleset + to work in .htaccess context instead + of per-server context. Always try to understand what a + particular ruleset really does before you use it; this + avoids many problems.
+ +
+ +
top
+
+

Web Cluster with Consistent URL Space

+ + + +
+
Description:
+ +
+

We want to create a homogeneous and consistent URL + layout across all WWW servers on an Intranet web cluster, i.e., + all URLs (by definition server-local and thus + server-dependent!) become server independent! + What we want is to give the WWW namespace a single consistent + layout: no URL should refer to + any particular target server. The cluster itself + should connect users automatically to a physical target + host as needed, invisibly.

+
+ +
Solution:
+ +
+

First, the knowledge of the target servers comes from + (distributed) external maps which contain information on + where our users, groups, and entities reside. They have the + form:

+ +
+user1  server_of_user1
+user2  server_of_user2
+:      :
+
+ +

We put them into files map.xxx-to-host. + Second we need to instruct all servers to redirect URLs + of the forms:

+ +
+/u/user/anypath
+/g/group/anypath
+/e/entity/anypath
+
+ +

to

+ +
+http://physical-host/u/user/anypath
+http://physical-host/g/group/anypath
+http://physical-host/e/entity/anypath
+
+ +

when any URL path need not be valid on every server. The + following ruleset does this for us with the help of the map + files (assuming that server0 is a default server which + will be used if a user has no entry in the map):

+ +
+RewriteEngine on
+
+RewriteMap      user-to-host   txt:/path/to/map.user-to-host
+RewriteMap     group-to-host   txt:/path/to/map.group-to-host
+RewriteMap    entity-to-host   txt:/path/to/map.entity-to-host
+
+RewriteRule   ^/u/([^/]+)/?(.*)   http://${user-to-host:$1|server0}/u/$1/$2
+RewriteRule   ^/g/([^/]+)/?(.*)  http://${group-to-host:$1|server0}/g/$1/$2
+RewriteRule   ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|server0}/e/$1/$2
+
+RewriteRule   ^/([uge])/([^/]+)/?$          /$1/$2/.www/
+RewriteRule   ^/([uge])/([^/]+)/([^.]+.+)   /$1/$2/.www/$3\
+
+
+
+ +
top
+
+

Structured Homedirs

+ + + +
+
Description:
+ +
+

Some sites with thousands of users use a + structured homedir layout, i.e. each homedir is in a + subdirectory which begins (for instance) with the first + character of the username. So, /~foo/anypath + is /home/f/foo/.www/anypath + while /~bar/anypath is + /home/b/bar/.www/anypath.

+
+ +
Solution:
+ +
+

We use the following ruleset to expand the tilde URLs + into the above layout.

+ +
+RewriteEngine on
+RewriteRule   ^/~(([a-z])[a-z0-9]+)(.*)  /home/$2/$1/.www$3
+
+
+
+ +
top
+
+

Filesystem Reorganization

+ + + +
+
Description:
+ +
+

This really is a hardcore example: a killer application + which heavily uses per-directory + RewriteRules to get a smooth look and feel + on the Web while its data structure is never touched or + adjusted. Background: net.sw is + my archive of freely available Unix software packages, + which I started to collect in 1992. It is both my hobby + and job to do this, because while I'm studying computer + science I have also worked for many years as a system and + network administrator in my spare time. Every week I need + some sort of software so I created a deep hierarchy of + directories where I stored the packages:

+ +
+drwxrwxr-x   2 netsw  users    512 Aug  3 18:39 Audio/
+drwxrwxr-x   2 netsw  users    512 Jul  9 14:37 Benchmark/
+drwxrwxr-x  12 netsw  users    512 Jul  9 00:34 Crypto/
+drwxrwxr-x   5 netsw  users    512 Jul  9 00:41 Database/
+drwxrwxr-x   4 netsw  users    512 Jul 30 19:25 Dicts/
+drwxrwxr-x  10 netsw  users    512 Jul  9 01:54 Graphic/
+drwxrwxr-x   5 netsw  users    512 Jul  9 01:58 Hackers/
+drwxrwxr-x   8 netsw  users    512 Jul  9 03:19 InfoSys/
+drwxrwxr-x   3 netsw  users    512 Jul  9 03:21 Math/
+drwxrwxr-x   3 netsw  users    512 Jul  9 03:24 Misc/
+drwxrwxr-x   9 netsw  users    512 Aug  1 16:33 Network/
+drwxrwxr-x   2 netsw  users    512 Jul  9 05:53 Office/
+drwxrwxr-x   7 netsw  users    512 Jul  9 09:24 SoftEng/
+drwxrwxr-x   7 netsw  users    512 Jul  9 12:17 System/
+drwxrwxr-x  12 netsw  users    512 Aug  3 20:15 Typesetting/
+drwxrwxr-x  10 netsw  users    512 Jul  9 14:08 X11/
+
+ +

In July 1996 I decided to make this archive public to + the world via a nice Web interface. "Nice" means that I + wanted to offer an interface where you can browse + directly through the archive hierarchy. And "nice" means + that I didn't want to change anything inside this + hierarchy - not even by putting some CGI scripts at the + top of it. Why? Because the above structure should later be + accessible via FTP as well, and I didn't want any + Web or CGI stuff mixed in there.

+
+ +
Solution:
+ +
+

The solution has two parts: The first is a set of CGI + scripts which create all the pages at all directory + levels on-the-fly. I put them under + /e/netsw/.www/ as follows:

+ +
+-rw-r--r--   1 netsw  users    1318 Aug  1 18:10 .wwwacl
+drwxr-xr-x  18 netsw  users     512 Aug  5 15:51 DATA/
+-rw-rw-rw-   1 netsw  users  372982 Aug  5 16:35 LOGFILE
+-rw-r--r--   1 netsw  users     659 Aug  4 09:27 TODO
+-rw-r--r--   1 netsw  users    5697 Aug  1 18:01 netsw-about.html
+-rwxr-xr-x   1 netsw  users     579 Aug  2 10:33 netsw-access.pl
+-rwxr-xr-x   1 netsw  users    1532 Aug  1 17:35 netsw-changes.cgi
+-rwxr-xr-x   1 netsw  users    2866 Aug  5 14:49 netsw-home.cgi
+drwxr-xr-x   2 netsw  users     512 Jul  8 23:47 netsw-img/
+-rwxr-xr-x   1 netsw  users   24050 Aug  5 15:49 netsw-lsdir.cgi
+-rwxr-xr-x   1 netsw  users    1589 Aug  3 18:43 netsw-search.cgi
+-rwxr-xr-x   1 netsw  users    1885 Aug  1 17:41 netsw-tree.cgi
+-rw-r--r--   1 netsw  users     234 Jul 30 16:35 netsw-unlimit.lst
+
+ +

The DATA/ subdirectory holds the above + directory structure, i.e. the real + net.sw stuff, and gets + automatically updated via rdist from time to + time. The second part of the problem remains: how to link + these two structures together into one smooth-looking URL + tree? We want to hide the DATA/ directory + from the user while running the appropriate CGI scripts + for the various URLs. Here is the solution: first I put + the following into the per-directory configuration file + in the DocumentRoot + of the server to rewrite the public URL path + /net.sw/ to the internal path + /e/netsw:

+ +
+RewriteRule  ^net.sw$       net.sw/        [R]
+RewriteRule  ^net.sw/(.*)$  e/netsw/$1
+
+ +

The first rule is for requests which miss the trailing + slash! The second rule does the real thing. And then + comes the killer configuration which stays in the + per-directory config file + /e/netsw/.www/.wwwacl:

+ +
+Options       ExecCGI FollowSymLinks Includes MultiViews
+
+RewriteEngine on
+
+#  we are reached via /net.sw/ prefix
+RewriteBase   /net.sw/
+
+#  first we rewrite the root dir to
+#  the handling cgi script
+RewriteRule   ^$                       netsw-home.cgi     [L]
+RewriteRule   ^index\.html$            netsw-home.cgi     [L]
+
+#  strip out the subdirs when
+#  the browser requests us from perdir pages
+RewriteRule   ^.+/(netsw-[^/]+/.+)$    $1                 [L]
+
+#  and now break the rewriting for local files
+RewriteRule   ^netsw-home\.cgi.*       -                  [L]
+RewriteRule   ^netsw-changes\.cgi.*    -                  [L]
+RewriteRule   ^netsw-search\.cgi.*     -                  [L]
+RewriteRule   ^netsw-tree\.cgi$        -                  [L]
+RewriteRule   ^netsw-about\.html$      -                  [L]
+RewriteRule   ^netsw-img/.*$           -                  [L]
+
+#  anything else is a subdir which gets handled
+#  by another cgi script
+RewriteRule   !^netsw-lsdir\.cgi.*     -                  [C]
+RewriteRule   (.*)                     netsw-lsdir.cgi/$1
+
+ +

Some hints for interpretation:

+ +
    +
  1. Notice the L (last) flag and no + substitution field ('-') in the fourth part
  2. + +
  3. Notice the ! (not) character and + the C (chain) flag at the first rule + in the last part
  4. + +
  5. Notice the catch-all pattern in the last rule
  6. +
+
+
+ +
top
+
+

Redirect Failing URLs to Another Web Server

+ + + +
+
Description:
+ +
+

A typical FAQ about URL rewriting is how to redirect + failing requests on webserver A to webserver B. Usually + this is done via ErrorDocument CGI scripts in Perl, but + there is also a mod_rewrite solution. + But note that this performs more poorly than using an + ErrorDocument + CGI script!

+
+ +
Solution:
+ +
+

The first solution has the best performance but less + flexibility, and is less safe:

+ +
+RewriteEngine on
+RewriteCond   /your/docroot/%{REQUEST_FILENAME} !-f
+RewriteRule   ^(.+)                             http://webserverB.dom/$1
+
+ +

The problem here is that this will only work for pages + inside the DocumentRoot. While you can add more + Conditions (for instance to also handle homedirs, etc.) + there is a better variant:

+ +
+RewriteEngine on
+RewriteCond   %{REQUEST_URI} !-U
+RewriteRule   ^(.+)          http://webserverB.dom/$1
+
+ +

This uses the URL look-ahead feature of mod_rewrite. + The result is that this will work for all types of URLs + and is safe. But it does have a performance impact on + the web server, because for every request there is one + more internal subrequest. So, if your web server runs on a + powerful CPU, use this one. If it is a slow machine, use + the first approach or better an ErrorDocument CGI script.

+
+
+ +
top
+
+

Archive Access Multiplexer

+ + + +
+
Description:
+ +
+

Do you know the great CPAN (Comprehensive Perl Archive + Network) under http://www.perl.com/CPAN? + CPAN automatically redirects browsers to one of many FTP + servers around the world (generally one near the requesting + client); each server carries a full CPAN mirror. This is + effectively an FTP access multiplexing service. + CPAN runs via CGI scripts, but how could a similar approach + be implemented via mod_rewrite?

+
+ +
Solution:
+ +
+

First we notice that as of version 3.0.0, + mod_rewrite can + also use the "ftp:" scheme on redirects. + And second, the location approximation can be done by a + RewriteMap + over the top-level domain of the client. + With a tricky chained ruleset we can use this top-level + domain as a key to our multiplexing map.

+ +
+RewriteEngine on
+RewriteMap    multiplex                txt:/path/to/map.cxan
+RewriteRule   ^/CxAN/(.*)              %{REMOTE_HOST}::$1                 [C]
+RewriteRule   ^.+\.([a-zA-Z]+)::(.*)$  ${multiplex:$1|ftp.default.dom}$2  [R,L]
+
+ +
+##
+##  map.cxan -- Multiplexing Map for CxAN
+##
+
+de        ftp://ftp.cxan.de/CxAN/
+uk        ftp://ftp.cxan.uk/CxAN/
+com       ftp://ftp.cxan.com/CxAN/
+ :
+##EOF##
+
+
+
+ +
top
+
+

Content Handling

+ + + +

Browser Dependent Content

+ + + +
+
Description:
+ +
+

At least for important top-level pages it is sometimes + necessary to provide the optimum of browser dependent + content, i.e., one has to provide one version for + current browsers, a different version for the Lynx and text-mode + browsers, and another for other browsers.

+
+ +
Solution:
+ +
+

We cannot use content negotiation because the browsers do + not provide their type in that form. Instead we have to + act on the HTTP header "User-Agent". The following config + does the following: If the HTTP header "User-Agent" + begins with "Mozilla/3", the page foo.html + is rewritten to foo.NS.html and the + rewriting stops. If the browser is "Lynx" or "Mozilla" of + version 1 or 2, the URL becomes foo.20.html. + All other browsers receive page foo.32.html. + This is done with the following ruleset:

+ +
+RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/3.*
+RewriteRule ^foo\.html$         foo.NS.html          [L]
+
+RewriteCond %{HTTP_USER_AGENT}  ^Lynx/.*         [OR]
+RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/[12].*
+RewriteRule ^foo\.html$         foo.20.html          [L]
+
+RewriteRule ^foo\.html$         foo.32.html          [L]
+
+
+
+ + + +

Dynamic Mirror

+ + + +
+
Description:
+ +
+

Assume there are nice web pages on remote hosts we want + to bring into our namespace. For FTP servers we would use + the mirror program which actually maintains an + explicit up-to-date copy of the remote data on the local + machine. For a web server we could use the program + webcopy which runs via HTTP. But both + techniques have a major drawback: The local copy is + always only as up-to-date as the last time we ran the program. It + would be much better if the mirror was not a static one we + have to establish explicitly. Instead we want a dynamic + mirror with data which gets updated automatically + as needed on the remote host(s).

+
+ +
Solution:
+ +
+

To provide this feature we map the remote web page or even + the complete remote web area to our namespace by the use + of the Proxy Throughput feature + (flag [P]):

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^hotsheet/(.*)$  http://www.tstimpreso.com/hotsheet/$1  [P]
+
+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^usa-news\.html$   http://www.quux-corp.com/news/index.html  [P]
+
+
+
+ + + +

Reverse Dynamic Mirror

+ + + +
+
Description:
+ +
...
+ +
Solution:
+ +
+
+RewriteEngine on
+RewriteCond   /mirror/of/remotesite/$1           -U
+RewriteRule   ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
+
+
+
+ + + +

Retrieve Missing Data from Intranet

+ + + +
+
Description:
+ +
+

This is a tricky way of virtually running a corporate + (external) Internet web server + (www.quux-corp.dom), while actually keeping + and maintaining its data on an (internal) Intranet web server + (www2.quux-corp.dom) which is protected by a + firewall. The trick is that the external web server retrieves + the requested data on-the-fly from the internal + one.

+
+ +
Solution:
+ +
+

First, we must make sure that our firewall still + protects the internal web server and only the + external web server is allowed to retrieve data from it. + On a packet-filtering firewall, for instance, we could + configure a firewall ruleset like the following:

+ +
+ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80
+DENY  Host *                 Port *     --> Host www2.quux-corp.dom Port 80
+
+ +

Just adjust it to your actual configuration syntax. + Now we can establish the mod_rewrite + rules which request the missing data in the background + through the proxy throughput feature:

+ +
+RewriteRule ^/~([^/]+)/?(.*)          /home/$1/.www/$2
+RewriteCond %{REQUEST_FILENAME}       !-f
+RewriteCond %{REQUEST_FILENAME}       !-d
+RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom/~$1/pub/$2 [P]
+
+
+
+ + + +

Load Balancing

+ + + +
+
Description:
+ +
+

Suppose we want to load balance the traffic to + www.foo.com over www[0-5].foo.com + (a total of 6 servers). How can this be done?

+
+ +
Solution:
+ +
+

There are many possible solutions for this problem. + We will first discuss a common DNS-based method, + and then one based on mod_rewrite:

+ +
    +
  1. + DNS Round-Robin + +

    The simplest method for load-balancing is to use + DNS round-robin. + Here you just configure www[0-9].foo.com + as usual in your DNS with A (address) records, e.g.,

    + +
    +www0   IN  A       1.2.3.1
    +www1   IN  A       1.2.3.2
    +www2   IN  A       1.2.3.3
    +www3   IN  A       1.2.3.4
    +www4   IN  A       1.2.3.5
    +www5   IN  A       1.2.3.6
    +
    + +

    Then you additionally add the following entries:

    + +
    +www   IN  A       1.2.3.1
    +www   IN  A       1.2.3.2
    +www   IN  A       1.2.3.3
    +www   IN  A       1.2.3.4
    +www   IN  A       1.2.3.5
    +
    + +

    Now when www.foo.com gets + resolved, BIND gives out www0-www5 + - but in a permutated (rotated) order every time. + This way the clients are spread over the various + servers. But notice that this is not a perfect load + balancing scheme, because DNS resolutions are + cached by clients and other nameservers, so + once a client has resolved www.foo.com + to a particular wwwN.foo.com, all its + subsequent requests will continue to go to the same + IP (and thus a single server), rather than being + distributed across the other available servers. But the + overall result is + okay because the requests are collectively + spread over the various web servers.

    +
  2. + +
  3. + DNS Load-Balancing + +

    A sophisticated DNS-based method for + load-balancing is to use the program + lbnamed which can be found at + http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html. + It is a Perl 5 program which, in conjunction with auxilliary + tools, provides real load-balancing via + DNS.

    +
  4. + +
  5. + Proxy Throughput Round-Robin + +

    In this variant we use mod_rewrite + and its proxy throughput feature. First we dedicate + www0.foo.com to be actually + www.foo.com by using a single

    + +
    +www    IN  CNAME   www0.foo.com.
    +
    + +

    entry in the DNS. Then we convert + www0.foo.com to a proxy-only server, + i.e., we configure this machine so all arriving URLs + are simply passed through its internal proxy to one of + the 5 other servers (www1-www5). To + accomplish this we first establish a ruleset which + contacts a load balancing script lb.pl + for all URLs.

    + +
    +RewriteEngine on
    +RewriteMap    lb      prg:/path/to/lb.pl
    +RewriteRule   ^/(.+)$ ${lb:$1}           [P,L]
    +
    + +

    Then we write lb.pl:

    + +
    +#!/path/to/perl
    +##
    +##  lb.pl -- load balancing script
    +##
    +
    +$| = 1;
    +
    +$name   = "www";     # the hostname base
    +$first  = 1;         # the first server (not 0 here, because 0 is myself)
    +$last   = 5;         # the last server in the round-robin
    +$domain = "foo.dom"; # the domainname
    +
    +$cnt = 0;
    +while (<STDIN>) {
    +    $cnt = (($cnt+1) % ($last+1-$first));
    +    $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
    +    print "http://$server/$_";
    +}
    +
    +##EOF##
    +
    + +
    A last notice: Why is this useful? Seems like + www0.foo.com still is overloaded? The + answer is yes, it is overloaded, but with plain proxy + throughput requests, only! All SSI, CGI, ePerl, etc. + processing is handled done on the other machines. + For a complicated site, this may work well. The biggest + risk here is that www0 is now a single point of failure -- + if it crashes, the other servers are inaccessible.
    +
  6. + +
  7. + Dedicated Load Balancers + +

    There are more sophisticated solutions, as well. Cisco, + F5, and several other companies sell hardware load + balancers (typically used in pairs for redundancy), which + offer sophisticated load balancing and auto-failover + features. There are software packages which offer similar + features on commodity hardware, as well. If you have + enough money or need, check these out. The lb-l mailing list is a + good place to research.

    +
  8. +
+
+
+ + + +

New MIME-type, New Service

+ + + +
+
Description:
+ +
+

On the net there are many nifty CGI programs. But + their usage is usually boring, so a lot of webmasters + don't use them. Even Apache's Action handler feature for + MIME-types is only appropriate when the CGI programs + don't need special URLs (actually PATH_INFO + and QUERY_STRINGS) as their input. First, + let us configure a new file type with extension + .scgi (for secure CGI) which will be processed + by the popular cgiwrap program. The problem + here is that for instance if we use a Homogeneous URL Layout + (see above) a file inside the user homedirs might have a URL + like /u/user/foo/bar.scgi, but + cgiwrap needs URLs in the form + /~user/foo/bar.scgi/. The following rule + solves the problem:

+ +
+RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...
+... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3  [NS,T=application/x-http-cgi]
+
+ +

Or assume we have some more nifty programs: + wwwlog (which displays the + access.log for a URL subtree) and + wwwidx (which runs Glimpse on a URL + subtree). We have to provide the URL area to these + programs so they know which area they are really working with. + But usually this is complicated, because they may still be + requested by the alternate URL form, i.e., typically we would + run the swwidx program from within + /u/user/foo/ via hyperlink to

+ +
+/internal/cgi/user/swwidx?i=/u/user/foo/
+
+ +

which is ugly, because we have to hard-code + both the location of the area + and the location of the CGI inside the + hyperlink. When we have to reorganize, we spend a + lot of time changing the various hyperlinks.

+
+ +
Solution:
+ +
+

The solution here is to provide a special new URL format + which automatically leads to the proper CGI invocation. + We configure the following:

+ +
+RewriteRule   ^/([uge])/([^/]+)(/?.*)/\*  /internal/cgi/user/wwwidx?i=/$1/$2$3/
+RewriteRule   ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
+
+ +

Now the hyperlink to search at + /u/user/foo/ reads only

+ +
+HREF="*"
+
+ +

which internally gets automatically transformed to

+ +
+/internal/cgi/user/wwwidx?i=/u/user/foo/
+
+ +

The same approach leads to an invocation for the + access log CGI program when the hyperlink + :log gets used.

+
+
+ + + +

On-the-fly Content-Regeneration

+ + + +
+
Description:
+ +
+

Here comes a really esoteric feature: Dynamically + generated but statically served pages, i.e., pages should be + delivered as pure static pages (read from the filesystem + and just passed through), but they have to be generated + dynamically by the web server if missing. This way you can + have CGI-generated pages which are statically served unless an + admin (or a cron job) removes the static contents. Then the + contents gets refreshed.

+
+ +
Solution:
+ +
+ This is done via the following ruleset: + +
+RewriteCond %{REQUEST_FILENAME}   !-s
+RewriteRule ^page\.html$          page.cgi   [T=application/x-httpd-cgi,L]
+
+ +

Here a request for page.html leads to an + internal run of a corresponding page.cgi if + page.html is missing or has filesize + null. The trick here is that page.cgi is a + CGI script which (additionally to its STDOUT) + writes its output to the file page.html. + Once it has completed, the server sends out + page.html. When the webmaster wants to force + a refresh of the contents, he just removes + page.html (typically from cron).

+
+
+ + + +

Document With Autorefresh

+ + + +
+
Description:
+ +
+

Wouldn't it be nice, while creating a complex web page, if + the web browser would automatically refresh the page every + time we save a new version from within our editor? + Impossible?

+
+ +
Solution:
+ +
+

No! We just combine the MIME multipart feature, the + web server NPH feature, and the URL manipulation power of + mod_rewrite. First, we establish a new + URL feature: Adding just :refresh to any + URL causes the 'page' to be refreshed every time it is + updated on the filesystem.

+ +
+RewriteRule   ^(/[uge]/[^/]+/?.*):refresh  /internal/cgi/apache/nph-refresh?f=$1
+
+ +

Now when we reference the URL

+ +
+/u/foo/bar/page.html:refresh
+
+ +

this leads to the internal invocation of the URL

+ +
+/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
+
+ +

The only missing part is the NPH-CGI script. Although + one would usually say "left as an exercise to the reader" + ;-) I will provide this, too.

+ +
+#!/sw/bin/perl
+##
+##  nph-refresh -- NPH/CGI script for auto refreshing pages
+##  Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
+##
+$| = 1;
+
+#   split the QUERY_STRING variable
+@pairs = split(/&/, $ENV{'QUERY_STRING'});
+foreach $pair (@pairs) {
+    ($name, $value) = split(/=/, $pair);
+    $name =~ tr/A-Z/a-z/;
+    $name = 'QS_' . $name;
+    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
+    eval "\$$name = \"$value\"";
+}
+$QS_s = 1 if ($QS_s eq '');
+$QS_n = 3600 if ($QS_n eq '');
+if ($QS_f eq '') {
+    print "HTTP/1.0 200 OK\n";
+    print "Content-type: text/html\n\n";
+    print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
+    exit(0);
+}
+if (! -f $QS_f) {
+    print "HTTP/1.0 200 OK\n";
+    print "Content-type: text/html\n\n";
+    print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
+    exit(0);
+}
+
+sub print_http_headers_multipart_begin {
+    print "HTTP/1.0 200 OK\n";
+    $bound = "ThisRandomString12345";
+    print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
+    &print_http_headers_multipart_next;
+}
+
+sub print_http_headers_multipart_next {
+    print "\n--$bound\n";
+}
+
+sub print_http_headers_multipart_end {
+    print "\n--$bound--\n";
+}
+
+sub displayhtml {
+    local($buffer) = @_;
+    $len = length($buffer);
+    print "Content-type: text/html\n";
+    print "Content-length: $len\n\n";
+    print $buffer;
+}
+
+sub readfile {
+    local($file) = @_;
+    local(*FP, $size, $buffer, $bytes);
+    ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
+    $size = sprintf("%d", $size);
+    open(FP, "&lt;$file");
+    $bytes = sysread(FP, $buffer, $size);
+    close(FP);
+    return $buffer;
+}
+
+$buffer = &readfile($QS_f);
+&print_http_headers_multipart_begin;
+&displayhtml($buffer);
+
+sub mystat {
+    local($file) = $_[0];
+    local($time);
+
+    ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
+    return $mtime;
+}
+
+$mtimeL = &mystat($QS_f);
+$mtime = $mtime;
+for ($n = 0; $n &lt; $QS_n; $n++) {
+    while (1) {
+        $mtime = &mystat($QS_f);
+        if ($mtime ne $mtimeL) {
+            $mtimeL = $mtime;
+            sleep(2);
+            $buffer = &readfile($QS_f);
+            &print_http_headers_multipart_next;
+            &displayhtml($buffer);
+            sleep(5);
+            $mtimeL = &mystat($QS_f);
+            last;
+        }
+        sleep($QS_s);
+    }
+}
+
+&print_http_headers_multipart_end;
+
+exit(0);
+
+##EOF##
+
+
+
+ + + +

Mass Virtual Hosting

+ + + +
+
Description:
+ +
+

The <VirtualHost> feature of Apache is nice + and works great when you just have a few dozen + virtual hosts. But when you are an ISP and have hundreds of + virtual hosts, this feature is suboptimal.

+
+ +
Solution:
+ +
+

To provide this feature we map the remote web page or even + the complete remote web area to our namespace using the + Proxy Throughput feature (flag [P]):

+ +
+##
+##  vhost.map
+##
+www.vhost1.dom:80  /path/to/docroot/vhost1
+www.vhost2.dom:80  /path/to/docroot/vhost2
+     :
+www.vhostN.dom:80  /path/to/docroot/vhostN
+
+ +
+##
+##  httpd.conf
+##
+    :
+#   use the canonical hostname on redirects, etc.
+UseCanonicalName on
+
+    :
+#   add the virtual host in front of the CLF-format
+CustomLog  /path/to/access_log  "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
+    :
+
+#   enable the rewriting engine in the main server
+RewriteEngine on
+
+#   define two maps: one for fixing the URL and one which defines
+#   the available virtual hosts with their corresponding
+#   DocumentRoot.
+RewriteMap    lowercase    int:tolower
+RewriteMap    vhost        txt:/path/to/vhost.map
+
+#   Now do the actual virtual host mapping
+#   via a huge and complicated single rule:
+#
+#   1. make sure we don't map for common locations
+RewriteCond   %{REQUEST_URI}  !^/commonurl1/.*
+RewriteCond   %{REQUEST_URI}  !^/commonurl2/.*
+    :
+RewriteCond   %{REQUEST_URI}  !^/commonurlN/.*
+#
+#   2. make sure we have a Host header, because
+#      currently our approach only supports
+#      virtual hosting through this header
+RewriteCond   %{HTTP_HOST}  !^$
+#
+#   3. lowercase the hostname
+RewriteCond   ${lowercase:%{HTTP_HOST}|NONE}  ^(.+)$
+#
+#   4. lookup this hostname in vhost.map and
+#      remember it only when it is a path
+#      (and not "NONE" from above)
+RewriteCond   ${vhost:%1}  ^(/.*)$
+#
+#   5. finally we can map the URL to its docroot location
+#      and remember the virtual host for logging purposes
+RewriteRule   ^/(.*)$   %1/$1  [E=VHOST:${lowercase:%{HTTP_HOST}}]
+    :
+
+
+
+ + + +
top
+
+

Access Restriction

+ + + +

Host Deny

+ + + +
+
Description:
+ +
+

How can we forbid a list of externally configured hosts + from using our server?

+
+ +
Solution:
+ +
+

For Apache >= 1.3b6:

+ +
+RewriteEngine on
+RewriteMap    hosts-deny  txt:/path/to/hosts.deny
+RewriteCond   ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]
+RewriteCond   ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND
+RewriteRule   ^/.*  -  [F]
+
+ +

For Apache <= 1.3b6:

+ +
+RewriteEngine on
+RewriteMap    hosts-deny  txt:/path/to/hosts.deny
+RewriteRule   ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1
+RewriteRule   !^NOT-FOUND/.* - [F]
+RewriteRule   ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1
+RewriteRule   !^NOT-FOUND/.* - [F]
+RewriteRule   ^NOT-FOUND/(.*)$ /$1
+
+ +
+##
+##  hosts.deny
+##
+##  ATTENTION! This is a map, not a list, even when we treat it as such.
+##             mod_rewrite parses it for key/value pairs, so at least a
+##             dummy value "-" must be present for each entry.
+##
+
+193.102.180.41 -
+bsdti1.sdm.de  -
+192.76.162.40  -
+
+
+
+ + + +

Proxy Deny

+ + + +
+
Description:
+ +
+

How can we forbid a certain host or even a user of a + special host from using the Apache proxy?

+
+ +
Solution:
+ +
+

We first have to make sure mod_rewrite + is below(!) mod_proxy in the Configuration + file when compiling the Apache web server. This way it gets + called before mod_proxy. Then we + configure the following for a host-dependent deny...

+ +
+RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+ +

...and this one for a user@host-dependent deny:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST}  ^badguy@badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+
+
+ + + +

Special Authentication Variant

+ + + +
+
Description:
+ +
+

Sometimes very special authentication is needed, for + instance authentication which checks for a set of + explicitly configured users. Only these should receive + access and without explicit prompting (which would occur + when using Basic Auth via mod_auth).

+
+ +
Solution:
+ +
+

We use a list of rewrite conditions to exclude all except + our friends:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend1@client1.quux-corp\.com$
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend2@client2.quux-corp\.com$
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend3@client3.quux-corp\.com$
+RewriteRule ^/~quux/only-for-friends/      -                                 [F]
+
+
+
+ + + +

Referer-based Deflector

+ + + +
+
Description:
+ +
+

How can we program a flexible URL Deflector which acts + on the "Referer" HTTP header and can be configured with as + many referring pages as we like?

+
+ +
Solution:
+ +
+

Use the following really tricky ruleset...

+ +
+RewriteMap  deflector txt:/path/to/deflector.map
+
+RewriteCond %{HTTP_REFERER} !=""
+RewriteCond ${deflector:%{HTTP_REFERER}} ^-$
+RewriteRule ^.* %{HTTP_REFERER} [R,L]
+
+RewriteCond %{HTTP_REFERER} !=""
+RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND
+RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]
+
+ +

... in conjunction with a corresponding rewrite + map:

+ +
+##
+##  deflector.map
+##
+
+http://www.badguys.com/bad/index.html    -
+http://www.badguys.com/bad/index2.html   -
+http://www.badguys.com/bad/index3.html   http://somewhere.com/
+
+ +

This automatically redirects the request back to the + referring page (when "-" is used as the value + in the map) or to a specific URL (when an URL is specified + in the map as the second argument).

+
+
+ + + +
+
+

Available Languages:  en 

+
+ \ No newline at end of file diff --git a/rubbos/app/apache2/manual/rewrite/rewrite_intro.html b/rubbos/app/apache2/manual/rewrite/rewrite_intro.html new file mode 100644 index 00000000..e6e697d2 --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/rewrite_intro.html @@ -0,0 +1,5 @@ +# GENERATED FROM XML -- DO NOT EDIT + +URI: rewrite_intro.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/rubbos/app/apache2/manual/rewrite/rewrite_intro.html.en b/rubbos/app/apache2/manual/rewrite/rewrite_intro.html.en new file mode 100644 index 00000000..32f666e1 --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/rewrite_intro.html.en @@ -0,0 +1,117 @@ + + + +Apache mod_rewrite Introduction - Apache HTTP Server + + + + + +
<-
+

Apache mod_rewrite Introduction

+
+

Available Languages:  en 

+
+ +

This document supplements the mod_rewrite +reference documentation. It +describes the basic concepts necessary for use of +mod_rewrite. Other documents go into greater detail, +but this doc should help the beginner get their feet wet. +

+
+ +
top
+
+

Introduction

+

The Apache module mod_rewrite is a very powerful and +sophisticated module which provides a way to do URL manipulations. With +it, you can do nearly all types of URL rewriting that you may need. It +is, however, somewhat complex, and may be intimidating to the beginner. +There is also a tendency to treat rewrite rules as magic incantation, +using them without actually understanding what they do.

+ +

This document attempts to give sufficient background so that what +follows is understood, rather than just copied blindly. +

+
top
+
+

Regular Expressions

+

Basic regex building blocks

+
top
+
+

RewriteRule basics

+

+Basic anatomy of a RewriteRule, with exhaustively annotated simple +examples. +

+
top
+
+

Rewrite Flags

+

Discussion of the flags to RewriteRule, and when and why one might +use them.

+
top
+
+

Rewrite conditions

+

Discussion of RewriteCond, looping, and other related concepts. +

+
top
+
+

Rewrite maps

+

Discussion of RewriteMap, including simple, but heavily annotated, +examples.

+
top
+
+

.htaccess files

+

Discussion of the differences between rewrite rules in httpd.conf and +in .htaccess files.

+
top
+
+

Environment Variables

+ +

This module keeps track of two additional (non-standard) +CGI/SSI environment variables named SCRIPT_URL +and SCRIPT_URI. These contain the +logical Web-view to the current resource, while the +standard CGI/SSI variables SCRIPT_NAME and +SCRIPT_FILENAME contain the physical +System-view.

+ +

Notice: These variables hold the URI/URL as they were +initially requested, i.e., before any +rewriting. This is important because the rewriting process is +primarily used to rewrite logical URLs to physical +pathnames.

+ +

Example

+SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html
+SCRIPT_FILENAME=/u/rse/.www/index.html
+SCRIPT_URL=/u/rse/
+SCRIPT_URI=http://en1.engelschall.com/u/rse/
+
+ +
+
+

Available Languages:  en 

+
+ \ No newline at end of file diff --git a/rubbos/app/apache2/manual/rewrite/rewrite_tech.html b/rubbos/app/apache2/manual/rewrite/rewrite_tech.html new file mode 100644 index 00000000..18b37ed7 --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/rewrite_tech.html @@ -0,0 +1,5 @@ +# GENERATED FROM XML -- DO NOT EDIT + +URI: rewrite_tech.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/rubbos/app/apache2/manual/rewrite/rewrite_tech.html.en b/rubbos/app/apache2/manual/rewrite/rewrite_tech.html.en new file mode 100644 index 00000000..20c83e6c --- /dev/null +++ b/rubbos/app/apache2/manual/rewrite/rewrite_tech.html.en @@ -0,0 +1,166 @@ + + + +Apache mod_rewrite Technical Details - Apache HTTP Server + + + + + +
<-
+

Apache mod_rewrite Technical Details

+
+

Available Languages:  en 

+
+ +

This document discusses some of the technical details of mod_rewrite +and URL matching.

+
+ +
top
+
+

Internal Processing

+ +

The internal processing of this module is very complex but + needs to be explained once even to the average user to avoid + common mistakes and to let you exploit its full + functionality.

+
top
+
+

API Phases

+ +

First you have to understand that when Apache processes a + HTTP request it does this in phases. A hook for each of these + phases is provided by the Apache API. Mod_rewrite uses two of + these hooks: the URL-to-filename translation hook which is + used after the HTTP request has been read but before any + authorization starts and the Fixup hook which is triggered + after the authorization phases and after the per-directory + config files (.htaccess) have been read, but + before the content handler is activated.

+ +

So, after a request comes in and Apache has determined the + corresponding server (or virtual server) the rewriting engine + starts processing of all mod_rewrite directives from the + per-server configuration in the URL-to-filename phase. A few + steps later when the final data directories are found, the + per-directory configuration directives of mod_rewrite are + triggered in the Fixup phase. In both situations mod_rewrite + rewrites URLs either to new URLs or to filenames, although + there is no obvious distinction between them. This is a usage + of the API which was not intended to be this way when the API + was designed, but as of Apache 1.x this is the only way + mod_rewrite can operate. To make this point more clear + remember the following two points:

+ +
    +
  1. Although mod_rewrite rewrites URLs to URLs, URLs to + filenames and even filenames to filenames, the API + currently provides only a URL-to-filename hook. In Apache + 2.0 the two missing hooks will be added to make the + processing more clear. But this point has no drawbacks for + the user, it is just a fact which should be remembered: + Apache does more in the URL-to-filename hook than the API + intends for it.
  2. + +
  3. + Unbelievably mod_rewrite provides URL manipulations in + per-directory context, i.e., within + .htaccess files, although these are reached + a very long time after the URLs have been translated to + filenames. It has to be this way because + .htaccess files live in the filesystem, so + processing has already reached this stage. In other + words: According to the API phases at this time it is too + late for any URL manipulations. To overcome this chicken + and egg problem mod_rewrite uses a trick: When you + manipulate a URL/filename in per-directory context + mod_rewrite first rewrites the filename back to its + corresponding URL (which is usually impossible, but see + the RewriteBase directive below for the + trick to achieve this) and then initiates a new internal + sub-request with the new URL. This restarts processing of + the API phases. + +

    Again mod_rewrite tries hard to make this complicated + step totally transparent to the user, but you should + remember here: While URL manipulations in per-server + context are really fast and efficient, per-directory + rewrites are slow and inefficient due to this chicken and + egg problem. But on the other hand this is the only way + mod_rewrite can provide (locally restricted) URL + manipulations to the average user.

    +
  4. +
+ +

Don't forget these two points!

+
top
+
+

Ruleset Processing

+ +

Now when mod_rewrite is triggered in these two API phases, it + reads the configured rulesets from its configuration + structure (which itself was either created on startup for + per-server context or during the directory walk of the Apache + kernel for per-directory context). Then the URL rewriting + engine is started with the contained ruleset (one or more + rules together with their conditions). The operation of the + URL rewriting engine itself is exactly the same for both + configuration contexts. Only the final result processing is + different.

+ +

The order of rules in the ruleset is important because the + rewriting engine processes them in a special (and not very + obvious) order. The rule is this: The rewriting engine loops + through the ruleset rule by rule (RewriteRule directives) and + when a particular rule matches it optionally loops through + existing corresponding conditions (RewriteCond + directives). For historical reasons the conditions are given + first, and so the control flow is a little bit long-winded. See + Figure 1 for more details.

+

+ [Needs graphics capability to display]
+ Figure 1:The control flow through the rewriting ruleset +

+

As you can see, first the URL is matched against the + Pattern of each rule. When it fails mod_rewrite + immediately stops processing this rule and continues with the + next rule. If the Pattern matches, mod_rewrite looks + for corresponding rule conditions. If none are present, it + just substitutes the URL with a new value which is + constructed from the string Substitution and goes on + with its rule-looping. But if conditions exist, it starts an + inner loop for processing them in the order that they are + listed. For conditions the logic is different: we don't match + a pattern against the current URL. Instead we first create a + string TestString by expanding variables, + back-references, map lookups, etc. and then we try + to match CondPattern against it. If the pattern + doesn't match, the complete set of conditions and the + corresponding rule fails. If the pattern matches, then the + next condition is processed until no more conditions are + available. If all conditions match, processing is continued + with the substitution of the URL with + Substitution.

+ +
+
+

Available Languages:  en 

+
+ \ No newline at end of file -- cgit 1.2.3-korg