source: branches/streams/abcl/doc/design/pathnames/url-pathnames.markdown

Last change on this file was 13353, checked in by Mark Evenson, 14 years ago

Fix problems with whitespace in JAR-PATHNAME.

For dealing with URI Encoding (also known as [Percent Encoding]() we
implement the following rules which were implicitly.

[Percent Encoding]: http://en.wikipedia.org/wiki/Percent-encoding

  1. All pathname components are represented "as is" without escaping.
  1. Namestrings are suitably escaped if the Pathname is a URL-PATHNAME

or a JAR-PATHNAME.

  1. Namestrings should all "round-trip":

(when (typep p 'pathname)

(equal (namestring p)

(namestring (pathname p))))

Users may use EXT:URI-ENCODE and EXT:URI-DECODE to access the escaping
rules in circumstances where they wish to manipulate PATHNAME
namestrings more directly.

All tests in JAR-PATHNAMES now pass.

Constructors for PATHNAME now produce ERROR rather than FILE-ERROR as
CLHS says "The type file-error consists of error conditions that occur
during an attempt to open or close a file, or during some low-level
transactions with a file system," which doesn't apply here.

File size: 4.3 KB
Line 
1URL Pathnames ABCL
2==================
3
4    Mark Evenson
5    Created:  25 MAR 2010
6    Modified: 21 JUN 2011
7
8Notes towards an implementation of URL references to be contained in
9Common Lisp `PATHNAME` objects within ABCL.
10
11
12References
13----------
14
15RFC3986   Uniform Resource Identifier (URI): Generic Syntax
16
17
18URL vs URI
19----------
20
21We use the term URL as shorthand in describing the URL Pathnames, even
22though the corresponding encoding is more akin to a URI as described
23in RFC3986. 
24
25
26Goals
27-----
28
291.  Use Common Lisp pathnames to refer to representations referenced
30by a URL.
31
322.  The URL schemes supported shall include at least "http", and those
33enabled by the URLStreamHandler extension mechanism.
34
353.  Use URL schemes that are understood by the java.net.URL object.
36
37    Example of a Pathname specified by URL:
38   
39        #p"http://example.org/org/armedbear/systems/pgp.asd"
40   
414.  MERGE-PATHNAMES
42
43        (merge-pathnames "url.asd"
44            "http://example/org/armedbear/systems/pgp.asd")
45        ==> "http://example/org/armedbear/systems/url.asd"
46
475.  PROBE-FILE returning the state of URL accesibility.
48
496.  TRUENAME "aliased" to PROBE-FILE signalling an error if the URL is
50not accessible (see "Non-goal 1").
51
527.  DIRECTORY works for non-wildcards.
53
548.  URL pathname work as a valid argument for OPEN with :DIRECTION :INPUT.
55
569.  Enable the loading of ASDF2 systems referenced by a URL pathname.
57
5810.  Pathnames constructed with the "file" scheme
59(i.e. #p"file:/this/file") need to be properly URI encoded according
60to RFC3986 or otherwise will signal FILE-ERROR. 
61
6211.  The "file" scheme will continue to be represented by an
63"ordinary" Pathname.  Thus, after construction of a URL Pathname with
64the "file" scheme, the namestring of the resulting PATHNAME will no
65longer contain the "file:" prefix.
66
6712.  The "jar" scheme will continue to be represented by a jar
68Pathname.
69
70
71Non-goals
72---------
73
741.  We will not implement canonicalization of URL schemas (such as
75following "http" redirects).
76
772.  DIRECTORY will not work for URL pathnames containing wildcards.
78
79
80Implementation
81--------------
82
83A PATHNAME refering to a resource referenced by a URL is known as a
84URL PATHNAME.
85
86A URL PATHNAME always has a HOST component which is a proper list.
87This list will be an property list (plist).  The property list
88values must be character strings.
89
90    :SCHEME
91        Scheme of URI ("http", "ftp", "bundle", etc.)
92    :AUTHORITY   
93        Valid authority according to the URI scheme.  For "http" this
94        could be "example.org:8080".
95    :QUERY
96        The query of the URI
97    :FRAGMENT
98        The fragment portion of the URI
99       
100The DIRECTORY, NAME and TYPE fields of the PATHNAME are used to form
101the URI `path` according to the conventions of the UNIX filesystem
102(i.e. '/' is the directory separator).  In a sense the HOST contains
103the base URL, to which the `path` is a relative URL (although this
104abstraction is violated somwhat by the storing of the QUERY and
105FRAGMENT portions of the URI in the HOST component).
106
107For the purposes of PATHNAME-MATCH-P, two URL pathnames may be said to
108match if their HOST compoments are EQUAL, and all other components are
109considered to match according to the existing rules for Pathnames.
110
111A URL pathname must have a DEVICE whose value is NIL.
112
113Upon creation, the presence of ".." and "." components in the
114DIRECTORY are removed.  The DIRECTORY component, if present, is always
115absolute.
116
117The namestring of a URL pathname shall be formed by the usual
118conventions of a URL.
119
120A URL Pathname has type URL-PATHNAME, derived from PATHNAME.
121
122
123URI Encoding
124------------
125
126For dealing with URI Encoding (also known as [Percent Encoding]() we
127adopt the following rules
128
129[Percent Encoding]: http://en.wikipedia.org/wiki/Percent-encoding
130
1311.  All pathname components are represented "as is" without escaping.
132
1332.  Namestrings are suitably escaped if the Pathname is a URL-PATHNAME
134    or a JAR-PATHNAME.
135
1363.  Namestrings should all "round-trip":
137
138    (when (typep p 'pathname)
139       (equal (namestring p)
140              (namestring (pathname p))))
141
142
143Status
144------
145
146This design has been implemented.
147
148
149History
150-------
151
15226 NOV 2010 Changed implemenation to use URI encodings for the "file"
153  schemes including those nested with the "jar" scheme by like
154  aka. "jar:file:/location/of/some.jar!/".
155
15621 JUN 2011 Fixed implementation to properly handle URI encodings
157  refering nested jar archive.
Note: See TracBrowser for help on using the repository browser.