source: trunk/abcl/doc/design/pathnames/notes.tex

Last change on this file was 13611, checked in by Mark Evenson, 13 years ago

Start article describing the implementation of URL-PATHNAME.

File size: 13.1 KB
Line 
1\begin{verbatim}
2JARs and JAR entries in ABCL
3============================
4
5    Mark Evenson
6    Created:  09 JAN 2010
7    Modified: 21 JUN 2011
8
9Notes towards an implementation of "jar:" references to be contained
10in Common Lisp `PATHNAME`s within ABCL.
11
12Goals
13-----
14
151.  Use Common Lisp pathnames to refer to entries in a jar file.
16   
172.  Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for
18    namestring representation.
19
20    An entry in a JAR file:
21
22         #p"jar:file:baz.jar!/foo"
23   
24    A JAR file:
25
26         #p"jar:file:baz.jar!/"
27
28    A JAR file accessible via URL
29
30         #p"jar:http://example.org/abcl.jar!/"
31
32    An entry in a ABCL FASL in a URL accessible JAR file
33
34         #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
35         
36[jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html
37
383.  `MERGE-PATHNAMES` working for jar entries in the following use cases:
39
40        (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._")
41        ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls"
42
43        (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/")
44        ==> "jar:file:foo.abcl!/foo-1.cls"
45
464.  TRUENAME and PROBE-FILE working with "jar:" with TRUENAME
47    cannonicalizing the JAR reference.
48
495.  DIRECTORY working within JAR files (and within JAR in JAR).
50
516.  References "jar:<URL>" for all strings <URL> that java.net.URL can
52    resolve works.
53
547.  Make jar pathnames work as a valid argument for OPEN with
55:DIRECTION :INPUT.
56
578.  Enable the loading of ASDF systems packaged within jar files.
58
599.  Enable the matching of jar pathnames with PATHNAME-MATCH-P
60
61        (pathname-match-p
62          "jar:file:/a/b/some.jar!/a/system/def.asd"
63          "jar:file:/**/*.jar!/**/*.asd")     
64        ==> t
65
66Status
67------
68
69All the above goals have been implemented and tested.
70
71
72Implementation
73--------------
74
75A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME.
76It can either refer to the entire JAR file or an entry within the JAR
77file.
78
79A JAR PATHNAME always has a DEVICE which is a proper list.  This
80distinguishes it from other uses of Pathname.
81
82The DEVICE of a JAR PATHNAME will be a list with either one or two
83elements.  The first element of the JAR PATHNAME can be either a
84PATHNAME representing a JAR on the filesystem, or a URL PATHNAME.
85
86A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is
87known as a DEVICE PATHNAME.
88
89Only the first entry in the the DEVICE list may be a URL PATHNAME.
90
91Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file.
92
93The DEVICE PATHNAME list of enclosing JARs runs from outermost to
94innermost.  The implementaion currently limits this list to have at
95most two elements.
96   
97The DIRECTORY component of a JAR PATHNAME should be a list starting
98with the :ABSOLUTE keyword.  Even though hierarchial entries in jar
99files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp",
100the meaning of DIRECTORY component is better represented as an
101absolute path.
102
103A jar Pathname has type JAR-PATHNAME, derived from PATHNAME.
104
105
106BNF
107---
108
109An incomplete BNF of the syntax of JAR PATHNAME would be:
110
111      JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ]
112
113      URL ::= <URL parsable via java.net.URL.URL()>
114            | JAR-FILE-PATHNAME
115
116      JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ]
117
118      JAR-NAMESTRING  ::=  ABSOLUTE-FILE-NAMESTRING
119                         | RELATIVE-FILE-NAMESTRING
120
121      ENTRY ::= [ DIRECTORY "/"]* FILE
122
123
124### Notes
125
1261.  `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use
127the local filesystem conventions, meaning that on Windows this could
128contain '\' as the directory separator, which are always normalized to
129'/'.  An `ENTRY` always uses '/' to separate directories within the
130jar archive.
131
132
133Use Cases
134---------
135
136    // UC1 -- JAR
137    pathname: {
138      namestring: "jar:file:foo/baz.jar!/"
139      device: (
140        pathname: { 
141          device: "jar:file:"
142          directory: (:RELATIVE "foo")
143          name: "baz"
144          type: "jar"
145        }
146      )
147    }
148
149
150    // UC2 -- JAR entry
151    pathname: {
152      namestring: "jar:file:baz.jar!/foo.abcl"
153      device: ( pathname: {
154        device: "jar:file:"
155        name: "baz"
156        type: "jar"
157      })
158      name: "foo"
159      type: "abcl"
160    }
161
162
163    // UC3 -- JAR file in a JAR entry
164    pathname: {
165      namestring: "jar:jar:file:baz.jar!/foo.abcl!/"
166      device: (
167        pathname: {
168          name: "baz"
169          type: "jar"
170        }
171        pathname: {
172          name: "foo"
173          type: "abcl"
174        } 
175      )
176    }
177
178    // UC4 -- JAR entry in a JAR entry with directories
179    pathname: {
180      namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls"
181      device: (
182        pathname {
183          directory: (:RELATIVE "a")     
184          name: "bar"
185          type: "jar"
186        }
187        pathname {
188          directory: (:RELATIVE "b" "c")
189          name: "foo"
190          type: "abcl"
191        }
192      )
193      directory: (:RELATIVE "this" "that")
194      name: "foo-20"
195      type: "cls"
196    }
197
198    // UC5 -- JAR Entry in a JAR Entry
199    pathname: {
200      namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls"
201      device: (
202        pathname: {
203          directory: (:RELATIVE "a" "foo")
204          name: "baz"
205          type: "jar"
206        }
207        pathname: {
208          directory: (:RELATIVE "c" "d")
209          name: "foo"
210          type: "abcl"
211        }
212      )
213      directory: (:ABSOLUTE "a" "b")
214      name: "bar-1"
215      type: "cls"
216    }
217
218    // UC6 -- JAR entry in a http: accessible JAR file
219    pathname: {
220      namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class",
221      device: (
222        pathname: {
223          namestring: "http://example.org/abcl.jar"
224        }
225        pathname: {
226          directory: (:RELATIVE "org" "armedbear" "lisp")
227          name: "Version"
228          type: "class"
229       }
230    }
231
232    // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE
233    pathname: {
234       namestring  "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
235       device: (
236         pathname: {
237           namestring: "http://example.org/abcl.jar"
238         }
239         pathname: { 
240           name: "foo"
241           type: "abcl"
242         }
243      )
244      name: "foo-1"
245      type: "cls"
246    }
247
248    // UC8 -- JAR in an absolute directory
249
250    pathame: {
251       namestring: "jar:file:/a/b/foo.jar!/"
252       device: (
253         pathname: {
254           directory: (:ABSOLUTE "a" "b")
255           name: "foo"
256           type: "jar"
257         }
258       )
259    }
260
261    // UC9 -- JAR in an relative directory with entry
262    pathname: {
263       namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp"
264       device: (
265         directory: (:RELATIVE "a" "b")
266         name: "foo"
267         type: "jar"
268       )
269       directory: (:ABSOLUTE "c" "d")
270       name: "foo"
271       type: "lisp
272    }
273
274
275URI Encoding
276------------
277
278As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for
279that type.  Most notably this means that all #\Space characters should
280be encoded as '%20' when dealing with jar entries.
281
282
283History
284-------
285
286Previously, ABCL did have some support for jar pathnames. This support
287used the convention that the if the device field was itself a
288pathname, the device pathname contained the location of the jar.
289
290In the analysis of the desire to treat jar pathnames as valid
291locations for `LOAD`, we determined that we needed a "double" pathname
292so we could refer to the components of a packed FASL in jar.  At first
293we thought we could support such a syntax by having the device
294pathname's device refer to the inner jar.  But with in this use of
295`PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC
296path support uses the `DEVICE` field so JARs located on UNC mounts can't
297be referenced. via '\\', i.e. 
298
299    jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java
300
301would not have a valid representation.
302
303So instead of having `DEVICE` point to a `PATHNAME`, we decided that the
304`DEVICE` shall be a list of `PATHNAME`, so we would have:
305
306    pathname: {
307      namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/"
308      device: (
309                pathname: {
310                  host: "server"
311                  device: "share"
312                  name: "foo"
313                  type: "jar"
314                }
315                pathname: {
316                  name: "foo"
317                  type: "abcl"
318                }
319              )
320    }
321
322Although there is a fair amount of special logic inside `Pathname.java`
323itself in the resulting implementation, the logic in `Load.java` seems
324to have been considerably simplified.
325
326When we implemented URL Pathnames, the special syntax for URL as an
327abstract string in the first position of the device list was naturally
328replaced with a URL pathname.
329
330\end{verbatim}
331\begin{verbatim}
332
333
334
335URL Pathnames ABCL
336==================
337
338    Mark Evenson
339    Created:  25 MAR 2010
340    Modified: 21 JUN 2011
341
342Notes towards an implementation of URL references to be contained in
343Common Lisp `PATHNAME` objects within ABCL.
344
345
346References
347----------
348
349RFC3986   Uniform Resource Identifier (URI): Generic Syntax
350
351
352URL vs URI
353----------
354
355We use the term URL as shorthand in describing the URL Pathnames, even
356though the corresponding encoding is more akin to a URI as described
357in RFC3986. 
358
359
360Goals
361-----
362
3631.  Use Common Lisp pathnames to refer to representations referenced
364by a URL.
365
3662.  The URL schemes supported shall include at least "http", and those
367enabled by the URLStreamHandler extension mechanism.
368
3693.  Use URL schemes that are understood by the java.net.URL object.
370
371    Example of a Pathname specified by URL:
372   
373        #p"http://example.org/org/armedbear/systems/pgp.asd"
374   
3754.  MERGE-PATHNAMES
376
377        (merge-pathnames "url.asd"
378            "http://example/org/armedbear/systems/pgp.asd")
379        ==> "http://example/org/armedbear/systems/url.asd"
380
3815.  PROBE-FILE returning the state of URL accesibility.
382
3836.  TRUENAME "aliased" to PROBE-FILE signalling an error if the URL is
384not accessible (see "Non-goal 1").
385
3867.  DIRECTORY works for non-wildcards.
387
3888.  URL pathname work as a valid argument for OPEN with :DIRECTION :INPUT.
389
3909.  Enable the loading of ASDF2 systems referenced by a URL pathname.
391
39210.  Pathnames constructed with the "file" scheme
393(i.e. #p"file:/this/file") need to be properly URI encoded according
394to RFC3986 or otherwise will signal FILE-ERROR. 
395
39611.  The "file" scheme will continue to be represented by an
397"ordinary" Pathname.  Thus, after construction of a URL Pathname with
398the "file" scheme, the namestring of the resulting PATHNAME will no
399longer contain the "file:" prefix.
400
40112.  The "jar" scheme will continue to be represented by a jar
402Pathname.
403
404
405Non-goals
406---------
407
4081.  We will not implement canonicalization of URL schemas (such as
409following "http" redirects).
410
4112.  DIRECTORY will not work for URL pathnames containing wildcards.
412
413
414Implementation
415--------------
416
417A PATHNAME refering to a resource referenced by a URL is known as a
418URL PATHNAME.
419
420A URL PATHNAME always has a HOST component which is a proper list.
421This list will be an property list (plist).  The property list
422values must be character strings.
423
424    :SCHEME
425        Scheme of URI ("http", "ftp", "bundle", etc.)
426    :AUTHORITY   
427        Valid authority according to the URI scheme.  For "http" this
428        could be "example.org:8080".
429    :QUERY
430        The query of the URI
431    :FRAGMENT
432        The fragment portion of the URI
433       
434The DIRECTORY, NAME and TYPE fields of the PATHNAME are used to form
435the URI `path` according to the conventions of the UNIX filesystem
436(i.e. '/' is the directory separator).  In a sense the HOST contains
437the base URL, to which the `path` is a relative URL (although this
438abstraction is violated somwhat by the storing of the QUERY and
439FRAGMENT portions of the URI in the HOST component).
440
441For the purposes of PATHNAME-MATCH-P, two URL pathnames may be said to
442match if their HOST compoments are EQUAL, and all other components are
443considered to match according to the existing rules for Pathnames.
444
445A URL pathname must have a DEVICE whose value is NIL.
446
447Upon creation, the presence of ".." and "." components in the
448DIRECTORY are removed.  The DIRECTORY component, if present, is always
449absolute.
450
451The namestring of a URL pathname shall be formed by the usual
452conventions of a URL.
453
454A URL Pathname has type URL-PATHNAME, derived from PATHNAME.
455
456
457URI Encoding
458------------
459
460For dealing with URI Encoding (also known as [Percent Encoding]() we
461adopt the following rules
462
463[Percent Encoding]: http://en.wikipedia.org/wiki/Percent-encoding
464
4651.  All pathname components are represented "as is" without escaping.
466
4672.  Namestrings are suitably escaped if the Pathname is a URL-PATHNAME
468    or a JAR-PATHNAME.
469
4703.  Namestrings should all "round-trip":
471
472    (when (typep p 'pathname)
473       (equal (namestring p)
474              (namestring (pathname p))))
475
476
477Status
478------
479
480This design has been implemented.
481
482
483History
484-------
485
48626 NOV 2010 Changed implemenation to use URI encodings for the "file"
487  schemes including those nested with the "jar" scheme by like
488  aka. "jar:file:/location/of/some.jar!/".
489
49021 JUN 2011 Fixed implementation to properly handle URI encodings
491  refering nested jar archive.
492
493\end{verbatim}
Note: See TracBrowser for help on using the repository browser.