source: branches/streams/abcl/doc/design/pathnames/jar-pathnames.markdown

Last change on this file was 13353, checked in by Mark Evenson, 14 years ago

Fix problems with whitespace in JAR-PATHNAME.

For dealing with URI Encoding (also known as [Percent Encoding]() we
implement the following rules which were implicitly.

[Percent Encoding]: http://en.wikipedia.org/wiki/Percent-encoding

  1. All pathname components are represented "as is" without escaping.
  1. Namestrings are suitably escaped if the Pathname is a URL-PATHNAME

or a JAR-PATHNAME.

  1. Namestrings should all "round-trip":

(when (typep p 'pathname)

(equal (namestring p)

(namestring (pathname p))))

Users may use EXT:URI-ENCODE and EXT:URI-DECODE to access the escaping
rules in circumstances where they wish to manipulate PATHNAME
namestrings more directly.

All tests in JAR-PATHNAMES now pass.

Constructors for PATHNAME now produce ERROR rather than FILE-ERROR as
CLHS says "The type file-error consists of error conditions that occur
during an attempt to open or close a file, or during some low-level
transactions with a file system," which doesn't apply here.

File size: 8.7 KB
Line 
1JARs and JAR entries in ABCL
2============================
3
4    Mark Evenson
5    Created:  09 JAN 2010
6    Modified: 21 JUN 2011
7
8Notes towards an implementation of "jar:" references to be contained
9in Common Lisp `PATHNAME`s within ABCL.
10
11Goals
12-----
13
141.  Use Common Lisp pathnames to refer to entries in a jar file.
15   
162.  Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for
17    namestring representation.
18
19    An entry in a JAR file:
20
21         #p"jar:file:baz.jar!/foo"
22   
23    A JAR file:
24
25         #p"jar:file:baz.jar!/"
26
27    A JAR file accessible via URL
28
29         #p"jar:http://example.org/abcl.jar!/"
30
31    An entry in a ABCL FASL in a URL accessible JAR file
32
33         #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
34         
35[jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html
36
373.  `MERGE-PATHNAMES` working for jar entries in the following use cases:
38
39        (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._")
40        ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls"
41
42        (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/")
43        ==> "jar:file:foo.abcl!/foo-1.cls"
44
454.  TRUENAME and PROBE-FILE working with "jar:" with TRUENAME
46    cannonicalizing the JAR reference.
47
485.  DIRECTORY working within JAR files (and within JAR in JAR).
49
506.  References "jar:<URL>" for all strings <URL> that java.net.URL can
51    resolve works.
52
537.  Make jar pathnames work as a valid argument for OPEN with
54:DIRECTION :INPUT.
55
568.  Enable the loading of ASDF systems packaged within jar files.
57
589.  Enable the matching of jar pathnames with PATHNAME-MATCH-P
59
60        (pathname-match-p
61          "jar:file:/a/b/some.jar!/a/system/def.asd"
62          "jar:file:/**/*.jar!/**/*.asd")     
63        ==> t
64
65Status
66------
67
68All the above goals have been implemented and tested.
69
70
71Implementation
72--------------
73
74A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME.
75It can either refer to the entire JAR file or an entry within the JAR
76file.
77
78A JAR PATHNAME always has a DEVICE which is a proper list.  This
79distinguishes it from other uses of Pathname.
80
81The DEVICE of a JAR PATHNAME will be a list with either one or two
82elements.  The first element of the JAR PATHNAME can be either a
83PATHNAME representing a JAR on the filesystem, or a URL PATHNAME.
84
85A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is
86known as a DEVICE PATHNAME.
87
88Only the first entry in the the DEVICE list may be a URL PATHNAME.
89
90Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file.
91
92The DEVICE PATHNAME list of enclosing JARs runs from outermost to
93innermost.  The implementaion currently limits this list to have at
94most two elements.
95   
96The DIRECTORY component of a JAR PATHNAME should be a list starting
97with the :ABSOLUTE keyword.  Even though hierarchial entries in jar
98files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp",
99the meaning of DIRECTORY component is better represented as an
100absolute path.
101
102A jar Pathname has type JAR-PATHNAME, derived from PATHNAME.
103
104
105BNF
106---
107
108An incomplete BNF of the syntax of JAR PATHNAME would be:
109
110      JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ]
111
112      URL ::= <URL parsable via java.net.URL.URL()>
113            | JAR-FILE-PATHNAME
114
115      JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ]
116
117      JAR-NAMESTRING  ::=  ABSOLUTE-FILE-NAMESTRING
118                         | RELATIVE-FILE-NAMESTRING
119
120      ENTRY ::= [ DIRECTORY "/"]* FILE
121
122
123### Notes
124
1251.  `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use
126the local filesystem conventions, meaning that on Windows this could
127contain '\' as the directory separator, which are always normalized to
128'/'.  An `ENTRY` always uses '/' to separate directories within the
129jar archive.
130
131
132Use Cases
133---------
134
135    // UC1 -- JAR
136    pathname: {
137      namestring: "jar:file:foo/baz.jar!/"
138      device: (
139        pathname: { 
140          device: "jar:file:"
141          directory: (:RELATIVE "foo")
142          name: "baz"
143          type: "jar"
144        }
145      )
146    }
147
148
149    // UC2 -- JAR entry
150    pathname: {
151      namestring: "jar:file:baz.jar!/foo.abcl"
152      device: ( pathname: {
153        device: "jar:file:"
154        name: "baz"
155        type: "jar"
156      })
157      name: "foo"
158      type: "abcl"
159    }
160
161
162    // UC3 -- JAR file in a JAR entry
163    pathname: {
164      namestring: "jar:jar:file:baz.jar!/foo.abcl!/"
165      device: (
166        pathname: {
167          name: "baz"
168          type: "jar"
169        }
170        pathname: {
171          name: "foo"
172          type: "abcl"
173        }
174      )
175    }
176
177    // UC4 -- JAR entry in a JAR entry with directories
178    pathname: {
179      namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls"
180      device: (
181        pathname {
182          directory: (:RELATIVE "a")     
183          name: "bar"
184          type: "jar"
185        }
186        pathname {
187          directory: (:RELATIVE "b" "c")
188          name: "foo"
189          type: "abcl"
190        }
191      )
192      directory: (:RELATIVE "this" "that")
193      name: "foo-20"
194      type: "cls"
195    }
196
197    // UC5 -- JAR Entry in a JAR Entry
198    pathname: {
199      namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls"
200      device: (
201        pathname: {
202          directory: (:RELATIVE "a" "foo")
203          name: "baz"
204          type: "jar"
205        }
206        pathname: {
207          directory: (:RELATIVE "c" "d")
208          name: "foo"
209          type: "abcl"
210        }
211      )
212      directory: (:ABSOLUTE "a" "b")
213      name: "bar-1"
214      type: "cls"
215    }
216
217    // UC6 -- JAR entry in a http: accessible JAR file
218    pathname: {
219      namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class",
220      device: (
221        pathname: {
222          namestring: "http://example.org/abcl.jar"
223        }
224        pathname: {
225          directory: (:RELATIVE "org" "armedbear" "lisp")
226          name: "Version"
227          type: "class"
228       }
229    }
230
231    // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE
232    pathname: {
233       namestring  "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
234       device: (
235         pathname: {
236           namestring: "http://example.org/abcl.jar"
237         }
238         pathname: {
239           name: "foo"
240           type: "abcl"
241         }
242      )
243      name: "foo-1"
244      type: "cls"
245    }
246
247    // UC8 -- JAR in an absolute directory
248
249    pathame: {
250       namestring: "jar:file:/a/b/foo.jar!/"
251       device: (
252         pathname: {
253           directory: (:ABSOLUTE "a" "b")
254           name: "foo"
255           type: "jar"
256         }
257       )
258    }
259
260    // UC9 -- JAR in an relative directory with entry
261    pathname: {
262       namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp"
263       device: (
264         directory: (:RELATIVE "a" "b")
265         name: "foo"
266         type: "jar"
267       )
268       directory: (:ABSOLUTE "c" "d")
269       name: "foo"
270       type: "lisp
271    }
272
273
274URI Encoding
275------------
276
277As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for
278that type.  Most notably this means that all #\Space characters should
279be encoded as '%20' when dealing with jar entries.
280
281
282History
283-------
284
285Previously, ABCL did have some support for jar pathnames. This support
286used the convention that the if the device field was itself a
287pathname, the device pathname contained the location of the jar.
288
289In the analysis of the desire to treat jar pathnames as valid
290locations for `LOAD`, we determined that we needed a "double" pathname
291so we could refer to the components of a packed FASL in jar.  At first
292we thought we could support such a syntax by having the device
293pathname's device refer to the inner jar.  But with in this use of
294`PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC
295path support uses the `DEVICE` field so JARs located on UNC mounts can't
296be referenced. via '\\', i.e. 
297
298    jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java
299
300would not have a valid representation.
301
302So instead of having `DEVICE` point to a `PATHNAME`, we decided that the
303`DEVICE` shall be a list of `PATHNAME`, so we would have:
304
305    pathname: {
306      namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/"
307      device: (
308                pathname: {
309                  host: "server"
310                  device: "share"
311                  name: "foo"
312                  type: "jar"
313                }
314                pathname: {
315                  name: "foo"
316                  type: "abcl"
317                }
318              )
319    }
320
321Although there is a fair amount of special logic inside `Pathname.java`
322itself in the resulting implementation, the logic in `Load.java` seems
323to have been considerably simplified.
324
325When we implemented URL Pathnames, the special syntax for URL as an
326abstract string in the first position of the device list was naturally
327replaced with a URL pathname.
328
329
Note: See TracBrowser for help on using the repository browser.