source: trunk/abcl/doc/design/pathnames/jar-pathnames.markdown

Last change on this file was 15184, checked in by Mark Evenson, 5 years ago

doc: start describing current problems with CL:PATHNAME

Also needs rethinking about literals like wildcard.

File size: 10.6 KB
Line 
1JARs and JAR entries in ABCL
2============================
3
4    Mark Evenson
5    Created:  09 JAN 2010
6    Modified: 02 NOV 2019
7
8Notes towards an implementation of "jar:" references to be contained
9in Common Lisp `PATHNAME`s within ABCL.
10
11Broken implementation
12---------------------
13
14abcl-1.5.0 was discovered to be broken with respect to nested jar
15entries in November 2019.  This is evidenced by the tests invoked via
16
17    (asdf:test-system :abcl)
18   
19failing with
20
21    Failed to parse URL 'jar:jar:file:a/baz.jar!/b/c/foo.abcl!/'Nested JAR URLs are not supported
22   
23In researching where to fix, a flaw in the reasoning about nesting jar
24pathnames emerged.  The current implementation uses the device as a
25CONS for storing the results of the hacky processing around the `jar`
26scheme.  This was reasoned to be "good enough" in that it kept the
27pathnames referencing pathnames to a minimum and no suitable case had
28been meaningful forwarded.  In the days of Überjars, where it is
29perfectly accepable to have jars within jars, here is a counter-example:
30
31    The jar containing the jar containing the abcl fasl
32   
33We need to name all possible locations of ABCL fasl files.
34
35To fix this, we need to allow the following structure for
36
37    #p"jar:jar:jar:file:abcl.jar!/b/c/foo.abcl!/foo.cls"
38
39resolve to linked PATHNAME-DEVICE references:
40
41    "foo.cls"  --device--> "foo.abcl" --device--> "abcl.jar"
42   
43Towards Fixing
44==============
45
46It would be better to reflect the pathname hierarchy as Java classes.
47Although hooking up things is gonna take some elbow grease, being to
48cleanly separate the logic for our schemas like "jar", and the special
49handling that should happen with all pathnames whose namestring starts
50with a schema we handle (like HTML encoding into/out of expression)
51would be helpful.
52
53We make a breaking change with how we abstract the notion of "Archive"
54and "Archive Entries".
55
56Pathname DEVICE fields currently contain either
57
58+ a single digit denoting a UNC drive (Windows)
59
60+ a list containing one or two pathnames denoting paths within archives
61
62It is conceptually much more correct to only have a single Pathname in
63a file to denote the source of an archive.
64
65
66
67
68Goals
69-----
70
711.  Use Common Lisp pathnames to refer to entries in a jar file.
72   
732.  Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for
74    namestring representation.
75
76    An entry in a JAR file:
77
78         #p"jar:file:baz.jar!/foo"
79   
80    A JAR file:
81
82         #p"jar:file:baz.jar!/"
83
84    A JAR file accessible via URL
85
86         #p"jar:http://example.org/abcl.jar!/"
87
88    An entry in a ABCL FASL in a URL accessible JAR file
89
90         #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
91         
92[jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html
93
943.  `MERGE-PATHNAMES` working for jar entries in the following use cases:
95
96        (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._")
97        ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls"
98
99        (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/")
100        ==> "jar:file:foo.abcl!/foo-1.cls"
101
1024.  TRUENAME and PROBE-FILE working with "jar:" with TRUENAME
103    cannonicalizing the JAR reference.
104
1055.  DIRECTORY working within JAR files (and within JAR in JAR).
106
1076.  References "jar:<URL>" for all strings <URL> that java.net.URL can
108    resolve works.
109
1107.  Make jar pathnames work as a valid argument for OPEN with
111:DIRECTION :INPUT.
112
1138.  Enable the loading of ASDF systems packaged within jar files.
114
1159.  Enable the matching of jar pathnames with PATHNAME-MATCH-P
116
117        (pathname-match-p
118          "jar:file:/a/b/some.jar!/a/system/def.asd"
119          "jar:file:/**/*.jar!/**/*.asd")     
120        ==> t
121
122Status
123------
124
125All the above goals have been implemented and tested.
126
127
128Implementation
129--------------
130
131A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME.
132It can either refer to the entire JAR file or an entry within the JAR
133file.
134
135A JAR PATHNAME always has a DEVICE which is a proper list.  This
136distinguishes it from other uses of Pathname.
137
138The DEVICE of a JAR PATHNAME will be a list with either one or two
139elements.  The first element of the JAR PATHNAME can be either a
140PATHNAME representing a JAR on the filesystem, or a URL PATHNAME.
141
142A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is
143known as a DEVICE PATHNAME.
144
145Only the first entry in the the DEVICE list may be a URL PATHNAME.
146
147Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file.
148
149The DEVICE PATHNAME list of enclosing JARs runs from outermost to
150innermost.  The implementaion currently limits this list to have at
151most two elements.
152   
153The DIRECTORY component of a JAR PATHNAME should be a list starting
154with the :ABSOLUTE keyword.  Even though hierarchial entries in jar
155files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp",
156the meaning of DIRECTORY component is better represented as an
157absolute path.
158
159A jar Pathname has type JAR-PATHNAME, derived from PATHNAME.
160
161
162BNF
163---
164
165An incomplete BNF of the syntax of JAR PATHNAME would be:
166
167      JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ]
168
169      URL ::= <URL parsable via java.net.URL.URL()>
170            | JAR-FILE-PATHNAME
171
172      JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ]
173
174      JAR-NAMESTRING  ::=  ABSOLUTE-FILE-NAMESTRING
175                         | RELATIVE-FILE-NAMESTRING
176
177      ENTRY ::= [ DIRECTORY "/"]* FILE
178
179
180### Notes
181
1821.  `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use
183the local filesystem conventions, meaning that on Windows this could
184contain '\' as the directory separator, which are always normalized to
185'/'.  An `ENTRY` always uses '/' to separate directories within the
186jar archive.
187
188
189Use Cases
190---------
191
192    // UC1 -- JAR
193    pathname: {
194      namestring: "jar:file:foo/baz.jar!/"
195      device: (
196        pathname: { 
197          device: "jar:file:"
198          directory: (:RELATIVE "foo")
199          name: "baz"
200          type: "jar"
201        }
202      )
203    }
204
205
206    // UC2 -- JAR entry
207    pathname: {
208      namestring: "jar:file:baz.jar!/foo.abcl"
209      device: ( pathname: {
210        device: "jar:file:"
211        name: "baz"
212        type: "jar"
213      })
214      name: "foo"
215      type: "abcl"
216    }
217
218
219    // UC3 -- JAR file in a JAR entry
220    pathname: {
221      namestring: "jar:jar:file:baz.jar!/foo.abcl!/"
222      device: (
223        pathname: {
224          name: "baz"
225          type: "jar"
226        }
227        pathname: {
228          name: "foo"
229          type: "abcl"
230        }
231      )
232    }
233
234    // UC4 -- JAR entry in a JAR entry with directories
235    pathname: {
236      namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls"
237      device: (
238        pathname {
239          directory: (:RELATIVE "a")     
240          name: "bar"
241          type: "jar"
242        }
243        pathname {
244          directory: (:RELATIVE "b" "c")
245          name: "foo"
246          type: "abcl"
247        }
248      )
249      directory: (:RELATIVE "this" "that")
250      name: "foo-20"
251      type: "cls"
252    }
253
254    // UC5 -- JAR Entry in a JAR Entry
255    pathname: {
256      namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls"
257      device: (
258        pathname: {
259          directory: (:RELATIVE "a" "foo")
260          name: "baz"
261          type: "jar"
262        }
263        pathname: {
264          directory: (:RELATIVE "c" "d")
265          name: "foo"
266          type: "abcl"
267        }
268      )
269      directory: (:ABSOLUTE "a" "b")
270      name: "bar-1"
271      type: "cls"
272    }
273
274    // UC6 -- JAR entry in a http: accessible JAR file
275    pathname: {
276      namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class",
277      device: (
278        pathname: {
279          namestring: "http://example.org/abcl.jar"
280        }
281        pathname: {
282          directory: (:RELATIVE "org" "armedbear" "lisp")
283          name: "Version"
284          type: "class"
285       }
286    }
287
288    // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE
289    pathname: {
290       namestring  "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls"
291       device: (
292         pathname: {
293           namestring: "http://example.org/abcl.jar"
294         }
295         pathname: {
296           name: "foo"
297           type: "abcl"
298         }
299      )
300      name: "foo-1"
301      type: "cls"
302    }
303
304    // UC8 -- JAR in an absolute directory
305
306    pathame: {
307       namestring: "jar:file:/a/b/foo.jar!/"
308       device: (
309         pathname: {
310           directory: (:ABSOLUTE "a" "b")
311           name: "foo"
312           type: "jar"
313         }
314       )
315    }
316
317    // UC9 -- JAR in an relative directory with entry
318    pathname: {
319       namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp"
320       device: (
321         directory: (:RELATIVE "a" "b")
322         name: "foo"
323         type: "jar"
324       )
325       directory: (:ABSOLUTE "c" "d")
326       name: "foo"
327       type: "lisp
328    }
329
330
331URI Encoding
332------------
333
334As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for
335that type.  Most notably this means that all #\Space characters should
336be encoded as '%20' when dealing with jar entries.
337
338
339History
340-------
341
342Previously, ABCL did have some support for jar pathnames. This support
343used the convention that the if the device field was itself a
344pathname, the device pathname contained the location of the jar.
345
346In the analysis of the desire to treat jar pathnames as valid
347locations for `LOAD`, we determined that we needed a "double" pathname
348so we could refer to the components of a packed FASL in jar.  At first
349we thought we could support such a syntax by having the device
350pathname's device refer to the inner jar.  But with in this use of
351`PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC
352path support uses the `DEVICE` field so JARs located on UNC mounts can't
353be referenced. via '\\', i.e. 
354
355    jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java
356
357would not have a valid representation.
358
359So instead of having `DEVICE` point to a `PATHNAME`, we decided that the
360`DEVICE` shall be a list of `PATHNAME`, so we would have:
361
362    pathname: {
363      namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/"
364      device: (
365                pathname: {
366                  host: "server"
367                  device: "share"
368                  name: "foo"
369                  type: "jar"
370                }
371                pathname: {
372                  name: "foo"
373                  type: "abcl"
374                }
375              )
376    }
377
378Although there is a fair amount of special logic inside `Pathname.java`
379itself in the resulting implementation, the logic in `Load.java` seems
380to have been considerably simplified.
381
382When we implemented URL Pathnames, the special syntax for URL as an
383abstract string in the first position of the device list was naturally
384replaced with a URL pathname.
385
386
Note: See TracBrowser for help on using the repository browser.