1 | JARs and JAR entries in ABCL |
---|
2 | ============================ |
---|
3 | |
---|
4 | Mark Evenson |
---|
5 | Created: 09 JAN 2010 |
---|
6 | Modified: 02 NOV 2019 |
---|
7 | |
---|
8 | Notes towards an implementation of "jar:" references to be contained |
---|
9 | in Common Lisp `PATHNAME`s within ABCL. |
---|
10 | |
---|
11 | Broken implementation |
---|
12 | --------------------- |
---|
13 | |
---|
14 | abcl-1.5.0 was discovered to be broken with respect to nested jar |
---|
15 | entries in November 2019. This is evidenced by the tests invoked via |
---|
16 | |
---|
17 | (asdf:test-system :abcl) |
---|
18 | |
---|
19 | failing with |
---|
20 | |
---|
21 | Failed to parse URL 'jar:jar:file:a/baz.jar!/b/c/foo.abcl!/'Nested JAR URLs are not supported |
---|
22 | |
---|
23 | In researching where to fix, a flaw in the reasoning about nesting jar |
---|
24 | pathnames emerged. The current implementation uses the device as a |
---|
25 | CONS for storing the results of the hacky processing around the `jar` |
---|
26 | scheme. This was reasoned to be "good enough" in that it kept the |
---|
27 | pathnames referencing pathnames to a minimum and no suitable case had |
---|
28 | been meaningful forwarded. In the days of Ãberjars, where it is |
---|
29 | perfectly accepable to have jars within jars, here is a counter-example: |
---|
30 | |
---|
31 | The jar containing the jar containing the abcl fasl |
---|
32 | |
---|
33 | We need to name all possible locations of ABCL fasl files. |
---|
34 | |
---|
35 | To fix this, we need to allow the following structure for |
---|
36 | |
---|
37 | #p"jar:jar:jar:file:abcl.jar!/b/c/foo.abcl!/foo.cls" |
---|
38 | |
---|
39 | resolve to linked PATHNAME-DEVICE references: |
---|
40 | |
---|
41 | "foo.cls" --device--> "foo.abcl" --device--> "abcl.jar" |
---|
42 | |
---|
43 | Towards Fixing |
---|
44 | ============== |
---|
45 | |
---|
46 | It would be better to reflect the pathname hierarchy as Java classes. |
---|
47 | Although hooking up things is gonna take some elbow grease, being to |
---|
48 | cleanly separate the logic for our schemas like "jar", and the special |
---|
49 | handling that should happen with all pathnames whose namestring starts |
---|
50 | with a schema we handle (like HTML encoding into/out of expression) |
---|
51 | would be helpful. |
---|
52 | |
---|
53 | We make a breaking change with how we abstract the notion of "Archive" |
---|
54 | and "Archive Entries". |
---|
55 | |
---|
56 | Pathname DEVICE fields currently contain either |
---|
57 | |
---|
58 | + a single digit denoting a UNC drive (Windows) |
---|
59 | |
---|
60 | + a list containing one or two pathnames denoting paths within archives |
---|
61 | |
---|
62 | It is conceptually much more correct to only have a single Pathname in |
---|
63 | a file to denote the source of an archive. |
---|
64 | |
---|
65 | |
---|
66 | |
---|
67 | |
---|
68 | Goals |
---|
69 | ----- |
---|
70 | |
---|
71 | 1. Use Common Lisp pathnames to refer to entries in a jar file. |
---|
72 | |
---|
73 | 2. Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for |
---|
74 | namestring representation. |
---|
75 | |
---|
76 | An entry in a JAR file: |
---|
77 | |
---|
78 | #p"jar:file:baz.jar!/foo" |
---|
79 | |
---|
80 | A JAR file: |
---|
81 | |
---|
82 | #p"jar:file:baz.jar!/" |
---|
83 | |
---|
84 | A JAR file accessible via URL |
---|
85 | |
---|
86 | #p"jar:http://example.org/abcl.jar!/" |
---|
87 | |
---|
88 | An entry in a ABCL FASL in a URL accessible JAR file |
---|
89 | |
---|
90 | #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls" |
---|
91 | |
---|
92 | [jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html |
---|
93 | |
---|
94 | 3. `MERGE-PATHNAMES` working for jar entries in the following use cases: |
---|
95 | |
---|
96 | (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._") |
---|
97 | ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls" |
---|
98 | |
---|
99 | (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/") |
---|
100 | ==> "jar:file:foo.abcl!/foo-1.cls" |
---|
101 | |
---|
102 | 4. TRUENAME and PROBE-FILE working with "jar:" with TRUENAME |
---|
103 | cannonicalizing the JAR reference. |
---|
104 | |
---|
105 | 5. DIRECTORY working within JAR files (and within JAR in JAR). |
---|
106 | |
---|
107 | 6. References "jar:<URL>" for all strings <URL> that java.net.URL can |
---|
108 | resolve works. |
---|
109 | |
---|
110 | 7. Make jar pathnames work as a valid argument for OPEN with |
---|
111 | :DIRECTION :INPUT. |
---|
112 | |
---|
113 | 8. Enable the loading of ASDF systems packaged within jar files. |
---|
114 | |
---|
115 | 9. Enable the matching of jar pathnames with PATHNAME-MATCH-P |
---|
116 | |
---|
117 | (pathname-match-p |
---|
118 | "jar:file:/a/b/some.jar!/a/system/def.asd" |
---|
119 | "jar:file:/**/*.jar!/**/*.asd") |
---|
120 | ==> t |
---|
121 | |
---|
122 | Status |
---|
123 | ------ |
---|
124 | |
---|
125 | All the above goals have been implemented and tested. |
---|
126 | |
---|
127 | |
---|
128 | Implementation |
---|
129 | -------------- |
---|
130 | |
---|
131 | A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME. |
---|
132 | It can either refer to the entire JAR file or an entry within the JAR |
---|
133 | file. |
---|
134 | |
---|
135 | A JAR PATHNAME always has a DEVICE which is a proper list. This |
---|
136 | distinguishes it from other uses of Pathname. |
---|
137 | |
---|
138 | The DEVICE of a JAR PATHNAME will be a list with either one or two |
---|
139 | elements. The first element of the JAR PATHNAME can be either a |
---|
140 | PATHNAME representing a JAR on the filesystem, or a URL PATHNAME. |
---|
141 | |
---|
142 | A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is |
---|
143 | known as a DEVICE PATHNAME. |
---|
144 | |
---|
145 | Only the first entry in the the DEVICE list may be a URL PATHNAME. |
---|
146 | |
---|
147 | Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file. |
---|
148 | |
---|
149 | The DEVICE PATHNAME list of enclosing JARs runs from outermost to |
---|
150 | innermost. The implementaion currently limits this list to have at |
---|
151 | most two elements. |
---|
152 | |
---|
153 | The DIRECTORY component of a JAR PATHNAME should be a list starting |
---|
154 | with the :ABSOLUTE keyword. Even though hierarchial entries in jar |
---|
155 | files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp", |
---|
156 | the meaning of DIRECTORY component is better represented as an |
---|
157 | absolute path. |
---|
158 | |
---|
159 | A jar Pathname has type JAR-PATHNAME, derived from PATHNAME. |
---|
160 | |
---|
161 | |
---|
162 | BNF |
---|
163 | --- |
---|
164 | |
---|
165 | An incomplete BNF of the syntax of JAR PATHNAME would be: |
---|
166 | |
---|
167 | JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ] |
---|
168 | |
---|
169 | URL ::= <URL parsable via java.net.URL.URL()> |
---|
170 | | JAR-FILE-PATHNAME |
---|
171 | |
---|
172 | JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ] |
---|
173 | |
---|
174 | JAR-NAMESTRING ::= ABSOLUTE-FILE-NAMESTRING |
---|
175 | | RELATIVE-FILE-NAMESTRING |
---|
176 | |
---|
177 | ENTRY ::= [ DIRECTORY "/"]* FILE |
---|
178 | |
---|
179 | |
---|
180 | ### Notes |
---|
181 | |
---|
182 | 1. `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use |
---|
183 | the local filesystem conventions, meaning that on Windows this could |
---|
184 | contain '\' as the directory separator, which are always normalized to |
---|
185 | '/'. An `ENTRY` always uses '/' to separate directories within the |
---|
186 | jar archive. |
---|
187 | |
---|
188 | |
---|
189 | Use Cases |
---|
190 | --------- |
---|
191 | |
---|
192 | // UC1 -- JAR |
---|
193 | pathname: { |
---|
194 | namestring: "jar:file:foo/baz.jar!/" |
---|
195 | device: ( |
---|
196 | pathname: { |
---|
197 | device: "jar:file:" |
---|
198 | directory: (:RELATIVE "foo") |
---|
199 | name: "baz" |
---|
200 | type: "jar" |
---|
201 | } |
---|
202 | ) |
---|
203 | } |
---|
204 | |
---|
205 | |
---|
206 | // UC2 -- JAR entry |
---|
207 | pathname: { |
---|
208 | namestring: "jar:file:baz.jar!/foo.abcl" |
---|
209 | device: ( pathname: { |
---|
210 | device: "jar:file:" |
---|
211 | name: "baz" |
---|
212 | type: "jar" |
---|
213 | }) |
---|
214 | name: "foo" |
---|
215 | type: "abcl" |
---|
216 | } |
---|
217 | |
---|
218 | |
---|
219 | // UC3 -- JAR file in a JAR entry |
---|
220 | pathname: { |
---|
221 | namestring: "jar:jar:file:baz.jar!/foo.abcl!/" |
---|
222 | device: ( |
---|
223 | pathname: { |
---|
224 | name: "baz" |
---|
225 | type: "jar" |
---|
226 | } |
---|
227 | pathname: { |
---|
228 | name: "foo" |
---|
229 | type: "abcl" |
---|
230 | } |
---|
231 | ) |
---|
232 | } |
---|
233 | |
---|
234 | // UC4 -- JAR entry in a JAR entry with directories |
---|
235 | pathname: { |
---|
236 | namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls" |
---|
237 | device: ( |
---|
238 | pathname { |
---|
239 | directory: (:RELATIVE "a") |
---|
240 | name: "bar" |
---|
241 | type: "jar" |
---|
242 | } |
---|
243 | pathname { |
---|
244 | directory: (:RELATIVE "b" "c") |
---|
245 | name: "foo" |
---|
246 | type: "abcl" |
---|
247 | } |
---|
248 | ) |
---|
249 | directory: (:RELATIVE "this" "that") |
---|
250 | name: "foo-20" |
---|
251 | type: "cls" |
---|
252 | } |
---|
253 | |
---|
254 | // UC5 -- JAR Entry in a JAR Entry |
---|
255 | pathname: { |
---|
256 | namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls" |
---|
257 | device: ( |
---|
258 | pathname: { |
---|
259 | directory: (:RELATIVE "a" "foo") |
---|
260 | name: "baz" |
---|
261 | type: "jar" |
---|
262 | } |
---|
263 | pathname: { |
---|
264 | directory: (:RELATIVE "c" "d") |
---|
265 | name: "foo" |
---|
266 | type: "abcl" |
---|
267 | } |
---|
268 | ) |
---|
269 | directory: (:ABSOLUTE "a" "b") |
---|
270 | name: "bar-1" |
---|
271 | type: "cls" |
---|
272 | } |
---|
273 | |
---|
274 | // UC6 -- JAR entry in a http: accessible JAR file |
---|
275 | pathname: { |
---|
276 | namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class", |
---|
277 | device: ( |
---|
278 | pathname: { |
---|
279 | namestring: "http://example.org/abcl.jar" |
---|
280 | } |
---|
281 | pathname: { |
---|
282 | directory: (:RELATIVE "org" "armedbear" "lisp") |
---|
283 | name: "Version" |
---|
284 | type: "class" |
---|
285 | } |
---|
286 | } |
---|
287 | |
---|
288 | // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE |
---|
289 | pathname: { |
---|
290 | namestring "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls" |
---|
291 | device: ( |
---|
292 | pathname: { |
---|
293 | namestring: "http://example.org/abcl.jar" |
---|
294 | } |
---|
295 | pathname: { |
---|
296 | name: "foo" |
---|
297 | type: "abcl" |
---|
298 | } |
---|
299 | ) |
---|
300 | name: "foo-1" |
---|
301 | type: "cls" |
---|
302 | } |
---|
303 | |
---|
304 | // UC8 -- JAR in an absolute directory |
---|
305 | |
---|
306 | pathame: { |
---|
307 | namestring: "jar:file:/a/b/foo.jar!/" |
---|
308 | device: ( |
---|
309 | pathname: { |
---|
310 | directory: (:ABSOLUTE "a" "b") |
---|
311 | name: "foo" |
---|
312 | type: "jar" |
---|
313 | } |
---|
314 | ) |
---|
315 | } |
---|
316 | |
---|
317 | // UC9 -- JAR in an relative directory with entry |
---|
318 | pathname: { |
---|
319 | namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp" |
---|
320 | device: ( |
---|
321 | directory: (:RELATIVE "a" "b") |
---|
322 | name: "foo" |
---|
323 | type: "jar" |
---|
324 | ) |
---|
325 | directory: (:ABSOLUTE "c" "d") |
---|
326 | name: "foo" |
---|
327 | type: "lisp |
---|
328 | } |
---|
329 | |
---|
330 | |
---|
331 | URI Encoding |
---|
332 | ------------ |
---|
333 | |
---|
334 | As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for |
---|
335 | that type. Most notably this means that all #\Space characters should |
---|
336 | be encoded as '%20' when dealing with jar entries. |
---|
337 | |
---|
338 | |
---|
339 | History |
---|
340 | ------- |
---|
341 | |
---|
342 | Previously, ABCL did have some support for jar pathnames. This support |
---|
343 | used the convention that the if the device field was itself a |
---|
344 | pathname, the device pathname contained the location of the jar. |
---|
345 | |
---|
346 | In the analysis of the desire to treat jar pathnames as valid |
---|
347 | locations for `LOAD`, we determined that we needed a "double" pathname |
---|
348 | so we could refer to the components of a packed FASL in jar. At first |
---|
349 | we thought we could support such a syntax by having the device |
---|
350 | pathname's device refer to the inner jar. But with in this use of |
---|
351 | `PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC |
---|
352 | path support uses the `DEVICE` field so JARs located on UNC mounts can't |
---|
353 | be referenced. via '\\', i.e. |
---|
354 | |
---|
355 | jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java |
---|
356 | |
---|
357 | would not have a valid representation. |
---|
358 | |
---|
359 | So instead of having `DEVICE` point to a `PATHNAME`, we decided that the |
---|
360 | `DEVICE` shall be a list of `PATHNAME`, so we would have: |
---|
361 | |
---|
362 | pathname: { |
---|
363 | namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/" |
---|
364 | device: ( |
---|
365 | pathname: { |
---|
366 | host: "server" |
---|
367 | device: "share" |
---|
368 | name: "foo" |
---|
369 | type: "jar" |
---|
370 | } |
---|
371 | pathname: { |
---|
372 | name: "foo" |
---|
373 | type: "abcl" |
---|
374 | } |
---|
375 | ) |
---|
376 | } |
---|
377 | |
---|
378 | Although there is a fair amount of special logic inside `Pathname.java` |
---|
379 | itself in the resulting implementation, the logic in `Load.java` seems |
---|
380 | to have been considerably simplified. |
---|
381 | |
---|
382 | When we implemented URL Pathnames, the special syntax for URL as an |
---|
383 | abstract string in the first position of the device list was naturally |
---|
384 | replaced with a URL pathname. |
---|
385 | |
---|
386 | |
---|