1 | \begin{verbatim} |
---|
2 | JARs and JAR entries in ABCL |
---|
3 | ============================ |
---|
4 | |
---|
5 | Mark Evenson |
---|
6 | Created: 09 JAN 2010 |
---|
7 | Modified: 21 JUN 2011 |
---|
8 | |
---|
9 | Notes towards an implementation of "jar:" references to be contained |
---|
10 | in Common Lisp `PATHNAME`s within ABCL. |
---|
11 | |
---|
12 | Goals |
---|
13 | ----- |
---|
14 | |
---|
15 | 1. Use Common Lisp pathnames to refer to entries in a jar file. |
---|
16 | |
---|
17 | 2. Use `'jar:'` schema as documented in [`java.net.JarURLConnection`][jarURLConnection] for |
---|
18 | namestring representation. |
---|
19 | |
---|
20 | An entry in a JAR file: |
---|
21 | |
---|
22 | #p"jar:file:baz.jar!/foo" |
---|
23 | |
---|
24 | A JAR file: |
---|
25 | |
---|
26 | #p"jar:file:baz.jar!/" |
---|
27 | |
---|
28 | A JAR file accessible via URL |
---|
29 | |
---|
30 | #p"jar:http://example.org/abcl.jar!/" |
---|
31 | |
---|
32 | An entry in a ABCL FASL in a URL accessible JAR file |
---|
33 | |
---|
34 | #p"jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls" |
---|
35 | |
---|
36 | [jarUrlConnection]: http://java.sun.com/javase/6/docs/api/java/net/JarURLConnection.html |
---|
37 | |
---|
38 | 3. `MERGE-PATHNAMES` working for jar entries in the following use cases: |
---|
39 | |
---|
40 | (merge-pathnames "foo-1.cls" "jar:jar:file:baz.jar!/foo.abcl!/foo._") |
---|
41 | ==> "jar:jar:file:baz.jar!/foo.abcl!/foo-1.cls" |
---|
42 | |
---|
43 | (merge-pathnames "foo-1.cls" "jar:file:foo.abcl!/") |
---|
44 | ==> "jar:file:foo.abcl!/foo-1.cls" |
---|
45 | |
---|
46 | 4. TRUENAME and PROBE-FILE working with "jar:" with TRUENAME |
---|
47 | cannonicalizing the JAR reference. |
---|
48 | |
---|
49 | 5. DIRECTORY working within JAR files (and within JAR in JAR). |
---|
50 | |
---|
51 | 6. References "jar:<URL>" for all strings <URL> that java.net.URL can |
---|
52 | resolve works. |
---|
53 | |
---|
54 | 7. Make jar pathnames work as a valid argument for OPEN with |
---|
55 | :DIRECTION :INPUT. |
---|
56 | |
---|
57 | 8. Enable the loading of ASDF systems packaged within jar files. |
---|
58 | |
---|
59 | 9. Enable the matching of jar pathnames with PATHNAME-MATCH-P |
---|
60 | |
---|
61 | (pathname-match-p |
---|
62 | "jar:file:/a/b/some.jar!/a/system/def.asd" |
---|
63 | "jar:file:/**/*.jar!/**/*.asd") |
---|
64 | ==> t |
---|
65 | |
---|
66 | Status |
---|
67 | ------ |
---|
68 | |
---|
69 | All the above goals have been implemented and tested. |
---|
70 | |
---|
71 | |
---|
72 | Implementation |
---|
73 | -------------- |
---|
74 | |
---|
75 | A PATHNAME refering to a file within a JAR is known as a JAR PATHNAME. |
---|
76 | It can either refer to the entire JAR file or an entry within the JAR |
---|
77 | file. |
---|
78 | |
---|
79 | A JAR PATHNAME always has a DEVICE which is a proper list. This |
---|
80 | distinguishes it from other uses of Pathname. |
---|
81 | |
---|
82 | The DEVICE of a JAR PATHNAME will be a list with either one or two |
---|
83 | elements. The first element of the JAR PATHNAME can be either a |
---|
84 | PATHNAME representing a JAR on the filesystem, or a URL PATHNAME. |
---|
85 | |
---|
86 | A PATHNAME occuring in the list in the DEVICE of a JAR PATHNAME is |
---|
87 | known as a DEVICE PATHNAME. |
---|
88 | |
---|
89 | Only the first entry in the the DEVICE list may be a URL PATHNAME. |
---|
90 | |
---|
91 | Otherwise the the DEVICE PATHAME denotes the PATHNAME of the JAR file. |
---|
92 | |
---|
93 | The DEVICE PATHNAME list of enclosing JARs runs from outermost to |
---|
94 | innermost. The implementaion currently limits this list to have at |
---|
95 | most two elements. |
---|
96 | |
---|
97 | The DIRECTORY component of a JAR PATHNAME should be a list starting |
---|
98 | with the :ABSOLUTE keyword. Even though hierarchial entries in jar |
---|
99 | files are stored in the form "foo/bar/a.lisp" not "/foo/bar/a.lisp", |
---|
100 | the meaning of DIRECTORY component is better represented as an |
---|
101 | absolute path. |
---|
102 | |
---|
103 | A jar Pathname has type JAR-PATHNAME, derived from PATHNAME. |
---|
104 | |
---|
105 | |
---|
106 | BNF |
---|
107 | --- |
---|
108 | |
---|
109 | An incomplete BNF of the syntax of JAR PATHNAME would be: |
---|
110 | |
---|
111 | JAR-PATHNAME ::= "jar:" URL "!/" [ ENTRY ] |
---|
112 | |
---|
113 | URL ::= <URL parsable via java.net.URL.URL()> |
---|
114 | | JAR-FILE-PATHNAME |
---|
115 | |
---|
116 | JAR-FILE-PATHNAME ::= "jar:" "file:" JAR-NAMESTRING "!/" [ ENTRY ] |
---|
117 | |
---|
118 | JAR-NAMESTRING ::= ABSOLUTE-FILE-NAMESTRING |
---|
119 | | RELATIVE-FILE-NAMESTRING |
---|
120 | |
---|
121 | ENTRY ::= [ DIRECTORY "/"]* FILE |
---|
122 | |
---|
123 | |
---|
124 | ### Notes |
---|
125 | |
---|
126 | 1. `ABSOLUTE-FILE-NAMESTRING` and `RELATIVE-FILE-NAMESTRING` can use |
---|
127 | the local filesystem conventions, meaning that on Windows this could |
---|
128 | contain '\' as the directory separator, which are always normalized to |
---|
129 | '/'. An `ENTRY` always uses '/' to separate directories within the |
---|
130 | jar archive. |
---|
131 | |
---|
132 | |
---|
133 | Use Cases |
---|
134 | --------- |
---|
135 | |
---|
136 | // UC1 -- JAR |
---|
137 | pathname: { |
---|
138 | namestring: "jar:file:foo/baz.jar!/" |
---|
139 | device: ( |
---|
140 | pathname: { |
---|
141 | device: "jar:file:" |
---|
142 | directory: (:RELATIVE "foo") |
---|
143 | name: "baz" |
---|
144 | type: "jar" |
---|
145 | } |
---|
146 | ) |
---|
147 | } |
---|
148 | |
---|
149 | |
---|
150 | // UC2 -- JAR entry |
---|
151 | pathname: { |
---|
152 | namestring: "jar:file:baz.jar!/foo.abcl" |
---|
153 | device: ( pathname: { |
---|
154 | device: "jar:file:" |
---|
155 | name: "baz" |
---|
156 | type: "jar" |
---|
157 | }) |
---|
158 | name: "foo" |
---|
159 | type: "abcl" |
---|
160 | } |
---|
161 | |
---|
162 | |
---|
163 | // UC3 -- JAR file in a JAR entry |
---|
164 | pathname: { |
---|
165 | namestring: "jar:jar:file:baz.jar!/foo.abcl!/" |
---|
166 | device: ( |
---|
167 | pathname: { |
---|
168 | name: "baz" |
---|
169 | type: "jar" |
---|
170 | } |
---|
171 | pathname: { |
---|
172 | name: "foo" |
---|
173 | type: "abcl" |
---|
174 | } |
---|
175 | ) |
---|
176 | } |
---|
177 | |
---|
178 | // UC4 -- JAR entry in a JAR entry with directories |
---|
179 | pathname: { |
---|
180 | namestring: "jar:jar:file:a/baz.jar!/b/c/foo.abcl!/this/that/foo-20.cls" |
---|
181 | device: ( |
---|
182 | pathname { |
---|
183 | directory: (:RELATIVE "a") |
---|
184 | name: "bar" |
---|
185 | type: "jar" |
---|
186 | } |
---|
187 | pathname { |
---|
188 | directory: (:RELATIVE "b" "c") |
---|
189 | name: "foo" |
---|
190 | type: "abcl" |
---|
191 | } |
---|
192 | ) |
---|
193 | directory: (:RELATIVE "this" "that") |
---|
194 | name: "foo-20" |
---|
195 | type: "cls" |
---|
196 | } |
---|
197 | |
---|
198 | // UC5 -- JAR Entry in a JAR Entry |
---|
199 | pathname: { |
---|
200 | namestring: "jar:jar:file:a/foo/baz.jar!/c/d/foo.abcl!/a/b/bar-1.cls" |
---|
201 | device: ( |
---|
202 | pathname: { |
---|
203 | directory: (:RELATIVE "a" "foo") |
---|
204 | name: "baz" |
---|
205 | type: "jar" |
---|
206 | } |
---|
207 | pathname: { |
---|
208 | directory: (:RELATIVE "c" "d") |
---|
209 | name: "foo" |
---|
210 | type: "abcl" |
---|
211 | } |
---|
212 | ) |
---|
213 | directory: (:ABSOLUTE "a" "b") |
---|
214 | name: "bar-1" |
---|
215 | type: "cls" |
---|
216 | } |
---|
217 | |
---|
218 | // UC6 -- JAR entry in a http: accessible JAR file |
---|
219 | pathname: { |
---|
220 | namestring: "jar:http://example.org/abcl.jar!/org/armedbear/lisp/Version.class", |
---|
221 | device: ( |
---|
222 | pathname: { |
---|
223 | namestring: "http://example.org/abcl.jar" |
---|
224 | } |
---|
225 | pathname: { |
---|
226 | directory: (:RELATIVE "org" "armedbear" "lisp") |
---|
227 | name: "Version" |
---|
228 | type: "class" |
---|
229 | } |
---|
230 | } |
---|
231 | |
---|
232 | // UC7 -- JAR Entry in a JAR Entry in a URL accessible JAR FILE |
---|
233 | pathname: { |
---|
234 | namestring "jar:jar:http://example.org/abcl.jar!/foo.abcl!/foo-1.cls" |
---|
235 | device: ( |
---|
236 | pathname: { |
---|
237 | namestring: "http://example.org/abcl.jar" |
---|
238 | } |
---|
239 | pathname: { |
---|
240 | name: "foo" |
---|
241 | type: "abcl" |
---|
242 | } |
---|
243 | ) |
---|
244 | name: "foo-1" |
---|
245 | type: "cls" |
---|
246 | } |
---|
247 | |
---|
248 | // UC8 -- JAR in an absolute directory |
---|
249 | |
---|
250 | pathame: { |
---|
251 | namestring: "jar:file:/a/b/foo.jar!/" |
---|
252 | device: ( |
---|
253 | pathname: { |
---|
254 | directory: (:ABSOLUTE "a" "b") |
---|
255 | name: "foo" |
---|
256 | type: "jar" |
---|
257 | } |
---|
258 | ) |
---|
259 | } |
---|
260 | |
---|
261 | // UC9 -- JAR in an relative directory with entry |
---|
262 | pathname: { |
---|
263 | namestring: "jar:file:a/b/foo.jar!/c/d/foo.lisp" |
---|
264 | device: ( |
---|
265 | directory: (:RELATIVE "a" "b") |
---|
266 | name: "foo" |
---|
267 | type: "jar" |
---|
268 | ) |
---|
269 | directory: (:ABSOLUTE "c" "d") |
---|
270 | name: "foo" |
---|
271 | type: "lisp |
---|
272 | } |
---|
273 | |
---|
274 | |
---|
275 | URI Encoding |
---|
276 | ------------ |
---|
277 | |
---|
278 | As a subtype of URL-PATHNAMES, JAR-PATHNAMES follow all the rules for |
---|
279 | that type. Most notably this means that all #\Space characters should |
---|
280 | be encoded as '%20' when dealing with jar entries. |
---|
281 | |
---|
282 | |
---|
283 | History |
---|
284 | ------- |
---|
285 | |
---|
286 | Previously, ABCL did have some support for jar pathnames. This support |
---|
287 | used the convention that the if the device field was itself a |
---|
288 | pathname, the device pathname contained the location of the jar. |
---|
289 | |
---|
290 | In the analysis of the desire to treat jar pathnames as valid |
---|
291 | locations for `LOAD`, we determined that we needed a "double" pathname |
---|
292 | so we could refer to the components of a packed FASL in jar. At first |
---|
293 | we thought we could support such a syntax by having the device |
---|
294 | pathname's device refer to the inner jar. But with in this use of |
---|
295 | `PATHNAME`s linked by the `DEVICE` field, we found the problem that UNC |
---|
296 | path support uses the `DEVICE` field so JARs located on UNC mounts can't |
---|
297 | be referenced. via '\\', i.e. |
---|
298 | |
---|
299 | jar:jar:file:\\server\share\a\b\foo.jar!/this\that!/foo.java |
---|
300 | |
---|
301 | would not have a valid representation. |
---|
302 | |
---|
303 | So instead of having `DEVICE` point to a `PATHNAME`, we decided that the |
---|
304 | `DEVICE` shall be a list of `PATHNAME`, so we would have: |
---|
305 | |
---|
306 | pathname: { |
---|
307 | namestring: "jar:jar:file:\\server\share\foo.jar!/foo.abcl!/" |
---|
308 | device: ( |
---|
309 | pathname: { |
---|
310 | host: "server" |
---|
311 | device: "share" |
---|
312 | name: "foo" |
---|
313 | type: "jar" |
---|
314 | } |
---|
315 | pathname: { |
---|
316 | name: "foo" |
---|
317 | type: "abcl" |
---|
318 | } |
---|
319 | ) |
---|
320 | } |
---|
321 | |
---|
322 | Although there is a fair amount of special logic inside `Pathname.java` |
---|
323 | itself in the resulting implementation, the logic in `Load.java` seems |
---|
324 | to have been considerably simplified. |
---|
325 | |
---|
326 | When we implemented URL Pathnames, the special syntax for URL as an |
---|
327 | abstract string in the first position of the device list was naturally |
---|
328 | replaced with a URL pathname. |
---|
329 | |
---|
330 | \end{verbatim} |
---|
331 | \begin{verbatim} |
---|
332 | |
---|
333 | |
---|
334 | |
---|
335 | URL Pathnames ABCL |
---|
336 | ================== |
---|
337 | |
---|
338 | Mark Evenson |
---|
339 | Created: 25 MAR 2010 |
---|
340 | Modified: 21 JUN 2011 |
---|
341 | |
---|
342 | Notes towards an implementation of URL references to be contained in |
---|
343 | Common Lisp `PATHNAME` objects within ABCL. |
---|
344 | |
---|
345 | |
---|
346 | References |
---|
347 | ---------- |
---|
348 | |
---|
349 | RFC3986 Uniform Resource Identifier (URI): Generic Syntax |
---|
350 | |
---|
351 | |
---|
352 | URL vs URI |
---|
353 | ---------- |
---|
354 | |
---|
355 | We use the term URL as shorthand in describing the URL Pathnames, even |
---|
356 | though the corresponding encoding is more akin to a URI as described |
---|
357 | in RFC3986. |
---|
358 | |
---|
359 | |
---|
360 | Goals |
---|
361 | ----- |
---|
362 | |
---|
363 | 1. Use Common Lisp pathnames to refer to representations referenced |
---|
364 | by a URL. |
---|
365 | |
---|
366 | 2. The URL schemes supported shall include at least "http", and those |
---|
367 | enabled by the URLStreamHandler extension mechanism. |
---|
368 | |
---|
369 | 3. Use URL schemes that are understood by the java.net.URL object. |
---|
370 | |
---|
371 | Example of a Pathname specified by URL: |
---|
372 | |
---|
373 | #p"http://example.org/org/armedbear/systems/pgp.asd" |
---|
374 | |
---|
375 | 4. MERGE-PATHNAMES |
---|
376 | |
---|
377 | (merge-pathnames "url.asd" |
---|
378 | "http://example/org/armedbear/systems/pgp.asd") |
---|
379 | ==> "http://example/org/armedbear/systems/url.asd" |
---|
380 | |
---|
381 | 5. PROBE-FILE returning the state of URL accesibility. |
---|
382 | |
---|
383 | 6. TRUENAME "aliased" to PROBE-FILE signalling an error if the URL is |
---|
384 | not accessible (see "Non-goal 1"). |
---|
385 | |
---|
386 | 7. DIRECTORY works for non-wildcards. |
---|
387 | |
---|
388 | 8. URL pathname work as a valid argument for OPEN with :DIRECTION :INPUT. |
---|
389 | |
---|
390 | 9. Enable the loading of ASDF2 systems referenced by a URL pathname. |
---|
391 | |
---|
392 | 10. Pathnames constructed with the "file" scheme |
---|
393 | (i.e. #p"file:/this/file") need to be properly URI encoded according |
---|
394 | to RFC3986 or otherwise will signal FILE-ERROR. |
---|
395 | |
---|
396 | 11. The "file" scheme will continue to be represented by an |
---|
397 | "ordinary" Pathname. Thus, after construction of a URL Pathname with |
---|
398 | the "file" scheme, the namestring of the resulting PATHNAME will no |
---|
399 | longer contain the "file:" prefix. |
---|
400 | |
---|
401 | 12. The "jar" scheme will continue to be represented by a jar |
---|
402 | Pathname. |
---|
403 | |
---|
404 | |
---|
405 | Non-goals |
---|
406 | --------- |
---|
407 | |
---|
408 | 1. We will not implement canonicalization of URL schemas (such as |
---|
409 | following "http" redirects). |
---|
410 | |
---|
411 | 2. DIRECTORY will not work for URL pathnames containing wildcards. |
---|
412 | |
---|
413 | |
---|
414 | Implementation |
---|
415 | -------------- |
---|
416 | |
---|
417 | A PATHNAME refering to a resource referenced by a URL is known as a |
---|
418 | URL PATHNAME. |
---|
419 | |
---|
420 | A URL PATHNAME always has a HOST component which is a proper list. |
---|
421 | This list will be an property list (plist). The property list |
---|
422 | values must be character strings. |
---|
423 | |
---|
424 | :SCHEME |
---|
425 | Scheme of URI ("http", "ftp", "bundle", etc.) |
---|
426 | :AUTHORITY |
---|
427 | Valid authority according to the URI scheme. For "http" this |
---|
428 | could be "example.org:8080". |
---|
429 | :QUERY |
---|
430 | The query of the URI |
---|
431 | :FRAGMENT |
---|
432 | The fragment portion of the URI |
---|
433 | |
---|
434 | The DIRECTORY, NAME and TYPE fields of the PATHNAME are used to form |
---|
435 | the URI `path` according to the conventions of the UNIX filesystem |
---|
436 | (i.e. '/' is the directory separator). In a sense the HOST contains |
---|
437 | the base URL, to which the `path` is a relative URL (although this |
---|
438 | abstraction is violated somwhat by the storing of the QUERY and |
---|
439 | FRAGMENT portions of the URI in the HOST component). |
---|
440 | |
---|
441 | For the purposes of PATHNAME-MATCH-P, two URL pathnames may be said to |
---|
442 | match if their HOST compoments are EQUAL, and all other components are |
---|
443 | considered to match according to the existing rules for Pathnames. |
---|
444 | |
---|
445 | A URL pathname must have a DEVICE whose value is NIL. |
---|
446 | |
---|
447 | Upon creation, the presence of ".." and "." components in the |
---|
448 | DIRECTORY are removed. The DIRECTORY component, if present, is always |
---|
449 | absolute. |
---|
450 | |
---|
451 | The namestring of a URL pathname shall be formed by the usual |
---|
452 | conventions of a URL. |
---|
453 | |
---|
454 | A URL Pathname has type URL-PATHNAME, derived from PATHNAME. |
---|
455 | |
---|
456 | |
---|
457 | URI Encoding |
---|
458 | ------------ |
---|
459 | |
---|
460 | For dealing with URI Encoding (also known as [Percent Encoding]() we |
---|
461 | adopt the following rules |
---|
462 | |
---|
463 | [Percent Encoding]: http://en.wikipedia.org/wiki/Percent-encoding |
---|
464 | |
---|
465 | 1. All pathname components are represented "as is" without escaping. |
---|
466 | |
---|
467 | 2. Namestrings are suitably escaped if the Pathname is a URL-PATHNAME |
---|
468 | or a JAR-PATHNAME. |
---|
469 | |
---|
470 | 3. Namestrings should all "round-trip": |
---|
471 | |
---|
472 | (when (typep p 'pathname) |
---|
473 | (equal (namestring p) |
---|
474 | (namestring (pathname p)))) |
---|
475 | |
---|
476 | |
---|
477 | Status |
---|
478 | ------ |
---|
479 | |
---|
480 | This design has been implemented. |
---|
481 | |
---|
482 | |
---|
483 | History |
---|
484 | ------- |
---|
485 | |
---|
486 | 26 NOV 2010 Changed implemenation to use URI encodings for the "file" |
---|
487 | schemes including those nested with the "jar" scheme by like |
---|
488 | aka. "jar:file:/location/of/some.jar!/". |
---|
489 | |
---|
490 | 21 JUN 2011 Fixed implementation to properly handle URI encodings |
---|
491 | refering nested jar archive. |
---|
492 | |
---|
493 | \end{verbatim} |
---|