Decompress zlib RFC-1950 in APL2
[page last modified 2023-04-04]


This is APL2 code to decompress (inflate) a block of RFC-1950 (zlib) data. It was originally written 3/13/2023 in Micro Apl APLX for Windows version 5.1.0. As of 3/29/2023 it also runs in Dyalog Windows version 18.2.45405.0 64 Unicode.

The "de" decompression function must be provided the entire compressed block at once as a byte data type vector (eg ⎕dr 4 80 83); it returns the decompressed result as data of the same type. If there is any decompression error, "de" returns a scalar negative 1.

⎕dr is used: (1) to convert byte data to binary (11 ⎕dr x); (2) to convert decoded uncompressed bytes from binary to the output byte data type.

The "deh" function is used by "de" to build Huffman decoding tables.

The exact number of RFC-1950 bytes must be supplied or an error will occur (this means the adler32 sum must be the last 4 bytes). The decompressor checks that the last compressed bit is read from the byte preceding the adler32 sum -- this check can be disabled by modifying the "ad:" line.

By default the adler32 sum itself is NOT verified, since it makes the decompression 10x slower. The adler32 check may be enabled by removing the ",0" in the "ad:" line.

Note: there is extra logic to enlarge the output buffer in large chunks. Originally raw bytes and copies were simply concatenated to existing output; however, this was very slow (30x slower).

The ⎕wa needed to run will be approx. 16meg + 2× size of the decompressed output.



 r←de b;c;d;dt;e;f;ft;g;h;i;l;lt;n;o;p;t;u;x;⎕IO
⍝decompress RFC-1950. paulhoule.com 3/29/2023
 o←x←ft←⎕IO←0 ⋄ t←⎕dr r←0⍴b ⋄ i←¯16+0⊃⍴b←11 ⎕dr⌽b
 u←t ⎕dr,1=⍉(8⍴2)⊤⍳256 ⋄ lt←256 1 8 21/rw bs cp cx
 →(c[2 8]∨(8≠2⊥¯4↑c)∨×31|2⊥8⌽c←¯16↑b)/er
 lt←lt,¯286 2↑(258⌊2++\2*c),[0.5]1⌽c←¯29↑4/⍳6
 dt←(+\2*0,¯1↓c),[0.5]c←2/0,⍳14
ex:p←¯258+0⊃⍴r←(4096+⍴r)↑r
bs:→x⍴ad ⋄ →(2⊥2↑(c d x)←b[(⍳3)+i←i-3])⊃nc fh dh er
nc:→(≠/c←2⊥⍉1 0≠[0]2 16⍴b[(⍳32)+i←i-32+8|i])⍴er
 o←0⊃⍴r←(o⍴r),⌽t ⎕dr b[(⍳c)+i←i-c←8×0⊃c] ⋄ →ex
fh:→(0≢(e g d f)←ft)⍴fa ⋄ c←144 112 24 8/8 9 7 8
 ft←(1+⍳¨9 5),(⊂deh c),⊂(⍳32),[0.5]5 ⋄ →fh
dh:→(286<n←257 1 4+⌽0 32 32⊤2⊥b[(⍳14)+i←i-14])/er
 e←16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15
 f←0⍴d←(19↑⌽2⊥⍉d 3⍴b[(⍳c)+i←i-c←3×d←2⊃n])[⍋e]
 e←1+⍳⌈/d ⋄ →(¯1≡d←deh d)⍴er ⋄ (n g)←+\2⍴n
hl:i←i-1⊃c←d[2⊥b[i-e];] ⋄ →(16>c←0⊃c)⍴h5 ⋄ c←c-16
 c←((2⊥b[(⍳c)+i←i-c←c⊃2 3 7])+c⊃3 3 11)⍴1↑c↓¯1↑f
h5:→(g>⍴f←f,c)⍴hl ⋄ →(g≠⍴f)⍴er ⋄ f←(n⍴f)(n↓f)
 ⍎(2>c←+/1⊃f)/'f[1]←⊂(1⊃f),c↓1 1 ⍝very rare'
 (e g)←1+⍳¨⌈/¨f ⋄ →(¯1∊(d f)←deh¨f)⊃fa er
rw:r[o]←u[c] ⋄ →(p≥o←o+1)⍴fa
ov:r←(258+p←p+16777216⌊p)↑r ⋄ →fa
cx:l←l+2⊥b[(⍳n)+i←i-n]
cp:(c n)←f[2⊥b[i-g];]
 i←i-n ⋄ (c n)←dt[c;] ⋄ c←c+2⊥b[(⍳n)+i←i-n]
 r[o+⍳l]←l⍴r[(o-c)+⍳c⌊l] ⋄ →(p<o←o+l)⍴ov
fa:(c n)←d[2⊥b[i-e];] ⋄ i←i-n ⋄ →(h l n)←lt[c;]
ad:r←o⍴r ⋄ →(i∊32+⍳8)↓er,0 ⋄ c←1,i←0 ⋄ →o↓ax
an:c←65521|+\c+2↑2⊥11 ⎕dr i⊃r ⋄ →(o≠i←i+1)⍴an
ax:→er×(2⊥,⊖4 8⍴b)≠65536⊥⌽c
er:r←¯1

 r←deh b;c;d;e;f;i;l;n;⎕IO
⍝r←(n,2) huffman decoding table, or ¯1 if error
⍝b: bit length (0..15) per code; 0= code unused
 l←b~c←⎕IO←0 ⋄ r←¯1 ⋄ →⍳(1⌈⍴l)≠+/f←+⌿l∘.=1+⍳15
 →⍳(f+.×15↑2*⌽⍳l)≠d←2*l←⌈/l ⋄ r←d 2⍴0 ⋄ d←(×f)/⍳⍴f
nx:e←2*l-i←1+n←0⊃d ⋄ c←2⊥i↑c ⋄ n←n⊃f
 r[(⍳e)∘.+e×c+⍳n;]←e n 2⍴((b=i)/⍳⍴b),[0.5]i
 c←(i⍴2)⊤c+n ⋄ →(⍴d←1↓d)⍴nx


Widget is loading comments...

You are visitor 1409       Go to Home Page