We know empirically that there exists a mapping from an 'incompressible' vector space such that a source message whose length is L1 maps to a compressible target message whose length is L2 via a vector whose length is L3. We know empirically that there exist mappings whereby (L4+L3) < L1. Furthermore, we know that there are mappings where (L4+L3) become arbitrarily small.
It easy to show that it is impossible to compress all streams of size N bits to N-1 bits:
Assume that the program can compress without loss all files of size >= N bits. Compress with this program all the 2^N files which have exactly N bits. All compressed files have at most N-1 bits, so there are at most (2^N)-1 different compressed files [2^(N-1) files of size N-1, 2^(N-2) of size N-2, and so on, down to 1 file of size 0]. So at least two different input files must compress to the same output file. Hence the compression program cannot be lossless. (Much stronger results about the number of incompressible files can be obtained, but the proofs are a little more complex.)
The argument above shows that it is not possible to find a transformation certain to compress any arbitrary stream. However, the task is not to compress arbitrary streams. The task is to compress 'streams of interest'. The streams of interest in this case are any data that might likely form a meaningful message.
Our hypothesis is that their exists a 'reachable' vector space whereby incompressible data streams of interest map to compressible data streams.
If we were to determine that there were 'only' 21024 (a very large number) streams of interest and found a suitable algorithm, all streams could be tokenized to 128 bytes in length, regardless of the length of the source. This theoretical limit is not likely to be reachable, but we speculate that by essentially building robots to explore various spaces, we may be able to find suitable spaces which result in gains over current methods of compression.
At the current time, no technology exists which is capable of finding and exploiting these spaces and indeed it is not even known if they exist in such a way that they lend themselves to a general data compression tool.
Our goal is to create a technology which will allow us to find and exploit vector spaces which yield meaningful compression. Our approach is a technological/empirical one. By building tools to explore the results of transformations we accomplish two things. First, we leverage our chances of stumbling upon promising vector spaces by automating searches using compute time. Second, should other researchers discover appropriate transformations, or the constraints of current technology change, we will have tools available which are ready to exploit the breakthroughs.
In the more than 40 years since the Huffman algorithm was discovered there has been tremendous change in computer science, particularly hardware and software. However, current general compression is little better than it was back then. We believe this is testimony to the difficulty and hence the uncertainty of solving the problem.