Commit Graph

15 Commits

Author SHA1 Message Date
Elf M. Sternberg 3978ac2928 Merge remote-tracking branch 'refs/remotes/origin/master'
* refs/remotes/origin/master:
  Added a TODO.
  Added documentation to the Squozen README describing the decompressor algorithm.
2022-12-29 11:38:22 -08:00
Elf M. Sternberg 9f0158be7e Some documentation. 2022-12-29 11:38:13 -08:00
Elf M. Sternberg 73bca38bb9 Added a TODO. 2022-11-30 08:27:49 -08:00
Elf M. Sternberg c7ba092293 Added documentation to the Squozen README describing the decompressor algorithm. 2022-11-30 08:20:04 -08:00
Elf M. Sternberg e624aea369 Adding an Errors block. 2022-11-27 10:37:44 -08:00
Elf M. Sternberg a01fcbee68 Still not working. 2022-11-26 17:59:57 -08:00
Elf M. Sternberg 550d4c1876 Intermediate progress: Squozen
I realized that the C version of this thing does multiple things
in the same function: it loads the bigrams, it iterates through
the database, and it compares the things found in the database
to the prepared pattern.  It seems to me, therefore, that we're
better off with an instance that loads the bigrams, then closes
the database immediately.

Later, the client can ask for one of two iterators: one that either
returns each entry in sequence, or one that returns each entry in sequence
that matches the pattern passed in.
2022-11-26 16:49:25 -08:00
Elf M. Sternberg d13a76f08a Make prepare_pattern more Rust-like.
This just removes the layer between `prepare_pattern` and
`prepare_pattern_raw`; the function now always returns the
allocated vector.

Oddly, it wasn't possible to encode this using an Option<> in the
`hunt()` function.  The `usize` of the scan variable meant that we'd
never go below zero legally (and Rust wouldn't let that happen), so
the "if we're at zero we have some special cases to check" had to
remain here.  The C version of this code could say "If this pointer
is below the allocated space" which is, to a Rust developer, hella
weird (you're literally pointed at memory you don't own!).

And despite the allocation, despite the special case checks, this code
is *still* twice as fast as its C implementation.
2022-11-24 12:56:13 -08:00
Elf M. Sternberg a6d4fda582 Minor re-arrangement for compatibiility. 2022-11-13 12:40:39 -08:00
Elf M. Sternberg 40811151ab The C and Rust versions are now comparable.
The C and Rust versions are now comparable, with a memory-reuse and
a memory-safe version for Rust.  The memory-safe version is five times
faster than the C version; the memory-reuse version (technically safe,
but can panic under some very rare circumstances) is ten times faster.

I suspect the reasons for he speedup are strictly in the `for()` loop
in the C version for copying the string, where the Rust version probably
uses memcpy() under the covers to transfer the short string into the
destination.
2022-11-13 12:33:34 -08:00
Elf M. Sternberg 2eab17934c Moved everything around so it's more project-y
Added the squozen patprep function, added unit tests to the
patprep `c` code, and ensured that the rust version works the
same way.  The only remaining code slowdown is that re-allocating
the Vec 50 million times turns out to be slower than re-using the
same slice of RAM over and over and over.
2022-11-11 08:31:22 -08:00
Elf M. Sternberg bf2b2715d4 Just pushing stuff around. Note: It was _dictd_ that has the nifty GZIP-with-indexes tryck. 2021-07-06 18:09:30 -07:00
Elf M. Sternberg 965df106d9 Separation of concerns. We now have a trait for identifying
a database type, and an implementation for mLocate.  That's a
start.
2021-07-06 18:03:30 -07:00
Elf M. Sternberg 3f7ae7bd8b Add a README and LICENSE file. 2021-06-24 11:59:47 -07:00
Elf M. Sternberg 76a905da4e Successfully read the mlocatedb header.
This commit shows how to read the mlocatedb header,
with a test to assert that the read is correct.
2021-06-23 19:00:56 -07:00