Braumeister-Stefan/lina_database_decoder

lina_database_decoder

0 subscribers

Python

Created Apr 2026

Live activities

pushed

Filter out short sequences to improve decipherment reliability

Braumeister-Stefan/lina_database_decoder • 22 days • view on GitHub

Added a preprocessing step in the decoder pipeline to discard any database rows containing fewer than two sign groups. By filtering out these inherently less informative or noisy sequences, the frequency analysis and random-init decipherment models can focus on higher-quality, richer data, reducing irrelevant processing. Data cleaning time

pushed

Implemented best-of-N hill-climbing for the 'random-init' decoding strategy.

Braumeister-Stefan/lina_database_decoder • 22 days • view on GitHub

Added a new iterative search mechanism to the random-initialization decoding strategy, which now runs $N$ consecutive random cypher generations and retains only the one producing the highest internal semantic consistency score (measured via WordNet). I also introduced a centralized summary output (outputs/strategy_summary.csv) to track and rank the performance of all active strategies. This upgrade dramatically improves the quality of random-init outputs and streamlines benchmarking across different approaches.

pushed

Boost performance by caching per-token WordNet synsets

Braumeister-Stefan/lina_database_decoder • 22 days • view on GitHub

Optimized the semantic consistency scoring process by caching synset lookups on a per-token basis. Previously, the meaning_scorer re-queried WordNet synsets for every pairwise comparison, leading to redundant overhead. By shifting to a synset_map, we significantly reduce computation time during translation analysis. Meaning scores are slow?

pushed

Refactored meaning_scorer to use WordNet-based semantic consistency

Braumeister-Stefan/lina_database_decoder • 22 days • view on GitHub

We have replaced the dependency on large semantic embedding models for the meaning scorer with a more lightweight, deterministic approach using WordNet's Wu-Palmer similarity metric. This refactor improves maintainability by removing the need for heavy external ML libraries like sentence-transformers, while maintaining a robust way to evaluate semantic consistency between translated tokens. Codebase refactor

pushed

Refine suffix stripping logic in meaning_scorer

Braumeister-Stefan/lina_database_decoder • 22 days • view on GitHub

Improved the suffix-stripping mechanism in the meaning_scorer's linguistic fallback logic. By reordering the suffix list to prioritize longer matches and simplifying the stripping process, domain vocabulary matching is now more accurate and robust. This improves the quality of meaning assessments when the primary transformer model is unavailable.

pushed

Introduce semantic coherence scoring for deciphered tablets

Braumeister-Stefan/lina_database_decoder • 22 days • view on GitHub

Added a new meaning_scorer subcomponent to evaluate the quality of transcriptions. It utilizes sentence-transformers for embedding-based semantic coherence and domain relevance, with a robust offline fallback using vocabulary overlap and heuristics. These scores are now integrated to provide better diagnostic insights into the quality of generated decipherments. Semantic quality check

pushed

Implemented automated online English word pool generation

Braumeister-Stefan/lina_database_decoder • 22 days • view on GitHub

Added a new utility that fetches, cleans, and samples English words from online dictionary repositories to populate the project's word pool. This replaces the hardcoded CSV dependency with a refreshable, automated process for better scalability and data consistency. Automated word generation

pushed

Refactor decoder outputs and implement random initialisation strategy

Braumeister-Stefan/lina_database_decoder • 22 days • view on GitHub

This update transitions the decoder to rely exclusively on CSV files for outputs, replacing JSON-based formats to improve consistency. A new random_init strategy has been added, which uses a pre-defined word pool to generate random sign-to-word mappings for testing. These changes streamline the decipherment workflow and provide a clearer structure for evaluating different translation strategies. Trust me, I'm a developer

- End of feed -