pushed
zhengqunkoo/tweet-prediction • 6:36 AM - Jun 8, 2017
We updated the metadata preprocessing pipeline to only include English data. By integrating the langdetect library, the generator now evaluates the expected output string and ensures it is classified as English before yielding it. This simple filter helps maintain higher data quality by automatically discarding multi-language artifacts or unsupported test cases from the dataset.
