Skip to content

Commit b1f43ce

Browse files
committed
Add polish data
1 parent c87231d commit b1f43ce

File tree

3 files changed

+125144
-0
lines changed

3 files changed

+125144
-0
lines changed

polish/COPYING

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
The word list for Polish was downloaded from the Wortschatz Leipzig website:
2+
3+
https://wortschatz.uni-leipzig.de/en/download/Polish
4+
5+
I took the News 2024 100K archive as the basis and cleaned it up manually
6+
(mostly by removing words with non-Polish characters):
7+
8+
https://downloads.wortschatz-leipzig.de/corpora/pol_news_2024_30K.tar.gz
9+
10+
According to the following page, the corpora are licensed under CC BY 4.0:
11+
12+
https://wortschatz-leipzig.de/en/usage

0 commit comments

Comments
 (0)