Bibliography (40):

  1. Language Identification

  2. https://arxiv.org/pdf/2010.14571.pdf#page=16&org=google

  3. https://inria.hal.science/hal-02148693/file/Asynchronous_Pipeline_for_Processing_Huge_Corpora_on_Medium_to_Low_Resource_Infrastructures.pdf

  4. https://archive.org/details/psychobiologyofl0000zipf

  5. https://slate.com/culture/2020/05/facebook-ants-roleplay-coronavirus-biologist-interview.html

  6. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

  7. Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

  8. Wikipedia Bibliography:

    1. https://en.wikipedia.org/wiki/Language_identification  :

    2. https://en.wikipedia.org/wiki/F-score  :

    3. https://en.wikipedia.org/wiki/Varhadi_dialect  :

    4. https://en.wikipedia.org/wiki/Aymara_language  :

    5. https://en.wikipedia.org/wiki/Turkmen_language  :

    6. Unicode

    7. https://en.wikipedia.org/wiki/Zaza_language  :

    8. https://en.wikipedia.org/wiki/Dogri_language  :

    9. Zalgo text  :

    10. https://en.wikipedia.org/wiki/Fula_language  :

    11. https://en.wikipedia.org/wiki/Ilocano_language  :

    12. https://en.wikipedia.org/wiki/Zhuang_languages  :

    13. https://en.wikipedia.org/wiki/Soft_hyphen#Encodings_and_definitions  :

    14. https://en.wikipedia.org/wiki/Cherokee_syllabary  :

    15. https://en.wikipedia.org/wiki/Balinese_script  :

    16. https://en.wikipedia.org/wiki/Maldivian_language  :

    17. N-gram

    18. Zipf’s law

    19. Power law

    20. https://en.wikipedia.org/wiki/Oromo_language  :

    21. https://en.wikipedia.org/wiki/Chechen_language  :

    22. https://en.wikipedia.org/wiki/Lambadi  :

    23. https://en.wikipedia.org/wiki/Santali_language  :

    24. https://en.wikipedia.org/wiki/Northern_Sotho  :

    25. https://en.wikipedia.org/wiki/Nigerian_Pidgin  :

    26. https://en.wikipedia.org/wiki/Uyghur_language  :

    27. https://en.wikipedia.org/wiki/Kazakh_language  :

    28. https://en.wikipedia.org/wiki/Kyrgyz_language  :

    29. https://en.wikipedia.org/wiki/Arabic_script  :

    30. Cyrillic script

    31. https://en.wikipedia.org/wiki/Wikipedia  :

    32. Latent and observable variables

    33. https://en.wikipedia.org/wiki/Palochka#Computing_codes  :