Scws

Jul 20, 2023

Simple Chinese word segmentation program and lib

SCWS Simple Chinese Word Segmentation is a frequency dictionary based Chinese word segmentation engine, it can cut a whole section of the Chinese text into words. Word is the smallest unit of morpheme in Chinese, but in Chinese words are not separated by spaces,so word segmentation is an important step for Chinese language process.SCWS is written in C without other dependencies and accept GBK and UTF-8 encoding for both the Simple Chinese zh_CN and the Traditional Chinese such as zh_TW.



Checkout these related ports:
  • Zxing-cpp - ZXing C++ Library for QR code recognition
  • Zu-hunspell - Zulu hunspell dictionaries
  • Zu-aspell - Aspell Zulu dictionary
  • Zq - Easier and faster alternative to jq
  • Zorba - General purpose C++ XQuery processor
  • Zenxml - Simple C++ XML Processing
  • Zed - Command-line tool to manage and query Zed data lakes
  • Yq - Command-line YAML and XML processor, jq wrapper for YAML/XML documents
  • Yould - Pronounceable word generator
  • Yodl - Easy to use but powerful document formatting/preparation language
  • Yi-hunspell - Yiddish hunspell dictionaries
  • Yi-aspell - Aspell Yiddish dictionary
  • Yelp-xsl - DocBook XSLT stylesheets for yelp
  • Yelp-tools - Utilities to help manage documentation for Yelp and the web
  • Ydiff - Diff readability enhancer for color terminals