libunibreak 6.1
Loading...
Searching...
No Matches
Macros | Functions
wordbreak.c File Reference

Implementation of the word breaking algorithm as described in Unicode Standard Annex 29. More...

#include <assert.h>
#include <stddef.h>
#include <string.h>
#include "unibreakdef.h"
#include "wordbreak.h"
#include "wordbreakdata.c"
#include "emojidef.h"
Include dependency graph for wordbreak.c:

Macros

#define IS_WB3ab(cls)
 

Functions

void init_wordbreak (void)
 Initializes the wordbreak internals.
 
void set_wordbreaks_utf8 (const utf8_t *s, size_t len, const char *lang, char *brks)
 Sets the word breaking information for a UTF-8 input string.
 
void set_wordbreaks_utf16 (const utf16_t *s, size_t len, const char *lang, char *brks)
 Sets the word breaking information for a UTF-16 input string.
 
void set_wordbreaks_utf32 (const utf32_t *s, size_t len, const char *lang, char *brks)
 Sets the word breaking information for a UTF-32 input string.
 

Detailed Description

Implementation of the word breaking algorithm as described in Unicode Standard Annex 29.

Author
Tom Hacohen

Macro Definition Documentation

◆ IS_WB3ab

#define IS_WB3ab (   cls)
Value:
((cls == WBP_Newline) || (cls == WBP_CR) || \
(cls == WBP_LF))
@ WBP_Newline
Definition wordbreakdef.h:65
@ WBP_CR
Definition wordbreakdef.h:63
@ WBP_LF
Definition wordbreakdef.h:64

Function Documentation

◆ init_wordbreak()

void init_wordbreak ( void  )

Initializes the wordbreak internals.

It currently does nothing, but it may in the future.

◆ set_wordbreaks_utf16()

void set_wordbreaks_utf16 ( const utf16_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the word breaking information for a UTF-16 input string.

Parameters
[in]sinput UTF-16 string
[in]lenlength of the input
[in]langlanguage of the input (reserved for future use)
[out]brkspointer to the output breaking data, containing WORDBREAK_BREAK, WORDBREAK_NOBREAK, or WORDBREAK_INSIDEACHAR

◆ set_wordbreaks_utf32()

void set_wordbreaks_utf32 ( const utf32_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the word breaking information for a UTF-32 input string.

Parameters
[in]sinput UTF-32 string
[in]lenlength of the input
[in]langlanguage of the input (reserved for future use)
[out]brkspointer to the output breaking data, containing WORDBREAK_BREAK, WORDBREAK_NOBREAK, or WORDBREAK_INSIDEACHAR

◆ set_wordbreaks_utf8()

void set_wordbreaks_utf8 ( const utf8_t s,
size_t  len,
const char *  lang,
char *  brks 
)

Sets the word breaking information for a UTF-8 input string.

Parameters
[in]sinput UTF-8 string
[in]lenlength of the input
[in]langlanguage of the input (reserved for future use)
[out]brkspointer to the output breaking data, containing WORDBREAK_BREAK, WORDBREAK_NOBREAK, or WORDBREAK_INSIDEACHAR