Technology Search |
---|
Your Cart | ||
---|---|---|
|
Customers | ||||||||
---|---|---|---|---|---|---|---|---|
|
Home Software Natural Language Processing MADA (V.3.1)
MADA (V.3.1) |
|
Price: FREE |
|
Much work has been done on addressing different specific natural language processing tasks for Arabic, such as tokenization, diacritization, morphological disambiguation, part-of-speech (POS) tagging, stemming and lemmatization. The MADA system (currently V 3.1) along with TOKAN provide one solution to all of these different problems. Our approach distinguishes between the problems of morphological analysis (what are the different readings of a word out-of-context) and morphological disambiguation (what is the correct reading in a specific context). Once a morphological analysis is chosen in context, we can determine its full POS tag, lemma and diacritization. Morphological analysis and disambiguation are handled in the MADA component of our system. Knowing the morphological analysis also allows us to tokenize and stem deterministically. Since there are many different ways to tokenize Arabic (tokenization is a convention adopted by researchers), the TOKAN component allows the user to specify any tokenization scheme that can be generated from disambiguated analyses. Please visit the MADA+TOKAN home page for more information. If you would like to receive email notifications when new releases and patches of MADA+TOKAN are available, join our mailing list by visiting the MADA users mailing list page and following the subscription instructions found there. |
|
|
|
|
|
|