[試題] 110-1 陳建錦 文字探勘初論 期末考

作者: unmolk (UJ)   2022-01-28 01:47:16
課程名稱︰文字探勘初論
課程性質︰資管系選修
課程教師︰陳建錦
開課學院:管理學院
開課系所︰資管系
考試日期(年月日)︰110.01.07
考試時限(分鐘):180
試題 :
1. (5 points) Why token normalization (e.g., stemming) is important to text
mining? (5 points) Also, what is the difference between stemming and lemmatiza-
tion?
2. TF-IDF is a classic term weighting scheme of text mining. (5 points) Explain
why only TF is not good enough to measure the weight of a term? (10 points) Sh-
ow the definition of IDF and explain how it helps discriminate important terms.
3. (5 points) Explain the role of validation data in building supervised text
mining models (e.g., classification).
4. (10 points) What are classification precision and recall? (5 points) And why
do we say precision and recall generally trade off to each other? (5 points) A-
lso, when measuring multi-class classification results, which average (micro or
macro) would make large classes dominate small classes?
5. (10 points) What is the advantage of Latent Semantic Analysis (dimension re-
duction) over the bag-of-word model when computing cosine similarity between d-
ocuments?
6. (5 points) Explain why kernal SVM would be capable of solving difficult cla-
ssification problems. (5 points) Is kernel SVM still a linear classification m-
odel, explain your answer?
7. (5 points) What is n-gram? (10 points) Also, explain why n usually is not a
big number in practice?
8. (10 points) Explain why Word2Vec is able to produce similar embeddings for
semantically similar words (e.g., synonyms)?
9. (5 points) Please explain the following code.
model.fit(partial x train, partial y trian,
epochs = 20,
bathc_size = 512,
validation_data = (x_val,y_val))

Links booklink

Contact Us: admin [ a t ] ucptt.com