Data Mining and Warehousing Question with Answer

1. What does the leaf node in decision tree indicates

A : sub tree

B : class label

C : testing node

D : condition

Q.no 2. sensitivity is also known as

A : false rate

B : recall

C : negative rate

D : recognition rate

Q.no 3. the negative tuples that were correctly labeled by the

classifier

A : False positives(FP)

B : True positives(TP)

C : True negatives (TN)

D : False negatives(FN)

Q.no 4. Removing duplicate records is a process called

A : recovery

B : data cleaning

C : data cleansing

D : data pruning

Q.no 5. For Apriori algorithm, what is the first phase?

A : Pruning

B : Partitioning

C : Candidate generation

D : Itemset generation

Q.no 6. Multi-class classification makes the assumption that each sample is

assigned to

A : one and only one label

B : many labels

C : one or many labels

D : no label

Q.no 7. Multilevel association rules can be mined efficiently using

A : Support

B : Confidence

C : Support count

D : Concept Hierarchies under support-confidence framework

Q.no 8. What is the method to interpret the results after rule generation?

A : Absolute Mean

B : Lift ratio

C : Gini Index

D : Apriori

Q.no 9. Self-training is the simplest form of

A : supervised classification

B : semi-supervised classification

C : unsupervised classification

D : regression

Q.no 10. Which of the following is direct application of frequent itemset mining?

A : Social Network Analysis

B : Market Basket Analysis

C : Outlier Detection

D : Intrusion Detection

Q.no 11. Hidden knowledge referred to

A : A set of databases from different vendors, possibly using different database

paradigms

B : An approach to a problem that is not guaranteed to work but performs well in most

cases

C : Information that is hidden in a database and that cannot be recovered by a simple

SQL query

D : None of these

Q.no 12. The schema is collection of stars. Recognize the type of schema.

A : Star Schema

B : Snowflake schema

C : Fact constellation

D : Database schema

Q.no 13. The Synonym for data mining is

A : Data warehouse

B : Knowledge discovery in database

C : ETL

D : Business Intelligemce

Q.no 14. Which of the following are methods for supervised classification?

A : Decision tree

B : K-Means

C : Hierarchical

D : Apriori

Q.no 15. These are the intermediate servers that stand in between a relational

back-end server and client front-end tools

A : ROLAP

B : MOLAP

C : HOLAP

D : HaoLap

Q.no 16. Color is an example of which type of attribute

A : Nominal

B : Binary

C : Ordinal

D : numeric

Q.no 17. What are two steps of tree pruning work?

A : Pessimistic pruning and Optimistic pruning

B : Postpruning and Prepruning

C : Cost complexity pruning and time complexity pruning

D : None of the options

Q.no 18. A data cube is defined by

A : Dimensions

B : Facts

C : Dimensions and Facts

D : Dimensions or Facts

Q.no 19. For Apriori algorithm, what is the second phase?

A : Pruning

B : Partitioning

C : Candidate generation

D : Itemset generation

Q.no 20. What is the range of the cosine similarity of the two documents?

A : Zero to One

B : Zero to infinity

C : Infinity to infinity

D : Zero to Zero

Answers

Answer for Question No 1. is b

Answer for Question No 2. is b

Answer for Question No 3. is c

Answer for Question No 4. is b

Answer for Question No 5. is c

Answer for Question No 6. is a

Answer for Question No 7. is d

Answer for Question No 8. is b

Answer for Question No 9. is b

Answer for Question No 10. is b

Answer for Question No 11. is c

Answer for Question No 12. is c

Answer for Question No 13. is b

Answer for Question No 14. is a

Answer for Question No 15. is a

Answer for Question No 16. is a

Answer for Question No 17. is b

Answer for Question No 18. is c

Answer for Question No 19. is a

Answer for Question No 20. is a

Q.no 21. Lazy learner classification approach is

A : learner waits until the last minute before constructing model to classify

B : a given training data constructs a model first and then uses it to classify

C : the network is constructed by human experts

D : None of the options

Q.no 22. Cross validation involves

A : testing the machine on all possible ways by substituting the original sample into

training set

B : testing the machine on all possible ways by dividing the original sample into

training and validation sets.

C : testing the machine with only validation sets

D : testing the machine on only testing datasets.

Q.no 23. The rule is considered as intersting if

A : They satisfy both minimum support and minimum confidence threshold

B : They satisfy both maximum support and maximum confidence threshold

C : They satisfy maximum support and minimum confidence threshold

D : They satisfy minimum support and maximum confidence threshold

Q.no 24. Data independence means

A : Data is defined separately and not included in programs

B : Programs are not dependent on the physical attributes of the data

C : Programs are not dependent on the logiical attributes of the data

D : Programs are not dependent on the physical attributes as well as logical attributes

of the data

Q.no 25. Which of the following is a predictive model?

A : Clustering

B : Regression

C : Summarization

D : Association rules

Q.no 26. The data cubes are generally

A : 1 Dimensional

B : 2 Dimensional

C : 3 Dimensional

D : n-Dimensional

Q.no 27. Identify the example of sequence data

A : weather forecast

B : data matrix

C : market basket data

D : genomic data

Q.no 28. The frequent-item-header-table consists of number fields

A : Only one

B : Two

C : Three

D : Four

Q.no 29. How are metarules useful in mining of association rules?

A : Allow users to specify threshold measures

B : Allow users to specify task relevant data

C : Allow users to specify the syntactic forms of rules

D : Allow users to specify correlation or association

Q.no 30. Which of the following activities is a data mining task?

A : Monitoring the heart rate of a patient for abnormalities

B : Extracting the frequencies of a sound wave

C : Predicting the outcomes of tossing a (fair) pair of dice

D : Dividing the customers of a company according to their profitability

Q.no 31. When do you consider an association rule interesting?

A : If it only satisfies minimum support

B : If it only satisfies minimum confidence

C : If it satisfies both minimum support and minimum confidence

D : There are other measures to check interesting rules

Q.no 32. What is the approach of basic algorithm for decision tree induction?

A : Greedy

B : Top Down

C : Procedural

D : Step by Step

Q.no 33. What do you mean by support(A)?

A : Total number of transactions containing A

B : Total Number of transactions not containing A

C : Number of transactions containing A / Total number of transactions

D : Number of transactions not containing A / Total number of transactions

Q.no 34. Which of the following probabilities are used in the Bayes theorem.

A : P(Ci|X)

B : P(Ci)

C : P(X|Ci)

D : P(X)

Q.no 35. In which step of Knowledge Discovery, multiple data sources are

combined?

A : Data Cleaning

B : Data Integration

C : Data Selection

D : Data Transformation

Q.no 36. The Galaxy Schema is also called as

A : Star Schema

B : Snowflake schema

C : Fact constellation

D : Database schema

Q.no 37. Handwritten digit recognition classifying an image of a handwritten

number into a digit from 0 to 9 is example of

A : Multiclassification

B : Multi-label classification

C : Imbalanced classification

D : Binary Classification

Q.no 38. What type of data do you need for a chi-square test?

A : Categorical

B : Ordinal

C : Interval

D : Scales

Q.no 39. For a classification problem with highly imbalanced class. The majority

class is observed 99% of times in the training data.

Your model has 99% accuracy after taking the predictions on test data. Which of

the following is not true in such a case?

A : Imbalaced problems should not be measured using Accuracy metric.

B : Accuracy metric is not a good idea for imbalanced class problems.

C : Precision and recall metrics aren’t good for imbalanced class problems.

D : Precision and recall metrics are good for imbalanced class problems.

Q.no 40. Which of the following property typically does not hold for similarity

measures between two objects ?

A : Symmetry

B : Definiteness

C : Triangle inequality

D : Transitive

Answers

Answer for Question No 21. is a

Answer for Question No 22. is c

Answer for Question No 23. is a

Answer for Question No 24. is d

Answer for Question No 25. is b

Answer for Question No 26. is d

Answer for Question No 27. is d

Answer for Question No 28. is b

Answer for Question No 29. is c

Answer for Question No 30. is a

Answer for Question No 31. is c

Answer for Question No 32. is a

Answer for Question No 33. is c

Answer for Question No 34. is a

Answer for Question No 35. is b

Answer for Question No 36. is c

Answer for Question No 37. is a

Answer for Question No 38. is a

Answer for Question No 39. is c

Answer for Question No 40. is c

Q.no 41. Cost complexity pruning algorithm is used in?

A : CART

B : C4.5

C : ID3

D : ALL

Q.no 42. In one of the frequent itemset example, it is observed that if tea and milk

are bought then sugar is also purchased by customers. After, generating an

association rule among the given set of items, it is inferred:

A : {Tea} is antecedent and {sugar} is consequent

B : {Tea} is antecedent and the itemset {milk, sugar} is consequent

C : The itemset {Tea, milk} is consequent and {sugar} is antecedent

D : The itemset { Tea, milk} is antecedent and {sugar} is consequent

Q.no 43. When do we use Manhattan distance in data mining?

A : Dimension of the data decreases

B : Dimension of the data increases

C : Underfitting

D : Moderate size of the dimensions

Q.no 44. Ordinal attribute has three distinct values such as Fair, Good, and

Excellent.

If x and y are two objects of ordinal attribute with Fair and Good values

respectively, then what is the distance from object y to x?

A : 1

B : 0

C : 0.5

D : 0.75

Q.no 45. Which of the following operation is requird to calculate cosine

similarity?

A : Vector dot product

B : Exponent

C : Modulus

D : Percentage

Q.no 46. Which is the most well known association rule algorithm and is used in

most commercial products.

A : Apriori algorithm

B : Pincer-search algorithm

C : Distributed algorithm

D : Partition algorithm

Q.no 47. What is the another name of Supremum distance?

A : Wighted Euclidean distance

B : City Block

distance

C : Chebyshev distance

D : Euclidean distance

Q.no 48. a model predicts 50 examples belonging to the minority class, 45 of which

are true positives and five of which are false positives. Precision of model is

A : Precision= 0.90

B : Precision= 0.79

C : Precision= 0.45

D : Precision= 0.68

Q.no 49. How the bayesian network can be used to answer any query?

A : Full distribution

B : Joint distribution

C : Partial distribution

D : All of the mentioned

Q.no 50. A sub-database which consists of set of prefix paths in the FP-tree co

occuring with the sufix pattern is called as

A : Suffix path

B : FP-tree

C : Prefix path

D : Condition pattern base

Q.no 51. Which of the following sentence is FALSE regarding regression?

A : It relates inputs to outputs.

B : It is used for prediction.

C : It may be used for interpretation.

D : It discovers causal relationships.

Q.no 52. The basic idea of the apriori algorithm is to generate the item sets of a

particular size & scans the database. These item sets are

A : Primary

B : Secondary

C : Superkey

D : Candidate

Q.no 53. Which operation data warehouse requires ?

A : Initial loading of data

B : Transaction processing

C : Recovery

D : Concurrency control mechanisms

Q.no 54. The problem of finding hidden structure from unlabeled data is called as

A : Supervised learning

B : Unsupervised learning

C : Reinforcement Learning

D : Semisupervised learning

Q.no 55. A model makes predictions and predicts 120 examples as belonging to the

minority class, 90 of which are correct, and 30 of which are incorrect. Precision of

model is

A : Precision = 0.89

B : Precision = 0.23

C : Precision = 0.45

D : Precision = 0.75

Q.no 56. Accuracy is

A : Number of correct predictions out of total no. of predictions

B : Number of incorrect predictions out of total no. of predictions

C : Number of predictions out of total no. of predictions

D : Total number of predictions

Q.no 57. What does a Pearson's product-moment allow you to identify?

A : Whether there is a relationship between variables

B : Whether there is a significant effect and interaction of independent variables

C : Whether there is a significant difference between variables

D : Whether there is a significant effect and interaction of dependent variables

Q.no 58. A model makes predictions and predicts 90 of the positive class

predictions correctly and 10 incorrectly.Recall of model is

A : Recall=0.9

B : Recall=0.39

C : Recall=0.65

D : Recall=5.0

Q.no 59. Rotating the axes in a 3-D cube is the example of

A : Pivot

B : Roll up

C : Drill down

D : Slice

Q.no 60. These server performs the faster computation

A : ROLAP

B : MOLAP

C : HOLAP

D : HaoLap

Answers

Answer for Question No 41. is a

Answer for Question No 42. is d

Answer for Question No 43. is b

Answer for Question No 44. is c

Answer for Question No 45. is a

Answer for Question No 46. is a

Answer for Question No 47. is c

Answer for Question No 48. is a

Answer for Question No 49. is b

Answer for Question No 50. is d

Answer for Question No 51. is d

Answer for Question No 52. is d

Answer for Question No 53. is a

Answer for Question No 54. is b

Answer for Question No 55. is d

Answer for Question No 56. is a

Answer for Question No 57. is a

Answer for Question No 58. is a

Answer for Question No 59. is a

Answer for Question No 60. is b

Facebook SDK (Plugin)

Sarkari Naukri Update - सरकारी नौकरी अपडेट

Data Mining and Warehousing Question with Answer for Exam Preparation

Data Mining and Warehousing Question with Answer

Post a Comment

Search on Google

Tools

About Us

Footer Copyright

Contact Form

Facebook SDK (Plugin)

Sarkari Naukri Update - सरकारी नौकरी अपडेट

Data Mining and Warehousing Question with Answer for Exam Preparation

Data Mining and Warehousing Question with Answer

You May Like

Post a Comment

Search on Google

Tools

About Us

Footer Copyright

Contact Form