dmdw-mid 1

Embed Size (px)

Citation preview

  • 8/9/2019 dmdw-mid 1

    1/22

    JNTU ONLINE EXAMINATIONS [Mid 1 - DMDW]

    1. Which of the following is the most popularly available and rich information

    repositories?

    a. Temporal databases

    b. Relational databases

    c. Transactional databasesd. spatial databases

    2. Which of the following databases is used to store time-related data?

    a. Spatial databasesb. Text databases

    c. Multimedia databasesd. Temporal databases

    3. From a DWH perspective, data mining can be viewed as an advanced stage of

    a. On-Line Transaction Processing

    b. On-Line Data Processingc. On-Line Analytical Processing

    d. On-Line Electronic Processing4. A _ _ _ _ _ _ is a group of heterogeneous databases?

    a. Time series databasesb. Object oriented databases

    c. Legacy databasesd. Spatial databases

    5. Spatial databases includes

    a. Legacy databasesb. Time series databasesc. Satellite image databases

    d. Temporal databases6. Many people treat data mining as synonym for another popularly used term

    a. Knowledge Discovery in databasesb. knowledge inventory in databasesc. Knowledge acceptance in databases

    d. knowledge disposal in databases.7. A database is a collection of

    a. Related data

    b. Interrelated datac. Irrelevant data

    d. Distributed data8. A Relational database is a collection of

    a. tablesb. eventsc. attributesd. values

    9. A _ _ _ _ _ _ _ is a repository of information collected from multiple squaresstored under a unified schema, and which usually resides at a single site.

    a. Data miningb. Databasec. Data warehoused. legacy databases

    10. Which of the following databases is used to store image, audio, and video data?

    a. Heterogeneous databases

    b. Temporal databasesc. Legacy databasesd. Multimedia databases

  • 8/9/2019 dmdw-mid 1

    2/22

    11. What is the single dimensional association rule for the following predicate

    notation, which in multidimensional association rule. Contains(T, "computer") ==

    contains(T, "software")

    a. Computer == software

    b. Software == computer

    c. Software == computer

    d. Computer == software

    12. Which of the following analysis attempt to identify attributes that do notcontribute to the classification or prediction process?

    a. Cluster analysisb. Outlier analysis

    c. Relevance analysisd. Evolution analysis

    13. Which of the following is a summarization of the general characteristics orfeatures of a target class of data?

    a. Data discriminationb. Data characterizationc. Data compressiond. Meta data

    14. _ _ _ _ _ _ _ is a comparison of the general features of target class data objectswith general features of objects from one or a set of contrasting classes.

    a. Data characterizationb. Data summarizationc. Data discriminationd. Meta data

    15. _ _ _ _ _ _ _ interestingness measures are based on user beliefs in the data.

    a. Objective

    b. Descriptivec. Collectived. Subjective

    16. _ _ _ _ _ _ mining tasks characterize the general properties of the data in the

    databases.

    a. Descriptive

    b. Predictivec. Metadatad. Data

    17. _ _ _ _ _ mining tasks perform inference on the current data in order to make

    predictions.

    a. Descriptiveb. Predictivec. Data

    d. Metadata18. The derived model may be represented in the form of

    a. ER modelb. Flow chartc. Decision treesd. DFD

    19. Which of the following is the classification of data mining systems?

  • 8/9/2019 dmdw-mid 1

    3/22

    a. Summarizationb. Visualization

    c. Discriminationd. Characterization

    20. _ _ _ _ _ _ _ analysis describes and models regularities or trends for objectswhose behavior changes over time.

    a. Data evolutionb. Cluster

    c. Outlierd. Summarization

    21. Which of the following issues relation to the diversity of database type?

    a. Handling noisy or incomplete datab. Incorporation of background knowledgec. Handling of relational and complex types of data

    d. Efficiency and scalability of data mining algorithms22. Which of the following is not major issue in data mining?

    a. Mining methodology and user interaction issuesb. Performance issuesc. Issues relating to the diversity of database typesd. Issues relating to the Measurement

    23. Processing _ _ _ _ _ queries in operational databases would substantially degradethe performance of operational tasks.

    a. On-Line Transaction Processingb. On-Line Electronic Processingc. On-Line Data Processingd. On-Line Analytical Processing

    24. An _ _ _ _ _ _ System typically adopts either a star or snow flake model andsubject oriented database design.

    a. On-Line Transaction Processingb. On-Line Electronic Processing

    c. On-Line Analytical Processingd. On-Line Data Processing

    25. The access patterns of an _ _ _ _ system consist mainly of short, atomic

    transactions.

    a. On-Line Analytical Processingb. On-Line Transaction Processing

    c. On-Line Electronic Processingd. On-Line Data Processing

    26. Which of the following approach requires complex information filtering andintegration processes and competes for resources with processing at local

    sources?

    a. Update-driven approach

    b. Integrate-driven approachc. Query-driven approachd. Data-driven approach

    27. Mining different kinds of knowledge in databases is an issue in

    a. Performance issueb. Mining methodology and user interaction issues

    c. Diversity of database types issuesd. time complexity

    28. Pattern evolution is an issue related to

    a. Mining methodology and user interaction issuesb. Performance issuesc. Issues relating to the diversity of database types

    d. Issues relating to the Measurement

  • 8/9/2019 dmdw-mid 1

    4/22

    29. A DWH is a subject oriented, integrated, time- variant, and _ _ _ _ _ _ collectionof data in support of management's decision-making process.

    a. Nonvolatileb. Volatile

    c. Disintegratedd. Object- oriented

    30. An _ _ _ system focuses mainly on the current data with in an enterprise ordepartment, without referring to historical data or data in different organizations .

    a. On-Line Analytical Processingb. On-Line Data Processing

    c. On-Line Electronic Processingd. On-Line Transaction Processing

    31. The basic characteristic of On-line Analytical Processing is

    a. Informational processingb. Operational processingc. Data processing

    d. Data cleaning32. Which of the following cuboid that holds the highest level of summerization?

    a. Cuboid

    b. Base cuboidc. Non-base cuboidd. Apex coboid

    33. _ _ _ _ _ _ _ _ _ _ is a visualization operation that rotates the data axes in view inorder to provide an alternative presentation of the data

    a. Rollupb. Drill down

    c. Pivotd. Slice & dice

    34. _ _ _ _ _ _ tables can be specified by users or experts, or automatically generatedand adjusted based on data distributions.

    a. Factb. Summarized

    c. Dimensiond. Relational35. _ _ _ _ _ _ _ executes queries involving more than one fact table

    a. Drill-throughb. Drill-acrossc. Drill-down

    d. Rotate36. A _ _ _ _ _ allows data to be modeled and viewed in multiple dimensions.

    a. Meta datab. Data cubec. Databased. Fact table

    37. The major difference between the snowflake and star schema models is that thedimension tables of the snowflake model image kept in _ _ _ _ form

    a. Standardb. De-normalized

    c. Normalizedd. Multi dimensional

    38. Which of the following is not a measure, which is based on the kind ofaggregation functions used.

    a. Cumulativeb. Distributed

    c. Algebraic

  • 8/9/2019 dmdw-mid 1

    5/22

    d. Holistic39. A concept hierarchy that is a total or partial order among attributes in database

    schema is called a _ _ _ _ _ _ _ _ _ _ _ hierarchy.

    a. Set-grouping

    b. Groupingc. Decisiond. Schema

    40. Which of the following focuses on socioeconomic applications?

    a. Statistical database systemsb. Online Analytical Processing systems

    c. Spatial database systemsd. Temporal database systems

    41. A _ _ _ _ _ _ _ _ _ model consists of radial lines emanating from a central point,where each line represents a concept hierarchy for a dimension

    a. Cube netb. Triangle net

    c. Square netd. Star net

    42. Which of the following is constructed where the enterprise warehouse is the solecustodian of all warehouse data. Which is then distributed to the various

    dependent data marts.

    a. Enterprise DWH

    b. Two- tier DWHc. Multi-tier DWHd. Virtual warehouse

    43. Which of the following is a Multi Dimensional Online Analytical Processing?

    a. Ess baseb. Database

    c. Swiss based. Red brick

    44. The _ _ _ _ _ _ view includes fact tables and dimension tables.

    a. DWH

    b. Top-downc. Data sourced. Business Query

    45. Which of the following is a Hybrid OLAP server?

    a. MS SQL server 1.0b. MS SQL 5.0

    c. MS SQL server 7.0d. MS SQL server 3.0

    46. ETL stands for

    a. Evaluate, Transport and Linkb. Extract Transfer and Loadc. Error, Tracking and Load

    d. Extract, Transient and Load47. To architect the DWH, the major driving factor to support is

    a. An inability to cope with requirements evolutionb. Not populating the warehouse

    c. Day- to- day management of the warehoused. Supporting Online Transaction processing

    48. A _ _ _ _ _ _ _ contains a subset of corporate-wide data that is of value to aspecific group of users.

    a. Enterprise warehouseb. Virtual warehouse

    c. Data warehouse

  • 8/9/2019 dmdw-mid 1

    6/22

    d. Data mart49. A _ _ _ _ _ _ _ is a set of views over operational databases

    a. Enterprise warehouseb. Virtual warehouse

    c. Data warehoused. Data mart

    50. What kind of the intermediate servers that stand in between a relational back-endserver and client front-end tools?

    a. Hybrid OLAP serversb. Multidimensional OLAP server

    c. Relational OLAP serversd. Specialized SQL servers

    51. Choose the _ _ _ _ _ _ _ _ _ that will populate each fact table record

    a. Measuresb. Dimensionsc. Grain

    d. Business Process52. How many cuboids are there in an n- dimensional data cube?

    a.

    b.

    c.

    d.

    53. Meta data repository contains

    a. Operational meta datab. Data irrelevant to system performancec. The mapping from the DWH to the operational environment

    d. Summarized data

    54. Which of the following support the bitmap indices

    a. Sybase IQb. Oracle 7c. CoBoLd. SQL

    55. _ _ _ _ _ _ _ are created for the data names and definitions of the givenwarehouse

    a. Data cubeb. Summarized datac. Meta datad. Detailed Information

    56. Chunking technique involves "overlapping" some of the aggregationcomputations, it is referred to as _ _ _ _ _ aggregation in data cube computation

    a. Two way arrayb. Three way array

    c. Multi way arrayd. Sparse array

    57. The _ _ _ _ _ _ _ operator computes aggregates over all subsets of thedimensions specified in the operation.

    a. Data baseb. Computer cube

    c. Define cubed. Group by

  • 8/9/2019 dmdw-mid 1

    7/22

    58. Which of the following is a subcuge that is small enough to fit into the memoryavailable for cube computation?

    a. Bulkb. Array

    c. Structured. Chunk

    59. The bit mapped join indices method is an integrated form of

    a. Composite join indexing and bitmap indexingb. Join indexing and composite join indexingc. Join indexing and bitmap indexing

    d. Bitmap indexing and outer join indexing60. A set of attributes in a relation schema that forms a primary key for another

    relation schema is called a _ _ _ _ _ _ _

    a. Primary keyb. Foreign keyc. Secondary key

    d. Composite key61. Which of the following typically gathers data from multiple, heterogeneous, and

    external sources?

    a. Data cleaningb. Loadc. Refresh

    d. Data extraction62. OLAM is particularly important for the following reason

    a. How quality of data in DWHb. Data processing

    c. OLTP-based exploratory data analysisd. Online selection of data mining functions

    63. Which of the following sets a good example for interactive data analysis andprovides the necessary preparations for exploratory data mining?

    a. OLPb. OLAP

    c. OLTPd. OLDP64. Which of the following is not exception indicator?

    a. Out Expb. Self Expc. In Exp

    d. Path Exp65. _ _ _ _ _ _ _ _ _ can help business managers find and reach more suitable

    customers, as well as gain critical business insights that may help to drive marketshare and raise profits.

    a. Data warehouseb. Data mining

    c. Data summarizationd. Data processing

    66. _ _ _ _ _ _ _ _ _ _ _ is an alternative approach in which pre-computed measuresindicating data exceptions are used to guide the user in the data analysis process

    at all levels of aggregation.

    a. Hypothesis-driven exploration

    b. Inventory-driven explorationc. Discovery-driven explorationd. Exception-driven exploration

    67. Which of the following is an exception indicator that indicates that indicates the

    degree of surprise of the cell value, relative to other cells at the same level ofaggregation?

  • 8/9/2019 dmdw-mid 1

    8/22

    a. Out Expb. In Exp

    c. Path Expd. Self Exp

    68. _ _ _ _ _ is a powerful paradigm that integrates OLAP with data miningtechnology.

    a. Online Analytical Modelingb. Online Analytical Machine

    c. Online Analytical Miningd. Online Analytical Monitoring

    69. Data warehouse application is _ _ _ _ _ _ _ _ _

    a. Data Processingb. Transaction Processingc. Datacube

    d. Datamining70. _ _ _ _ _ _ _ _ _ cubes compute complex queries involving multiple dependent

    aggregates as multiple granularities

    a. Multi featureb. Datac. Meta

    d. Solid71. Which of the following performs a linear transformation on the original data?

    a. Z-score normalizationb. Normalization with decimal scalingc. Zero-standard deviationd. Min-max normalization

    72. Which of the following is the best method for missing values in data cleaning?

    a. Fill in the missing value manually

    b. Use the most probable value to fill in the missing valuec. Use the attribute mean to fill the missing value

    d. Use a global constant to fill in the missing value73. The minimum and maximum values in a given bin are identified as the

    a. Bin meansb. Bin averagec. Bin mediansd. Bin boundaries

    74. Which of the following is data transformation operation?

    a. Normalization

    b. Regressionc. Clusteringd. Binning

    75. The correlation between attributes A and B can be measured by

    a.

    b.

    c.

    d.76. _ _ _ _ _ methods smooth a sorted data value by consulting in neighborhood ie

    the values around it.

    a. Clusteringb. Binning

  • 8/9/2019 dmdw-mid 1

    9/22

    c. Regressiond. Data reduction

    77. Z-score normalization is also called as

    a. Min-max normalization

    b. Zero-standard deviation normalizationc. Zero-mean normalizationd. Normalization by decimal scaling

    78. _ _ _ _ _ _ is a random error or variance in a measured variable.

    a. Binb. Cluster

    c. Noised. Regression

    79. The data are consolidated into forms appropriate for mining is called as

    a. Data reductionb. Data Redundancyc. Data clean

    d. Data transformation80. Which of the following is a decision tree algorithm?

    a. C3.2

    b. ID3c. PP2d. DIM

    81. If the tuples in D are grouped into M mutually disjoint Clustering, then an simple

    random sample of m clusters can be obtained, where m M which of thefollowing suits the above sentence?

    a. Stratified sampleb. SRS without replacementc. Cluster sampled. SRS with replacement

    82. Multidimensional index trees include

    a. A- trees

    b. T-treesc. P-trees

    d. R-trees83. Which of the following strategy for data reduction is irrelevant, weakly relevant,

    or redundant attributes may be detected and removed?

    a. Data cube aggregationb. Dimension reductionc. Data compression

    d. Numerosity reduction84. In database systems, _ _ _ _ _ are primarily used for providing fast data access.

    a. Red-black treesb. Game treesc. Multidimensional index trees

    d. splay trees85. If the mining task is classification, and the mining algorithm itself is used to

    determine the attribute subset, then this is called a _ _ _ _ _ _ approach.

    a. Filterb. Reductionc. Smoothingd. Wrapper

    86. The discrete wavelet transformation is closely related to the _ _ _ _ _ _ _transform.

  • 8/9/2019 dmdw-mid 1

    10/22

    a. Discrete fourierb. Fourier

    c. Laplaced. wavelet

    87. Principal components analysis is also called as

    a. Karhunen-loeve methodb. Kinen-liva methodc. Kruskal-learn method

    d. Kutni-lara method88. _ _ _ _ _ _ can be used as a data reduction technique since it allows a large data

    set to be represented by a much smaller random subset of the data.

    a. Clusteringb. Regressionc. Histograms

    d. Sampling89. Loy-linear models are

    a. Parametric methodsb. Discrete methodsc. Non-parametric methodsd. Non- discrete methods

    90. Which of the following method is the generation of concept of hierarchies forcategorical data?

    a. Specification of a portion of a hierarchy by implicit data groupingb. Specification of their partial ordering, but not of a set of attributesc. Specification of a set of attributes, but not of their partial orderd. Specification of only a partial set of entities

    91. Which of the following method uses class information?

    a. Histogram analysis

    b. Binningc. Cluster analysis

    d. Entropy-based Discretization92. _ _ _ _ _ _ _ _ _ hierarchies for categorical attributes or dimensions typically

    involve a group of attributes

    a. Diccretizationb. Semanticc. Index

    d. Concept93. Which of the following is based on the maximal asset values, which may lead to a

    highly biased hierarchy?

    a. Cluster analysisb. Segmentationc. Binning

    d. Histogram analysis94. The _ _ _ _ _ can be used to segment numeric data into relatively uniform,

    "natural" intervals.

    a. 1-2-3 rule

    b. 2-3-4 rulec. 3-4-5 rule

    d. 4-5-6rule95. _ _ _ _ _ _ _ _ hierarchies for numeric attributes can be constructed automatically

    based on data distribution analysis

    a. Conceptb. Discretizationc. Tree

    d. Index

  • 8/9/2019 dmdw-mid 1

    11/22

    96. _ _ _ _ _ _ _ techniques can be used to reduce the number of values for a givencontinuous attribute, by dividing the range of the attribute into intervals

    a. Concept hierarchyb. Discretization

    c. Tree-basedd. Index

    97. A _ _ _ _ _ _ _ _ _ algorithm can be applied to partition data into groups

    a. Binningb. Histogramc. Clustering

    d. Entropy-based98. An information-based measure called _ _ _ _ can be used to recursively partition

    the values of a numeric attribute A, resulting in a hierarchical discretization.

    a. Entropyb. Clusterc. Binning

    d. Segmentation99. The kinds of knowledge include

    a. Image analysis

    b. Query processc. Associationd. Multimedia analysis

    100. Which of the following is a simplicity measure?

    a. Rule strengthb. Rule qualityc. Rule reliability

    d. Rule length101. _ _ _ _ _ _ hierarchies can be used to refine or enrich schema defined

    hierarchies. When the two types of hierarchies are combined.

    a. Schemab. Set-groupingc. Operation-derived

    d. rule-based102. _ _ _ _ _ _ _ are those that contribute new information or increasedperformance to the given pattern set.

    a. Utility patternsb. Certainty patternsc. Novelty pattern

    d. Simplicity patterns103. Certainty factor is also known as

    a. Rule lengthb. Noice thresholdc. Minable viewd. Rule strength

    104. Which of the following primitive specifies the data mining functions to beperformed?

    a. Task-relevant datab. The kind of knowledge to be mined

    c. Background knowledged. Interestingness measures

    105. _ _ _ _ _ _ _ may be used to guide the mining process or, after discoveryto evaluate the discovered patterns.

    a. Task-relevant datab. The kind of knowledge to be mined

    c. Background knowledge

  • 8/9/2019 dmdw-mid 1

    12/22

    d. Interestingness measures106. A _ _ _ _ _ hierarchy is a total or partial order among attributes in the

    database schema.

    a. Schema

    b. Set-groupingc. Operation-derivedd. rule-based

    107. Given a set of task-relevant data tuples the confidence of "A== B" is

    defined as

    a.

    b.

    c.

    d.

    108. _ _ _ _ _ hierarchies include the decoding of information encoded stringsinformation extraction from complex data objects and data clustering.

    a. Rule-based

    b. Operation-derivedc. Schema

    d. Set grouping

    109. For association rules of the form "A== B" where A and B are sets of

    items, support is defined as

    a.

    b.

    c.

    d.110. Which of the following clause is the task-irrelevant data primitive?

    a. In relevance tob. Use for warehousec. Analysisd. Order by

    111. Mining with the use of _ _ _ _ , allows additional flexibility for ad hoc rulemining.

    a. Image patternsb. Data patterns

    c. Information patternsd. Meta patterns

    112. Which of the following clause lists the attributes or dimensions forexploration

    a. Order byb. group by

    c. havingd. in relevance to

  • 8/9/2019 dmdw-mid 1

    13/22

    113. Which of the following clause uses the meta pattern?

    a. Analyzeb. In relevance toc. Matching

    d. Use data warehouse114. Which of the following clause is used for discrimination?

    a. Mine characteristicsb. Mine discriminantc. Mine associationd. Mine comparison

    115. DMQL expansion is

    a. Data Modeling Queue Level

    b. Design Modeling Query languagec. Data Mining Query Languaged. Data &Meta data Query Language

    116. The _ _ _ _ _ clause, when used for characterization, specific aggregate

    measures, such as count, sum or count .

    a. Use databaseb. Analyze

    c. Matchingd. Use hierarchy

    117. Which of the following clause specifies the condition by which groups ofdata are considered relevant?

    a. Havingb. Group by

    c. Order byd. analyze

    118. The _ _ _ _ _ _ _ _ statement is used to specify the kind of knowledge tobe mined.

    a. Knowledge-mine-specification

    b. Mine-knowledge-specification

    c. Knowledge-specification-mine

    d. Specification-mine-knowledge

    119. An example of interestingness measures and threshold values is

    a. Without support threshold=

    b.With confidence threshold=

    c. Without Confidence threshold=

    d. With support threshold=

    120. CRISP-DM addresses an issue as

    a. Mapping from datamining problems to business issuesb. Capturing and misunderstanding the data

    c. Disintegrating datamining results within the business context

  • 8/9/2019 dmdw-mid 1

    14/22

    d. Deploying and maintaining data mining results121. An Example of a set-grouping hierarchy is

    a. Define hierarchy age-hierarchy for age as customer on level1:{young, middle-

    aged,serior} level10:all level2:{20 39} level1: young level2:{20 59}

    level1: middle-aged level2:{60 89} level1:senior

    b. Define hierarchy age-hierarchy as age for customer on level1:{young, middle-

    aged,serior} level10:all level2:{20 39} level1: young level2:{20 59}

    level1: middle-aged level2:{60 89} level1:senior

    c. Define hierarchy age-hierarchy for age on customer as level1:{young,

    middle-aged,serior} level10:all level2:{20 39} level1: young level2:

    {20 59} level1: middle-aged level2:{60 89} level1:senior

    d. Define hierarchy age-hierarchy on age for customer as level1:{young, middle-

    aged,serior} level10:all level2:{20 39} level1: young level2:{20 59}

    level1: middle-aged level2:{60 89} level1:senior122. Which of the following data mining language uses SQL-like syntax and

    serves as rule generation queries for mining association rules.

    a. MINE RULE operatorb. RULE MINE operator

    c. DATA MINE operatord. DWH operator

    123. Which of the following is not a data mining language?

    a. DMQLb. MSQLc. PSQLd. OLE DB for

    124. System of schema hierarchy is

    a. textbf{Define hierarchy} location-hierarchy textbf{on} addresstextbf{as} [street, city, country]

    b. textbf{Define hierarchy} location-hierarchy textbf{as} address textbf{on} [street,city, country]c. textbf{Define hierarchy} location-hierarchy textbf{from} address textbf{to}[street, city, country]

    d. textbf{Define hierarchy }location-hierarchy textbf{for} address textbf{all} [street,city, country]

    125. The DMQL statement syntax is

    a. display as result _ from

    b. display result _ from

    c. display on result _ from

  • 8/9/2019 dmdw-mid 1

    15/22

    d. display for result _ from

    126. Which of the following is a data mining query language

    a. PSQLb. QSQLc. MSQL

    d. RSQL127. _ _ _ _ _ is used for efficient implementations of a few essential data

    mining primitives.

    a. No couplingb. Loose couplingc. Tight couplingd. Semi tight coupling

    128. _ _ _ _ _ _ _ is a compromise between loose and tight coupling.

    a. No coupling

    b. Loose couplingc. Tight coupling

    d. Semi tight coupling129. Which of the following coupling schema is used to fetch data from a data

    repository managed by database systems?

    a. No couplingb. Loose couplingc. Tight coupling

    d. Semi tight coupling130. A well designed data mining system should offer _ _ _ _ _ _ _ with a data

    warehouse system

    a. Semi tight couplingb. No couplingc. Loose coupling

    d. Normal coupling131. Which of the following is difficult to achieve high scalability and good

    performance with large data sets?

    a. No couplingb. Tight couplingc. Semi tight coupling

    d. Loose coupling132. _ _ _ _ _ _ _ _ means that a Data mining system will not utilize any

    function of a data warehouse system

    a. Loose couplingb. Semi tight couplingc. Loose coupling

    d. No coupling133. _ _ _ _ _ _ _ _ means that a data mining system is smoothing integrated

    coupling database system.

    a. No coupling

    b. Loose couplingc. Tight coupling

    d. Semi tight coupling134. Which of the following provides a concise and succinct summerization of

    the given collection of data?

    a. Comparison

    b. Characterizationc. Summerization

    d. Aggregation

  • 8/9/2019 dmdw-mid 1

    16/22

    135. _ _ _ _ _ _ _ _ data mining describes the data set in a concise andsummerative manner and presents interesting general properties of the data.

    a. Descriptiveb. Predictive

    c. Actived. Constructive

    136. _ _ _ _ _ _ data mining analyzes the data in order to construct one or a setof models and attempts to predict the behavior of new data sets.

    a. Descriptiveb. Predictive

    c. Actived. Constructive

    137. Attribute removal is based on the following rule: If there is a large set ofdistinct values for an attribute of the initial working relation but,

    a. There is generalization operator on the attributeb. There is no generalization operand on the attribute

    c. There is no generalization operator on the attributed. There is no aggregation operator on the attribute

    138. On-line analysis processing in data warehouses is a purely-controlledprocess

    a. Machineb. database

    c. Developerd. User

    139. Which of the following approach is used to control generalization process?

    a. Generalized relation threshold control

    b. Generalized class threshold controlc. Generalized dimension threshold control

    d. Generalized query threshold control140. Many current OLAP systems confine dimensions to _ _ _ _ _ _ _ _ _ _ data

    a. Numericb. Non numeric

    c. Metad. Summerized141. _ _ _ _ _ _ _ is a process that abstracts a large set of task-relevant data in

    a database from a relatively low conceptual level to higher conceptual levels.

    a. Data realizationb. Data characterization

    c. Data summerizationd. Data generalization

    142. The _ _ _ _ _ _ approach can be considered as a data warehouse-basedpre-computation-oriented, material- view approach.

    a. Object-oriented inductionb. Data cube

    c. Attribute-oriented inductiond. Data square

    143. Which of the following approach is a relational database query-oriented,generalization-based, on-line data analysis technique?

    a. Attribute-oriented inductionb. object-oriented approach

    c. Data cubed. Data square

    144. _ _ _ _ _ _ _ _ performs off-line aggregation before an OLAP or Datamining query is submitted for processing.

    a. Object-oriented induction

  • 8/9/2019 dmdw-mid 1

    17/22

    b. Data cubec. Attribute-oriented induction

    d. Data square145. The range of t-weight is

    a.

    b.

    c.

    d.

    146. How can the t-weight and interestingness measures in general be used bythe data mining system to display only the concept descriptions that it objectively

    evaluates as interesting?

    a. By thresholdb. By generalizationc. By comparison

    d. By characterization147. The data cube implementation of attribute-oriented induction can be

    performed by

    a. Using defined data cubeb. Using a predefined data cubec. Using a generalized data cube

    d. Using a quantified data cube148. A _ _ _ _ _ can be represented by a 3-D data cube.

    a. Cross-tabb. Bar chart

    c. pie chartd. Flow chart

    149. Step one of the attribute-oriented-induction algorithm is essentially arelational query to collect the task relevant data into the _ _ _ _ _ _ _ _ _ _ _ .

    a. Prime relationb. Secondary relation

    c. Working relationd. Analyzing relation

    150. Which of the following relation collects the statistics of attribute-oriented-induction algorithm?

    a. Working relationb. Prime relation

    c. Secondary relationd. Analyzing realation

    151. Descriptions can also be visualized in the form of _ _ _ _ _ _ _ _ .

    a. Cross-ralationsb. Cross-checksc. Cross-boards

    d. Cross-tabs152. Step three of attribute-oriented-induction derives the _ _ _ _ _ _ _

    relation.

    a. Workingb. Primec. Secondary

    d. Analysing

  • 8/9/2019 dmdw-mid 1

    18/22

    153. The _ _ _ _ _ _ as an interestingness measure that describes the typicallyof each disjoint in the rule, or of each tuple in the corresponding generalized

    relation.

    a. Quantitative rule

    b. Quantitative characteristic rulec. c-weightd. t-weight

    154. The information gain is obtained by

    a. Expected information + entropyb. Entropy - Expected information

    c. Expected information entropy

    d. Entropy Expected information

    155. The expected information needed to classify a given sample is

    a. I(s1,s2----.sm)= mathop Sigma limits_{i = 1}n ( /s) ( /s)

    b. I(s1,s2----.sm)= ( /s) ( /s)

    c. I(s1,s2----.sm)= - mathop Sigma limits_{i = 1}n ( /s) ( /s)

    d. I(s1,s2----.sm)=- mathop Sigma limits_{i = 1}n ( /s) ( /s)

    156. Class comprarison is also called as

    a. compositionb. aggregationc. discriminationd. characterization

    157. _ _ _ _ _ _ can be used to perform some preliminary relevance analysis onthe data by removing or generalizing attributes having a very large number ofdistinct values.

    a. Object-oriented inductionb. Attribute-oriented inductionc. Batch-oriented induction

    d. Class-oriented induction158. Class characterization that includes the analysis of attribute/dimensions

    relevance is called _ _ _ _ _ .

    a. Analytical comparisonb. Analytical measurementc. Analytical characterization

    d. Analytical difference159. _ _ _ _ _ _ _ irrelevant and weakly relevant attributes using the selected

    relevance analysis measure.

    a. Insertb. Updatec. Modify

    d. Remove160. The _ _ _ _ _ class is the class to be characterized

    a. baseb. target

    c. contrastingd. sub

  • 8/9/2019 dmdw-mid 1

    19/22

    161. The _ _ _ _ _ _ class is the set of comparable data that are not in thetarget class.

    a. baseb. target

    c. contrastingd. sub

    162. Generalization is performed on the _ _ _ _ _ _ _ _ to the level controlledby a user or expert-specified dimension threshold, which results in a _ _ _ _ _ _ _

    a. Target class, Prime target class relationb. Contrasting class, Prime contrasting class relation

    c. Target class, Secondary target class relationd. Contrasting class, Secondary contrasting class relation

    163. Let be a generalized tuple, and be the target class, the d-weight

    is defined as

    a. d-weight =condition( ) / count( )

    b. d-weight =condition( ) / mathop Sigma limits_{i = 1}m

    count( )

    c. d-weight =condition( ) / count( )

    d. d-weight =condition( ) / count( )

    164. Can class comparison mining be implemented efficiently using data cubetechniques?

    a. yesb. no

    c. limitedd. difficult

    165. Class discrimination is also called as

    a. class comparisonb. class hierarchyc. class aggregationd. class concept

    166. The set of relevant data in the database is collected by query processed

    and is partitioned respectively into a target class and one or a set of _ _ _ _ _class(es)

    a. discriminationb. contrastingc. comparabled. target

    167. The range for the d-weight is

    a.

    b.

    c.

    d.168. A _ _ _ _ _ _ d-weight in the target class indicates that the concept

    represented by the generalized tuple is primarily derived from the target class

  • 8/9/2019 dmdw-mid 1

    20/22

    a. Lowb. High

    c. Averaged. Middle

    169. A _ _ _ _ _ _ d-weight implies that the concept is primarily derived fromthe contrasting class

    a. Lowb. High

    c. Averaged. Middle

    170. A quantitave discriminant rule for the target class of a given comparisondescription is written in the form

    a. x, target _ class(x) compare(x) [d: d-weight]

    b. x, contrasting _ class(x) condition(x) [d: d-weight]

    c. x, contrasting _ class(x) compare(x) [d: d-weight]

    d. x, target _ class(x) condition(x) [d: d-weight]

    171. In d-weight, d stands for

    a. divide

    b. deadc. discriminationd. degree

    172. Inter quartile is defined as

    a. First quartile -Third quartileb. First quartile + Third quartile

    c. Third quartile + First quartiled. Third quartile - First quartile

    173. One common rule of thumb for identifying suspected outliers is to singleout values falling at least _ _ _ _ _ _ _ above the third quartile or below the first

    quartile.

    a.

    b.

    c.

    d.174. The most commonly used percentiles other the median are _ _ _ _ _ _

    a. Outliers

    b. Boxplotsc. Quartilesd. Modes

    175. A popularly used visual representation of a distribution is the _ _ _ _ _ _ __

    a. Boxplotb. Outlierc. Quartiled. Histogram

    176. Dispersion is also called as

  • 8/9/2019 dmdw-mid 1

    21/22

    a. Meanb. Variance

    c. Mediand. mode

    177. Which of the following is central tendency measure?

    a. Outliersb. Variancec. Quartiles

    d. Mode178. Which of the following is a data dispersion measure?

    a. Meanb. Variancec. Moded. Median

    179. The average of the largest and smallest values in a data set is called as

    a. Median

    b. Meanc. Mid ranged. Mode

    180. The _ _ _ _ _ _ _ _ for a set of data is the value that occurs most

    frequently in the set.

    a. Median

    b. Meanc. Mid ranged. Mode

    181. Which of the following is not central tendency measure?

    a. Varianceb. Mean

    c. Mediand. Mode

    182. A _ _ _ _ _ _ _ _ is one of the most effective graphical methods or trendbetween two quantitative variables.

    a. q-q plotb. scatter plotc. quantile plotd. q-q-q plot

    183. A _ _ _ _ _ _ _ _ is another important exploratory graphic aid that adds asmooth curve to a scatter plot in order to provide better perception of the pattern

    of dependence.

    a. Loess curveb. Scatter curvec. Bar chat

    d. Quantile plot184. Histograms are also called as _ _ _ _ _ _ _ _ _ histograms.

    a. frequencyb. variance

    c. quartiled. outlier

    185. The word loess is short for

    a. Load compression

    b. Local compressionc. Load refressiond. Local refression

    186. A _ _ _ _ _ _ _ _ _ consists of a set of rectangles that reflect the counts of

    the classes present in the given data.

  • 8/9/2019 dmdw-mid 1

    22/22

    a. Quartile plotb. q-q plot

    c. Histogramd. Loess curves

    187. A _ _ _ _ _ _ is a simple and effective way to have a first look at anunvariate data distribution.

    a. q-q plotb. scatter plot

    c. histogramd. quantile plot

    188. A _ _ _ _ _ _ _ _ _ , groups the quantiles of one unvariate distributionagainst the correspondings quantiles of another.

    a. quantile plotb. q-q-q plot

    c. q-q plotd. Scatter plot