5

Click here to load reader

1.1. What methods and tools does the predictive analytics use? fileLogarithmic Regression and Multiple Linear Regression. ... Rapid Insight Veera, DMWay, Lavastorm Analytics Engine,

Embed Size (px)

Citation preview

Page 1: 1.1. What methods and tools does the predictive analytics use? fileLogarithmic Regression and Multiple Linear Regression. ... Rapid Insight Veera, DMWay, Lavastorm Analytics Engine,

Задания 1.1. и 1.2. Даны развернутые ответы на английском языке.

Учитываются следующие критерии:

1. Полнота и правильность ответа

2. Ясность изложения

3. Корректность терминологии

Проверяющий выставляет интегральную оценку, основанную на вышеуказанных правилах.

Примеры ответов.

1.1. What methods and tools does the predictive analytics use?

Predictive analytics is the branch of the advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics, modeling, machine learning, and artificial intelligence to analyze current data and historical facts to make predictions about future or otherwise unknown events. Predictive analytics allows organizations to become proactive, forward looking, anticipating outcomes and behaviors based upon the data and not on a hunch or assumptions.

Generally, the term predictive analytics is used to mean predictive modeling, "scoring" data with predictive models, and forecasting. However, statistical analysis and descriptive modeling also are used.

Statistical analysis is the study of the collection, organization, analysis, interpretation and presentation of data. Statistical analysis enables to validate the assumptions, hypothesis and test them using standard statistical models. The goal of statistical analysis is to identify trends. A retail business, for example, might use statistical analysis to find patterns in unstructured and semi-structured customer data that can be used to create a more positive customer experience and increase sales.

Some of the statistical tests and procedures used in predictive analytics are:

Analysis of variance (ANOVA): ANOVA models are used to analyze the differences between group

means and the variation among and between the groups.

Factor analysis: This describe the variability among observed and correlated variables with reference

to factors which are unobserved variables

Regression analysis: Estimating the relationships among variables.

Time series analysis: This is a sequence of data points, measured at successive points in time.

k-nearest neighbor algorithm: This is a non-parametric method for classification and regression,

which predicts the objects values or class memberships based on the k closest training examples in

the feature space.

Logistic regression is a technique in which unknown values of a discrete variable are predicted based

on known values of one or more continuous and/or discrete variables.

Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong

naive independence assumptions.

Predictive modeling is the process of creating, testing, validating and evaluating a model to best predict the probability of an outcome. Predictive modeling provides the ability to automatically create accurate predictive models about future. There are also options to choose the best solution with multi model evaluation. Each model is made up of a number of predictors, which are variables that are likely to influence future results.

There are two types of predictive models. Classification models predict class membership. For instance, you try to classify whether someone is likely to leave, whether he will respond to a solicitation, whether he’s a good or bad credit risk, etc. Usually, the model results are in the form of 0 or 1, with 1 being the event you are targeting. Regression models predict a number – for example, how much revenue a customer will generate over the next year or the number of months before a component will fail on a machine.

Three of the most widely used predictive modeling techniques are decision trees, regression and neural networks.

Decision Trees Algorithms classify and predict one or more discrete variables based on other

variables in the dataset. Example algorithms are C 4.5 and CNR Tree.

Page 2: 1.1. What methods and tools does the predictive analytics use? fileLogarithmic Regression and Multiple Linear Regression. ... Rapid Insight Veera, DMWay, Lavastorm Analytics Engine,

Regression Algorithms which predicts continuous variables based on other variables in the dataset.

Example algorithms are Linear Regression, Exponential Regression, Geometric Regression,

Logarithmic Regression and Multiple Linear Regression.

Neural Network Algorithms does the forecasting, classification, and statistical pattern recognition.

Example algorithms are NNet Neural Network and MONMLP Neural Network

Descriptive modeling is a mathematical process that describes real-world events and the relationships

between factors responsible for them. In descriptive modeling, customer groups are clustered according to

demographics, purchasing behavior, expressed interests and other descriptive factors. Statistics can identify

where the customer groups share similarities and where they differ.

The main aspects of descriptive modeling include:

Customer segmentation: Partitions a customer base into groups with various impacts on marketing

and service.

Value-based segmentation: Identifies and quantifies the value of a customer to the organization.

Behavior-based segmentation: Analyzes customer product usage and purchasing patterns.

Needs-based segmentation: Identifies ways to capitalize on motives that drive customer behavior.

Top Predictive Analytics Software : RapidMiner Studio, KNIME Analytics Platform, IBM Predictive Analytics,

SAP Predictive Analytics, Dataiku DSS, SAS Predictive Analytics, Oracle Data Mining ODM, Angoss Predictive

Analytics, Microsoft R, Minitab, TIBCO Spotfire, AdvancedMiner, Microsoft Azure Machine Learning,

STATISTICA, Anaconda, Alteryx Analytics, ABM, Google Cloud Prediction API, DataRobot, HP Haven Predictive

Analytics, Analytic Solver, H2O.ai, Actian Analytics Platform, GMDH Shell, GoodData, Alpine Chorus, Portrait

Predictive Analytics, FICO Model Central, GraphLab Create, Viscovery Software Suite, Information Builders

WebFOCUS Platform, MATLAB, Predixion Insight, Mathematica, Rapid Insight Veera, DMWay, Lavastorm

Analytics Engine, TIMi Suite, CMSR Data Miner Suite, Vanguard Business Analytics Suite, DataRPM, Feature

Labs, Salford Systems SPM, Skytree, QIWare, Grapheur, Emcien, RapidMiner Server are the top predictive

analytics software.

1.2. Everything-as-a-Service: what do I need to implement this idea?

Everything-as-a-Service (EaaS, XaaS, *aaS) is a term for the extensive variety of services and applications

emerging for users to access on demand over the Internet as opposed to being utilized via on-premises

means. It is a subset of cloud computing. The term as a service has been associated and used with many core

components of cloud computing including communication, infrastructure, data and platforms.

Key characteristics. Offerings tagged with the as a service suffix have a number of common attributes,

including:

Low barriers to entry is a common method of offerings, with services typically being available to or

targeting consumers and small businesses.

Little or no capital expenditure as infrastructure is owned by the provider.

Massive scalability is also common, though this is not an absolute requirement and many of the

offerings have yet to achieve large scale.

Multitenancy enables resources (and costs) to be shared among many users.

Device independence enables users to access systems regardless of what device they are using (e.g.

PC, mobile,...etc.).

Location independence allows users remote access to systems.

Benefits of EaaS. When speaking about EaaS as it relates to cloud computing, there are a number of benefits:

Lower costs.

Flexibility. This also includes easier scalability.

Page 3: 1.1. What methods and tools does the predictive analytics use? fileLogarithmic Regression and Multiple Linear Regression. ... Rapid Insight Veera, DMWay, Lavastorm Analytics Engine,

Maintenance is done by the provider. This frees up the customer’s resources and allows them to

focus on what they do best.

Easy access to new technologies (which are being developed rapidly).

New business services are able to debut quickly (think weeks instead of months).

Allows for quick responses to market developments.

EaaS facilities the flexibility for users and companies to customize their computing environments to craft the

experiences they desire, all on demand. EaaS is dependent on a strong cloud services platform and reliable

Internet connectivity to successfully gain traction and acceptance among both individuals and enterprises.

EaaS has frequently been used as an umbrella term to encompass SaaS (Software-as-a-Service), PaaS

(Platform-as-a-Service), and IaaS (Infrastructure-as-a-Service).

Software as a Service (SaaS) — Enables consumers to use the provider’s applications running on a

cloud infrastructure.

Platform as a Service (PaaS) — Deploys consumer-created or acquired applications onto the cloud

infrastructure.

Infrastructure as a Service (IaaS) — Provisions processing, storage and other fundamental computing

resources to deploy and run operating systems and applications.

Other examples of EaaS are storage-as-a-service, desktop-as-a-service, disaster recovery-as-a-service.

But now EaaS leads to ANYTHING as a service, not just cloud computing: marketing-as-a-service, healthcare-

as-a-service, transportation-as-a-service, grocery-as-a-service, accommodation-as-a-service and many other

examples.

There are a number of reasons why subscription models appeal to consumers. Consumers and businesses are

embracing the many benefits of the subscription model, and businesses should be looking at how to

incorporate EaaS into their own roadmaps for future success.

EaaS is radical change or modernization of the company's services to the service model. CIOs and business

leaders can begin their EaaS journeys by answering the following questions:

What can everything-as-a-service do for your business? Viewing business models, processes, and strategies

through an EaaS lens may illuminate entirely new opportunities to grow revenue and drive efficiency.

Bringing these opportunities to fruition may require that you overhaul some legacy systems and reimagine

your operations and the way you engage customers. The good news is that there are core modernization

techniques that can help you extract more value from legacy assets while laying the groundwork for a

service-oriented future—from replatforming to remediating to revitalizing.

How can EaaS transform the way your employees work? Think about how your employees currently do their

jobs. What departmental or task-specific systems do they rely upon? What processes do they follow, and

how does your operational model help or hinder them as they work? Then, imagine those same systems,

processes, and operating models as services that are no longer siloed by task or department. Instead, they

are horizontal, extending across organizational boundaries for use by internal and external customers,

business partners, and suppliers, among others. What opportunities can you identify?

What new products and service offerings can EaaS enable? EaaS is as much a mind-set as it is a strategic and

operational vision. What products do you offer that could manifest as services? What operational verticals

could take on new life as horizontals?

Задание 2.1.

Правила оценки ответа:

1. Ответ содержит схему базы данных в общепринятой нотации в соответствии с заданием,

удовлетворяющую третьей нормальной форме, указаны типы и направления связи (0-4 баллов).

Page 4: 1.1. What methods and tools does the predictive analytics use? fileLogarithmic Regression and Multiple Linear Regression. ... Rapid Insight Veera, DMWay, Lavastorm Analytics Engine,

2. Ответ содержит подробное описание таблиц с расшифровкой имен полей, указанием типов и

свойств данных, ключевых полей (0-3 баллов).

3. Ответ содержит правильные и корректные запросы с использованием операторов языка SQL,

которые позволяют получить информацию в соответствии с заданием (0-8 баллов). Максимальное

количество баллов за один запрос составляет 4 балла.

Проверяющий выставляет интегральную оценку, основанную на вышеуказанных правилах.

Задание 2.2. Решение.

Обозначим через 𝑧𝑡 случайную величину, суть которой число частиц в момент времени 𝑡. Величина

принимает только два значения с вероятностями 0,1 и 0,9.

𝑧0 = 100 частиц, в терминах теории процессов гибели и размножения, образует нулевое поколение.

Каждая частица с вероятностью 0,1 порождает 2 новые частицы. Частицы каждого поколения

размножаются независимо друг от друга.

Математическое ожидание числа частиц в момент времени 𝑡: 𝜇𝑡, где – математическое ожидание

числа непосредственных потомков одной частицы.

В таком случае вероятность выживания частиц до момента времени 𝑡 = 10 включительно:

𝑝(𝑧10 > 0|𝑧0 = 100) = 1 − (1 − 𝜇𝑡)𝑧0

Ответ: 1,024 ∙ 10−5.

Задание 2.3. Решение.

По условию задачи, произвольная цепочка из языка Lang включает каждую из цифр 0, 1. При этом

слева и справа от каждого вхождения единицы стоит символ 0 или несколько символов 0.

Тогда легко видеть, что язык Lang - это множество всех цепочек вида

x[1]...x[k]0^m, где k ≥ 1, m ≥ 1, для i = 1,..., k x[i] - какая то цепочка вида 0^n 1, где n ≥ 1.

Пусть множество вспомогательных символов Help-set = {start, A, B, C}, и P - система продукций вида

start --> BAC (1)

B --> BA (2)

B --> пустая-цепочка (3)

A --> C 1 (4)

C --> 0C (5)

C --> 0 (6) .

Тогда легко проверить, что, применяя в любом порядке продукции из P как подстановки и всегда

начиная с символа start, можно вывести любую цепочку из языка Lang и нельзя вывести какую-либо

цепочку, не входящую в язык Lang.

Пример 1 - вывод из символа start цепочки y1 = 010.

start ==> (1) BAC ==> (3) AC ==> (6) A0 ==> (4) C1 0 ==>(6) 010.

Пример 2 - вывод из символа start цепочки y2 = 00100010.

Page 5: 1.1. What methods and tools does the predictive analytics use? fileLogarithmic Regression and Multiple Linear Regression. ... Rapid Insight Veera, DMWay, Lavastorm Analytics Engine,

start ==> (1) BAC ==> (2) BAAC ==> (6) BAA0 ==> (4) BAC10 ==> (5)BA0C10 ==> (5) BA00C10 ==>(6)

BA00010 ==> (4) BC100010 ==> (3) C100010 ==> (5) 0C100010 ==> (6) 00100010.

Задание 2.4. Решение.

Пусть 𝐵 – событие, состоящее в том, что откажут 𝑛 ≥ 2 процессоров (𝑛 – число процессоров); �̅� –

событие, состоящее в том, что откажут 𝑛 < 2 процессоров.

Совместное использование формулы Бернулли и теоремы сложения вероятностей несовместных

событий дает:

𝑃(�̅�) = 𝑝0𝑞𝑛 + 𝐶𝑛1𝑝𝑞𝑛−1 = 0.8𝑛(1 + 𝑛 4⁄ ),

где 𝑝 = 0.2, 𝑞 = 1 − 𝑝 = 0.8. Соответственно, вероятность события 𝐵:

𝑃(𝐵) = 1 − 𝑃(�̅�) = 1 − 0.8𝑛(1 + 𝑛 4⁄ )

Искомое число процессоров есть решение неравенства:

1 − 0.8𝑛(1 + 𝑛 4⁄ ) ≥ 0.9

Неравенство выполняется при 𝑛 ≥ 18.

Ответ: не менее 18 процессоров.

Задание 2.5. Решение.

По определению,

𝑀 = ∫(2 − 𝑞)𝜆𝑥[1 − (1 − 𝑞)𝜆𝑥]1 (1−𝑞)⁄ 𝑑𝑥

0

Первообразная от подынтегральной функции:

𝐹(𝑥) =1 + 𝜆𝑥 − (𝑞2 − 3𝑞 + 2)𝜆2𝑥2

(2𝑞 − 3)𝜆[1 + (𝑞 − 1)𝜆𝑥]1 (𝑞−1)⁄

Значение первообразной в нижнем пределе интегрирования:

𝐹(0) =1

(2𝑞 − 3)𝜆

Предел первообразной lim𝑥→∞ 𝐹(𝑥) = 0 только при 1 < 𝑞 < 3 2⁄ . При других значениях параметра 𝑞

предела не существует.

Следовательно, 𝑀 = [𝜆(3 − 2𝑞)]−1 при 1 < 𝑞 < 3 2⁄ .

Ответ: [𝜆(3 − 2𝑞)]−1 при 1 < 𝑞 < 3 2⁄ .