Upload
roger-fried
View
327
Download
1
Embed Size (px)
Citation preview
Impact of Data Science on the Future of the Finance Function
Roger Fried, Sr. Data Scientist at Teradata
2
Let’s Talk About Hadoop, MapReduce, Spark, and…, and …
No, not for this talk. For this talk, let’s focus on the basics.
3
The Future of Finance and Accounting• Various technologies employing Data Science are going to lead
to a hollowing out of the field where many tasks are much easier (executed perhaps by the CFO’s administrative assistant) and others will require more mathematical , technical, and people skills
• The changes will be so profound that they will blur traditional lines between Finance and Accounting and Data Science
Actually, the lines between all of the analytic functions will blur together with Data Science
4
Examples of “applied Data Science”• Software will automatically create both your PowerPoint
presentations and the text for your earnings call
• Any employee may make sophisticated queries of your financial performance with natural language questions (think Google combined with Siri tied to all of your financial data with better answers)
• Applications will use “big data” (especially web crawling among regulatory filings) to predict yours - and your competitors’ performance
5
Let’s talk about how a simple journey can change your perspective
6
The problem with averages (and Statistics)
Technically, this is an illustration of the Central Limit Theorem, but it works in this argument too.
7
Aggregates (averages, sums, minimums, maximums) all have the same effect
While your organization is not a fish, there is a complex story to be understood that is only visible with the detail.
8
Think of the financial statement and financial ratios as that poor fish, in a shot glass, after the blender
The ability to drill down into the detail with BI tools does not really address this issue even if it gives you a false confidence that you have a handle on the detail.
9
We need the detail: time series decomposition
10
We need the detail: automatic categorization
Image of a Decision Tree, one of the most intuitive of the categorization techniques
11
We need the detail: automatic clustering into comparable units
Combination of dimensionality reduction (PCA) and SVM and GMMclustering techniques
12
Human classification = Bias
The science of research design invests immense resources into dealing with this problem and yet scientific research is still plagued with this problem. Accounting barely recognizes the problem.
13
Accounting is pretty much the process of pouring bias into transaction data•Account•Department•Cost Center•Fiscal Period•Fund Account•Asset Category
14
Accounting categories limit analysis because they are… • Static and out of date•Driven by accounting system needs rather than analysis needs•Created by a limited number of people with particular, possibly incompatible viewpoints •Unified and standard• Biased by preconceived (or ill-conceived) notions
This is not to say that we have to stop the process of assigning accounting categories, but rather that we cannot limit ourselves to them during analysis
15
Techniques for analyzing all the data• i.e., Econometrics, Time Series
Decomposition• e.g., ARIMA models, SAX + Shapelets…
Forecasting•i.e., Unsupervised learning•e.g., KMeans, KNN, Hierarchical Clustering…Clustering•i.e., Supervised learning•e.g., Naïve Bayes, Decision Trees, GMM…Categorization•i.e., Feature selection and extraction•Principal Component Analysis, SVD, FA…
Dimensionality Reduction
16
Datamart / Enterprise Data
Warehouse
Accounting Transactions
Budgeting System
Inventory System
Asset Management
SystemEmployee
Hierarchy, H/R & T&E DataCustomer
Relationship Management
Integrating data for a more complete picture
17
•Needs to be at the multi-year transactional level – Aggregation is for reporting, not analysis– Detail is for analysis
• Start with, for example, 100 columns of data– Scale and bin numeric columns in multiple manners and arrive at
250 columns of data– Create alternative hierarchies and distance calculations for date
and categorical data to arrive at 500 columns of data• Data transformation may involve a large number of steps
which will be repeated many times during analysis
This data can be very awkward to use
We need the right tools and processes
18
Use cases for “core Data Science”• Automatic categorization• (Useful) Outlier identification and evaluation• Fraud and embezzlement identification• Performance evaluation•Organizational reengineering• Improved forecasting (Self learning and granular)• Radically new visualization of the enterprise
19
Thank you for your time today
Graph analysis of fund flows among and within companies
[email protected] produced with Teradata Aster