What’s Your - library.ashoka.edu.inlibrary.ashoka.edu.in/wp-content/uploads/2020/01/... · lets me read and write upside-down with great facility (I can’t write in cursive upside-down,

v

SQL Server and Machine Learning

What’s Your Superpower?

code

mag

.com

- TH

E LE

AD

ING

IND

EPEN

DEN

T D

EVEL

OPE

R M

AGA

ZIN

E - U

S $

8.9

5 C

an $

11.

95

NOVD E C2019

Azure Functions, Accessibility, Machine Learning, Data Visualization

An Introduction to Digital Accessibility

Get Started with Serverless Azure Functions

DEVintersection.com 203-264-8220 M-F, 9-4 EDT AzureAIConf.com

Follow us on: twitch.tv/devintersectionTwitter: @DEVintersection Facebook.com/DEVintersection LinkedIn.com/company/devintersectionconference/

Twitter: @AzureAIConf Facebook.com/MicrosoftAzureAIConference LinkedIn.com/company/microsoftazureaiconf/

DEVintersection.com203-264-8220 m–f, 9-4 est

GET THEINSIDER VIEW

MGM GRAND LAS VEGAS, NVNOVEMBER 18 – 21, 2019

REGISTER by JANUARY 13for a WORKSHOP PACKAGE and receive a choice of hardware or hotel gift card! Shown are samples of past hardware choices.

Xbox One X

Surface GoXbox One S

Surface Headphones

Powered by

SCOTT HUNTER

Director of Program Management .NET,

Microsoft

SCOTT HANSELMANPrincipal Program

Manager, Web Platform Team, Microsoft

SCOTT GUTHRIE

Executive Vice President, Cloud + AI Platform,

Microsoft

ERIC BOYD

Corporate Vice President,

AI Platform, Microsoft

JEFF FRITZ

Senior Program Manager, Microsoft

JOHN PAPA

Principal Developer Advocate, Microsoft

KATHLEEN DOLLARD

Principal Program Manager, Microsoft

BOB WARD

Principal Architect Azure Data/SQL Server Team,

Microsoft

ROBERT GREEN

Technical Evangelist, DPE, Microsoft

ANNA THOMAS

Data & Applied Scientist, Microsoft

150+ Sessions75+ Microsoft and industry expertsFull-day workshopsEvening events

Walt Disney World Swan and Dolphin

April 7– 9, 2020 Orlando, FL Workshops April 5, 6, 10

www.devintersection.com

www.azureaiconf.com

DEVintersection.com 203-264-8220 M-F, 9-4 EDT AzureAIConf.com

Follow us on: twitch.tv/devintersectionTwitter: @DEVintersection Facebook.com/DEVintersection LinkedIn.com/company/devintersectionconference/

Twitter: @AzureAIConf Facebook.com/MicrosoftAzureAIConference LinkedIn.com/company/microsoftazureaiconf/

DEVintersection.com203-264-8220 m–f, 9-4 est

GET THEINSIDER VIEW

MGM GRAND LAS VEGAS, NVNOVEMBER 18 – 21, 2019

REGISTER by JANUARY 13for a WORKSHOP PACKAGE and receive a choice of hardware or hotel gift card! Shown are samples of past hardware choices.

Xbox One X

Surface GoXbox One S

Surface Headphones

Powered by

SCOTT HUNTER

Director of Program Management .NET,

Microsoft

SCOTT HANSELMANPrincipal Program

Manager, Web Platform Team, Microsoft

SCOTT GUTHRIE

Executive Vice President, Cloud + AI Platform,

Microsoft

ERIC BOYD

Corporate Vice President,

AI Platform, Microsoft

JEFF FRITZ

Senior Program Manager, Microsoft

JOHN PAPA

Principal Developer Advocate, Microsoft

KATHLEEN DOLLARD

Principal Program Manager, Microsoft

BOB WARD

Principal Architect Azure Data/SQL Server Team,

Microsoft

ROBERT GREEN

Technical Evangelist, DPE, Microsoft

ANNA THOMAS

Data & Applied Scientist, Microsoft

150+ Sessions75+ Microsoft and industry expertsFull-day workshopsEvening events

Walt Disney World Swan and Dolphin

April 7– 9, 2020 Orlando, FL Workshops April 5, 6, 10

www.devintersection.com

4 codemag.com

TABLE OF CONTENTS

4 Table of Contents

50 POURing Over Your Website: An Introduction to Digital Accessibility Everyone knows that there are standards when it comes to building apps. And most people know that there are standards for accessibility. But did you know that writing accessible apps is better for everyone? Ashleigh shows you what to think about the next time you sit down to create something. Ashleigh Lodge

54 Best Practices for Data Visualizations: A Recipe for SuccessHelen shows you the ins and outs of creating really useful charts and graphs with Tableau. You’ll never make a boring old pie chart again. Helen Wall

Columns74 What Captain Marvel

Can Teach Us About ManagementDian spends an evening re-watching Captain Marvel with a group of friends and they realize that there’s a lot more to that movie than just a rollicking good film. Dian Schaffhauser

Departments6 Editorial

24 Advertisers Index

73 Code Compilers

Features8 Enhance Your Search Applications

with Artificial IntelligenceSearch is everywhere. But unless you add it to your app, you won’t find it there! Sahil examines the various search tools in the Microsoft ecosystem and shows you how to make the most of them. Sahil Malik

14 Synchronizing the In-Browser Database with the ServerCraig shows you how to gracefully resolve conflicts and synchronization issues with disconnected databases. Craig Shoemaker

22 Get Started with Serverless Azure FunctionsAzure Functions take care of most of the server-related problems tied to hosting. Julie shows you how to integrate them with your own app and then monitor the results. Julie Lerman

28 Women in STEM: An InterviewWhether you’re in the middle of your career or just starting out, women in science, technology, engineering, and math (STEM) have unique challenges. Listen in as Sumeya and Sara interview each other about it.Sumeya Block and Sara Chipps

32 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning SQL Server 2017 has machine learning services baked right in. If you’ve been wondering how to use it, you’ll be fascinated by what Jeannine serves up. Jeannine Takaki-Nelson

44 Emotional CodeWhether you know it or not, your code says something about you. Kate tells you how to read emotions in existing code and how to be a better member of the coding community when writing your own. Kate Gregory

US subscriptions are US $29.99 for one year. Subscriptions outside the US pay US $49.99. Payments should be made in US dollars drawn on a US bank. American Express, MasterCard, Visa, and Discover credit cards are accepted. Bill Me option is available only for US subscriptions. Back issues are available. For subscription information, send e-mail to [email protected] or contact Customer Service at 832-717-4445 ext. 9.

Subscribe online at www.codemag.com

CODE Component Developer Magazine (ISSN # 1547-5166) is published bimonthly by EPS Software Corporation, 6605 Cypresswood Drive, Suite 425, Spring, TX 77379 U.S.A. POSTMASTER: Send address changes to CODE Component Developer Magazine, 6605 Cypresswood Drive, Suite 425, Spring, TX 77379 U.S.A.

www.leadtools.com

6 codemag.comEditorial

Sock, Sock, Shoe, ShoeI was at the opera, and the main character pretended she was a man for most of the evening. She began the show by dressing so we could see that she was a woman pretending to be a man and I was struck by how she put on her shoes: first one sock, then the shoe, then the other sock,

EDITORIAL

6

Melanie Spiller

and the other shoe. That’s how I put on my socks and shoes.

The next day, I was at the acupuncture clinic, and I noticed someone putting on his shoes: both socks first, then both shoes, starting on the right both times.

I asked a few people, and everyone seemed to have really strong opinions about the correct or-der of such things. Several people explained that they put on both socks first “in case there was a fire.” (Wouldn’t you rather have one shoe than no shoes in a fire? What if you needed to stomp the fire out?) People were very committed to the “correct” order of things.

It got me thinking about what other things we do in a certain order by rote: brushing teeth, brush-ing hair, starting the car and putting on a seat belt. I walk every day and always take the same route or some version of the same route. I told myself it was because I loved the walk so much. So I decided to prove it. I walked the route in the opposite direction.

A walk that normally takes an hour and a half took two and a quarter hours! I had trouble getting across a major intersection—normally I crossed there an hour later when the traffic had died down. People I routinely nodded and smiled at and said good morning to didn’t recognize me (nor I them until I realized that I wasn’t paying attention). I noticed houses and shops that I’d never seen before, I got a whole different view of some massive construction, and I began to tire right where I usually got to my happy place.

I know that I’m a creature of many habits, so I decided that it was clearly time to shake things up. I sat to my work in a different place, listened to different music, practiced my music at a dif-ferent time of day, watched different things on television, wore different clothes than usual, did my shopping and laundry on different days, went to bed at a different time—even brushed my teeth in a different order. Some of the changes felt refreshing, some were annoying or felt like interference, and some, like the walk reversal, were interesting and informative.

The most useful thing was what I found energiz-ing—working in a different place, starting at the other end of the To Do List, changing the music I listen to when I’m working.

It seems logical that shaking up the usual or-der of things would be useful in a larger project with more staff, too. Not change for the sake of change if it’s disruptive, but maybe the alphabet could start in the middle every now and then.

Back in the day, when we used to edit on print-ed-out copies, I learned a neat trick for catch-ing things like repeated words or extra spac-es—things that are hard to catch under normal circumstances. Turn the page upside-down. As it happens, I have some peculiarity in my brain that lets me read and write upside-down with great facility (I can’t write in cursive upside-down, only printing, which makes me wonder if I learned this trick before I learned cursive), but even so, I found more errors like that than I did when edit-ing right-side up.

Another trick is for when I get stuck while writ-ing. Write it out of order! You don’t have to start at the beginning of the article/chapter/book/paper/story when you’re writing any more than you have to start at the beginning when you’re reading. Write the bits that interest you most or that are easiest to write, and then go back and introduce them. Or, if you’re really stuck, write each heading/chapter title/topic on three-by-five cards and toss them in the air. Start by writ-ing the cards that land right-side-up. Or the up-side-down ones. Boom! Responsibility for writing in order is over. Writing this way guarantees that you’ll go back and read from end-to end to make sure it all gels. (Believe me, your editor knows that you didn’t reread your article a single time because of all the stupid typos, but also, things written in haste read like things written in haste.)

Software development projects are commonly done out of order. Sometimes you need to test the viability of a project by tackling the most dif-ficult bits first. Sometimes you start with the out-put of a project: reports and user interfaces come to mind. Sometimes it’s just how you delegate responsibilities for features. Assigning people to

doing the various tasks in a large project gives you the opportunity for different views about how to proceed. Even the act of delegating tasks gives you different perspectives on timing, tool-ing, and approaches.

There are many ways to shake up your process, and I plan to be more interested in them, and to take note when I’ve fallen into a habit. Even so, it’s just wrong to put on both socks and then both shoes. That’s just crazy talk.

codemag.com

www.codemag.com/staffing

mailto:[email protected]

8 codemag.comEnhance Search Applications with AI

ONLINE QUICK ID 1911021

Enhance Search Applications with AISearch. It’s so ubiquitous, so easy, yet so difficult. Users expect to see that friendly search box in their applications. They seem to really like it, because it’s so simple to use. You don’t need a user manual to figure out search. In fact, if your application doesn’t have search, you’ll be pelted with negative reviews. No wonder you see search in so many applications. Yet, search is hard.

It’s very difficult to implement. We all know it’s more than just simple text matching. Even simple text matching isn’t easy. Those of us with database backgrounds know that searching for “prefix*” is a lot easier than searching for “*suffix”. And users want to do all sorts of weird searches like “*run*”, which should match ran, or shrunken or brunt, or—you get the idea. Quick search results and performance are important, as is accuracy and ranking. You almost have to read the user’s mind. And then there’s the whole idea of keeping your search results fresh. Not an easy task, is it?

What’s amazing is that all that complexity barely scratches the surface of the endless possibilities. About 70% of the data on the Internet is visual. Photos and videos. Another big part is audio. Wouldn’t it be useful to be able to search through audio and video as well?

Have you ever thought yourself asking a question such as, “I have this tune stuck in my head, what song is that?” Yes, we know there are apps that’ll do that on your phone. But what if that power was brought to your corporate world. Say, “Someone said xyz in a meeting or perhaps an email, or maybe it was document, I wish I could find out easily where xyz was said and by whom.” Personally, I struggle with this deluge of information every day. Finding that needle in the haystack when my boss is on the call with me is something I deal with far too often.

Search is incredibly powerful. It saves the user’s time. In this article, I’ll show you how you can build an application with a very gentle learning curve that allows you build such functionality and more.

But first, let’s start with clarifying the various search prod-ucts available in the Microsoft space.

Search ProductsIn the Microsoft ecosystem, there are multiple search prod-ucts with overlapping names. The Microsoft Department of Confusing Naming can sometimes do a great job, so it is best to clarify them first.

Microsoft has three search products.

The first is Bing, which you can find at www.bing.com. It’s an Internet-facing search engine, it’s free to use, and searches execute against the open, anonymous Internet.

The second is search under Cognitive Services, not to be confused with cognitive search under Azure search, which is an entirely different product. You can read more about Cog-nitive Services search at https://azure.microsoft.com/en-us/services/cognitive-services/directory/search/. But, to put it simply, this is your way to tap into the power of Bing, to create an ad-free search experience, completely brand-able to your requirements, available as a paid offering.

Finally, there’s Azure Search, which is the focus of this ar-ticle. Azure Search is one of the products under the Azure umbrella. It allows you to create your own private search cor-pus in the cloud. It’s best viewed as a cloud-hosted, Internet scale, search-as-a-service solution. It allows you to search your data, in an index you define, with documents you put in the index, at a schedule you define. All this, but with none of the complexity that’s typically paired with an enterprise-class search product. Microsoft Azure manages all of the infrastruc-ture complexity, and, as I mentioned earlier, I assure you, the learning curve here, is indeed quite gentle.

One pretty amazing capability of Azure Search is the abil-ity to enhance it with the power of AI using the cognitive capabilities of Azure Search. The typical process of search is to define the index, import data, and execute queries. Cognitive capabilities allow you to make further sense of the imported data. For instance, a video could be further deci-phered into the people appearing in the video, and text-to-speech capabilities could make the spoken text in the video searchable. Or you could use OCR capabilities to make the text in the images searchable. I’ll show you how to do all of this in this article.

Again, I assure you, the learning curve is quite gentle.

Create a Simple Search EngineThe best way to learn how to swim is to dive in. Without much further ado, let’s go ahead and build a simple search engine. I’ll do it using Azure Search, and I’ll explain the important concepts as I go along.

The first thing I’ll need is the data I wish to search. There are two ways to put data in an Azure Search index: Push and Pull.

Push Data into Azure SearchThe first way to get data into your Azure Search index is by pushing data into it. Azure Search comes with a REST API, or .NET and Java SDKs. You can choose to push any search-able data in the index using this push-based mechanism. Certainly, this has its advantages. You can now make almost anything searchable, as long as you can programmatically push the data in. Also, you control how and when new data becomes searchable. This means that if you have a specific requirement where new data must become searchable with a very short latency, the push-based mechanism is what you need.

At a high level, the process of pushing data involves you defining an index first. When you define an index, you get to define a lot of details, such as what columns in the entity are searchable, which columns are retrievable in search re-sults, which you can perform facets on, etc. Once you define such an index, you can push documents in that match that data structure.

Sahil Malikwww.winsmarts.com @sahilmalik

Sahil Malik is a Microsoft MVP, INETA speaker, a .NET author, consultant, and trainer.

Sahil loves interacting with fellow geeks in real time. His talks and trainings are full of humor and practical nuggets.

His areas of expertise are cross-platform Mobile app development, Microsoft anything, and security and identity.

9codemag.com Enhance Search Applications with AI

intend to make searchable. This table is rather interesting. It contains information about the customers’ names, their companies, contact names, contact titles, and so much more. Of special interest is the country column. It has 21 countries. Perhaps it would be useful to treat this column differently. For instance, maybe I want to issue queries for “sales representative” in Brazil.

But, before you can do any of that, you need to set up a search instance.

Create a Search InstanceCreating a search instance is rather simple. You simply at-tempt to create an instance of “Azure Search.” It’ll ask you the usual questions, such as what resource group you wish to put this search into, the name of the search instance, the location, etc. The most important question it’ll ask is what pricing tier you’d like to put this search instance into.

Search can be provisioned in free, basic, standard, or storage optimized tiers. Free is fine for this tutorial or testing simple systems. The biggest downside of the free tier is that you can’t scale it. Basic, standard, and stor-age optimized can be scaled, but storage optimized is op-timized more for storage—your indexing is quicker, but your query latency is poorer. You scale the search instanc-es via search units, which is a combination of replicas and partitions.

Partitions provide index storage, and IO for read/write op-erations. The more partitions, the quicker the indexing. Replicas, on the other hand, are instances of your search service. They’re used to load balance your query opera-tions. Each replica always hosts once copy of an index. If you have six replicas, you’ll have six copies of every index loaded onto the service.

It is important to realize that:

• There’s no feature set difference between free, basic, standard, and storage optimized.

• The only difference is scale.• Replicas aren’t your answer for disaster recovery. For

proper disaster recovery, you need to create a separate and identical search instance in another data center.

There is another tier, the S3HD tier. S3HD is designed for multi-tenant environments and it has a feature set differ-ence. Indexers are not available in S3HD.

For the purposes of this article, go ahead and provision an instance of Azure Search under the free tier.

Once inside Azure Search, you’ll see a number of interest-ing things. Right through the portal, you can choose to scale it. Because you went with the free tier, this will be disabled. All tiers, including free tier, give you access to keys. There are two kinds of keys: admin and query. The admin key can be used to programmatically affect search service configuration. You can create up to a maximum of two equivalent keys. Or you can create up to 50 query keys, which only let you query data. An admin key also lets you query data but an admin key is a lot more powerful than a query key; therefore you should use query keys for pure querying functions.

Have Azure Search Pull DataAzure search can also pull data in using indexers. An indexer in Azure Search is a crawler that extracts searchable data and metadata from an external Azure data source and popu-lates an index based on field-to-field mappings between the index and your data source. There are indexers available for Azure SQL, Azure Cosmos DB, Azure Blob Storage, and Azure Table Storage.

The process of using indexers is fairly straightforward. You first need a data source matching one of the supported data sources for which indexers are available. Then you can ei-ther define an index or set up an import data job. As a part of import data, the indexer can query the data structure and suggest an index structure to you, which you can tweak further. And then you can perform a one-time import or set up a recurring schedule for newer data to become available in search results.

The obvious advantage of having Azure Search pull data is that you can set it up with simple point and click. The disadvantage is that you can only pull data for data sources from whom an indexer is available. And you have to wait for the indexer to run again for newer data to show up in search results.

Although, it’s also important to consider that while index-ers are commonly set up to run on a scheduled job, it’s also possible to run an indexer on demand using the REST API and the command shown below.

POST https://[service name].search.windows.net/indexers/[indexer name]/run?api-version=2019-05-06api-key: [Search service admin key]

You can also get the status of the current running indexer easily, as shown below.

GET https://[service name].search.windows.net/indexers/[indexer name]/status?api-version=2019-05-06api-key: [Search service admin key]

Although these operations give you some flexibility, they won’t be as efficient as just pushing a document into an index, like the push mechanism allows you to do. That’s because when kick-starting an index and getting the status of an index, it still has to find all new changes and then pull them in, one by one. But hey, it’s a good middle ground between increasing index-ing results, and not having to write a lot of code.

Set Up a Data SourceFor the purposes of this article, I’ll index the Northwind da-tabase. Yes, that old tired, boring Northwind database. You can grab the script for the Northwind database from here, https://github.com/Microsoft/sql-server-samples/tree/master/samples/databases/northwind-pubs. Why did I pick Northwind? Well I didn’t have to. I just want a data source. But feel free to target any other similar content source.

Once I set up the data source, I see the usual Northwind tables. One of those tables is the Customers table that I

10 codemag.com

and facetable. The Country column is a great candidate for these because there are fewer and more well-defined dis-tinct values in this column. You can see how I’ve set up my index in Figure 3.

Clicking Next creates the index and starts the first crawl to put data into the index. You can examine the progress of your search instance on the overview page. If you wish to dive into the details of any particular indexer, you can view it under the indexer’s tab on the overview page. Certainly, all this information is also accessible via the API.

Because the Customer’s table has only 91 records, in almost no time, you’ll see the indexing operation completed.

Execute Search QueriesYou can execute search queries directly through the Azure portal. Remember this is for you, the administrator, to test

Import Data and Create an IndexRight through the Azure portal, at the very top, you’ll also see a button to import data. This can be seen in Figure 1.

Clicking on that import button gives you a simple form to fill in, where you can point this indexer to your SQL server. Feel free to explore what other indexers are available to you.

You can see the configuration information I provided, in Figure 2.

Once you click the Test connection button, you’ll see a drop-down appear with all the tables and views that can be indexed. Click the next button—for now, I’ll skip all settings, including Cognitive Services details, and go to Customize target index. Here, you can define which columns are re-turned in search results, which columns are searchable, and you can even define which columns are filterable, sortable,

Figure 1: The import data button at the top of your search instance overview page

Figure 2: The import data screen for Azure SQL database

Figure 3: My Azure Search index

Enhance Search Applications with AI

11codemag.com

Leveraging the Power of AIYou have a neat little search engine and it wasn’t too hard to create. Everything I showed via the Azure portal can also be built using the REST API or the .NET or Java SDKs. And remember, this example that I showed used an indexer to query the Northwind database. What if you don’t have an in-dexer for the objects you wish to have made searchable. For instance, what if the data resides in an ERP system that has a weird arcane Web API? You can still push the objects in, in a neat and clean JSON format that matches your index.

“Neat and clean JSON format.” Did that make you hiccup? We all know that the real world is hardly neat and clean. The real world is messy. So in my next example, I’m going to leverage the power of AI to make sense of unstructured data via search.

In order to do so, I’ll use a fantastic capability of search called Cognitive Search. Put simply, Cognitive Search is a bunch of skillsets that leverage the power of AI to make sense out of unstructured data. For instance, you can OCR text out of images and make those images searchable. You can submit a bunch of pictures and have AI recognize celeb-rities in those pictures. Or you can do speech to text and so much more. Where the out-of-the-box abilities fall short, you’re welcome to write your own skill.

For this part of the article, I had a hard time coming up with a good example, so I just took a screenshot of this article I am writing. Seriously, the text you see here, unedited so far by the editor, is a screenshot I took of it and decided to make it searchable. The goal is, via OCR, I want to be able to search through the text of this article. You’re welcome to make this more compelling by uploading pictures of other kinds, such as landmarks, celebrities, your dog—whatever floats your boat.

Back in the search service instance, go ahead and delete the previous index. I’m doing that just to keep my search results clean.

the queries. To integrate searches within your applications, you need to make a REST call to a request URL, with the api-key header. The value of the header is the query key.

Here’s a little tip. You’ll pay for data egress costs, but you only pay for what leaves the data center. So, if you have a Web front-end for the search results, place it in the same data center as your search instance. That way, you only pay for the data egress once.

Let’s execute some search queries now. Under the search explorer button, as can be seen in Figure 1, type in a search query. For instance, I’m trying to search for “London.” This can be seen in Figure 4.

That’s fantastic! Just like that, I was able to search for all cus-tomers that had the word “London” anywhere in their entity.

I can even do some wildcard searches; for instance, try searching for AR*. You’ll see that all of the objects returned have “ar” somewhere in the object. Also note that the re-turned object, as can be seen in Figure 4, contains all of the columns that you marked as retrievable when defining the index.

Remember that the Country field was special? You made it filterable. Can you search for AR* in just the UK? Sure, just use the search query like this:

AR*&$filter=Country eq 'UK'

This simple query should now show you the customers with the pattern AR only from UK. Integrating this within your application is also quite trivial. All you need to do is pick the request URL from Figure 4 and execute a simple REST call to that URL with your query key in the api-key header.

Congratulations, you’ve just made yourself a neat little search engine with your data.

Figure 4: Searching for London


12 codemag.com

ing upon your input data, which may be more than just im-ages, feel free to check whatever seems fit.

Next, choose to customize the target index. I’ll simply ac-cept the defaults, as shown in Figure 7.

Finally, choose to create the index.

Clicking the submit button causes the search engine to crawl the document. In my case, it’s a simple screenshot of one page of text, so it shouldn’t take too long. You can click the Refresh button you see in Figure 1 to keep up to date with your progress.

Once the search crawling is done, visit the search explorer and execute a search. For instance, I used the phrase “Seri-

Now choose to click on “Import data” as shown in Figure 1 and choose Azure blob storage as the data source. Choose to go with the default parsing mode and choose to extract Content and Metadata and point it to wherever your image is located.

Click Next to add cognitive skills. Here’s where things be-come interesting. Under the Add enrichments section, choose to Enable OCR and merge all text into merged_con-tent field as shown in Figure 5.

Notice the other capabilities you can tap into, as shown in Figure 6.

I know my data is a simple screenshot of this article with just text, so I’ll skip checking all those textboxes. Depend-

Figure 5: Enabling OCR

Figure 6: Additional cognitive skills

Figure 7: The target index


13codemag.com

vice solution, hosted in the cloud. This means that you can now bring the power of search into your applications with ease. Did you notice that there was no code in this article? Well, everything I showed can be done via the SDK or the REST APIs. That’s the gentle learning curve of Azure Search putting all that power in your hands with such ease.

And then you add AI to the mix and the power multiplies exponentially. Now you can search any kind of content. You can issue search queries in various languages. You can make sense of completely unstructured data. Have you ever run into a law firm saying, “we have so many documents and we wish we could search through them easily” and they wish to keep their data private?

Azure search is your answer, and it’s an answer to many other commonly heard problems.

How will you use Azure Search? Do let me know.

Until next time, happy searching!

ously, the text you see here”, so let me just search for the word Seriously.

The search results can be seen in Figure 8.

This is truly mind-blowing. I just did an OCR search of an im-age. And it doesn’t even begin to scratch the surface of the possibilities here. For instance, if you were a media com-pany and all your photos, audio, and video were in Azure BLOB storage, just through a simple point-and-click, you could make all that media searchable.

Then you could issue a search query such as “Picture of Satya Nadella” and it’ll show pictures of Satya Nadella, as-suming your media library had such pictures. Or you could search “a dog lying on grass” and it’ll match the pictures. Or you could even issue queries in English, and it’ll match non-English documents via the magical powers of AI-powered language translation.

SummarySearch is a great feature to have. Users find it useful. So much so that they almost demand to see it in your appli-cations. But building a search engine is a non-trivial task. There are products out there that will help, but they’re ex-pensive to both buy and to run. And they need hardware, expensive and powerful hardware. Azure Search eliminates all such complexity by providing you with a search-as-a-ser-

Figure 8: Searching an image using OCR.

Sahil Malik


14 codemag.comSynchronizing the In-Browser Database with the Server


Synchronizing the In-Browser Database with the ServerWhether you’re building a traditional distributed system or an offline Web app, synchronizing data and reconciling conflicts are accompanied by some hard realities. Sometimes data gets stale, sometimes users update the same data simultaneously, and sometimes synchronization attempts fail. This article demonstrates how to gracefully resolve conflicts and synchronize

disconnected databases. The examples explored in this article demonstrate how to work with the PouchDB API (Listing 1) as well as how to create a to-do list application that synchro-nizes with server (Listing 2 and Listing 3). Figure 1 shows a screenshot of the running application. The application is available on GitHub at https://github.com/craigshoemaker/synchronize-dbs-demo.

Different Databases in Different ContextsCouchDB (http://couchdb.apache.org) is a server-side multi-master-document database that seamlessly synchronizes data among disconnected database instances. As data changes, a complete revision history for each document is stored, giving CouchDB the context to handle synchronization and resolve conflicts. As databases are synchronized, the revision history is used to decide which revisions prevail among the different versions. When dealing with conflicts, the revision informa-tion is used to allow users to select winning revisions.

A core aspect of CouchDB known as “eventual consistency” means that changes are incrementally replicated across the network. This same principle is at work when dealing with databases found inside a Web browser.

PouchDB (https://pouchdb.com) is a browser-based database interface that’s tailor-made to synchronize with CouchDB.

In the same fashion as with multiple server instances of CouchDB, data from PouchDB synchronizes with server-side databases. This means that data manipulated in a discon-nected state from the server can seamlessly flow up to the server.

PouchDB is a JavaScript implementation of CouchDB that uses IndexedDB, and, on rare occasion, Web SQL. The fol-lowing similarities exist between PouchDB and CouchDB.

• The APIs are consistent. Although not identical, much of the code you write for PouchDB works directly against CouchDB.

• PouchDB implements CouchDB’s replication algo-rithm. The same rules are enforced on the client as exist on the server that decide how data is synchro-nized across multiple database instances.

Craig Shoemakercraigshoemaker.net@craigshoemaker

Craig Shoemaker is a devel-oper, author, speaker, and Senior Content Developer for Microsoft on the Azure Functions team. From building samples, internal tools, and writing articles, Craig helps developers around the world learn to build serverless applications.

As a Pluralsight author, Craig specializes in teaching JavaScript, HTML5, and IndexedDB.

In the future, Craig wants to learn how to tell a joke.

Figure 1: Screenshot of running application

PouchDB is a browser-based database interface that’s tailor-made to synchronize with CouchDB. This means that data manipulated in the browser can seamlessly flow up to the server.

15codemag.com Synchronizing the In-Browser Database with the Server

Document RevisionsSynchronization is made possible by carefully tracking document revisions. Each document revision generates a unique identifier, known as the revision ID. There are two parts to a revision ID. The first part is a human-readable incrementing integer. The second part of the revision ID

• HTTP as a core transport. CouchDB exposes RESTful HTTP/JSON APIs that allow direct access to data. Ex-posing data through HTTP side-steps the data access layers often required to work with other databases. PouchDB capitalizes on this feature and sends JSON payloads via HTTP to interface directly with CouchDB.

let localDB;

const api = {

init: async () => {

const databaseName = ‘people’;

localDB = new PouchDB(databaseName); await localDB.destroy();

localDB = new PouchDB(databaseName); console.log(localDB);

api.seed(); },

add: async () => { const person = { _id: ‘craigshoemaker’, name: ‘Craig Shoemaker’, twitter: ‘craigshoemaker’ };

const response = await localDB.put(person); console.log(response); },

get: async () => { const person = await localDB.get(‘craigshoemaker’); console.log(person); },

update: async() => { const person = await localDB.get(‘craigshoemaker’); console.log(person);

person.github = ‘craigshoemaker’;

const response = await localDB.put(person); console.log(response); },

remove: async () => { const person = await localDB.get(‘craigshoemaker’); console.log(person);

const response = await localDB.remove(person); console.log(response); },

getAll: async () => { const options = { include_docs: true, conflicts: true };

const response = await localDB.allDocs(options); console.log(response);

return response.rows; },

syncer: {},

sync: (live = true, retry = true) => { const options = { live: live, retry: retry

};

api.syncer = localDB.sync(remoteDB, options); syncer.on(‘change’, e => console.log(‘change’, e)); syncer.on(‘paused’, e => console.log(‘paused’, e)); syncer.on(‘active’, e => console.log(‘active’, e)); syncer.on(‘denied’, e => console.log(‘denied’, e)); syncer.on(‘complete’, e => console.log(‘complete’, e)); syncer.on(‘error’, e => console.log(‘error’, e));

// syncer.cancel(); },

resolveImmediateConflict: async (selectedSource) => {

const record = /database/i.test(selectedSource) ? databaseRecord : incomingRecord;

const person = localDB.get(record.id); person.title = record.name;

const response = await localDB.put(person); },

resolveEventualConflict: async (id, winningRevId) => { const options = { conflicts: true };

// get item with conflicts const item = await localDB.get(id, options);

// filter out wanted item let revIds = item._conflicts; revIds.push(item._rev); revIds = revIds.filter(conflictId => conflictId !== winningRevId);

// an array of items to delete const conflicts = revIds.map(rev => { return { _id: item._id, _rev: rev, _deleted: true }; });

const response = await localDB.bulkDocs(conflicts); },

seed: async () => { let counter = 0;

await localDB.bulkDocs([ { _id: ‘jimnasium’, name: ‘Jim Nasium’ }, { _id: ‘ottopartz’, name: ‘Otto Partz’ }, { _id: ‘dinahmite’, name: ‘Dinah Mite’ } ]);

console.log(‘The local database is seeded’); }};

Listing 1: Working with the PouchDB API

16 codemag.com

const app = new Vue({

el: ‘#app’,

data() { return { localTodos: [], remoteTodos: [], localTitle: ‘’, remoteTitle: ‘’, isLiveSyncing: false } },

created() { this.getAll(); },

methods: {

async add(location) { const title = this.localTitle.length > 0 ? this.localTitle : this.remoteTitle;

const todo = { _id: (new Date()).toISOString(), title: title }; const response = await db[location].put(todo); console.log(response);

this.localTitle = ‘’; this.remoteTitle = ‘’; },

async get(location, _id) { return await db[location].get(_id); },

async update(location, item) { const todo = await this.get(location, item._id);

todo.title = item.title; const response = await db[location].put(todo); return response; },

async remove(location, item) { const todo = await this.get(location, item._id); return await db[location].remove(todo); }, async getAll() {

const options = { include_docs: true, conflicts: true }; const localData = await db.local.allDocs(options); this.localTodos = localData.rows;

const remoteData = await db.remote.allDocs(options); this.remoteTodos = remoteData.rows; },

manualSync() { db.synchronize(); },

liveSync() { db.synchronize(true); this.isLiveSyncing = true; },

cancel() { db.cancel(); this.isLiveSyncing = false; } }});

Listing 3: Implementing a synchronized todo list – using Vue.js

readable. When you create a document in the database, the first revision ID is generated, as shown in the following ex-ample.

1-abc

is a GUID-like value that’s generated by the database API.When you create a new document in the database, the revi-sion ID is prefixed with the number 1 followed by a GUID-like value. In the following examples, a three-letter string is used instead of an actual GUID value to make the examples

const db = {

local: new PouchDB(‘todos’),

remote: new PouchDB(‘http://localhost:5984/todos’, { skipSetup: true, auth: { username: ‘smoothwookie’, password: ‘ThisIsMyPassword!’, }, }),

_sync: {},

listenForChanges() { this.local.changes({ since: ‘now’, live: true}) .on(‘change’, app.getAll) .on(‘error’, console.log);

this.remote.changes({ since: ‘now’, live: true}) .on(‘change’, app.getAll) .on(‘error’, console.log); },

synchronize(live = false, retry = true) { const options = { live: live, retry: retry };

this._sync = this.local.sync(this.remote, options);

this._sync.on(‘change’, e => console.log(‘change’, e)); this._sync.on(‘paused’, e => console.log(‘paused’, e)); this._sync.on(‘active’, e => console.log(‘active’, e)); this._sync.on(‘denied’, e => console.log(‘denied’, e)); this._sync.on(‘complete’, e => console.log(‘complete’, e)); this._sync.on(‘error’, e => console.log(‘error’, e)); },

cancel() { this._sync.cancel(); }};

db.listenForChanges();

Listing 2: Implementing a synchronized todo list, database setup

Synchronizing the In-Browser Database with the Server

17codemag.com

const add = async () => {

const person = { _id: ‘craigshoemaker’, name: ‘Craig Shoemaker’, twitter: ‘craigshoemaker’ };

const response = await localDB.put(person); console.log(response);};

The result returned from the database resembles an HTTP re-sponse code. When successful, the response from PouchDB returns a response with ok: true, the document’s unique identifier, and the revision ID value.

{ ok: true, id: “craigshoemaker”, rev: “1-747b2b81bf8ef992e8ec1f44aa737c48”}

Once you have the identifier and revision ID, you can access and manipulate the data as you wish. To retrieve a record from the database, you pass the document ID to the get method.

const get = async () => {

const person = await localDB.get(‘craigshoemaker’);

console.log(person);};

The response from the database includes the full document data including the unique identifier and revision ID.

{ _id: “craigshoemaker”, _rev: “1-747b2b81bf8ef992e8ec1f44aa737c48” name: “Craig Shoemaker”, twitter: “craigshoemaker”,}

As the document changes, the prefix is incremented by 1 and a new GUID is generated. Therefore, when you update the document, the revision ID prefix advances from 1 to 2.

2-def

Revision IDs are updated in this way in concert with any data changes. Even if you delete the document from the database at this point, the revision ID advances to 3 and the document metadata is marked as deleted. Tracking with revision IDs allows the database to maintain a full revision history of each document. By sustaining a running revision history for every document, the database has the context necessary to replicate changes among different database instances.

Working with PouchDBTo begin working with a database in the browser, you first need to reference the pouchdb.js script in your HTML page.

<script src="scripts/pouchdb.js"></script>

Next, inside a script tag or in a separate JavaScript file, create a new instance of PouchDB. The constructor accepts the database name.

const localDB = new PouchDB(‘people’);

As you create a new instance of PouchDB, the resulting ob-ject either points to an existing database or it creates a new database for you. In this case, a new IndexedDB database is created in the browser. PouchDB uses one of a series of adapters to interface with different databases. If you in-spect the localDB instance in the browser console, notice that the adapter, as shown in Figure 2, is set as idb. This alludes to the fact that in the browser, PouchDB is using the IndexedDB adapter.

PouchDB is architected with a Promise-based API that pro-vides an opportunity to use JavaScript’s async/await syntax when calling methods. The following snippet demonstrates how to add a new object to the database by calling the put method.

Figure 2: Create a local instance of PouchDB.


18 codemag.com

id: “craigshoemaker”, rev: “2-101931707fec4f12ff20776d94690c9f”}

To retrieve a list of documents from the database, you use the allDocs method. The response from allDocs varies depend-ing on the options you provide. In the following snippet, the include_docs: true option is set, which tells the method to return full document data along with the query. The default value for include_docs is false and when not enabled, the only information returned from allDocs is the _id and _rev values.

const getAll = async () => {

const options = { include_docs: true

};

const response = await localDB.allDocs(options);

console.log(response);

return response.rows;};

Updating data in the database requires that you have the latest revision ID associated with a specific document. Of-ten, the most reliable way to reference the latest revision ID is to get the latest version of the document from the data-base just prior to updating values. To update the document, you can call the get method, add or update the object’s val-ues, and then call the put method to persist changes to the database.

const update = async () => {

const person = await localDB.get(‘craigshoemaker’);

person.github = ‘craigshoemaker’;

const response = await localDB.put(person); console.log(response);};

Once updated, the response from the database includes the new revision ID, as shown in the following code snippet.

{ ok: true,

Figure 3: Return value from the allDocs method

Figure 4: State of the document after removal


19codemag.com

{ ok: true, id: “craigshoemaker”, rev: “4-ffc5ec971505cfb9b37318877441e646”}

The revision ID starts with a 4 instead of a 1, even though a new document is inserted into the database. Building on these API basics, you can begin synchronizing data between two databases.

Synchronizing with the ServerTo synchronize with the server, you first need to create an instance of PouchDB in the client script that points to the server-side database. By providing PouchDB with a URL and authorization credentials, the browser creates a secure con-nection to the remote database.

const remoteDB = new PouchDB( ‘http://localhost:5984/people’, { skipSetup: true, auth: { username: ‘account_user_name’, password: ‘secret_password’, } });

When you create an instance of PouchDB against the server, the adapter used is http, as shown in Figure 5. This means that each call to the PouchDB API is ultimately expressed as an HTTP call to CouchDB over the network. The benefit to you is that your application code remains unchanged re-gardless of whether your commands are against the local database or the server.

This example uses the PouchDB Authentication (https://github.com/pouchdb-community/pouchdb-authentication) plugin to handle authentication with the remote server. The plugin allows you to add options to the constructor that authenticates your connection to the server.

The response, as shown in Figure 3, includes a rows array that holds data from the database. Inside each element the id and key values are copied from the data document to make working with the data easier, and the entire docu-ment’s data is available via the doc property.

Removing a document from the data also requires reference to the unique identifier and latest revision ID values. The best way to get the latest values is to call get immediately before attempting to remove the document from the database.

const remove = async () => {

const person = await localDB.get(‘craigshoemaker’); const response = await localDB.remove(person);

console.log(response);};

The response from the database is reminiscent of the re-sponse returned from the get method. Here, you get back the document’s ID and a new revision number.

{ ok: true, id: “craigshoemaker”, rev: “3-70fb7e034b076663cd6861a46516c7f9”}by

Internally, the database hasn’t deleted your record, but has marked it as deleted by adding the _deleted property to the document. Figure 4 shows how a deleted record appears in the database.

In fact, if you tried to create a new document in the database with the same primary key value, instead of getting an entirely new revision ID, the database returns a document with a revi-sion ID incremented from the deleted state. The following snip-pet shows the database’s response after creating a new docu-ment with the same ID as the previously deleted document.

Figure 5: Create a remote instance of PouchDB


20 codemag.com

the CouchDB and PouchDB APIs make conflict management a first-class concern. The dual nature of the revision ID allows the database to resolve different types of conflicts. Conflicts are managed by continuously evaluating the revision ID during any operation that manipulates data. There are at least two differ-ent types of conflicts that arise when using PouchDB.

Immediate and Eventual ConflictsAn immediate conflict arises when you attempt to save changes to a document, but the revision ID provided is older than what’s in the database. For instance, let’s say a new record added to the database results in a revision ID of 1-aa1. As the document is updated, the revision ID be-comes 2-aa2. If the first revision of the document (1-aa1) is cached somewhere and the user tries to persist the version of the document while the database holds a newer version, an immediate conflict is encountered.

To handle conflicts, any operation that manipulates data should be nested inside a try/catch block giving you the chance to handle conflicts.

try { const response = await db.put(person);} catch(error) { if(error.name === ‘conflict’) { // handle conflict } else { // handle other error }}

The following code snippet shows the error object returned from the database during an immediate conflict. As conflicts are encountered, PouchDB returns a 409 (conflict) error.

{ “status”: 409, “name”: “conflict”, “message”: “Document update conflict”, “error”: true, “id”: “2019-06-08T12:33:00.169Z”, “docId”: “2019-06-08T12:33:00.169Z”}

The easiest way to resolve this conflict is to fetch the docu-ment’s latest version, update the required values and then attempt to save the document again.

Once you have instances of PouchDB that point to both the in-browser database and the server, you can then begin to synchronize data between the two.

The code required to handle synchronization accepts a few options. During synchronization, you can create a persistent live connection and choose to retry failed attempts. The fol-lowing example creates a function that sets up synchroniza-tion between the local and remote databases.

let syncer = {};

const sync = (live = true, retry = true) => {

const options = { live: live, retry: retry };

syncer = localDB.sync(remoteDB, options);

syncer.on(‘complete’, e => { // handle complete });

syncer.on(‘error’, e => { // handle error });};

The syncer object is declared outside the sync function so that you have access to the synchronization instance throughout the application. The arguments defined in the function allow you to select if you want to establish a live connection and whether you want to retry failed synchroni-zation attempts.

As the databases are synchronized, data flows in a bi-di-rectional direction. Data added to the remote database is replicated to the local database, and vice versa. Ultimately, the sync method is a wrapper for CouchDB’s underlying rep-lication feature. As data is replicated among individual data-bases, conflicts are not just a possibility, but an inevitability.

Managing ConflictsDealing with conflicting data sits at the heart of any attempt to synchronize databases. Embracing the inevitability of conflicts,

Figure 6: Conflicts array in a document


21codemag.com

Painless IndexedDB

Even if you don’t have the need to synchronize data with a remote database, PouchDB is still a great library for working with IndexedDB. The API for IndexedDB is notoriously difficult to use and based on callbacks and events rather than Promises. PouchDB makes adding, editing, and deleting data from IndexedDB easy to use and comes complete with modern JavaScript syntax support.

Craig Shoemaker

By contrast, an eventual conflict happens when a revision ID is mismatched during a synchronization attempt. Con-sider the situation when an existing document is updated in the browser and the resulting revision ID becomes 2-bb1. Then the same document is updated directly on the server, and that copy’s revision ID becomes 2-bb2. The document is updated for the second time in both locations, but the dis-connected databases are unaware of each other’s change. Eventually, the databases will synchronize together and the conflict for this document must be resolved.

The CouchDB replication logic handles conflicts seamlessly. As databases are synchronized, the replication algorithm automatically selects a revision as the winner for you. As the winning version is selected, the metadata of the document is flagged as being in a conflicted state and is associated with an array of revision IDs that represent the conflicted versions.

When you retrieve data from the database, you have the option to request conflicts associated with a specified docu-ment. The following example demonstrates how you can re-quest conflicts when calling the allDocs method.

const options = { include_docs: true, conflicts: true};

const response = await localDB.allDocs(options);console.log(response);

The result from this code returns a collection of documents from the database that includes an array of revision IDs that conflict with the current version of the document. Figure 6 shows a document in a conflicted state.

Storing conflicting revision IDs as document metadata al-lows your applications to always be aware of conflicts. By writing conflict-aware code, you can early and often allow users to resolve conflicts by giving them a chance to decide which version is ultimately the winner.

Resolving eventual conflicts involves fetching documents from the database with the associated conflict data. Con-flict data is not returned by default, so when you call the get method, you need to enable the conflicts option. Once data is returned from the database, you can allow the user to designate which revision is the desired version.

The following example extracts a document from the data-base with conflict information. The revision IDs are evaluat-ed to find the winning revision and the database is updated to mark all revisions as deleted except the winning revision.

// get item with conflictsconst item = await localDB.get( id, { conflicts: true });

// filter out item you want to keeplet revIds = item._conflicts;

revIds.push(item._rev);


revIds = revIds.filter( conflictId => conflictId !== winningRevId);

// delete the rest of the itemsconst conflicts = revIds.map(rev => { return { _id: item._id, _rev: rev, _deleted: true };});

const response = await localDB.bulkDocs(conflicts);

The call to get includes the document ID and an options object where conflicts is set to true. This tells the database to fetch the document with the matching ID and return an array of revisions IDs in an array named _conflicts. Next, the revision IDs are isolated into a variable named revIds. The current winning revision ID is added to the array with revIds.push and then the user-selected winning version is filtered out of the revision IDs array. Now the revIds array only contains values of revision IDs that aren’t selected by the user as the winning document version. These revisions are meant to be deleted from the database.

The map method is used to transform the revIds array into an array named conflicts. This becomes an array of objects that includes the unique identifier, the losing revision ID and the _deleted property set to true. This object array is then passed to bulkDocs to update the database revisions simultaneously.

ConclusionBuilt as a multi-master database from the ground-up, CouchDB makes conflict resolution a first-class concern. The replication logic, which powers synchronization, is ro-bust enough recognize conflict, temporarily select winning versions, and provide the context necessary to allow users to decide how to resolve conflicted data. In the browser, PouchDB is a JavaScript implementation of CouchDB and makes it easy to carry out not only simple data operations but to synchronize data from the browser to the server.

22 codemag.comGet Started with Serverless Azure Functions


Julie Lermanthedatafarm.com/blog @julielerman

Julie Lerman is a Microsoft Regional Director, Docker Captain and a long-time Microsoft MVP who now counts her years as a coder in decades. She makes her living as a coach and consultant to software teams around the world. You can find Julie present-ing on Entity Framework, Domain Driven Design, and other topics at user groups and conferences around the world. Julie blogs at thedatafarm.com/blog, is the author of the highly acclaimed “Programming Entity Framework” books, the MSDN Magazine Data Points column and popular videos on Pluralsight.com.

Get Started with Serverless Azure Functions“Serverless” is a hot tech term that doesn’t quite mean what it says. That’s because there truly is a server hosting serverless apps and functions. The point is that it feels like there’s no server because you don’t have to handle or manage one or worry about things like scaling because it’s all done for you. Serverless functions aren’t simply Web services that you’re hosting in the cloud.

The functions are event-driven and have a beautiful way to orchestrate a variety of services through configurable trig-gers and bindings, reducing the amount of code you have to write. You can just focus on the logic you’re trying to achieve, not on the effort of wiring up the orchestrations.

HTTP Requests, like a Web service, are just one type of event that can trigger your function to run. You can also wire func-tions up to other triggers, such as listening for changes in an Azure Cosmo DB database or a message queue. Other tasks you might want to perform can also be specified through con-figurable bindings that also don’t require code. For example, you can use an input binding to retrieve some data from a da-tabase that your function needs to process. Output bindings—again defined with configurations, not code—let you send re-sults to another service. Your function only needs to create the results in code and the binding will ensure that those results get passed on to their destination. A single function could be triggered by a write to your Cosmos DB database, then uses an input binding to gather relevant data from the database, and then uses a message queue output binding to update a cache. If you don’t need any additional logic, the function is totally defined by the trigger and bindings.

All of these bindings and triggers remove many of the re-dundant tasks that you might otherwise have to perform and they allow you to focus on the logic of the function. And the Azure Functions service takes care of all of the server-related problems tied to hosting. Integration with Applica-tion Insights lets you monitor your functions to observe how your functions are performing and being used.

The Structure of Azure FunctionsThe structure of Azure Functions is defined by a Function App that hosts one or more related functions. The app has its own settings that are secure by default. This is a good place to store details like connection strings, credentials, and more. Then each function within the app is a self-con-tained set of triggers and bindings with its own additional settings. The only thing the functions share are the sub-domain URL and the app settings. Figure 1 shows some of the Function Apps in my subscription. I’ve expanded the DataAPINode app so you can also see the three functions I created for that app.

Preparing Your Environment Although it’s possible to create functions directly in the portal, both Visual Studio and Visual Studio Code have ex-tensions that make it easy to develop, debug, and deploy Azure Functions. I’ll use VS Code and its Azure Functions extension.

I found an easy inspiration for a new function. I often need to know how many words I’ve written for things like confer-ence abstract submissions, etc. I find myself copying the text into Microsoft Word to get that count. You’ll get to create your own function that returns character and word counts for a given bit of text.

I’ll be using Visual Studio Code along with its Azure Func-tion extension and a few other related extensions. If you haven’t used Visual Studio Code before, I invite you to in-stall it (for free on MacOS, Linux, or Windows) to try it out as you follow along. VS Code is cross-platform and is a breeze to install. (Go to code.visualstudio.com to install and learn more.) You can use Visual Studio 2017 or 2019, which has a similar extension built into the Azure workload. The VS extension doesn’t have the same workflow, however. You can see how to get started with that in the docs and then come back to walk through the functions built in this article.

In VS Code, start by installing the Azure Functions exten-sion through the Extensions icon in VS Code’s Activity Bar (along the left side of the IDE). A prerequisite of the Azure Functions extension is that you install the Azure Functions Core Tools. There are links to the OS-specific installers in the Prerequisites section of the extension details (marketplace.visualstudio.com/items?itemName=ms-azuretools.vscode-azurefunctions). I’ll warn you now that even for Windows, it is an npm install, but it’s quick and painless.

You don’t need an account to use this extension unless you plan to follow the deployment test later on. Note that if you have a Visual Studio Subscription, you have an Azure account. If you have neither, you can get a free account at https://azure.microsoft.com/en-us/free/.

Creating Your First Function I began with a parent folder named Code Functions, in case I want to add more functions later. Then I created a sub-folder called WordCount. Open VS Code in the WordCount folder.

Next, you’ll use the Azure Functions extension to turn the WordCount folder into an Azure Functions project.

Click the Azure icon in the Activity Bar to show the Azure explorer. You’ll see a window for Azure Functions. If you’re logged into an Azure account, the Azure Functions explorer will also show you the Function apps in your account (Figure 2). I have a lot of demo function apps in my subscription. To save resources in my account, I’ve stopped them all while they’re not actively in use.

23codemag.com Get Started with Serverless Azure Functions

debug icon in the Activity Bar, you’ll see that the debug configuration is set to this. To debug the function, you can either click the green arrow next to that dropdown or F5.

When you run the function, the Azure Functions SDK starts by calling dotnet build on the project. Then it runs a com-mand from its own Command Line Interface (CLI): func host start. After this, you’ll see the Azure Functions banner (Fig-ure 4), which is a good clue that things are working.

Following some status messages, you’ll then see in yellow text:

Http Functions: WordCount: [GET,POST] http://localhost:7071/api/WordCount

The SDK will continue to perform tasks and report them in the terminal. Wait until it’s complete and displays the final message “Host lock lease acquired by instance ID …” Now it is ready for you to try it out. You can CTRL-Click (or CMD-Click) the URL which will open your browser where you’ll see the message “Please pass a name on the query string or in the request body”. That’s because you still need to pro-vide either a query parameter or a body. Modify the URL to include the name parameter, e.g., http://localhost:7071/

Hover over the Functions toolbar (as I’ve done in Figure 2) to see the extension’s icons to appear. The folder icon in the upper right-hand corner is to create an Azure Functions project in a folder. The bolt is to create a function inside the project. The up arrow is to deploy your function to the cloud, and the last one is a standard refresh icon.

Click the folder icon to create an Azure Functions project. You’ll get a number of prompts to walk you through the proj-ect set up.

1. Select the target folder for the project: Browse to the WordCount folder if needed.

2. Select the language for your project from a drop-down list: I’m choosing C# but other current options are Ja-vaScript, TypeScript, Java, and previews for Python and PowerShell.

3. Select a runtime: This prompt will show only if the core tools aren’t in your system’s PATH. Choose Azure Func-tions v2 to build a .NET Standard-based function.

4. Select a template: Choose HttpTrigger.5. Provide a function name: I named my function WordCount.6. Provide a namespace: I entered CodeMagFunctions.

WordCount.7. Select Access Rights: For this demo, I’m using Anony-

mous to make things easy.

That’s it. The extension will then build out the assets need-ed for this project.

You’ll see a new “stake in the ground” HttpTrigger function file in the editor and the assets from the template in the solu-tion explorer (see Figure 3). The full code listing is in Listing 1. Notice that there are four files in the .vscode folder inside the parent (CodeFunctions) folder. The extensions.json files is a VS Code recommendation file (code.visualstudio.com/docs/editor/extension-gallery#_workspace-recommended-extensions) to ensure that anyone who also works on this source will be alerted about installing required extensions. The WordCount folder has a few files in addition to the csproj and cs files. The most important of these new files is local.settings.json. This is where you store the local version of function app settings, like connection strings and creden-tials, if you’re using them. I won’t be doing any of that in this demo. When you have a Function App in Azure, which is comprised of one or more functions, there’s a set of appli-cation-level settings shared by all of its functions and you have to add them explicitly. That’s what these settings align with.

Testing the Default Function Before modifying our function for counting words, let’s see the template logic in action. The full listing of the file, ex-cluding using statements, is shown in Listing 1. This func-tion takes an incoming string, assuming it’s someone’s name, and responds with “Hello,“ to that name. The func-tion is flexible, checking for the name both in the HttpRe-quest query parameters and in a request body (if one exists and has a property called “name”).

Running this function isn’t the same as simply running a .NET Core app because it has to run the Azure Functions logic. The template that created the function project also created a VS Code Launch Configuration called “Attach to .NET Functions” in the launch.json file. If you click on the

Figure 1: Some Azure Apps and Functions in the Azure Portal for my account

1 public static class WordCount 2 { 3 [FunctionName("WordCount")] 4 public static async Task<IActionResult> Run( 5 [HttpTrigger(AuthorizationLevel.Anonymous,"get", 6 "post", Route=null)] HttpRequest req,ILogger log) 7 { 8 log.LogInformation("C# HTTP trigger function 9 processed a request.");10 11 string name = req.Query["name"];12 string requestBody = 13 await new StreamReader(req.Body).ReadToEndAsync();14 dynamic data = JsonConvert.DeserializeObject(15 requestBody);16 name = name ?? data?.name;17 return name != null18 ?(ActionResult)new OkObjectResult($"Hello,{name}")19 : new BadRequestObjectResult( 20 "Please pass a name on the query string or21 in the request body");22 }23 }

Listing 1: Default function code created by the HttpTrigger template

Figure 2: The Azure Functions explorer listing my live functions as well as icons for working with local code

24 codemag.com

I want the function only to read a request body, so I’ll re-move the code to read a query parameter.

11 //string name = req.Query[“name”];

And further down, I only need to read the data from the request body. While I’m at it, I’ll change the variable names from name to text.

old 16 //name = name ?? data?.name;new 16 var text = data?.text;

Now it’s time for the logic that reads the text and creates an output with the character and word counts. Rather than creating yet another Azure Function to perform that task, I’ll first added a sub class called DocObject to encapsulate the results of the analysis.

public class DocObject { public string Text { get; set; } public int WordCount { get; set; } public int CharCount { get; set; } public int NonSpaceCharCount { get; set; }}

Then I added a method (AnalyzeText) to the current func-tion.

private static DocObject AnalyzeText(string text){ var charsLen = text.Length; if (charsLen==0) return “ “;

api/WordCount?name=Julie. The browser will then display the response sent back from the function, which, in my case, is “Hello, Julie” as you can see in Figure 5.

When you’re finished testing the function, return to the VS Code terminal and press CTRL-C to stop the function.

Transforming the Template LogicNow let’s modify the file created by the template. The key change will be adding logic to perform character and word counts.

Figure 3: Initial result of creating a function project using the Azure Functions extension

Figure 4: The Azure Functions logo displayed in the terminal when the SDK is running properly

Figure 5: The response from the default HttpTrigger function


Advertisers Index

CODE Framework www.codemag.com/framework 75

CODE Staffing www.codemag.com/staffing 7

DeveloperWeek www.developerweek.com 49

DEVintersection Conference www.DEVintersection.com 2

dtSearch www.dtSearch.com 21

JetBrains www.jetbrains.com/resharper 76

LEAD Technologies www.leadtools.com 5

ADVERTISERS INDEXv

SQL Server and Machine Learning

What’s Your Superpower?

code

mag

.com

- TH

E LE

AD

ING

IND

EPEN

DEN

T D

EVEL

OPE

R M

AGA

ZIN

E - U

S $

8.9

5 C

an $

11.

95

NOVD E C2019

Azure Functions, Accessibility, Machine Learning, Data Visualization

An Introduction to Digital Accessibility


Advertising Sales:Tammy Ferguson832-717-4445 ext. [email protected]

This listing is provided as a courtesy to our readers and advertisers. The publisher assumes no responsibi-lity for errors or omissions.

25codemag.com

REST Client Extension

The Visual Studio Code REST Client extension by Huachao Mao (marketplace.visualstudio.com/items?itemName=humao.rest-client) has over 2.5 million downloads, so I’m not alone in my fandom. If you want to try it, go ahead and install the extension and follow my lead. Otherwise, use the tool you’re familiar with.

If you want to be sure that you haven’t made any typos, you can run dotnet build from the terminal window. I think that’s always a good idea.

Debugging the WordCount FunctionOne of the great features of the Azure Functions extension in VS Code as well as Visual Studio is that because you are writing it in the IDE, you can take advantage of all of the debugging features provided by the IDE. If you’re new to VS Code, you may not realize that it has a very rich debug-ging experience with break points, watch variables, Code Lens, and more. So, if you find that your function isn’t doing what you expect, you can set a breakpoint and debug as you would any other code.

Because the function now relies on a request body, you’ll need a way to compose the body and include it with the HTTP re-quest to the function for testing. You may already be a wiz with tools like Fiddler, Postman, or the browser developer tools. I’m a fan of another VS Code extension called REST Cli-ent Huachao Mao.

I created a new folder in the .vscode folder called RESTCli-entTesters to keep these out of my function project. Add a new file called WordCountTest.http. The HTTP extension isn’t required for the REST client to work. I just use it to identify my REST client files. Enter an HTTP request, de-fining the method, headers, and body. Here’s the simple request I’m using to start. The request body itself is the last three lines.

POST http://localhost:7071/api/WordCount HTTP/1.1content-type: application/json{ “text”: “Hey, I built a serverless function!»}

Before you can test the request, you’ll need to get the func-tion running again. Put some breakpoints into the code if you want to step through to watch the logic in action, use F5 to run it in debug mode. Remember to wait for the “host lease locked” message to know that the function is ready to accept requests. Then, with the WordCountTest.http editor window active, you can send the request. You can use keystrokes (Windows: CTRL-ALT-R; macOS Opt-Cmd-R) or press F1 and se-lect Rest Client: Send Request. The extension will open a new window to display the results returned from the function. The

var noSpacesLen = text.Replace(“ “, “”) .Length; char[] delimiters = new char[] { ‘ ‘, ‘\r’, ‘\n’ }; var wordCount = text.Split(delimiters, StringSplitOptions.RemoveEmptyEntries) .Length; var docObject = new DocObject { Text = text, WordCount = wordCount, CharCount = charsLen, NonSpaceCharCount = noSpacesLen }; return docObject ;}

These both go after the Run method. I had some help from https://stackoverflow.com/questions/8784517/counting-number-of-words-in-c-sharp to find an efficient way to count words that takes some punctuation into account.

Finally, you’ll create a string from the results of Analyze-Text to send back in the function’s result. Here is the new listing for the Run method of the WordCount function. I’ve removed the ILogger parameter in the method’s signature along with the log.LogInformation call from the original logic.

[FunctionName(“WordCount”)]public static async Task<IActionResult> Run( [HttpTrigger(AuthorizationLevel.Anonymous, “get”, “post”, Route = null)] HttpRequest req){ string requestBody = await new StreamReader (req.Body).ReadToEndAsync(); dynamic data = JsonConvert .DeserializeObject(requestBody); string text = data?.text; var httpResponseText = $@”CountText results: Total Chars:{docResults.CharCount} Chars No spaces: {docResults.NonSpaceCharCount} Word count: {docResults.WordCount}”; return text != null ? (ActionResult) new OkObjectResult(AnalyzeText(text) : new BadRequestObjectResult (“Please include text to analyze”);}

Figure 6: Using the REST Client extension to test the function


26 codemag.com

Advanced version of this option but choose the simple ver-sion for this demo. You’ll need to provide a new name for the function app. The prompt asks for a “globally unique” name. That doesn’t just mean unique to your account, but to anyone’s account in the whole world. That’s because the final URI will be a subdomain of azurewebsites.com. I chose CodeMagFunctions for this example. The simple version of “Create new Function App doesn’t give you the chance to se-lect a resource group or choose the location. The advanced option lets you specify these and additional settings for the new Function App. You can also modify settings in the Azure Portal after the fact.

After the extension creates the new Function App, it zips up the compiled function and pushes it up to Azure. You’ll get some status reports and then a notification when it’s done. At the end, you’ll be prompted to upload the settings. I didn’t need to do that, so I just closed that prompt window.

Because it’s a zip file deployment, the function code will be read-only in the portal. You’ll get a message about that with guidance to change an app setting if you want to edit directly in the portal. Essentially, if you’re creating these in VS Code or VS, the assumption is that you will make any changes in your IDE and then re-deploy the updated function.

result shown in Figure 6 tells me that my text has 35 charac-ters, 30 without spaces, and the word count is six.

Deploying My New Function to the CloudFor my new function to be truly useful, I’ll need to deploy it to Azure. Even with 30 years of software experience, the term “deploy” still makes my heartrate go up a little. Luck-ily, both Visual Studio and VS Code make it easy. Remember the “upload” icon in the Azure Functions explorer shown in Figure 2? As long as you’re logged into your account, there’s not much to deploying this function. In my case, I’ll need to ensure that the Function App is created first and then the WordCount function inside of it. Also, keep in mind those local settings I pointed out earlier. For this beginner function, I didn’t do much that involved settings, such as define bindings, provide connection strings, or credentials. You’ll get a chance to upload your local settings at the end of the deployment process.

Go ahead and click the upload button. You’ll follow a series of prompts as you did when creating the function. First, cre-ate a new Function App in Azure (as opposed to adding it to an existing function app). You’ll see that there’s also an

Figure 7: Response from running the second POST request with the REST Client extension

Figure 8: Cosmos DB document created by the Cosmos DB output binding


27codemag.com Get Started with Serverless Azure Functions

Julie Lerman

SPONSORED SIDEBAR:

Moving to Azure? CODE Can Help!

Microsoft Azure is a robust and full-featured cloud platform. Take advantage of a FREE hour-long CODE Consulting session (yes, FREE!) to jumpstart your organization’s plans to develop solutions on the Microsoft Azure platform. For more information visit www.codemag.com/consulting or email us at [email protected].

My function was published into the new Function App and the notification tells me the URI is https://codemagfunc-tions.azurewebsites.net/api/WordCount. You can continue to use the REST Client or your HTTP tool of choice to test out your live function. Note that I probably won’t leave my function public for long to avoid eating up the Azure credits associated with my Visual Studio subscription.

To test from VS Code, I added a new POST command to the WordCountTester.http file that points to the new URL. Then I was able to select that full command (lines 8 through 13), press F1 and run the REST Client Send Request com-mand again. The extension only ran the request I selected in the file. Figure 7 shows the new POST message and the response.

Next Steps: Try Out Some Bindings!I originally started my Azure Functions journey by build-ing and testing them directly in the portal where you can easily add trigger, input, and output bindings by clicking on UI elements. In the same time frame that my compre-hension increased of how the functions worked, the Azure Functions extension for VS Code also evolved. I eventually transitioned to using VS Code with the extension to build, debug, and deploy the functions. I’ve also created a number of functions that read data from Azure Cosmos DB with input bindings, store data into Cosmos DB with output bindings, and even sent text messages with an output binding for the Twilio API. You can see all of this in action in a recorded con-ference session I gave at Oredev in 2018 which is available on YouTube at https://youtu.be/fp9bB3L5utM.

I won’t detail how to do this in this already lengthy article, so here’s a quick look at what the bindings look like for a Cosmos DB trigger function. This does require an Azure account and an existing Cosmos DB account. See this docu-mentation to create a Cosmos DB account: https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-manage-database-account. In C#, the bindings are described, like the trigger in our WordCount function, as attributes of the Run method in code. This is quite different from how they’re configured in a JavaScript or C# script-based function where the configurations live in a JSON file.

I’ve modified the WordCount function’s Run method to in-clude a CosmosDb output binding attribute that will store the documents into a container called Items in my database with a connection string now defined in local.settings.json:

public static async Task<IActionResult> Run( [HttpTrigger(AuthorizationLevel.Anonymous, “get”, “post”, Route = null)] HttpRequest req) [CosmosDB ( databaseName: “WordCounts”, collectionName: "Items”, ConnectionStringSetting = "CosmosDBConnection”, CreateIfNotExists=true)] ICollector<doc> docs) {

There are three ways to ensure that the data intended for the output binding are discovered by the binding. One is to create an Out parameter in the signature. Because I’m us-ing an asynchronous method, that’s not possible with C#.

Another way to make sure the output binding is discover-able is to return the value. But I am already using return to send the HttpResponse. The third way is to use an ICollector object. This is also needed if you want to send back multiple documents. But I’m using the ICollector pattern to return a single document, as it solves the problem.

After my code has received the populated DocObject in the docResults, I’ll add that to the docs ICollector object:

documents.Add(docResults);

That’s all I need to do. The output binding takes care of the rest which is one of the truly amazing benefits of the bindings. Now when I run my function, not only do I get the HttpResponse, but the document shown in Figure 8 was added to my Cosmos DB database (which, by the way, was created on the fly thanks to the CreateIfNotExists setting).

Summing UpThere’s so much more to learn about using Azure Functions and how you can use them as part of your production solu-tions, not just little demo apps. I love working with tools that aren’t just productive but are a joy to use; the combina-tion of Azure Functions, Visual Studio Code, and the Azure Functions extension definitely falls into this category!

28 codemag.comWomen in STEM, an Interview


Women in STEM, an InterviewSumeya Block is a high school student who’s discovered that coding is creative, builds communities, and provides an excellent platform for activism. Through her explorations, she’s made a good friend of JavaScript coder Sara Chipps. The two of them interviewed each other and they’re letting us listen in. During this interview, we learn more about Sara’s own encounters,

Sumeya Block

Sumeya is a passionate writer, lover of creative expression, and a recent Teen Tix press corps writer. She’s currently in her sophomore year of high school and spends most of her time going to poetry slams, writing art reviews, and speaking at events. She’s been published in The Evergrey, Teen Tix blog, and in the Poetry on Busses contest. She has also pre-sented at CPP con and .NET Fringe. When Sumeya isn’t running around, she enjoys bingeing Netflix, reading books, and attending social events.

Sara [email protected]

Sara’s an engineering manager at Stack Overflow and the cofounder of http://Jewelbots.com. Sara was formerly the CTO of http://FlatironSchool.com and in 2010, she cofounded Girl Develop It, a non-profit focused on helping women become software developers.

advice, and work, and how what she had to say was inspiring to Sumeya—and can be to all of us. And we can also pick up that sense of wonder and excitement from Sumeya’s infec-tious interest. Here’s Sumeya’s introduction to the interview.

“When I think of inspiring women who are making a difference in the tech world, a few women come to mind. One is Sara Chipps, JavaScript lover, and co-founder of Girl Develop It, where women can learn computer programing skills online. She’s currently at Jewelbots (which she also co-founded). Jewelbots launched on Kick Starter just over six years ago and, since then, has been committed to getting girls inter-ested in STEM fields. Jewlbots currently sells two projects. One is a JewelBits science kit that sparks creativity through DIY neon-colored light-up signs. The second is a programmable friendship bracelet that can be used to talk to friends through Morse code. It can light up when paired with other bracelets and do even more as the users develop their coding skills.

“When I called Sara on a rainy New York night, she talk-ed admiringly about the women she works with and men-tors. She talked about how she and others (not just other women) can support their female coworkers by stressing the importance of reaching out and sharing opportunities. Talk-ing to Sara, I learned about why she continues working to create spaces for girls and women to learn about STEM. I learned more about her own encounters, advice, and work. And what she had to say was inspiring.”

Sara’s Answers as Asked by SumeyaWhen did you first become interested in tech and was there a moment where you knew you were going to be a computer programmer?

I was around 11 or 12. This was before the Internet existed, and there were these things called BBSs (bulletin board sys-tems), that were linked to your computer and were like early chat rooms. I used to hang out on those a bunch and real-

ized how much I loved computing and computers because they could make communities happen. I knew I was going to be a computer programmer in my senior year of high school. I took a C++ class with a teacher named Mrs. Gaul, and for the first time, I felt like the computers thought the same way that I thought—very logically.

Do you still see sexism and discrimination in the work place?

I definitely experienced it when I first started my career, I know a lot of women who ended up leaving the industry be-cause of it. I think the positive thing now, being on this side of my career, is that I can help mentor younger women and I can step in. Now that I’m older, I can step in when I see it happening to other women. I think it’s important that we’re all aware of and keep our eyes on these things.

Before you started Jewelbots, you were in Girl Develop It. How does your work in both organizations help to encour-age more diversity in STEM fields?

The thing they both have in common is helping to teach women and girls that coding isn’t something that’s impos-sible to achieve. It can be something that’s fun and power-ful. Often, I heard from women in Girl Develop It classes that they didn’t know what an engineer was until they got to college, and by then they felt it was too late to learn or take the classes needed. The interesting thing about what I do now with Jewelbots is to help encourage younger girls.

Would you say that the environment has changed since those first girls became women? Is it the same for kids now?

I’ve really seen a push to get more girls involved at a young-er age, and I think that’s really important. It’s important that we help girls understand that this is something that is for them, it’s something that can really help their lives, that it’s something that they can really have fun doing.

Jewelbots has really changed and it’s developed into a re-ally great community of girls. For me, that was something I always loved when I was first learning. As the CEO, how does the process of developing a product and working with beta testers like me change how you work?

I learned a ton! It really gave me respect for people who do product management and things that aren’t strictly engi-neering. Something I learned really early on is that a lot of assumptions that you make about a product can be wrong. Just because I happen to remember what it’s like to be a girl doesn’t mean that now, 20 years later, I understand what girls want today. One thing that it’s really taught me is to not make assumptions or pretend that I can understand what someone else might be facing just because I think I can imagine it or I think I can remember it from a long time ago. No matter what, it’s really important to talk to people and see what you are building with a product.

29codemag.com Women in STEM, an Interview

That’s one of my favorite questions. The age group is 8-14 because when I started talking to my peers who were cod-ing, my male peers, I started asking them why did you get into coding and your sister didn’t? What’s happening here? Why did you find this when the girls in your life did not? Usually what I heard is that they were in middle school or elementary and they found a game, got really into gaming, and decided “when I grow up, I want to be a game devel-oper.” Additionally, if you look at the research, you can see that somewhere in their preteen years is often when girls in western culture and the US start thinking about math and science as things that aren’t really for them. So that’s why we really wanted to aim for this age group, right at the same time they might be thinking that math and science aren’t for them. We want to really reach them with products that show them that math and science ARE for them.

The Jewelbots YouTube has tons of different challenge videos. What do you like about them?I like when the challenge videos focus on friendship stuff and doing cool colors with friends. I really like that because it re-quires more than one person interacting, which is super fun.

What do readers of CODE Magazine need to know in order to empower their female coworkers and young girls to keep pursuing fields of STEM and to feel encouraged to do so?

The best thing that people who already have a career can do is sponsoring. That means not just being a mentor to them, but also giving them opportunities that may come to you. For example, if someone is recruiting you for a job or to speak at a conference, if it’s something you’re not going to do or even if it’s something you want to, helping someone that you’re mentoring into that opportunity is a great way to sponsor them throughout their career. That might mean if you know a young woman who’s studying computer science, just make yourself available for questions and or talking things through. It’s important to make sure that they have an opportunity on the other side of school, too.

How should people reach out?

Anyone can reach out and just say, “how can I be helpful? or ask, “are you job hunting?” “Are you practicing for interviews?” “Are you facing anything at work that you could use some help with from someone with more experience?” Just making yourself available and saying, “how can I help?” is a great way to do it.

Sumeya’s Answers to Sara’s QuestionsWhy is coding important to you?

It’s really important to me because I know that in this society, coding has definitely become prominent in general. Especially in my age group, technology is just so prominent and I re-ally like coding as a great way to be creative. I haven’t really been able to do a lot of it since I started high school because high school is very demanding. But what I’ve always loved is the creativity about it. The community of getting to share the things you’ve done with other people. I think it’s important to know for the future because, like I said, it’s just so integral.

What’s the first thing you ever coded?

I’m pretty sure the first thing I coded was with my dad. We programmed this series of colors with the Jewelbots brace-

Is there anything you’ve changed about Jewelbots based on feedback received?

Initially, we were going to make a bracelet that could change colors to match your outfit instead of a friendship bracelet. I thought that was a great idea! I thought that I remembered what it was like to be ten, and I know I would have loved that. But when I started talking to some girls that age, they were like “that sounds really boring, I don’t think I would do that.” I was just like wow, it’s so good we talked to people before making this whole thing that they wouldn’t have liked.

Recently, Jewelbits was released. What is it? What can you do with them?

Jewelbits are STEM-themed craft boxes where you can learn certain science concepts. The first box is a Hello World Neon Box. It gives you all the components you need to make a neon- colored sign that lights up different colors. The point of the boxes is to introduce other STEM concepts. They are also less expensive than Jewelbots. One thing we heard from parents a bunch as we were selling Jewelbots is that the price point is a little high. That it’s not something that some people could afford to do, you know if it wasn’t a birthday or Christmas. The one thing we set out to do is make some-thing that’s more accessible pricewise and still delivers the great STEM content and education that Jewelbots does. Our next box is Hello World Lava, full of lava beads that you can make friendship bracelets with for your friends. It’s filled with tassels and letters, and the lesson is all about lava from volcanos. It’s about how hot lava gets, where it comes from, and how volcanoes work.

Q: How often are Jewelbits going to be released?

We plan to release a box every few months. So far, people have been having a lot of fun with them and making really cool neon-colored signs. I’m excited about that.

That sounds really fun! Are these lessons also being taught in science classes? What’s special about using these boxes, as opposed to learning about it in school?

That’s a really good question. Often, these are things that people are learning in science class. The difference here is learning through play, something we at Jewelbots believe in a lot. I was an okay student, but when I really cared about something and it was something that I could play with and have fun with, those are the things that I really remember now, 20 years later. The goal is more education and a better grasp and understanding of concepts through play.

Jewelbots has its own YouTube channel. In your opinion, why are these videos important and how do they create a community? Why is a community so important?

Community really helps with learning, whether it’s the You-Tube community or any other type of on- or off-line com-munity. People really respond when they see other people their own age doing the same things they are. I think that’s a really neat thing about kids and girls.

On the Jewelbots website, it’s stated that your age market is 8-14-year-old girls. What’s so special about this particu-lar age group? Why is Jewelbots targeted at this audience?

30 codemag.com

you’re using. I think the other ways that I’m going to be using coding in my activism is creating social media. Social media is a really great way to carry all these messages across that we want to talk about, informing people and using freedom of speech to share ideas and communicate between different places. I think coding social media is a really great way to do networking and that’s how it could be involved in activism and really working to empower everybody to learn.

Do you feel that your peers are interested in coding?

Yes, definitely. I don’t know if it’s like that all over the coun-try: I was just talking to my mom about that because I live in a very tech-centered city. In my school, we have a coding club and it’s full. All of my friends who are girls go to it. All of my guy friends go to it—everybody goes to it and it’s really great because when a computer programming class opened, no question, all of my friends signed up for it. I think what inspired them is that here, a lot of opportuni-ties have been pushed to get girls into coding. Ever since I was in fourth grade, we’ve had a lot of people come to talk to us about tech and I think what’s really great is that the schools I’ve been to have taken care to represent that there are males who work in coding and also females who work in coding. There are women in tech who work really hard.

We’ve had several instances at my school where they took us out of class to go to workshops to hear about women in STEM. For example, many of the girls in my class went to Nordstrom’s when they held a workshop and they talked all about entrepreneurship—coming from women from all differ-ent walks of life and ages. They talked about running websites and a lot of the things that you don’t know are happening at Nordstrom’s. It was really cool because they all had super powerful positions. Even if you’re not interested in coding, you still know that it’s an option. And my friends who are interested in coding know that they can be developers when they grow up if they want to be. With Jewelbots, I learned how important that is and seeing that we’re working to em-power girls and that it’s really working out and people are confident in their skills. That’s really awesome.

In that vein, what’s your favorite thing that you’ve built with coding?

I programmed this Pomodoro Timer, and I’ve talked about this several times, but it was really cool for me to see “Oh my God, I can change the colors!” Obviously, this was with Jewelbots. And once I learned even more about coding, I was able to create this program where I could set up a timer and it would go on for five minutes of rest and then 30 min-utes of work, and then five minutes of rest and I could actu-ally use that in my homework. For me, that was really excit-ing because it was a time when I was able to use code in my own life and incorporate it into my lifestyle. The second best thing is the game Catch the Leprechaun, which was really fun for me because it was really cool seeing what you could do with code. That you could actually create a game on a bracelet that you wouldn’t think you could do that much with was really exciting. So those are my two favorite things I’ve built to this day.

What is Sumeya’s life like in 2025?

Oh my God. I was actually just talking to my mom about this in the car and I was like, “Oh, that’s like 10 years away”

let. And it was my first time really getting into coding be-cause in my classes, I did one of those websites where it’s not really coding but you just learn the basics of it. I got to do that, but then with Jewelbots, I was able to do more raw coding—like going into Arduino IDE and actually putting the language in, and I just programmed a series of colors. I remember that was so exciting to me because I got to see what I created and I was like, “Oh wow, there’s a lot I can do with this.” And I told all my friends about it. I remember it was a really great bonding experience for me and my dad because he’s a computer programmer who really loves pro-gramming and he got to share that with me. So yeah, the first thing I coded was a rainbow series of flashing lights to match the outfit I was wearing. And then later on, I learned how to do the actual friendship coding aspects.

Can you tell me a bit about the activism work that you’ve done in the past?

I used to run a middle school youth group called “Besought Youth.” It was aimed at creating spaces for Muslims ranging from 12-14 years old, to just talk about being Muslim and really anything that they want to talk about. I didn’t have a lot of friends who were my age who were Muslim. And I didn’t always feel welcome to be able to speak up during things because adults sometimes are like, “No, you’re the kid. You need to sit down.” So it was really great because it was an environment where we kids could talk and we did a lot of community-based work when I was running it, which I no longer am because I’m in high school now. But when I was, we held a food drive and in the middle of the year, we worked on a book drive.

After that, some other activism that I’ve been doing is I’ve become involved with this really great organization called Kids4Peace. I’m in the Kids4Peace local chapter but there are chapters all over the world. We work on interfaith con-nections. It’s really great because we visit churches, syna-gogues, mosques, and all of these places of worship. You get to learn about all the different religions but then ad-ditionally, we do a lot of activism-based work about equity and working against racism, working with women’s rights, and of course discrimination against religions such as anti-Semitism and Islamophobia. That’s the kind of activism I’ve been doing right now.

That’s so awesome that you’ve done so much and you’re so young. I bet your parents are super proud.

I think that young people have to start standing up now be-cause there’s a lot of work to do. You can’t just sit down, you know? Everybody I know that’s around my age is really inter-ested in activism and is definitely working to fight for our world because you know, we have to take it up in like five years.

How do you see coding empowering your career as an ac-tivist in the future?

I definitely think I’m going to include coding in my activism. I think that the stereotype is “You have to be good at math” or “You have to be good at science to do computer program-ming” and those are two things I’m not particularly great at. I really think that I can use my voice to advocate that everyone should learn how to code. It doesn’t matter if it’s going to be part of your profession. It’s really important to learn be-cause it helps you to understand the world and the tools that

Women in STEM, an Interview

31codemag.com

because I’m bad at math. And I’m like wait. Oh my God. It’s 2019. I’m going to be a junior in college in like five years. That’s terrifying. My mom is hoping I go to UC Berkeley, which would be fun because I love San Francisco. I think that what I’ll be doing is something with writing and jour-nalism. I really love journalism. I think that I’ll be in college and definitely advocating for women, and I hope that I’ll be going to conventions, continuing my path of learning about coding because I think it’s important that everyone learns about it. And I also hope to continue my work with activ-ism, empowering women and empowering all people, mak-ing sure to amplify the voices to whom injustices happen so they can be heard and their needs met.

The people at CODE Magazine want to know: What is the best way to inspire and excite young women to code?

Let’s see, something that I really love is just seeing these opportunities that I’m getting and that my friends are get-ting, and I really hope that’s happening across the nation. I think when code is talked about and celebrated, it’s defi-nitely more exciting. Girls who learn about coding are more like, “Oh, this is like a real career.” I don’t think everybody knows exactly what coding is. I’ve talked to people who don’t understand how much of a great opportunity it is. I think one way that people who read CODE Magazine can help to empower girls and women is to see if young people can come in and learn about coding and learn about what you do at your workplace. Some of the things that hinder people is not knowing what coding is and thinking that they’re not empowered to do it. And when you don’t learn, you might be like “Oh this is really cool. But it feels kind of out of reach.” Going to someone’s workplace and seeing that they’re cod-ing and they’re doing all these things, working or running a business, I think it definitely shows that you can do it and you can be inspired to do it. The best thing people can do is just inspire us all to work really hard—all young people, not just women and girls—to get to our goals and to learn about coding.

SPONSORED SIDEBAR:

Need FREE Project Advice? CODE Can Help!

How does no strings, free advice on a new or existing project sound? Need free advice on migrating an existing application from an aging legacy platform to a modern cloud or Web application? CODE Consulting experts have experience in cloud, Web, desktop, mobile, microservices, and DevOps and are a great resource for your team! Contact us today to schedule your free hour of CODE consulting call with our expert consultants (not a sales call!). For more information, visit www.codemag.com/consulting or email us at [email protected].

Sumeya Block, Sara Chipps

Women in STEM, an Interview

Instantly Search Terabytes

®

The Smart Choice for Text Retrieval® since 1991

1-800-IT-FINDSwww.dtSearch.com

dtSearch’s document filters support: • popular file types• emails with multilevel

attachments• a wide variety of databases• web data

Over 25 search options including:• efficient multithreaded search• easy multicolor hit-highlighting • forensics options like credit

card search

Visit dtSearch.com for• hundreds of reviews and

case studies• fully-functional enterprise

and developer evaluations

Developers:• SDKs for Windows, Linux,

macOS• Cross-platform APIs for

C++, Java and .NET with .NET Standard / .NET Core• FAQs on faceted search,

granular data classification, Azure and more

www.dtsearch.com

codemag.com32 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning


Jeannine [email protected] @jrrnt

While at Microsoft, Jeannine worked as a tester and wrote technical documenta-tion for machine learning products, including SQL Server Data Mining, SQL Server Machine Learning, and Azure Machine Learn-ing Studio. She’s currently retired, which gives more time to read about data science and run really inef-ficient code. She’s grateful to the many writers in the R-blogger and SQL Server community for their excellent examples and gentle explanations.

Stone Soup: Cooking Up Custom Solutions with SQL Server Machine LearningThis article describes the machine learning services provided in SQL Server 2017, which support in-database use of the Python and R languages. The integration of SQL Server with open source languages popular for machine learning makes it easier to use the appropriate tool—SQL, Python, or R—for data exploration and modeling. R and Python scripts can also be used in

T-SQL scripts or Integration Services packages, expanding the capabilities of ETL and database scripting.

What has this to do with stone soup, you ask? It’s a meta-phor, of course, but one that captures the essence of why SQL Server works so well with Python and R. To illustrate the point, I’ll provide a simple walkthrough of data explora-tion and modeling combining SQL and Python, using a food and nutrition analysis dataset from the US Department of Agriculture.

Let’s get cooking!

Machine Learning, from Craft to ProYou might have heard that data science is more of a craft than a science. Many ingredients have to come together ef-ficiently, to process intake data and generate models and predictions that can be consumed by business users and end customers.

However, what works well at the level of “craftsmanship” of-ten has to change at commercial scale. Much like the home cook who has ventured out of the kitchen into a restaurant or food factory, big changes are required in the roles, ingre-dients, and processes. Moreover, cooking can no longer be a “one-man show;” you need the help of professionals with different specializations and their own tools to create a suc-cessful product or make the process more efficient. These specialists include data scientists, data developers and tax-onomists, SQL developers, DBAS, application developers, and the domain specialists or end users who consume the results.

Any kitchen would soon be chaos if the tools used by each professional were incompatible with each other, or if pro-cesses had to be duplicated and slightly changed at each step. What restaurant would survive if carrots chopped up at one station were unusable at the next? Unfortunately, the variety (and sometimes incompatibility) of tools used in data science means that a lot of work has had to be rein-vented or created ad hoc and left unmanaged. For example, ETL processes often create data slices that are too big for analysis or they change core aspects of the data irreparably.

The core business proposition of integrating Python and R with SQL and the RDBMS is to end such duplication of ef-fort by creating commercial-strength bridges among all the tools and processes.

• Your data science tools can connect securely to the database to develop models without duplicating or compromising data.

• You can save trained models to a database and gener-ate predictions using customer data and leave optimi-zation to your DBA.

• You can build predictive or analytical capacity into your ETL processes using embedded R or Python scripts.

Let’s look at how it works and how the integration makes it easier to combine tools as needed.

The article is targeted at the developer with an interest in machine learning (ML), who’s concerned about the com-plexity of ML and is looking for an easier way to incorporate ML with other services and processes. I’ve chosen “stone soup” as a metaphor to describe the process of collabora-tion between data scientists and database professionals to brew up the perfectly performant ML solution.

Security ArchitectureFirst off, let’s be clear about the priorities in this platform: security, security, and security. Also, accountability, and management at scale.

Data science, like cooking, can be tremendous fun when you’re experimenting in your own kitchen. Remove vari-ables, mash data into new formats, and no one cares if the result is half-baked. But once you move into production, or use secure data, the stakes go up. You don’t want some-one contaminating the ingredients that go into a recipe or spying on your data and production processes. So how do you control who’s allowed in the kitchen, when you can’t have just anyone involved in preparing your food or touch-ing your data?

With ML in SQL Server, security and management is enforced at four layers (see Figure 1):

• Isolation of Python or R processes: When you install the ML service, the database server gets its own local instance of Python (or R). Only a database adminis-trator or someone with the appropriate permissions can run scripts or modify installed packages. (No more installing packages from the Internet on a whim.)

• Secure lockdown of Python launcher: The stored pro-cedure that calls the Python (or R) runtime is not en-abled by default; after the feature has been installed, an administrator must enable external code execution

codemag.com 33Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning

From the standpoint of the DBA, drawbacks include not just the crazy data scientists asking for Python installs, but new workloads. The administrator must allocate server resources to support ML workloads, which can have very, very differ-ent performance profiles. ML also uses new database and server roles to control script execution as well as the ability to install Python or R packages. Other new tasks for the DBA include backing up your ML data, along with your data sci-ence users and their installed libraries.

The SQL Server development team put a lot of effort into figuring out workflows that support data science without burdening the DBA too much. However, data scientists who lack familiarity with the SQL Server security model might need help to use the features effectively.

Package Installation and ManagementSecurity is great, but the data scientist needs to be able to install open source Python or R packages. Moreover, they expect to install those new packages and all their depen-dencies straight from the Internet. How does this even work in a secured environment?

First off, the default installation includes the most popular packages used for data science, including nltk, scikit-learn, numpy, etc. SQL Server also supports installing new pack-ages and sharing packages among a group of data scien-tists. However, the package installation process is restricted to admins and super users. This is understandable because new Python or R libraries can be a security risk. Also, if you install version x.n of a Python package, you risk breaking the work of everyone who’s been using a different version of the package.

at the server level, and then assign specific users the permissions to access data and run the stored proce-dure.

• Data access: Managed by traditional SQL Server se-curity. To get access to data, you must have database permissions, either via SQL login or Windows authen-tication. You can run Python or R entirely in the server context and return the results back to SQL Server tables. If you need more flexibility, data scientists with permission to connect to the database can also connect from a remote client, read data from text files stored on the local computers, and use the XDF file format to make local copies of models or intermedi-ate data.

• Internal data movement and data storage: The SQL Server process manages all connections to the server and manages hand-offs of data from the database to the Python or R processes. Data is transferred between SQL Server and the local Python (or R) process via a compressed, optimized data stream. Interim data is stored in a secure file directory accessible only by the server admin.

Whereas data science used to be a headache for control-minded DBAs, the integrated ML platform in SQL Server provides room for growth, as well as all the monitoring and management required in a commercial solution. Compare this to the old paradigm of exporting customer data to a data scientist to build a model on an unsecured laptop. Add in the SQL Server infrastructure that supports monitor-ing—who viewed what data, who ran which job, and for how long—infrastructure that would be complex to implement in an all-Python or R environment.

For details on the new services, and the network protocols used to exchange data between Python and SQL Server, I recommend the articles listed in Table 1 from Microsoft:

Now that I’ve touted the advantages of the platform, let’s look at some of the drawbacks:

From the standpoint of the data scientist (the freewheeling home cook, if you will), the framework is far more restric-tive. You can’t install just any Python or R library onto the server. Some packages are incompatible with a database en-vironment, and often the package you need isn’t compatible with the version installed with the server.

Some standardization and refactoring of your Python or R code will also be required. Just as in your commercial kitch-en, where vegetables have to be diced to a particular size and shape or else, your data has to match SQL’s require-ments. You can’t dump in just any Python code, either. Code typically has to be rewritten to work inside a SQL Server stored procedure. Usually this work is trivial, such as get-ting tabular data from SQL Server rather than a text file and avoiding incompatible data types.

Resource LinkIntroduction to the extensibility framework https://docs.microsoft.com/ sql/advanced-analytics/concepts/extensibility-framework?view=sql-server-2017

Network protocols and how Python is called from SQL Server https://docs.microsoft.com/sql/advanced-analytics/concepts/extension-python?view=sql-server-2017

Security details at the database level https://docs.microsoft.com/sql/advanced-analytics/concepts/security?view=sql-server-2017

Figure 1: Four layers of security and management for Python

Table1: Security and architecture resources

codemag.com

Management, Optimization, and MonitoringIf you’re a database professional, you already know how to optimize server performance and have experienced the challenges of balancing multiple server workloads. For ML, you’ll want to make full use of your DBA’s knowledge in this area and think hard about server allocation. But you’ll also need to lean hard on your data scientist.

Let’s start with the basics. Calling Python (or R) does add processing time. Like any other service, you’ll notice the lag the first time the executable is called, or the first time a model is loaded from a table. Successive processing is much faster, and SQL Server keeps models in cache to improve scoring performance.

If you set up some event traces, you might also detect small effects on performance from factors such as:

• Moving data, plots, or models between a remote client and the server

• Moving data between SQL Server and the Python or R executables

• Converting text or numeric data types as required by Python, R, or the RevoScale implementations

(For the nitty-gritty details of performance, I strongly rec-ommend the blog series by SQL Server MVP Niels Berglund on Machine Learning Services internals: https://nielsber-glund.com/2018/05/20/sp_execute_external_script-and-sql-compute-context---i/)

Considered as a platform, SQL Server Machine Learning of-fers a lot of options for optimization. Several of the most important use cases have been baked into the platform. For example, native scoring uses C++ libraries in T-SQL (https://docs.microsoft.com/sql/advanced-analytics/sql-native-scoring?view=sql-server-2017) to generate predictions from a stored model very fast. Optimized correctly, this feature can generate as many as a million predictions per second (see: One million predictions per second: https://blogs.technet.microsoft.com/machinelearning/2016/09/22/pre-dictions-at-the-speed-of-data/).

The key phrase is “optimized correctly.” To get the super-performance goodies, you really need to know something about server configuration, SQL Server optimization, the al-gorithms and distributed computing features in RevoScaleR/revoscalepy, and, of course, some basic R or Python optimi-zation. That’s a tall order, and it’s another reason you ben-efit from having multiple contributors to your stone soup.

For example, ML workloads can have very different pro-files depending on whether the task is training or scoring, which algorithm has been used, and how big or wide the data is (see Figure 2: Optimization for data size vs. model complexity).

Therefore, a database administrator typically has to perform or approve the installation. You can install new packages on the server directly if an admin gives you permissions to install packages. After that, installation is as easy or hard as any other Python install, assuming the server can access the Internet. Whoops. Fortunately, there are workarounds for that too.

The SQL Server product team has thought long and hard about how to enable additional packages without break-ing the database, annoying the DBA, or blocking data sci-entists. Package management features in SQL Server 2017 let the DBA control package permissions at the server and database level. Typically, a super user with the correct da-tabase role installs needed packages and shares them with a team. The package management features also help the DBA back-up and restore a set of Python libraries and their users. Remote installation is also supported for R pack-ages.

Because this feature is complex, I won’t provide more de-tails here. Just know that in a secure server, there are nec-essarily restrictions on package installation. Table 2 lists some great resources.

Some caveats before I move on:

• Azure SQLDB uses a different method for managing packages. Because multiple databases can run in a container, stricter control is applied. For example, the SQL Server ML development team has tested and “whitelisted” R packages for compatibility and use in Azure SQL DB. At this time, the R language is the only one supported for Azure SQL DB.

• There is no comparable “whitelist” of Python packages that are safe to run on SQL Server. ML in the Linux edition of SQL Server is also still in preview. Watch the documentation for more details.

Resource LinkPackage management roles https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2017/05/11/enterprise-grade-r-package-management-

made-easy-in-sql-server/

Using sqlmutils to install packages remotely https://docs.microsoft.com/sql/advanced-analytics/package-management/install-additional-r-packages-on-sql-server?view=sql-server-2017

Table 2: Package management resources

Figure 2: Optimization depends on both data size and model complexity

34 Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning

codemag.com

and Agriculture and represents a summary of food stamp spending across the nation.

Finally, I’ll build a simple visualization using a Python package.

Prepare the EnvironmentFor specific set up steps, see the Microsoft documentation. Links to all the pertinent set up docs are provided in Table 5, near the end of this section. Set up of the server takes about an hour and ditto for the client tools.

Choosing which features to install, and which version, is the first step. What features you install depend on what version is available, and what you will be doing with ML. Figure 3 summarizes the versions of SQL Server that support ML.

For this demo, I installed Developer Edition for SQL Server 2017, because Python became available starting in 2017.

You can use a laptop or any other personal computer with sufficient memory and processing speed, or you can create an Azure Windows VM. Remember, you need to meet at least the minimum SQL Server requirements and then have ex-tra memory to support the Python or R workloads. Such an environment will let you try out all the features, including passing data between R and Python.

Your big data might require minimal resources if processed in parallel or by using batches, compared to a neural net-work model using lots of features, or even a small data-set with an algorithm that must keep the entire dataset in memory until processing is complete.

If you’re curious about the performance characteristics of a particular type of model, the ML literature these days is chock-full of interesting research on which algorithm is bet-ter or faster, how much data is enough or better, and what constitutes complexity. You can even find cheat sheets spe-cific to a type of algorithm; such as a comparison of the dif-ferent implementations of decision trees or neural networks in terms of feature size, processing capacity, etc. Table 3 is a good start on these resources.

The key to success is capturing a baseline so that your DBA and your data scientist can begin the process of optimi-zation—figuring out which processes are taking the most time, and how you can adjust your data sources and code to streamline performance. The goal here is simply to provide a set of starter resources that you can use to begin to opti-mize a workload in SQL Server Machine Learning.

Python and SQL: A WalkthroughLet’s get cooking! For this walkthrough, the goal is simple—to get the server and client tools set up and learn the basics of the stored procedure sp_execute_external_script.

The first two sections cover basic set up of Machine Learning Services on SQL Server, as well as set up of R or Python client tools. If you already have SQL Server 2017 installed, includ-ing the ML features, you can skip the first part.

In the third and fourth sections, I’ll explore a simple data set. The data was obtained from the US Department of Food

Task Description and link to resourceOptimize Windows server and SQL Server

Although this case study was originally for R, most of the tips apply to Python models as well.The experiment compares a solution before and after server optimizations such as use of NUMA and maximum parallelism.https://docs.microsoft.com/sql/advanced-analytics/r/sql-server-configuration-r-services?view=sql-server-2017

Be sure to catch this part of the series, which covers use of compression and columnstore indexes: https://docs.microsoft.com/ sql/advanced-analytics/r/sql-server-r-services-performance-tuning?view=sql-server-2017

Optimize for concurrent execution

The Microsoft Tiger Team captures real-world customer problems and periodically distills them into useful blogs.https://blogs.msdn.microsoft.com/microsoftrservertigerteam/2016/09/20/tips-sql-r-services-optimization-for-concurrent-execution-of-in-database-analytics-using-sp_execute_external_script/

Choose models and data processing methods

There are many ways that the RevoScale platform can improve performance: enhanced numerical computation, streaming and batching, parallel algorithms, and pretrained algorithms. This guide to distributed and parallel computing provides a high-level introduction to the types of distributed computing provided by the RevoScale algorithms. https://docs.microsoft.com/machine-learning-server/r/how-to-revoscaler-distributed-computing

Use pretrained models Talk about a shortcut—the pretrained models in microsoftml (for Python and R) support sentiment analysis and image recognition, two areas where it would be impossible for most users to get and use enough training data. https://docs.microsoft.com/sql/advanced-analytics/install/sql-pretrained-models-install?view=sql-server-2017

Manage SQL resources Resource Governance is an awesome feature for helping manage ML workloads, although it’s available only with Enterprise Edition. https://docs.microsoft.com/sql/advanced-analytics/administration/resource-governance?view=sql-server-2017

Optimize for specific high-priority tasks

As noted earlier, fast scoring is particularly important for enterprise customers. There are lots of ways to accomplish this based on whether you are using a single server or distributed servers and even a Web farm.

https://docs.microsoft.com/sql/advanced-analytics/r/how-to-do-realtime-scoring?view=sql-server-2017

https://docs.microsoft.com/machine-learning-server/operationalize/concept-what-are-web-services

Table 3: Performance optimization resources

Figure 3: Versions of SQL Server that support machine learning

35Stone Soup: Cooking Up Custom Solutions with SQL Server Machine Learning

codemag.com

That said, not everyone will need to set up a client, and such software might not be allowed in certain highly secured en-vironments. If you can accept the limitations around debug-ging and viewing charts, you can develop and run every-thing in SQL Server.

However, there are good reasons to set up a client. One is that SQL Server Management Studio does not support the R graphics device and can’t display charts returned from Py-thon. If you want to view charts, find a client. For model development or data processing, it’s not critical.

The download that installs the R and Python clients includes some basic tools, but not a full-fledged IDE. You might want to install an IDE or hook up an existing IDE.

• Jupyter notebook is included with the Python client install and offers support for charts.

• Spyder is included with the Python client install, but the IDE is not preconfigured to work with the revoscalepy packages.

• If you install another IDE to use as client, such as Visual Studio or PyCharm, you must create a Python environment that uses the correct libraries. Otherwise your IDE will, by default, use the Python executable named in PYTHON_PATH. Configuring this and getting it to work can be tricky. I used Visual Studio 2019, which has nice support for Python.

• The client included with the R install is much simpler to set up and configure. It’s also relatively easy to run RevoScaleR from RStudio or other popular IDEs.

• If you don’t have administrative rights to the SQL Serv-er computer, connection from a remote client can be tricky. See the Troubleshooting Tips and Known Issues list (in Table 4) for current firewall and network issues.

Table 4 is a list of some resources for setting up and trouble-shooting.

If you have an existing set up, take some time to verify the version of the Python (or R) executable that is used by SQL Server. You can run the following code in T-SQL (either in SSMS or a remote client like Azure Data Studio) to verify the version of Python installed on the server:

EXECUTE sp_execute_external_script @language = N’Python’, @script = N’import sys; print(“\n”.join(sys.path))’

The version of revoscalepy also must be the same on the server and any client you connect from. Run the following T-SQL code to check the revoscalepy version.

If you know that you won’t be building models, only running predictions, you have many more options. You can build a model on your beefiest server, save it to a table, and then copy that model to another server or into Azure SQLDB to generate predictions.

Before you begin installation, be aware that set up is a multistage process, which includes reconfiguration of SQL Server to enable the ML features, and possibly changing firewall or network protocols, followed by an optional client install, and testing of client connectivity. Figure 4 shows these stages.

Caveats:• Be sure to choose the Machine Learning Service in SQL

Server.• Do not install the “standalone” Machine Learning

Server. That’s a different product, included in SQL Server set up mostly for licensing reasons. Basically, if you have an enterprise license agreement, you can install Machine Learning Server on a different com-puter and use that either as a development suite, or for distributed computing on Windows/Linux without SQL Server.

• After set up, do run all the connection tests suggested in the documentation. And see the troubleshooting tips listed in Table 6.

After the server is installed and ML features have been en-abled, consider whether you need to install a remote client. The free client tools from Microsoft basically give you the same R or Python packages that you run on the server, to use for testing and developing solutions remotely. You must have the client to interact with SQL Server; you can’t run a vanilla instance of PyCharm or RStudio.

Figure 4: Multiple stages of set up and development

Setup type LinkSet up SQL Server with Python https://docs.microsoft.com/ sql/advanced-analytics/install/sql-machine-learning-services-windows-install?view=sql-server-2017

Python client: https://docs.microsoft.com/sql/advanced-analytics/python/setup-python-client-tools-sql?view=sql-server-2017

R client https://docs.microsoft.com/sql/advanced-analytics/r/set-up-a-data-science-client?view=sql-server-2017

Troubleshooting and Known issues https://docs.microsoft.com/sql/advanced-analytics/known-issues-for-sql-server-machine-learning-services?view=sql-server-2017

https://docs.microsoft.com/ sql/advanced-analytics/common-issues-external-script-execution?view=sql-server-2017

What’s different between the versions? https://docs.microsoft.com/ azure/sql-database/sql-database-machine-learning-services-differences

Table 4: Set up and troubleshooting resources


codemag.com

Restrictions on Package Installation in SQL Server Support Security

You can’t use a Python or R user library no matter where it’s installed or how you call it. New packages must always be installed in the server context.

You might consider these added hurdles to be a blessing, given the headache caused by package instability and the proliferation of Python environments.

OutputDataSet = Midwest’WITH RESULT SETS ((col1 nvarchar(50), col2 varchar(50), Rank1 int, Amt1 float, Pct1 float, Rank2 int, Amt2 float, Pct2 float))

It’s not a practical example, but it demonstrates some key aspects of running Python (or R) code in SQL Server:

• The stored procedure generally takes T-SQL as input—not from a text file, console, or other ad hoc source.

• The column names from SQL aren’t necessarily pre-served in the output, although they’re available to the Python code internally. You can change column names as part of your R or Python code.

• The default inputs and outputs are InputDataSet and OutputDataSet. All identifiers are case-sensitive. You can optionally provide new names for the inputs and outputs.

• Tabular data returned to SQL Server has to be a data frame (pandas for Python). Errors are gener-ated whenever you return something that isn’t a data frame, even if implicit conversion is sometimes al-lowed.

• The Launchpad service returns error messages and other status text to the console (in SQL Server Man-agement Studio, the Messages pane). The error mes-sages are generally helpful although verbose.

• Providing a SQL schema for the output data is optional but can help your database developer.

A word before I go any further: SQL Server and Python are kind of like a chain saw and a food processor. Both can process huge amounts of data, but they differ in the way they chop it up. Python has lists, series, matrices, and other structures that often can be converted to a data frame, and sometimes can’t. R, although it’s a delightfully flexible lan-guage for data exploration, has its own quirks.

Such differences can break your code if you aren’t aware of them or don’t prepare your code to account for the dif-ferences. Be sure to review the data type conversion topics listed in Table 5, as well as the Known Issues in Books On-line, before you port over existing code.

-- check Revoscalepy versionEXECUTE sp_execute_external_script @language = N’Python’, @script = N’import revoscalepyimport sysprint(revoscalepy.__version__)print(sys.version)

Basic Tools and RecipesSQL Server Machine Learning uses a stored procedure to en-capsulate all Python (or R) code, as well as some related services under the covers. All interactions with the Python executables are managed by SQL Server, to guarantee data security. You call the stored procedure like you would call any other SQL command and get back tabular results. This architecture makes it easy to send data to ML, or to get back predictions: Just make a T-SQL call.

The key requirements are simple:

• Pass in a supported language as nvarchar.• Pass in a well-formed script, as nvarchar. With Python,

the “well-formed” part can be tricky, as all the rules about spaces, indents, and casing apply even in the context of the SQL design window.

• Provide inputs from SQL Server (typically as a variable, query, or view).

• Align the inputs to the variables in your Python code.• Generate results from Python that can be passed back

to SQL Server. You get one and only one tabular datas-et back, as a data frame, but can return multiple other values or objects as SQL variables (models, charts, in-dividual values, etc.).

For now, let’s assume that you just want to view the data from the USDA analysis in SQL and maybe do something with it in Python. (You can use your own data, but I’ve provided a da-tabase backup. The dataset is quite small by SQL standards.)

The following code merges two views as inputs and returns some subset from Python.

EXECUTE sp_execute_external_script @language = N’Python’ , @input_data_1 = N’ (SELECT * FROM [dbo].[vw_allmidwest]) UNION (SELECT * FROM [dbo].[vw_allsouth]) ‘ , @input_data_1_name = N’SummaryByRegion’ , @script = N’import revoscalepyimport pandas as pddf = pd.DataFrame(SummaryByRegion)Midwest = df[df.Region == “midwest”]

Resource Descriptionhttps://docs.microsoft.com/sql/advanced-analytics/python/python-libraries-and-data-types?view=sql-server-2017 Data type mismatches and other warnings

for SQL to Python conversion

https://docs.microsoft.com/sql/advanced-analytics/r/r-libraries-and-data-types?view=sql-server-2017 Data type mismatches and other warnings SQL to R conversion

https://docs.microsoft.com/sql/advanced-analytics/r-script-execution-errors?view=sql-server-2017 Issues that apply only to R scripts

Table 5: Known issues and data type conversion

SQL Server and Python are kind of like a chain saw and a food processor. Both can process huge amounts of data, but they differ in the way they chop it up.


codemag.com

and R. Fortunately, in the SQL Server 2017 environment, you’re not constrained to any one tool and can use whatever gets the job done.

I decided that it would be interesting to compare the three regions included in the report. A bar chart might work, but radar charts are also handy for displaying complex differ-ences at a glance. A radar chart doesn’t show much detail, but it does highlight the similarities, and might suggest some areas a nutritionist might want to dig into, such as the heavy use of frozen prepared foods. See Figure 5 for the summary of purchases by food type, per region.

The cool graphic was produced not in Python at all, but in Ex-cel. The Python library matplotlib includes a function for creat-ing a radar chart, but it was pretty complex, and I’m a Python amateur, whereas creating a radar chart in Excel takes only a few clicks. That’s right; I don’t particularly care which tool I use, as long as I can explore the data interactively. I don’t

Explore Some DataYou’re welcome to use Adventureworks or any existing data to play with Python. The dataset provided here is extremely simple and any similar dataset would do. For this article, I used nutritional and purchase data related to the food stamp program.

The Nutrition database (provided as a backup file) was im-ported from a USDA report on the supplemental nutrition assistance program, known as SNAP (or food stamps). The study analyzed food stamp purchases nationwide, and clas-sified purchases by food and nutrition type, to try to under-stand if money was being well spent, and how the program might be improved.

The principal finding by the USDA was that there were no major differences in the spending patterns of households that used food assistance vs. those that do not use food assistance. For example, both types of households spend about 40 cents of every dollar of food expenditures on basic items such as meat, fruits, vegetables, milk, eggs, and bread. However, the study found some ways to improve access to healthy food choices for SNAP participants. For example, authorized retailers were required to offer a larger inventory and variety of healthy food options, and convenience-type stores were asked to stock at least seven varieties of dairy products, including plant-based alternatives.

I’ll do some easy data exploration to find additional insights into food consumption by food stamp users:

• Differences between regions in terms of seasonal veg-etable consumption, or meat purchases

• Top commodities for each region and for all regions • Differences based on age of the head of household

and poverty level of the surrounding community

Such descriptive statistics have long been the domain of traditional BI, and there are lots of tools for displaying nice summary charts, from Power BI to Reporting Services and plain old T-SQL, to the many graphing libraries in Python

Figure 5: Comparison of major food purchases per region

Figure 6: Differences between SNAP and non-SNAP households for infant formula


codemag.com

Food is Big Data

Uses for data science are growing fast in the food industry. For example, there’s an app that can take a picture of your meal and use image recognition to tell you how many calories and other nutrients it contains. Another ambitious project uses neural networks and crowd-sourced data to identity possible connections between food allergens and diseases.

and the answer was clearly negative. That’s okay. We tend to expect that brilliant insights will emerge from every set of data and forget that the original goal of statistical analysis is to disprove that an effect exists.

A secondary goal was to explore areas where potentially the program could be modified to assist SNAP users or improve store offerings. There are lots of ways to do exploratory analysis and I would have loved to use one of the newer clustering methods, but the summary data was too sparse. So let’s fall back on a favorite “broad qualitative method:” the word cloud.

The wordcloud package in Python makes a handy example for these reasons:

• It’s not installed by default but it has very few de-pendencies. All those required packages are already installed.

• The method for outputting an image file is a useful one to know. Neither SSMS nor most other SQL tools (such as Azure Data Studio) can directly display graphs cre-ated by Python or R. Instead, you output the plot as a binary data stream, which you can use elsewhere. This method is important because it’s also how you output a model and save it to a table for reuse.

Step 1. Install wordcloud on the server and client. The default installation of SQL Server Machine Learning includes many of the most popular packages for ML, and to try out the features, you don’t need to install additional packages. To see a list of all currently installed packages, you can run something like this:

-- view packages in instance libraryEXEC sp_execute_external_script@language = N’Python’,@script = N’import pipimport pandas as pdinstalled_pkgs = pip.get_installed_distributions()installed_pkgs_list = sorted([“%s==%s” % (i.key, i.version) for i in installed_pkgs])df = pd.DataFrame(installed_pkgs_list)OutputDataSet = df’WITH RESULT SETS ((PackageVersion nvarchar(150)))

Assuming that wordcloud isn’t already present, you can in-stall it on the SQL Server instance by opening a command prompt as administrator at the folder containing the Py-thon scripts library, typically C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\Scripts.

pip.exe install wordcloud

To install wordcloud on the client might require a little more work. On my computer, multiple Python environments complicated the issue to the point where I finally removed all old client tools and old Python versions and installed Visual Studio 2019 Community Edition. (Although only in preview release, it has a nice UI and some improvements in Python support.)

Ensure that the custom environment uses the downloaded Microsoft Python client, to match the Python libraries on

have to ask my users to learn Python or code up an interactive interface for them; the data is in SQL Server and easily acces-sible to existing tools that my users are familiar with.

As long as I had Excel open, I wanted to try out a new ML feature in Excel. The Ideas pane, which debuted in Office 365 in late 2018, takes any tabular data set, and computes some quick correlations on every value in every dimension in the table. Imagine how complicated that code would be if you had to write it yourself! The Ideas feature returns a series of text boxes listing the most interesting correla-tions.

For example, I created some quick features on the dataset that represent the delta between target groups in terms of percentage expenditures. Features were generated for poverty level in the county, the age of the head of the household, and, of course, the region (Midwest, South, and West). Analyzing the total matrix of correlations for these features took about three seconds, and Excel returned 35 different “ideas.”

The Idea presented in Figure 6 tells you that there is a big-ger difference than expected between SNAP and non-SNAP households in terms of their purchases of infant formula.

The wording is a bit opaque, and you would have to do fur-ther analysis to see exactly what this means. But in fact, this particular correlation was one of the primary findings in the original USDA study, so the results are valid.

Other “ideas” from Excel suggested that ice cream purchases were higher in southern households than in other regions, and that southern households spent more money on meat than those in the West or Midwest. What’s behind those dif-ferences? Ideas is a fun, easy way to start the data explora-tion process.

Predictive AnalyticsIn the original study, the goal of analysis was to determine whether there were significant differences between con-sumption patterns of food stamp users vs. other consumers,


codemag.com

Step 3. Create the plot. Having extracted the table of words and weights, it’s simple to input the data to sp_execute_external_script as a view or query, and build a word cloud using Python or R. The Python script has these basic steps:

1. Import the required libraries.2. Put word data from the SQL query into a data frame.3. Create a word cloud using Python libraries.4. Dump the plot objects as a serialized variable using

pickle.5. Save the variable as an output to SQL Server.

You can see the full text of the stored procedure in Listing 1, but this excerpt shows the key steps:

from wordcloud import WordCloud, ImageColorGenerator

# Handle and prepare datadf = pd.DataFrame(WesternFoods)descriptors = df.MergedText.valuestext = descriptors[0]wordcloud = WordCloud().generate(text)plot0 = pd.DataFrame(data =[pickle.dumps(wordcloud)], columns =[“plot”]

The pattern of creating a complex object and saving it to a binary data stream is standard for handling complex struc-tures like plots or predictive models. SQL Server can’t under-stand or display them, so you generate the object, save it as a binary data stream, and then pass the variable to another SQL statement or client or save it to a table.

In the case of a predictive model, you’ll generally save the model to a table. That way you can save and manage mod-els, add metadata about when the model was run on how many rows of data, and which prediction runs it was used for. To see an example of this process for models used in production, I recommend this tutorial from the Microsoft data science team: Python for SQL developers (https://docs.microsoft.com/sql/advanced-analytics/tutorials/sqldev-py3-explore-and-visualize-the-data?view=sql-server-2017).

SQL Server, and open a command prompt (not a Python in-teractive window) to run pip as follows:

python -m pip install wordcloud

Step 2. Prepare text data used for the word cloud. There are many ways to create and provide weights to a word cloud. To simplify the demo, I used Python to process the list of top commodities for each region and wrote that data back to a table.

Data preparation is another area where the SQL Server plat-form gives you the ability to use the most convenient, fast-est tool for the job. This data set had very short text entries, so I merely concatenated the text and removed nulls, but you can imagine text data sources where the ability to pro-cess data in Python’s nltk and return the tokenized text to SQL Server would be useful. On a later iteration, I’ll probably add a stopword list or expand abbreviations.

INSERT [dbo].[WesternFoods]EXECUTE sp_execute_external_script @language = N’Python’ , @input_data_1 = N’(SELECT Region, [Subcommodity], [CompositeSubcat],[OtherSubcat],[SNAP_Pct] FROM [dbo].[vw_FoodListWest])’ , @script = N’import revoscalepyimport pandas as pddf = pd.DataFrame(InputDataSet)# prevent Python from inserting Nonedf = df.fillna(“”)df[“mergedtext”] = df[“Subcommodity”].map(str) + “ “ + df[“CompositeSubcat”].map(str) print(list(df.columns.values))OutputDataSet = df[[“Region”,”SNAP_Pct”,“mergedtext”]]’

Note: There’s some slight cost incurred when moving data between SQL Server and Python, but the pipeline is highly compressed and optimized; certainly, it’s faster that moving data across the network to a Python client.

USE [Nutrition]GO

/*** Object: StoredProcedure [dbo].[mysp_Westernwordcloud] ***/SET ANSI_NULLS ONGO

SET QUOTED_IDENTIFIER ONGO

CREATE PROCEDURE [dbo].[mysp_Westernwordcloud]as BEGINEXECUTE sp_execute_external_script @language = N’Python’ , @input_data_1 = N’SELECT CAST (ROUND([Weight1] * 100, 0) as INT) as Weights, [MergedText] FROM [dbo].[WesternFoods]’ , @input_data_1_name = N’WesternFoods’ , @script = N’import revoscalepyimport pandas as pdimport pickle

import matplotlib.pyplot as pltfrom wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

# Handle and prepare datadf = pd.DataFrame(WesternFoods)descriptors = df.MergedText.valuesncounts = df.Weights.valuesprint(type(ncounts))

text = descriptors[0]print(type(text))

# Generate basic plotwordcloud = WordCloud().generate(text)

# Serialize and output to variableplot0 = pd.DataFrame(data =[pickle.dumps(wordcloud)], columns =[“plot”])OutputDataSet = plot0‘WITH RESULT SETS (( plot varbinary(max) ))

ENDGO

Listing 1: Stored procedure that creates the Western word cloud


codemag.com

For plots, you need some way to view the object, and there are several options:

• Save the plot to a local image file, and then copy the file elsewhere to view it. That way, you aren’t inviting people to open image files on the server.

• Save the binary object to a table.• Pass the binary variable to a Python client so that it

can be read and displayed.

I recommend using a Python client to view charts. Configur-ing the client correctly is critical. To ensure compatibility, the Python version, the revoscalepy version, and the pack-age versions must exactly match what’s installed on SQL Server.

However, the clients provided in the Microsoft download don’t come preconfigured to use the right libraries; you’ll need to supply the Python environment definition yourself. The documentation has guidance on the properties that need to be supplied in the environment.

Set up a Python client: https://docs.microsoft.com/sql/advanced-analytics/python/setup-python-client-tools-sql?view=sql-server-2017. You can see the dialog in Figure 7.

In my case, connections from several tools failed, possibly because so many old Python clients and environments were cluttering the system. After uninstalling several Python IDEs, I installed Visual Studio 2019, Community Edition, as the client, and configured a custom environment using the tips in the documentation. It worked great, and Visual Stu-dio 2019 has improved support for Python, but other Micro-soft demos have used Visual Studio 2017 and PyCharm. Let me know what works for you!

Because my client is installed on the same computer as SQL Server, I also created a custom environment that points to the server libraries, as you can see in Figure 7. Typically, you’d never run code using this environment unless you had a problem you needed to debug.

The code that loads the word cloud in the client as a plot is shown in Listing 2. I ran the code using the interactive win-dow, which opens the plot in a separate graphics window.

The first word cloud, in Figure 8, is much too simple, of course. I’ll want to add weights, adjust font and canvas size, and tweak the colors. However, now that I have the code in a stored procedure, it’s relatively easy to change the parameters and create different graphic objects. I can also clone working code to use other regional data sets and add metadata to the plots table. In short, the use of stored procedures makes it easy to build on existing code, pass in parameters, and generate metadata for storing with the plot objects.

Advanced ML Recipes for the EnterpriseI’ve illustrated the integration of Python and R with SQL Server using a food dataset because it was an interesting domain. However, astute readers will have noted that the dataset was extremely small and didn’t really require the resources of the server. SQL Server Machine Learning can certainly be used for this type of exploration, but it’s really

Figure 7: Configuring a custom Python environmentFigure 8: Initial word cloud showing Midwest purchases

import matplotlibimport pyodbcimport pickleimport osfrom matplotlib import pyplot as pltfrom matplotlib import rcParamsrcParams[“figure.figsize”] = (10, 10)

# connect to database and get plot objectnutritiondb = pyodbc.connect('DRIVER=SQL Server;SERVER=LEGIONJ1060;DATABASE=Nutrition; Trusted_Connection=True;')cursor = nutritiondb.cursor()# From a stored procedure, use the following line# cursor.execute(“EXECUTE [dbo].[mysp_Westernwordcloud]”)# From a query on table with saved plots, use this linecursor.execute(“SELECT * FROM [dbo].[pythonplots]”)

# Display plot using matplotlibtables = cursor.fetchall()for i in range(0, len(tables)): type(i) WC = pickle.loads(tables[i][0]) type(WC) plt.imshow(WC) plt.imshow(WC) plt.axis("off") plt.show()nutritiondb.close()

Listing 2: Viewing the wordcloud from a remote Python client


codemag.com

Food Is Big Business

Enrollment in food assistance programs grew from nine percent of the U.S. population in 2007, to about 15% (47.6 million of Americans) in 2013.

After the recession, participation has gradually declined, to 43.4 million people as of July 2016 and 40.3 million in 2018.

• Formally defining the business problem and the data required

• Defining the scope and lifecycle of the data science project. Describing the people who are required in a large data science project and their roles

• Providing to partners the detailed requirements in terms of packages, server resources, data types, data SLAs, etc.

• Specifying ownership and SLAs for related operations such as data cleansing, scoring pipelines, backups, etc.

In case you’re thinking “this is all too complex for my little project,” consider how many applications have started as demo projects but ended up in production and ran for years with scant documentation. Given that data science projects typically entail massive amounts of data that change fre-quently, with small tweaks to algorithms that few people understand, best start documenting early!

Scaling Up = Changing Your RecipeA core value proposition for integration of SQL Server with an open source language like Python (or R) is to increase the limited processing capacity of Python and R to consume more data, and to build models on wider data sets. Revolu-tion Analytics did the pioneering work on scaling R to meet enterprise needs, and their acquisition by Microsoft led to the incorporation of R (and later Python) into the SQL Server platform.

Other solutions exist to support scaling ML, of course: dis-tributed cloud computing, specialized chipsets such as FPGAs, use of GPUs, and very large VMs customized specifically to support data science. However, the SQL Server platform has the advantage of being both ubiquitous and accessible by most developers, and it offers a well-tested security model.

Here are some challenges of scaling data science and solu-tions in the SQL Server Machine Learning platform:

Scaling up is rarely a linear process. This applies to cook-ing, as well as ML. A recipe is not a formula, and a model that runs on your laptop will not magically scale to millions of rows on the server. The training time could be quadratic to the number of points, depending on the type of model and data. The problem is not just the size of the data, or even the number of input features that can blow you out of the water. Even algorithms widely known for speed and tractability with large datasets can include features that greatly increase the number of computations and thus the time your model churns in memory.

There are different ways to address computation complexity. Is the model big because it uses a lot of data, or because it’s complex, with many columns and different types of fea-tures? In the first case, SQL Server Machine Learning might be the solution because it supports algorithm optimizations and parallel process that allow distributed processing. In the second case, SQL Server and Machine Learning Server offer ways to chunk and stream data, to process far more data than is possible with native R or Python. You might also collaborate with your DBA and ETL folks to ensure that the data is available for model training and that the workload can be scheduled.

Refactoring processes takes time but saves time in the long run. Functions such as data cleansing or feature engi-

required in scenarios where performance and handling of large data is required, such as these:

• Creating models that can be processed in parallel us-ing revoscalepy, RevoScaleR, or microsoftml algo-rithms

• Saving models to a table, which you can then reuse in any server that supports native scoring

• Loading pretrained models from disk and caching for subsequent predictions

• Scaling predictions from a saved model to multiple servers

• Embedding Python or R scripts in ETL tasks, using an Execute SQL task in Integration Services

Go Pro in the Kitchen with Team Data ScienceGoing pro with data science is more complicated than get-ting bigger data or moving the data to a server from a file share. Scaling up requires fundamental changes in the way you work.

Back to the cooking metaphor, imagine a pastry chef who has crafted an elaborate French pastry. Like the data scien-tist who has painstakingly selected and prepared data and fine-tuned the results using feature selection and param-eters, the result is a one-off masterpiece.

Now imagine that chef being asked to turn that delight-ful recipe into a commodity at the pace of several hundred thousand per day. The problem is no longer one of taste and invention, but of scale and process. And because a lot of money rests on the results, consistency and guaranteed results are critical, as well as accountability for preparation and cooking time and ingredient cost.

Data scientists often find themselves scrambling to effi-ciently productize and hand over the perfect mix of data and algorithm. Tasks include documenting what was done and why, ensuring that results are repeatable, changing the recipe as needed to support scale and cost reduction, and tracking the consistency and quality of results.

The good news is that there’s help from the Team Data Sci-ence Project (TDSP). TDSP is a solution created by Microsoft data science teams to guide a team through development, iteration, and production management of a large data sci-ence project. You can read more here: https://docs.micro-soft.com/azure/machine-learning/team-data-science-pro-cess/overview.

Based loosely on the CRISP-DM standard, TDSP provides a set of templates for reproducible, well-managed data min-ing projects. The templates apply to multiple products, not solely SQL Server, and provide a structure around key data science tasks that you can use to organize a project or com-municate with a client, such as:

Data scientists often find themselves scrambling to efficiently productize and hand over the perfect mix of data and algorithm.


codemag.com

Parameter Help

If you find any part of the parameters mysterious, I highly recommend the series by SQL Server MVP Niels Berglund on the mechanics of sp_execute_external_script: https://nielsberglund.com/ 2018/03/07/microsoft-sql- server-r-services---sp_execute_external_script---i/

lm(y ~ x1 + x2)

If you like a challenge, you could always implement it en-tirely in T-SQL. But using R or Python sure is a shortcut! To see other examples of what you can do with a few lines of R, look up “R one-liners.”

For some additional ideas of how a DBA might have fun with R, I recommend this book by a long-time SQL MVP and a Microsoft PM: SQL Server 2017 Machine Learning with R: Data Exploration, Modeling and Advanced Analytics by Tomaz Kastrun and Julie Koesmarno. The book is a depar-ture from the usual “data science”-centered discussions of Python and R and is written with the database pro in mind. It includes multiple scenarios where R is applied to typical DBA tasks.

ConclusionMy goal was to demonstrate that running Python (or R) in SQL Server is a fun, flexible, and extensible way to do ma-chine learning. Moving from the kitchen to the factory is a true paradigm shift that requires coordination as well as innovation, and flexibility in the choice of tools and pro-cesses. Here’s how the new process-oriented, scalable, com-mercial data science kitchen works:

• Your data scientist contributes code that has been de-veloped and tested in Python, then optimized by the new options in revoscalepy.

• Your DBA brings to the table the ability to keep data optimized through the model training and scoring processes and guarantees security of your product.

• Your data architect is busy cooking up new ideas for using R and Python in the ETL and reporting.

Stone soup? Sure, the combination of ingredients—SQL Server plus an open source language—might seem like an odd one, but in fact they complement each other well, and the results improve with the contribution of each tool, cook, or bit of data.

neering that were run in-line as part of the model building process now might need to be offloaded to an ETL process. Reporting and exploration are other areas where the typical workflow of the data scientist might need drastic change. For example, rather than display a chart in your Python cli-ent, push results to a table so that it can be presented in interactive reports or consuming applications.

Scoring (prediction) is a business priority. Scoring is the process of doing work with a model, of serving up predic-tions to users. Optimizing this process can either be a “nice to have” or a showstopper for your ML project. For example, real-time constraints on recommendation systems mean that you must provide a user with a list of “more items like this” within a second or they leave the page. Retail stores must obtain a credit score for a new customer instantly or risk losing a customer.

For this reason, SQL Server Machine Learning has placed great emphasis on scoring optimization. Models might be retrained infrequently but called often, using millions of rows of input data. Several features are provided to optimize scoring:

• Parallel processing from SQL Server• Native scoring, in which the model “rules” are writ-

ten to C, so that predictions can be generated without ever loading R or Python. Native scoring from a stored model is extremely fast and generally can make maxi-mum use of SQL Server parallelism. Native scoring can also run in Azure SQLDB. (There are some restrictions on the type of models that support native scoring.)

• Distributed scoring, in which a model is saved to a table in another server that can run predictions with or without R/Python)

Data Science for the DBAWant to know the secret cook in the data science kitchen? It’s your DBA. They don’t just slave away on your behalf, op-timizing queries and preventing resource contention. They also do their own analyses, proactively finding and fending off problems and even intrusions.

To do all this, developers often have code or tools of their own running on a local SQL Server instance. However, rather than install some other tool on the server and pass data through the SQL Server security boundary, doesn’t it make more sense to use the Python or R capabilities provided by Machine Learning Services? Simply by enabling the external script execution feature, the DBA gains the ability to push event logs to Python inside the SQL Server security bound-ary and do some cool analytics. Like, for example:

• Looking for a more sophisticated method of analyzing log data? Try clustering or sequence analysis in R or Python.

• Tired of “reinventing the wheel” in T-SQL? Put R in SQL functions to perform complex statistics on all your data.

• Need to detect patterns of intrusion or patterns of user behavior from event logs? Check out the process mining packages in R or Python.

For example, a fun use for Python/R in the database engine is to embed specialized statistical computations functions inside stored procedures. This bit of R code fits a multivari-ate linear regression model:

Jeannine Takaki-Nelson


44 codemag.comEmotional Code


Kate [email protected] www.gregcons.com/kateblog @gregcons

Kate Gregory has been using C++ since before Microsoft had a C++ compiler, and has been paid to program since 1979. She loves C++ and be-lieves that software should make our lives easier. That includes making the lives of developers easier! She’ll stay up late arguing about deterministic destruction or how modern C++ is not the C++ you remember.

Kate runs a small consult-ing firm in rural Ontario and provides mentoring and management consultant services, as well as writing code every week. She has spoken all over the world, written over a dozen books, and helped thousands of de-velopers to be better at what they do. Kate is a Visual C++ MVP, an Imagine Cup judge and mentor, and an active contributor to StackOverflow and other StackExchange sites. She develops courses for Pluralsight, primarily on C++ and Visual Studio. Since its founding in 2014, she has served on the Planning and Program committees for CppCon, the largest C++ conference ever held, where she also delivers sessions.

Emotional CodeI’ve been paid to program since 1979 and for most of that time, I’ve been working with other people’s code. At first it was “add this little feature to something we already have.” These days, it’s “how can we be better” and “is this code worth keeping?” Reading code has always been a huge part of my job, and so I care a lot about the kind of code I (and the people I work with) write.

Of course, I want it to be fast – I’m a C++ programmer after all. It also needs to be correct, yes. But there’s more to it than those two things: I want code that’s readable, under-standable, sensible, and even pleasant.

I’ve put a lot of work into looking at code and seeing how it could be better. Often, I recommend making it better by us-ing things—language keywords, library functionality, etc.—that we’ve added to C++ in this century or even this decade. I show people how to write code that does the same thing but is clearer, shorter, more transparent, or better encap-sulated. Until recently, I didn’t spend a lot of time thinking about why people wrote that code the way they did, even when there were things they could have done at the time that were clearly better. I just made the code better. In this article, I want to talk about that why factor, and about the humans who write the code you read and maintain.

What Are Emotions?Because I mention Emotions in the title, it’s probably a good idea to discuss what they are. I think of emotions as out-of-band interrupts. Emotions deliver a conclusion with-out all the supporting evidence being clearly listed. For in-stance, you’re walking down the street, or negotiating to buy a car, or on a date, and suddenly your brain tells you something like:

• Get out!• Trust this person.• Smile, relax, everything’s fine.• Fight, yell, hit, and scream!

These reactions may be right—you may be talking to a won-derful person you can trust—or wrong—the sales rep may be trying to flatter you into paying too much for the car. But the point is that at the moment the message arrives, you don’t have a nice clear list of reasons to feel that way. Your brain has done some pattern matching and delivered a conclusion. You can act on it or not.

Some people don’t like it when other people take actions based on emotions. If you become angry and leave a situa-tion, you may not be able to explain the precise details that led to your conclusion. You may not be able to prove that leaving was right. I’ve been told that relying on emotional reactions to make decisions is lazy, non-rigorous, or cutting corners.

No Emotions AllowedPeople who feel that that emotional reactions are inappro-priate are especially common in the field of software de-velopment. They ban the disruptive out-of-band interrupts that emotions are. They insist that you must win arguments with logic and not with feeling strongly about things. Just the pure crystalline logic of the 1s and 0s of the matrix. A

lot of people really feel this way and try to suppress emo-tions in themselves and in others.

To a certain extent, they have a point. If we’re arguing about how many parameters a function takes, I don’t want to hear that you feel five is the right number: “It just makes me happy seeing it like that.” I want you to persuade me with logic. But for many cases, the quick overview conclusion delivered by emotions is super useful—not for winning arguments, but for doing my work. I look over an API, a whole collection of functions and all their parameters, and something in me says EEEWWWW. I don’t know exactly what is so yucky, but it draws me in there to give a closer look and to see what I rationally think about that part of the code. I’m not going to say, “pay me to re-architect your whole system because it feels kind of gross and wrong,” but that first emotional response had great value in bringing my attention to a place that needed it. I’ve learned to value those signals a great deal.

But not everyone does. When I tell them “I don’t know what specifically I dislike about this API right now; at first glance, my gut has a problem with it and I know it needs to be looked at so I’ll write you a summary this afternoon,” they may reject that because there’s nothing to argue with, they have to just trust me. They may reject it because they think I’m being weird or emotional instead of using my experience. (This is odd, because emotional and intuitive reactions to situations are how your experience generally shows itself to you.) Perhaps they may just be in the habit of telling other people not to feel emotions: not to get re-ally happy about things, or really upset either. We have a whole strain of humor about this: “Oh, the humanity” and other memes where people are mocked for being upset over relatively small things or for being happy.

Mocking people for their “first-world problems” or replying “oh, the humanity” to them when they’re getting worked up is our way of enforcing a social norm within the programming community, especially programming communities that have roots in the 20th century rather than the 21st. The social norm says “don’t express emotions to me,” and ideally, don’t even have emotions. But here’s the thing: Programmers are human beings and human beings have emotions. Therefore, whether you like it or not, programmers have emotions.

In reality, emotions are a big part of software development. It’s a lot more than writing and debugging code, and the not-code parts of it are FULL of emotion: getting users to tell you

Emotional and intuitive reactions to situations are how your experience generally shows itself to you.

45codemag.com Emotional Code

what they want instead of what they think they want, trust-ing your team, telling the truth about your limitations and dreams, being brave enough to go against the stream when you have to, keeping your integrity and values when you find yourself in a place that doesn’t share them, and much more.

I’m not here to talk about any of that today. I want to focus in on just one myth, just one not-true thing that we all tell ourselves: There are no emotions in code. That there are no messy feelings when it comes to writing code, because code is purely logical.

Your Code Shows EmotionsIn fact, code is full of emotions, right there on the page for anyone to read. Most people don’t believe me, but seeing is believing. Let me show you some examples.

FearWhen you start asking why people do things, you can get some interesting answers. Take commented-out code as an example. You can’t find anyone who likes it. We rant and rail that there is source control; people can take work notes and paste the deleted stuff in there to use later; it’s just confusing when you’re trying to read later, it messes up your searches; and so on. But people do it, they do it every day. Why? Because they’re scared. “I might not be doing this right; I might need this later.”

Here’s an example from real code, found in the middle of a long function.

//if (m_nCurrentX != g_nCurrentX // || m_nCurrentABC != g_nCurrentABC) {//}

That’s a semi-tricky “if” comparing the member variable values of things to some global values with similar names. That’s apparently no longer necessary, but the developer can’t bring themself to throw it away. They aren’t sure they could get it back if they needed it again. They feel timid and afraid about this code base, and about the consequences for not being able to find what they need later.

Another frustrating category of comments: the “don’t blame me” comment. This developer is afraid they’ll be caught doing something that another developer, or a man-ager, will disapprove of, so they leave a comment explaining that it wasn’t their choice, someone told them to make this change. You’ll see things like:

limit = InputLimit - 1; //as per Bill

This developer hasn’t decided how to calculate the limit, isn’t prepared to stand behind that calculation, is telling you “hey, if you have any issues with this line of code, don’t talk to me, go and find Bill, it was all Bill’s idea.” And it’s not just that they aren’t confident how to calculate the limit, it’s also that they’re worried someone is going to object to this line and they want to defend themselves against hostile re-views. Somehow the environment has left people unable to feel confident about even the simplest calculations.

Fear is also why people don’t delete variables they’re not us-ing. They leave in lines of code calculating something that’s never used after it’s calculated. Functions that are never

called. Member variables that are never set or used. “How can I be sure we won’t need it?” the developer is clearly saying. Oth-er times, they know they don’t need it, but they still don’t take the time to clean up. They don’t have time for that. They’re going to “get in trouble” for not getting things done quickly enough, for not getting enough things done each day, and de-leting code that “isn’t hurting anyone” just doesn’t make it onto the priority list. Yes, perhaps some future developer will be slowed a little by trying to figure out what’s happening, but this developer right here is trying to make a deadline or trying not to get fired, and so the old unneeded junk stays behind.

It takes bravery to divert from patterns you see in existing code, even when you know they’re wrong, to stand up for the right way to do things. I see things like this next sample in almost every C++ codebase I meet:

int c, n;int r1, r2, r3, r4;double factor;double pct1, pct2, pct3, v1, v2, v3, v4, v5;double d1, d2, d3;

There’s so much wrong here! None of these variables are being initialized. None are declared close to where they’re used. And the names! I can guess what factor is, but what does r stand for? I am pretty sure v just stands for value and d for double. There’s no information at all in these names. But a scared developer, a developer worried about what code reviewers will say, a developer afraid they’re putting bugs in working code by messing with it, that developer just adds d4 to the end of the list and feels pretty sure that’s the safest thing to do today.

I also see code that checks conditions that don’t need to be checked. Here’s a C++ example:

if (pPolicy) { delete pPolicy; }

Deleting a null pointer is a no-op. It’s harmless. There is no need for this check but I see it all the time. (If the pointer was set to null after the delete, or anything else was done inside those braces, then it would be fine, but that’s not happening here.) I’m also likely to wonder why you’re do-ing manual memory management here but ignore that for now and concentrate on the mindset of someone who keeps checking things “just in case,” someone who’s paying in runtime performance every single time this application runs, because they feel tentative and unsure. Afraid.

It’s easy to say that training people and doing code reviews will teach them that delete on a null pointer is a no-op. But what if the developer is afraid their coworkers will let them down? They check the same conditions repeatedly because they can’t be sure the conditions are always met. “That was in a team-mate’s code, they might have changed it without telling me.” Here’s a place where a thorough test suite, and running those tests after every change, can improve your runtime performance. If you can confidently write the code knowing you’re getting valid parameters from the other parts of the application, think how many of those runtime checks (make sure x is positive, check that y is not over the limit, and so on) could be dropped!

46 codemag.com

think they’re wrong. I came across something called Und-oStevesNonsense() once. Only it wasn’t Steve and it wasn’t nonsense; it was a cruder expression of disagreement. Ap-parently, Steve wrote some code that did things this devel-oper didn’t think should be done, so rather than settling the design as a group or going to an authority, this person just undid those steps and named the function accordingly. Again, if you’re all about runtime performance, the thought of one developer doing steps A, B, and C (that all take time on every run) and another developer turning around and arranging for them to be undone, it’s horrifying. But it’s real in live code today.

SelfishnessFear, arrogance, and anger don’t explain all the bad code out there. Another huge group of developers are selfish. They don’t take the time to clean up, refactor, rearrange, or rename as they go. You can hear them asking “Why should I spend my time making things easy for you?” Imagine you’re given an hour to fix something. You get it fixed in 50 min-utes. If you spend 20 minutes tidying up, subsequent fixes in this area will only take half an hour. You’ll be even after only one fix in this area, and ahead for every fix after that. But if the team is measured on each fix, it’s possible some-one else will be saving that half hour, while the original developer is punished for the 20 minutes extra. So, unsur-prisingly, that 20 minutes isn’t taken.

Selfish code has short and opaque names because the de-veloper didn’t bother thinking of good ones. It uses magic numbers instead of constants with names. There are side ef-fects and consequences everywhere, like using public vari-ables because it’s quicker, or mutable global state because it’s quicker. Sure, it might be slower next time, but next time is someone’s else problem, right?

Selfishness also leads to information hoarding. My job is safe if nobody else can do this. If I explain this to others, I’ll be less valuable because I’ll be replaceable. (As an outside consultant, my usual reaction on hearing that someone is irreplaceable is to change that first. It’s not good for teams and it doesn’t lead to good code in most cases.) This devel-oper sees their coworkers as competitors and doesn’t want to help them. That’s not how good teams work.

LazinessNot all programmers are selfish, of course. Some of them just can’t be bothered. “Whatever; it works. Mostly. We have testers, right?” They don’t use libraries because they resist learning new things or finding new ways to do things. They’re busy typing code. Or copying and pasting code. “Abstraction? Sounds like work to me!” When you suggest that they add some testing, or build automation or scripts, you’re likely to hear, “If you think that matters, you do it.” They don’t show any kind of commitment to the quality of the code, the user’s experience, deadlines, their own future, the success of the company, or the goals of the team. They just want to come in, type without thinking a lot, and go home, having been paid for the day regardless of what was actually accomplished. And it shows in the code!

It’s not just that they haven’t refactored, haven’t spotted useful abstractions, haven’t given things good names. You see repetition, really long functions, a mishmash of naming conventions—all things that are easy to clean up on a slow day when you don’t want to be thinking about new code. But

Sometimes fear is also why a developer does everything by hand instead of using a library. They got hurt by a library once and now they need to see it, step through it, watch it work. They can’t trust anyone else’s code and they’re willing to take longer writing it all themselves, or write naïve code that misses some edge and corner cases, out of the fear of what unknown library code might do.

Arrogance and AngerEarlier, I showed a code snippet with variable names like r1 and v2. I’ve actually asked a developer what those names meant and been asked “aren’t you smart enough to figure out what these are?” When I ask people to explain obscure function names, the responses sound an awful lot like “Why should I explain myself to people who can’t understand it without an explanation?” You’ll see this with deliberately opaque names like f(), foo, and bar. Setting aside their ori-gins in the dark humor of people facing wartime death, foo and bar mean “this doesn’t have a name, it doesn’t need a name, and your desire for it to have a name is wrong.” I think that’s a terrible thing to leave in code for others to read later.

It may come from arrogance, from believing you’re just bet-ter than everyone else and they don’t deserve explanations, or not wanting to take the time to provide them. It can also come from someone who is angry about something else and showing that in their code.

Sometimes this sort of “I’m better than anyone else” is what drives developers to use raw loops instead of some-thing from a library, to write their own containers, and so on. They perhaps ran into a performance problem a decade or two ago in one popular library and concluded that they would always be better than those library writers. Although “it ain’t bragging if you can do it,” very few developers can actually outperform those who concentrate on a specific li-brary all day long. Maybe they measured the performance of their hand-rolled solution with their unusual data against the library solution, but then again, maybe they didn’t. It’s worth checking.

You’d be surprised (I no longer am) how often I find sneer-ing comments and names in live code. People who say lus-ers, PEBCAK, and RTFM in emails and Slack say it in their code too. They say it in their commit messages too: April Wensel found a pile of hits for “stupid user” in GitHub com-mit comments. Obviously, those comments are public (she found them). How do you imagine users would feel discover-ing a commit message that was nothing more than “Be nicer to the stupid user” when learning about a product they used? And that committer needs to try harder at “be nicer,” by the way, because that comment shows a distinct lack of nice. Steve Miller searches executables for swear words: Malware has more of them than non-malware does.

I’ve also seen function names and variable names that drip with disdain and contempt for the work being done. No, I don’t mean calling a variable “dummy.” I mean things like putting a coworker’s name in a function to show you

Malware has more swear words in the code than non-malware.

Emotional Code

47codemag.com

miss a deadline, or don’t close enough tickets each day, or get changes requested after a code review, then naturally the way to help them write better code is to reassure them about those fears. Yelling at them to get their code quality up is only going to increase their fear and will probably make the code worse. But if someone is writing bad code because they’re selfish or arrogant and not interested in making things easier for the rest of the team, they don’t need reassurance, they need re-minders about what’s important to their employer.

Are your management practices causing runtime performance issues?

I also keep the emotions my code demonstrates in mind when I’m writing—even one-page samples that can fit on a slide or in an article. People will copy what they see in your code base as they work on it. Do you want them to copy the fear or arrogance or selfishness they see there? Do you want them to believe that good developers use short and meaningless names?

Look Where You Want to GoIt’s tempting to conclude that you should try to take the emotions out of your code. But I don’t think it’s possible. The bad code showing that the author was afraid is timid, overly cautious, self-protective. Take that fear away and you don’t have neutral code, you have confident and capable code. The selfish code that hoards information, after you’ve done some refactoring so it explains itself and has clear names, isn’t neutral: it’s generous. Once you understand that everything you write is lighting up with good or bad emotions, why not, when you can, take the time and put in the work to show what you stand for and create code that inspires and helps others?

ConfidenceYou can show confidence in your code. Start by deleting things you don’t need—old code, variables whose values aren’t used, member variables that are never set or read. You have source control, after all. I take “work notes” as I tackle specific tasks, and if I rip out several lines of code that are now obsolete, I can paste them into those notes, where they’ll generally be easier to find than if they stay in the code, but commented out. As a nice side effect, my code now appears less tentative, less worried about the future, less concerned with what a reviewer will think of me.

When you’ve just added a feature or fixed a bug or otherwise been working on some code, take the time to clean up after-wards. You’ll never know more about this code than you do right now, and now is the time to record that knowledge in the code itself. Give things names that explain your think-ing. Leave comments that guide people through the sharp edges if there are some. This might help you later or it might help someone else. Code like this says, “I know I’m right, let me show you.”

When you come across something obsolete, that can be done better using a new language feature, change it. (You have

that time just isn’t taken. You see bugs that would be easily caught if you turned up the warning level, because turning down the warning level is a classic lazy person’s way of get-ting their code to compile and run. And you see disorganiza-tion and mess because the effort to think clearly about the problem and organize the code to communicate about the problem is more effort than this developer is willing to make.

Now, a warning. Some teams practice crunch. If you keep a team in crunch indefinitely, they will end up behaving in a way that is indistinguishable from laziness. They literally don’t have time to find out how to do things more quickly. They live in fear of the drop-dead deadline, that if it’s missed, they will be fired, the company will collapse, they will for-ever be associated with the failed project. They can’t invest an hour to save a day or a week. Nobody is letting them have the hour. They already aren’t sleeping enough and are keep-ing track of whose marriage has failed so far on a whiteboard in the common room. The code that comes out of crunch is rarely good code. Sometimes, that doesn’t matter. When it does matter, good teams go back and clean up afterwards, when the deadline has been met. If no one has done that and the artifacts are still in the code, I know there’s a lazy devel-oper that’s getting away with it, or a burned-out developer that no one looks after, or perhaps perennial crunch. The bad code points directly to bad management practices.

Why Does This Matter?So yes, your code can show emotions to me. I can see fear, arrogance, selfishness, laziness, and more, right there on the screen. In some sense, it’s true that code is only logic and has no emotions. You’re given a rule like “if today’s date is after the due date, then the item is overdue and this is how it gets processed.” There’s no emotion to that. The person doesn’t get a break on the overdue process be-cause they’re cute, or get extra fees charged because they’re rude. But everything in the way you implement that rule, including the variable and function names you use, whether you make checking for overdue-ness a member function of something, what data type you use for dates and how you test “after,” all of that can and does carry emotion to some-one who knows how to read it there.

Of course, one single-letter variable name does not a psy-chopath make. Sometimes calling a variable i is the right thing to do. I see emotions in patterns, not in single in-stances. And when you see a pattern, and learn about the team and the developers, you don’t go and find the code author and say “wow! I never knew before how scared you are all the time!” Seeing the emotional causes of bad code can give you empathy as you read and fix legacy code. I no longer yell out “what were you thinking?” as I read old code. I understand now that sometimes people were under time pressure, were being measured, were getting unpleasant code reviews, or were missing the tools we all rely on, and that put them in a mindset that produced this sort of code.

For me, gaining these insights also leads naturally to suggest-ing changes to a team or a workplace. That might mean that a particular team member should learn something, or that a particular management practice should be amended. Ask your-self, are your management practices causing runtime perfor-mance issues? They often do and knowing that may be enough to get them changed. It can also direct the way you try to get a particular developer to write better code. If they’re writing bad code because they’re afraid that they’ll be fired if they

Emotional Code

48 codemag.com

statements instead of one. Someone might change the 50 in the first case but forget to change it in the second. And so on. Most of all, it takes you some time to reason through this and figure out what the rule is as implemented here. Compare to this line:

return (application.CreditScore > 50);

It does exactly the same thing. It’s much shorter, and it can’t get inconsistent. There’s only one 50 in there—if you change it, you’ve changed it in all the places you need to. You don’t leave the reader looking at your default case trying to imagine an integer that doesn’t meet either of the first two cases. And there’s no local variable to trace through the code: That’s one less moving part for the reader to hold in their head.

Something that’s rare for me when I start to look through a team’s existing code is that it simply compiles, links, runs, and passes the tests. Oh, it compiles, but there are warn-ings. Maybe hundreds of them. And the team members all know there are 417 and if ever there are 418, somebody needs to look into it. Or it runs, but you get an exception on startup, just hit Continue and don’t worry about it. Or it leaves a few stray files behind and you have to hand de-lete them before you run it again. Or it passes the tests, but there are only seven of them. When I meet code that compiles without warnings, runs smoothly without by-hand steps, and has complete and well documented tests, I feel really looked after. Here’s someone who doesn’t have to be asked to do it right. They aren’t using tools for the sake of tools, or for fun, but to make things run smoothly.

I can see that sort of work ethic when I look inside the code, too. It uses modern constructs or libraries because the de-velopers are always learning. It follows modern practices, not just churning out code. Code like this (and the scripts and tests that surround it) show a commitment to the fu-ture, to that developer’s own ease and to the team’s suc-cess. This is the code of a hard-working programmer who doesn’t do just the minimum to skate by. Who doesn’t just copy what was there before, including the bad patterns and the bad code. Who takes the time to see if now is the time to change that thing that sort of grew organically and has become unwieldly and almost unmaintainable.

Choose to Show Positive EmotionsSo sure, your code could show fear, selfishness, laziness, and arrogance. But why not show confidence, generosity, humility, and how hard working you are? Your code will be easier to read and maintain. You’ll enjoy reading and main-taining it more, and your reputation will improve as other people realize they can understand what you write, it’s easy to change it when life changes, and it’s generally better code. Even if the code isn’t better, there’s a lot to be gained from writing this way. But it probably will be better.

I want you to care about those who wrote the code you main-tain and those who maintain the code you write. When you find crummy code, fix it. Show your confidence. Clean up. Make it right. Name things well. You’re going to show emo-tions in your code and they might as well be positive ones!

tests, right?) When you come across something hand-rolled, replace it with a known-good library approach. (Again, you have tests. You can do this.) When you find a raft of variable names like r1, r2, r3, change them to something better. Let your code say, “I’m brave enough to stand up for doing things the right way.”

HumilityThe opposite of arrogance is humility. Knowing that you’re good is not the same as thinking you’re the best at every-thing. Code acknowledging that the next person to read is likely to be a good developer who deserves an explanation is humble code. So use libraries, and include a link to the documentation somewhere. In C++, it’s easy to add a com-ment on the line where you #include the header file. In other languages, you can find somewhere else to put that comment so that people copying and pasting later will paste the link as well as the code that uses the library.

Write gentle comments that tell why, not what. But don’t rely only on comments. Where things aren’t obvious, you want to leave some help for the next person, and names are always better than comments for functions, variables—ev-erything. When you imagine the next reader of your code, don’t imagine someone less than you; imagine someone better than you. After all future you is likely to be better than current you, and future you is the most likely next reader of this code.

Generosity and Hard WorkMy eyes light up when I read code that’s truly well done. When I see clean engineering that was done to make next time easier, well thought-out encapsulation, and that elu-sive appropriate level of abstraction. Someone has taken the time to clean up: to refactor, rearrange, and rename things so that the code makes sense and leads me through the process.

Information sharing is also generous, someone who’s think-ing “my job is safe if we can all do this.” Their comments are enlightening, they’ve chosen good names, and they’ve arranged the code in an order that makes sense to the read-er. “First we prep the order, then ship it, then update the inventory. I get it.” That sort of clarity isn’t easy, and it’s generally not what you write the first time. Someone has put in the effort to make this code good.

Sometimes code just strikes you as brilliant. It’s clearly and obviously right, and dramatically easy to grasp. I don’t mean that it’s clever. I mean that it’s obvious. Consider this C#:

var creditScore = application.CreditScore;switch (creditScore){ case int n when (n <= 50): return false; case int n when (n > 50): return true; default: return false;}

This code isn’t wrong. It compiles without any warnings, and it implements the logic that’s needed. But there are three cases (two ranges and a default) and lots of places to make mistakes. Someone might return true in two of these return

Kate Gregory

SPONSORED SIDEBAR:

Are Your Apps Stuck in the Past?

Need free advice on migrating an existing VB6, FoxPro, WinForms, Access, ASP Classic, or C++ application to a modern platform? CODE Consulting has years of experience migrating applications and has experience in ASP.NET MVC, .NET Core, HTML5, JavaScript/TypeScript, NodeJS, mobile (IOS and Android), WPF, and more. Contact us today to schedule your free hour of CODE consulting call with our expert consultants (not a sales call!). For more information, visit www.codemag.com/consulting or email us at [email protected].

Emotional Code

www.developerweek.com

50 codemag.comPOURing Over Your Website: An Introduction to Digital Accessibility


POURing Over Your Website: An Introduction to Digital AccessibilityIn 2010 when I was diagnosed with fibromyalgia, I did what any good nerd would: I started searching for more information online. While searching, I came across many disabled people and disability communities discussing the issues they had with accessibility—both physical and digital. This opened up a whole new world for me: Accessibility was never taught in college

Ashleigh [email protected] @shimmoril

Ashleigh is the Application Development Manager at Neovation Learning Solu-tions (www.neovation.com/) in Winnipeg, Manitoba, Canada.

Ashleigh is a vocal advo-cate for accessibility and inclusive design. Earlier this year, Ashleigh was a speaker at TedxWinnipeg, bringing the idea and foundations of digital accessibility to an audience of nearly 700 Winnipeggers.

In her free time, Ashleigh consumes a truly frighten-ing amount of pop culture media, including movies, TV shows, comic books, and novels. You can usually find her with Pokémon Go open on her phone, no matter where she is or what she’s (supposed to be) doing.

or mentioned in any programming job I’d ever had. I, like many people, assumed that the Web was for everyone, and it would just magically work. My eyes were opened, and I found it terribly unfair that people were struggling so much to access the greatest resource of human information ever. Wasn’t the Internet supposed to be the great equalizer?

Accessibility StandardsLet’s start with an introduction to some of the standards that govern Web accessibility—there are several. You’ve likely heard of the W3C, or the World Wide Web Consortium (https://www.w3.org/), the standards body for the Web. The WAI is a part of the W3C that specifically deals with accessibil-ity, as the Web Accessibility Initiative (https://www.w3.org/WAI/). There are two primary standards developed by the WAI: WCAG 2.1 and ATAG 2.0.

The Web Content Accessibility Guidelines (WCAG) (https://www.w3.org/TR/WCAG21/) are currently at version 2.1, re-leased in June 2018. WCAG deals with accessing content—think about reading a newspaper or magazine article online or watching videos on YouTube.

The Authoring Tool Accessibility Guidelines (ATAG) (https://www.w3.org/TR/ATAG20/) are a newer standard; v2 is from 2015. ATAG applies when tools are provided to create content, i.e., writing a post on Medium or creating an Instagram story.

You’ll notice that many familiar services are bound by both standards. If you provide a product that allows users to cre-ate and post their own content, you’ll need to consider ATAG for content creation and WCAG for content display. I’ll focus on WCAG in this article, because there are far more people viewing and reading content than there are creating it, even when you consider social media.

WCAG OverviewWCAG is broken down into four guiding principles: Perceiv-able, Operable, Understandable, and Robust, often abbrevi-ated as “POUR.”

These principles are key to understanding digital accessi-bility: If you remember nothing else, try to keep POUR in mind. There are three WCAG ratings, single A (A), double A (AA), and triple A (AAA). In general, you’ll be aiming for AA, except in cases where AAA is easily attainable with a little extra effort. If you’re new to accessibility, A may seem like a good place to start, but be aware that level A criteria are seen as the barest of the bare minimum; not even a real improvement over forgetting or disregarding accessibility altogether.

PerceivableLet’s start with perceivable, the first step for accessibility and the foundation for the other three elements. After all, how can you operate or understand something if you can’t perceive it?

All input to our brains comes via one or more senses, with seeing, hearing, and touching being the primary ways for humans to take in and convey information. A person who’s blind or visually impaired needs to use another sense, such as hearing, to access information that sighted people get visually. You may have heard of screen readers, which are tools that literally read out the underlying code of a site so a blind or low vision user can access it.

Screen ReadersThe most common categories for disabilities are pain, flex-ibility, and mobility, followed by mental health, dexterity, and hearing, and then seeing, learning, and memory. Al-though screen readers are essential tools, focusing only on screen reader users means that you’ve ignored six other more common disability categories completely. And of course, disabilities have multiple aspects—over 75% of dis-abled people reported in more than one category.

If you’re curious about how screen readers work, I can al-most guarantee that you have one in your hand or pocket right now. All Apple and Android phones and tablets come with one built in, and they’re pretty easy to pick up and start using. Take a look at the settings and enable VoiceOver (Apple) or TalkBack (Android) and give it a try for yourself. What’s the same and what’s different about accessing sites and apps this way? Can you access them at all?

Semantic HTMLFor Web accessibility, properly semantic HTML is key for many reasons; knowing how screen readers work can help you understand why. These tools present much more of the underlying structure of a site than a sighted user may ex-pect. For example:

• A screen reader describes a link with the text of “All” as “All, link” or “All, visited link” depending on the state.

• A screen reader announces specific elements such as but-tons and headers along with their level (i.e., h1, h2, etc.).

• For images, screen readers attempt to read out a de-scription of the image (alt text), if one’s provided. If not, the file name of the image is used instead, even if it’s obfuscated or otherwise confusing and unhelpful.

• Most screen readers ignore non-semantic elements, such as divs and spans.

The div example brings up a common scenario: If you’ve done any Web development, I’ll bet that you’ve seen cases where

51codemag.com POURing Over Your Website: An Introduction to Digital Accessibility

Did you know that all the major social media platforms have a feature where you can add alt text to your images as you post them? It’s not enabled by default (which could be considered a violation of ATAG), but check out the settings on Facebook, Twit-ter, and Instagram and start adding alt text to all your images. Unfortunately, you can’t add alt text to a gif on these platforms, but you could include a description as part of your content. Af-ter all, you never know when you’re going to go viral, and every-one really should be able to experience your brilliance and wit.

OperableNext up is operable. How do you usually move around online? Probably with a mouse or by tapping on a phone or tablet, right? If you have to enter information, you use a keyboard.

Some users, such as those with Parkinson’s, may not have the fine motor control needed to use a mouse. In other cas-es, people with limited mobility may use assistive technol-ogy that replaces their mouse or keyboard entirely. These users can also have difficulty with areas that are too small to tap accurately, particularly on a phone or tablet.

Operable generally refers to keyboard accessibility, because most assistive devices mimic keyboard functionality. This also makes operable one of the easiest principles to test: I dare you to disconnect your mouse and try to navigate through a site you use every day.

• Is it always clear where you are on a page (i.e., the focus state)?

• Can you interact with all the controls you normally would—things like menus, buttons, and drop-downs?

• Is it possible to bypass irrelevant or repetitive sections easily, such as the header or navigation, via a skip link?

• Is the content structured in a consistent and meaning-ful way, with labels always positioned before, and prop-erly associated with their controls (i.e., can you interact with a control by clicking or tapping on the label)?

• And what about hotkeys? You probably already know about using the backspace key instead of the back button, but do you know about the hotkeys specifi-cally for, say, the Twitter Web client, or do you have to figure them out by trial and error?

Speaking of errors, users will make them, so it’s crucial to provide ways for users to recover from these errors. For the most part, this isn’t too difficult: If someone makes a post by accident, you can allow them to edit it or delete it and start again. But what if a user transfers $1,000 to the wrong account? This can be much harder (or impossible) to undo. Things like confirmation screens, alerts, and warnings ben-efit all users, not just those with disabilities. Even better, provide instructions before users start a long or compli-cated process; these help users avoid errors from the start.

These examples show inclusive design principles—rather than designing a product and only then figuring out how it might work for disabled people, you can intentionally design prod-ucts that work for all users, including those with disabilities. Giving users control over timing and time limits is also cru-cial to operability—I’m looking at you, Ticketmaster! Con-sider the situation where you’re trying to buy tickets for your favorite band and there’s a time limit on the process. Now think about the experience of someone with a cogni-tive disability who has a slower response time. And what

a div has been styled to look like a button, instead of using the actual button element. This is problematic accessibility-wise for a variety of reasons. A screen reader or other assis-tive device may skip it completely, confusing users. Using the wrong element may cause assistive technologies to announce it incorrectly, which causes frustration. Or, your button may be inaccessible by keyboard—and many disabled people rely on keyboard-based navigation. Ultimately, you lose all of the built-in benefits available with a semantic button.

Semantic elements such as buttons, links, inputs, checkbox-es, etc. provide information to a browser that gives context to the element, even when it stands alone. Non-semantic ele-ments, such as divs and spans, don’t provide this contextual information to the browser, and therefore they do not provide necessary information to assistive devices or to users.

ColorColor is a big aspect of perceivable. A large number of peo-ple are color-blind or have low vision, so in order to have accessible content, you must use colors that provide suf-ficient contrast between the background and text (or image) colors. The WCAG AA standard for color contrast requires a ratio of 4.5:1; for AAA it is 7:1.

Luckily, you don’t need to do the math yourself! There are a number of tools that let you input the background and foreground colors in hex, RBG, and HSLA formats and re-ceive the contrast ratio. Some tools even take font size into consideration—a larger font can have a lower contrast ratio than the general guideline and still be AA or AAA compliant.

Using color alone to convey meaning causes accessibility is-sues as well. Someone who’s red-green color blind may not be able to tell the difference between a grade written in green (i.e., a pass), and one written in red (a failure). Add-ing another indicator, such as a thumbs-up or thumbs-down icon, provides additional context, and makes the site more usable for everyone, not only users with disabilities. Be careful with icons though—solely using icons (no text) can cause issues for people with cognitive disabilities or non-disabled users who have a different cultural background.

TransformableIn order for content to be perceivable, it has to be trans-formable as well, to support users with different abilities. In general, you’ll want to provide text alternatives for images, video, and audio.

• For images, transformable content is called alt or al-ternative text, and it’s a description of an image that conveys the necessary meaning to a screen reader user. You may also need to consider whether this image is purely decorative, in which case you provide an empty alt text attribute to hide it from screen readers.

• With videos, transformable content can mean captions or described audio, which are distinct and have different features. Captions transcribe speech; a similar feature you may be familiar with is subtitles on a TV show or movie. Described audio is an audio description of everything ap-pearing on the screen. Think of it as telling a friend about the amazing hotel room you had while on vacation: “When I turned the corner, and saw the room for the first time, it was HUGE! There was a sunken coffee table area and…”

• When you have purely audio content (like podcasts), a transcript is required.

52 codemag.com

There are standards for HTML and CSS and you should be following them. Not only is this good for accessibility, it also provides a certain amount of future-proofing. When Apple first announced that websites would be available on the Watch, they noted that any sites using standard HTML would just work. Other sites, well they were in a bit of trouble.

Accessibility with ARIASo how do you make Web content accessible when there’s no standard? For example, there’s no semantic HTML element for tabs, so you must create your own visual representations using lists and divs.

Fortunately, there’s a tool called ARIA—Accessible Rich In-ternet Applications—that’s meant for cases when you have to go above and beyond the existing standards. The first rule of ARIA is that you don’t use ARIA.

Let me explain: Bad ARIA is actually worse than no ARIA at all. When you have a non-standard HTML element or have used a standard element in a non-standard way, you may cause confusion or problems for disabled people using as-sistive technologies, but it’s likely that they’re familiar with these types of problems and can work through it. However, if you have added ARIA without knowing exactly what you’re doing, it’s very likely that you’ve made that element, or pos-sibly your whole site, completely unusable.

ARIA should be used in three cases:

• When you have dynamic content. A common pattern in Web development is to create a single-page applica-tion (SPA), where the page never actually refreshes. This is problematic for screen reader users, because they have no indication that the page contents have changed or have been updated.

• When you need to expose your content to either sighted users or assistive device users, but not both (or both but in different ways). For example, you wouldn’t want your skip link to be visible to all users, but it should be focusable by anyone using assistive technology.

• When you have advanced UI elements. As I men-tioned above, there’s no standard HTML element for tabs, so you create these controls using lists, divs, and a whole bunch of CSS to make them appear visually like tabs. But what about screen reader users? You can use ARIA to indicate the relationship between a list item and a div, as well as the current state of the div (i.e., is the tab active/visible).

ARIA is an extremely powerful tool that allows you to hook directly into the accessibility tree of a browser. The accessibil-ity tree is basically the DOM (document object model), but for accessibility. All styling is removed, any elements that have been hidden from assistive devices aren’t included, etc. There are three aspects of ARIA that affect how an element is pre-sented in the accessibility tree: role, relationship, and state.

RoleSemantic and interactive HTML elements have a role based on the type of control—button, checkbox, heading, etc. Non-interactive HTML elements can be given a role via ARIA, and the default role of a semantic element can be overrid-den. Consider this snippet of code:

if they’re also using an assistive device, which could also increase their response time.

This kind of control also applies to animations and videos: An epileptic user needs to be able to stop or pause a rapidly flashing animation or video before it causes a seizure, and all users benefit by being able to scan through a video to see what they missed (or skip ahead to the good parts).

Understandable Once your content is perceivable and operable by a wide variety of users, you need to ask whether they can understand it. Lan-guage choices matter to accessibility: What counts as “under-standable” depends on your audience, but unless you’re 100% sure about education, background knowledge, life experience, and culture, it’s best to default to very simple and concise text.

Basically, you don’t want to use a long word when a short one would work just as well—and probably better. For example, instead of saying additional, say added. Instead of contains, say has, and rather than expiration, just say ends.

Functionality also needs to be understandable. At some point, developers decided that the hamburger icon (three stacked hor-izontal lines) would represent menus, and people have become familiar with this icon. So, unless there is a very good reason to break from the standard, you should also use the hamburger icon to represent menus on your site. If you do decide to do something unique, it’s important that you’re at least consistent within your own site. Don’t use both the hamburger and your custom icon or you’ll just end up confusing your users.

Forms often present problems with understandability. For example, when the Save or Next button is disabled, it’s not always immediately clear what a user needs to do to enable that button. Rather than force users to poke around the form, filling in values at random until the button becomes active, consider accessibility and inclusive design. Although it’s pos-sible that someone without a disability may figure out how to get the form working more quickly than someone with a disability, it’s a terrible experience for both users.

RobustLast, but absolutely not least, we have robust. Does your site work in Chrome? Probably. What about Firefox? Yeah, probably that one too. What about IE11 and Edge? Safari, UC Browser, Brave? Does your site work on older versions of these browsers? What about the developer/canary versions? Does your app work on an iPhone 5? What about an iPhone X, with the notch at the top? What about allllll the different types of Android devices?

The key for robust is that unless there is a critical compatibil-ity issue, everything should just work. Users need to be able to choose their own technologies in order to meet their own unique needs. I can feel you glaring at me through the page: This doesn’t mean that you must support every version of every browser out there, just that you shouldn’t be too restrictive.

Having a message on your site that it only works in the lat-est version of Chrome doesn’t do anyone any good. When thinking about inclusive design as well as accessibility, con-sider the user who is at work and doesn’t have permission to install the latest version of Chrome, as well as the disabled user who can’t use Chrome with their assistive technology.

POURing Over Your Website: An Introduction to Digital Accessibility

53codemag.com

this custom checkbox to make it properly accessible. Or, you could just use the standard, semantically correct checkbox input element instead and get all the functionality auto-matically. Up to you!

Next StepsKnowing anything about accessibility—even just a vague idea of what it means—puts you in a better position than many people out there, unfortunately. I urge you to keep accessibility in mind, remembering that small changes are important and they do add up.

When you’re thinking about which library or framework to use; when you’re considering performance enhancements; when you’re writing a spec doc, designing a mockup, or writing a test script—just put the word accessibility out there. Maybe it’ll force you to reconsider some assump-tions. Maybe you’ll realize that you don’t need a library at all, because there’s a native HTML element that does what you need. And you know what? Maybe nothing will change. But sometimes it will. In the future, you’ll consider another aspect of accessibility. Build an accessibility toolbox, just as you would with any skill.

Maybe you’re thinking that you don’t have time for acces-sibility. However, using properly semantic HTML is much quicker than using custom elements, so you’ll likely be able to gain some time that way. Then think about all the times you’ve given an estimate for finishing a task and completed it quicker than expected. Now you have an extra half hour or more to dedicate to accessibility, and tasks like checking the contrast ratio, adding alt text to images, and verifying keyboard navigation take very little time.

Don’t forget about your internal tools and systems: Just because you’re not selling something doesn’t mean that ac-cessibility doesn’t matter. Could you hire a visually impaired developer tomorrow and have them use the same tools as your current team?

I’ll also ask you to think about physical accessibility, which is still a problem and concern for many disabled people. When you go to meetups or conventions, ask if they’re wheelchair accessible (and make sure they’re truly accessible, not “oh, there’s just one small step, I’m sure it’s fine”). Ask if they have accessible and gender-neutral washrooms. Ask if there are accommodations for visually impaired, blind, or deaf users. If they’re serving food, ask if there’s a process for handling allergies and sensitivities.

Disabled people are already very aware of places they’re not welcome, which allows those venues to say, “disabled people don’t come here, they’re just not interested in what we provide.” In reality, disabled people just don’t want to deal with the frustration and possible humiliation of not being able to access a building or event. However, if abled people start asking these questions, accessibility will be presented to the venue as something that’s desirable and changes (may) happen.

I’m here to put accessibility into your heads, and I hope you’ll do the same with others in your communities.

<h1 role=”button”> This is a heading - or is it?</h1>

Here, I’ve used a semantic element, a heading, but I’ve also provided an ARIA role of type button. Visually, this will be rendered as a heading. However, as far as the accessibility tree is concerned, this code represents a button. Screen readers and other assistive devices will announce this as a button, but the user won’t actually be able to interact with it as a button—there’s no click event or handler, no hover state or tooltip.

And of course, I can also do the opposite:

<button role=”heading” aria-level=”1”> My Button?</button>

Again, I’ve overridden the role of a semantic element (in this case a button), so the accessibility tree believes it’s a Level 1 heading. You can see why it would be better to have no ARIA at all, rather than misleading and just plain wrong ARIA.

RelationshipARIA can be used to describe the relationship between two elements. This is similar to setting the for attribute on a label, which links the label and the control together, as far as the browser is concerned.

If you think about the tab example again, you can use ARIA to link the first element in a list (i.e., the “tab”) to the appropriate div. This tells the browser—and the user, ulti-mately—that clicking on a specific tab activates or displays a specific div.

Another use for ARIA relationships would be for a descrip-tion of a chart’s data. In this case, you’d display your chart, and then have a visually hidden div or span linked via the describedby attribute. When a screen reader user reaches the chart, the contents of the div will be read out because the two elements have been linked together in the acces-sibility tree.

<canvas aria-describedby=”chartDesc”></canvas><div id=”chartDesc” style=”display:none;”> This is a description of the data in the chart.</div>

StateIf, for some reason, you decide that you need to create a custom checkbox instead of using the default one, you could use ARIA to expose the current state (checked or un-checked). <div role=”checkbox” aria-checked=”false”> On</div><div role=”checkbox” aria-checked=”true”> Off</div>

Please, please never do this! This is an extremely simpli-fied example—you’d need at least five other attributes on

Ashleigh Lodge

POURing Over Your Website: An Introduction to Digital Accessibility

54 codemag.comBest Practices for Data Visualizations: A Recipe for Success


Helen Wallwww.helendatadesign.com

Helen Wall is a power user of Microsoft Power BI, Excel, and Tableau. The primary driver behind working in these tools is finding the point where data analytics meets design principles, thus making data visualization platforms both an art and a science. She considers herself both a lifelong teacher and learner. She is a LinkedIn Learning instructor for Power BI courses that focus on all as-pects of using the application, including data methods, dash-board design, and programming in DAX and M formula language. Her work background includes an array of industries, and in numerous functional groups, including actuarial, financial reporting, forecasting, IT, and management consulting. She has a double bachelor’s degree from the University of Washington where she studied math and economics, and also was a Division I varsity rower. On a note about brushing with history, the real-life characters from the book The Boys in the Boat were also Husky rowers that came before her. She also has a master’s degree in financial management from Durham University (in the United Kingdom).

Best Practices for Data Visualizations: A Recipe for SuccessOver the last decade, many companies debuted data visualization platforms that enable organizations to analyze trends and understand their businesses better. Tools such as Microsoft Power BI, Tableau, Amazon QuickSight, and QlikView represent just a few of the many potential applications businesses could leverage. Given that CODE Magazine focuses on computer programming,

many of us easily fixate ourselves on the potential capabili-ties for creating queries, developing custom programming code, or creating extensive calculations on the back-end be-hind the scenes. However, the majority of business users use these data visualization platforms to dynamically interact with this top layer of application: the dashboard. To develop these user-friendly dashboards, we need to think more like designers instead of programmers.

The Importance of Dashboard DesignThe best-designed and implemented dashboards appear effortless; we like them more, but yet can’t quite explain exactly why. Their likeability comes not by accident, but by mindfully following design principles implemented with the end viewer in mind. Good design isn’t an accidental result, but rather a strategic decision to choose to make the small design choices that have a huge impact on the end result.

NudgingIn order to maximize the likelihood that users make your dashboard part of their everyday processes, you need to design dashboards that guide users through not only key visuals and figures, but also how they collectively interact with each other. A dashboard can check all of the user’s requirements, yet still not yield much value to the user be-cause there’s a dissonance between what the y tell you they would like and how their actual behavior responds to the dashboard. How do you bridge this gap? Let’s put this dis-cussion in the context of the nudging proposition.

Nudging is defined as tiny prompts that alter our behav-ior—specifically social behavior. Richard Thaler extensively examined nudge theory in his book Nudge. Examples of nudging techniques include:

• Encouraging recycling, by placing a larger recycling bin in a more prominent location than a smaller gar-bage bin in many cities and businesses.

• Sending out electricity bills that compare usage to that of neighboring housing units to encourage users to limit their electricity usage.

• Charging even the nominal amount of a few cents for single-use plastic bags encourages people to bring their own reusable bags when shopping or not use one at all.

Much like Ikea furniture comes in a box with pre-cut pieces and instructions for assembly, you want to present your dashboards to the user in a similar manner. Think of the in-struction manual as the nudging component to the product. This translates to techniques in designing data visualiza-tions (which I’ll discuss later) such as:

• Choosing visuals to convey strategic points• Positioning and the number of charts/visuals• Instruction prompts• Colors to point out key trends

Users want you to do the analysis of the components before-hand, but they want to interact with the data themselves using the instructions you provide. If the pieces don’t fit together, or if the instructions don’t make sense, building a successful final product becomes much more difficult.

The Starting DashboardI chose Tableau for this example because I think it enables a focus on making changes with best design practices in mind, given that the tool focuses on the visuals themselves. It also works on Apple products at the time of publication for this article, and Microsoft Power BI, unfortunately, does not. You can use different versions of Tableau, but for the purposes of this article, I’ll use Tableau Public Desktop, which is free to download. You can download the starting file from Tableau Public Online to follow along with the changes I make in this article, and then save it by uploading it to your own Tableau Public Online account.

I obtained the infant mortality data from the impressive data section of the Gapminder website developed by the late Hans Rosling. The data set contains the infant mortal-ity rates by country and by year from 1800 until 2015. Note that there are incomplete data sets because some countries may not have data (or at least useable data) for all of these years. I categorized each country into its own region of the world, as you can see in the region mapping key file in Figure 1.

The Gapminder site defines infant mortality as the number of deaths in the first two years of life for every 1000 live births. Although you may think this project is taking a mor-bid direction, you’ll see that the Tableau dashboard helps communicate a much more positive outcome. A decrease in these rates means that the survival rates are increasing and improving global health outcomes.

To walk through the steps of applying best visualization de-sign practices, let’s begin with a less-than-optimal Tableau dashboard I created, as seen in Figure 1. I can make stra-tegic design and formatting changes that transform it into a more effective dashboard built with the end user in mind.

You will need several components to try this out on your own, including the Tableau Public dashboard links below and the link for the two Excel files and the PNG image that are available on the CODE Magazine page associated with this article.:

55codemag.com Best Practices for Data Visualizations: A Recipe for Success

and they enjoy the process, you’re telling them that you value their ability to analyze the data trends and take own-ership in this process. Psychologists define this as the “Ikea effect” (https://www.bbc.com/worklife/article/20190422-how-the-ikea-effect-subtly-influences-how-you-spend) where the customers (in this case dashboard viewers or users) feel they achieve the greatest value for their invest-ment. This is the Holy Grail for many businesses.

On the flip side, you need to do a lot of work on your end to get the users to feel this empowerment and ownership in the process. Making the process easy for the user involves putting yourself into their thought patterns to analyze the unknowns and numbers before they even see them in the dashboard. These areas for you to analyze beforehand for them include:

• The meaning and magnitude of data set numbers• The relationship between data points and data fields• Optimal ways to see the data in visuals and charts

How do you want to measure the infant mortality rate? Does a higher rate indicate a better or worse metric? You need to establish that you want to see lower infant mortality numbers because this means that more babies are surviving out of infancy, which also indicates improving public health outcomes. You can’t assume that the reader already knows this, and you need to explicitly say what the numbers mean in context of the bigger picture.

Furthermore, you also need to indicate how you’re aggregat-ing these infant mortality numbers. Each point in the data source represents the infant mortality for a given year and country. If you wanted to determine the global infant mortal-ity rate for 1990, for example, you need to analyze the data points for all of the countries that year. It doesn’t make any sense to sum them together because they represent rates and not absolute values. It makes more sense to average all of these data points to get this global mortality rate the years and countries we want to see as an aggregated number.

• Starting Tableau dashboard: https://public.tableau.com/views/VisualizationBestPracticesstartingfile/Dashboard1?:embed=y&:display_count=yes&:origin=viz_share_link

• The Excel file from Gapminder data with the infant mortality rates

• The Excel file for country to region mapping• The Gapminder logo• Ending Tableau dashboard: https://public.tableau.com/

shared/8TZ3K2TWW?:display_count=yes&:origin=viz_share_link

You need to update the Tableau file to point to the Excel file on your own dashboard:

1. Download the Tableau Desktop Public application (free ver-sion) or you can use Tableau Desktop if you already use that

2. Download both Excel files and the Gapminder logo to a folder in your own desktop or documents folder.

3. Open up the Tableau link for the starting dashboard with your own Tableau application

4. Go into the Data Source tab of the Tableau file, click on each of the connections for the rates and region key, and set the folder connection to the path on your own desktop.

After you update the sources, the rest of the visualization will update as well, and you can begin the transformation process.

The “Ikea Effect”Looking at Figure 1, can you tell at first glance what the initial dashboard is analyzing? Inconspicuous legends and axis labels serve as the only indications that you’re studying infant mortality rates. The user shouldn’t have to guess at what you’re trying to do.

Building successful data visualizations involves striking a balance between giving dynamic options to the viewer and your own design and analysis process. When viewers feel like they do most of the work by interacting with the dashboard

Figure 1: Initial infant mortality dashboard

56 codemag.com

Effective chart options include:

• Bar charts• Line charts• Scatter plots• Box and whisker plots• KPI metrics• Heat maps• Highlight tables

You can see the infant mortality rates by region represented as a pie chart in the upper left-hand corner of Figure 1. This chart presents two big issues when you view it:

• You can’t easily distinguish between the slices of the pie because you have to guess the angles rather than actually measuring the numbers.

• The chart represents the infant mortality rate as a sum of the rates, which can mislead the audience because regions with more data points and complete data will have a bigger slice, even if they have lower infant mortality rates.

The bar chart (Figure 2) does a much more effective job of showing the average infant mortality rate by region (changed from the sum aggregation in the pie chart), and you can easily rank and compare these rates between re-gions directly within the chart.

To change a pie chart into a horizontal bar chart:

1. Move the “Region” dimension to rows.2. Change the chart from a pie chart to a horizontal bar

chart in the Show Me options menu.3. Change the aggregation of the infant mortality rate

from Sum to Average.

You also lose the time dimension this way because you’re mea-suring the average infant mortality rate for all years for coun-tries within each region in Figure 2 rather than in a certain year.

Because you want to measure a time value, you can use a line graph or a bar graph. What if you took this infant mor-

If you buy furniture from Ikea, you assemble it yourself from pre-cut pieces that come in a box that designers planned out and tested ahead of time. Similarly, in dash-boards, you want to analyze the data and plan out the dashboards before passing it off to the user to interact with in a pre-packaged box in the form of a dashboard. If you don’t include a necessary piece or if the sizing doesn’t work, neither you nor the viewer get the desired finished product or result. Much like Ikea furniture comes in a box with pre-cut pieces and instructions for assembly, you want to present your dashboards to the user in a similar manner. Users want you to do the analysis of the com-ponents beforehand, but they want to interact with the data themselves using the instructions you provide. If the pieces don’t fit together or if the instructions don’t make sense, the likelihood that they will embrace this dash-board as their own goes down substantially.

Good Design Isn’t an AccidentHow do you approach designing your best possible version of the dashboard? What do you consider for the job at hand? You want to analyze the data initially to create components for the dashboard that fit together with each other. Much like an Ikea furniture pack, you design and test the pieces to make sure they fit together beforehand.

When you like the way something looks, you can’t always quite explain why. The “why” comes through your decision to strategically choose to apply design principles to it. De-signing an effective dashboard that users embrace interact-ing with is not an accident, but rather a well-planned ap-proach that keeps the user in mind.

Choose the Right Visuals for the Job at HandThe first step in this process is selecting visuals that repre-sent the data correctly, and also effectively communicate the results and trends in the data. There’s no one chart that works for all data and no one data set that works for all charts. I encourage you to experiment with charts within the data visualization application to compare how they represent the data and what visual works best for your in-tended result.

Figure 2: Updated bar chart

Best Practices for Data Visualizations: A Recipe for Success

57codemag.com

To convert from a vertical bar chart into a line chart: In the Show Me options menu, select the line chart. You can see the updated chart in Figure 4.

The world map you saw in Figure 1 represents the infant mortality rate by country, with the color representing the region, and the size of the bubble representing the sum of the infant mortality rate for that country across all years. Notice that the map shows the bubble size as the sum of the infant mortality rates, which misleads the viewer, so you’ll need to update the aggregation to average across all years for each country. You may also find it difficult to distinguish between the sizes of the bubbles or to determine trends or discrepancies within a region because the bubble sizes are small to start with on the dashboard.

I changed the map type to the filled map you see in Figure 5, where the darker colors represent higher infant mortal-ity rates over a two-hundred-year range, and the lighter colors represent lower infant mortality rates. You can also see higher rates concentrated among neighboring countries in sub-Saharan Africa. Notice that the visual automatically dropped the region dimension completely from the map vi-sual. The filled map makes it easier to compare rates be-tween neighboring countries because you can see color dif-ferences much more easily than bubble-sized differences.

To change from a bubble chart (Figure 1) into a shaded filled chart (Figure 5):

1. Select the Show Me option menu and pick the maps chart icon

2. Change the aggregation from Sum to Average for the Color Marks card option.

Now I’m going to tackle the two charts you saw in Figure 1 that show the infant mortality rates by country as a bar chart and the average infant mortality rate by year as a data table by combining them into a single visual rather than two, which makes it easier to easier. You can see some key

tality data by region and put the trends in a bar chart us-ing a time dimension x-axis? You can see the results of this visual in Figure 3, where each region is distinguished within each year on the x-axis with a color as well as a label.

This chart option still poses some issues, including:

• Because you now have both the region and time on the x-axis, the axis becomes very long and difficult to read. Even adjusting the fit, you still need to process a lot of data points.

• If you compare the trends by year for a certain region (say Europe), how can you quickly tell if the infant mortality rate improves from the previous year?

To change from a horizontal bar chart (Figure 2) to a verti-cal bar chart (Figure 3) with time on the x-axis:

1. Movethe Regions dimension to columns and the aver-age Infant Mortality aggregation field to rows, and you now see the chart automatically update.

2. Add Years to the columns in front of the Regions, and you now see a bar chart with a very long x-axis.

3. Take the Regions dimension from the fields and place it on the Color Marks card, and you now see regions in two plac-es of the chart: one for the x-axis and one for the color.

You can’t stack up the region bars in Figure 3 because you want to average rather than sum the rates. Showing the average infant mortality rates by region as a line chart miti-gates the size and readability issues that you encounter for this scenario with bar charts.

The line chart in Figure 4 allows you to easily rank the regions for each year, and you can see the infant mortality rate trends by region because each point in the graph joins to the point for the next year and the previous year and so on. More importantly, you can also easily see that the rates trend downward across all regions, which means that global health outlooks are improving, even if you continue to see disparity among the regions.

Figure 3: A vertical bar chart over time and region


58 codemag.com

for each of these coordinates, it doesn’t technically matter whether you select sum or average as the aggregation.

You also need to ask yourself if you think that the table visual in Figure 6 serves as the most effective way to easily view and analyze the data. Although you may find it nice to more easily see with numeric values how the rates are improving for each country over a two-hundred-year time frame, looking at a lot of numbers without visual cues or assistance can get fatiguing.

You can still use the idea of a table but instead, what if you use a highlight table instead, as you see in Figure 7? This

pieces of information in both graphs, such as easily iden-tifying the countries with lower infant mortality rates and also that the infant mortality rates go down substantially over the last fifty years.

Figure 6 shows a data table summarizing the infant mortal-ity rate with the countries on the row labels, the years in the column labels, and the corresponding values effectively as cell coordinates in the middle. If you looked around the Gapminder Excel file storing this data, you might notice that it looks like the original data set with the countries listed alphabetically in the rows and the years listed chronologi-cally as columns at the top. Because there’s only a single rate

Figure 4: A line chart over time with colors for region

Figure 5: Filled map



3. In the values area (or the Text Marks card), you have an average of infant mortality rate. It doesn’t matter if you use sum or average here because for each row and col-umn coordinate in the data table, you only have a single corresponding value, but it makes the most sense to just select the average aggregation to line up with the other visuals. If you inspected the Excel file, you may remem-ber this is what the data table looks like.

4. To convert to a highlight table, select the highlight table icon from the Show Me menu. If it switches the rows and columns when you convert the visual type, just move them back into the correct positions.

visual shows both the rate as a text value and a color, and you can see that the color effectively illustrates an improved in-fant survival rate in recent years for all countries in this view.

Now change it into a single table (Figure 6) with rows and columns combined with the data in the chart, and then into a highlight table (Figure 7):

1. Move the Year dimension from rows to columns.2. Add the Country dimension to the rows, so that you

now have a data table with years in the column labels and countries in the row labels.

Figure 6: Infant mortality rate by country and year in a single table

Figure 7: Highlight table

Data Aggregation Options

Within data visualization platforms like Tableau, you can aggregate data through functions such as counting, count distinct, sum, average, minimum, or maximum.

Selecting an aggregation option allows you to analyze trends in the data set.

60 codemag.com

2. Convert this new field from a measure to a dimension by right-clicking on the new measure and selecting Convert to Dimension.

3. Now, right-click on this new dimension Year (YEAR), select Create > Bins, and a new dialog box will open up.

4. Use 10 (as in ten years) for the size of the bin and keep the name as Year (Bins).

5. You can now see a little histogram icon next to the field name in the dimensions list (see Figure 10).

6. Now add the new year bin to the data table next to the Year dimension in the column shelf, and remove the Year dimension because you no longer need it.

Sort the LabelsThe highlight table you created (shown in Figure 10) shows the trends by country, where you determine the order of the countries in the labels simply by their alphabetical order. You might find this a helpful set up if you want to easily find Bel-gium in the list for example, but you can’t identify the coun-try with the lowest rates without having to navigate through the list. I think it makes the most sense to put the country with the best outcome since 2010 at the top and rank the rest of the countries as they fall into the subsequent order for infant mortality rates, as you see in Figure 11.

To sort the country order:

1. Go to the last column where the 2010 through 2015 years aggregate and click on the header. You’ll see the sorting icon that looks like a little bar chart appear.

2. Click on the little horizontal bar chart icon once, where you see the highest rate for Angola, then click on it again to see it listed by lowest infant mortality rate

Make the Labels Easy to ReadIf you cut off the label names in a visual, do you expect the viewer to fill in the missing letters and guess the name? In Figure 11, you can’t see the entire country name for the healthiest country. Even if you know to add an “in” to

Notice that the empty values show colors, which you don’t want to see. You’re not done with this visual yet, and it won’t look like this after you finish making modifications to it.

Use Bins to Simplify VisualsNotice in the data table you just created in Figure 9 that be-cause you’re measuring rates over more than two hundred years, you can’t see all the years without having to use the scroll bar. You also already know that you don’t have consis-tent data back to historical periods farther in the past and even across contiguous time spans. Averaging the infant mor-tality rate across ten-year time segments rather than a single year using bins in Tableau creates some noted advantages:

• If a country has gaps in a time range, averaging out the rates within a ten-year segment allows you to smooth out those inconsistencies.

• It also makes it possible to see the entire time range in a single view, as you can see in Figure 8. I chose to use ten years because it allows me to have enough ag-gregated rates within a reasonable time range without missing too many data points.

To create bins for the years and update the highlight table (Figure 8):

1. Add a newly calculated field for the year and enter the formula Year (YEAR) = YEAR([Year]) (see Figure 9).

Figure 8: Updated highlight table with year bins

Figure 9: Years measure calculation



You also add the year filter to the filled map chart as you can see in Figure 15, with the option to select multiple or all years within the view rather than showing the average infant mortality rate across all available years. This allows the viewer to create a custom view of the map based on their selected year, and dynamically update the colors on the map that represent the infant mortality rates averaged across the selected year or years.

To add the Years filter:

1. Go into the map worksheet and put a filter on the filters shelf.

2. Select all the years, and then select to show the filter.3. Setting this up as a drop-down list has many benefits,

including taking up less space. I’d also recommend the drop-down list because you can see that the single list takes up a great deal of space and doesn’t do much for you.

The drop-down list is shown in Figure 16.

I’ll revisit the filter options in much more detail later when you set up the dashboard.

Applying Color EffectivelyIn Tableau and many other data visualization platforms, the application automatically assigns a color palette to the chart. However, using the default option may not present the best color scheme. To leverage color effectively, you want to limit the color scheme you use (as contradictory as that sounds) because strategically applying just a few colors not only makes the visuals easier to read, but also allows users to focus on key trends and numbers.

Color-blindness is a visual disability that affects the eyesight abilities of one in ten men and a smaller group of women. You may be color-blind yourself or work with someone who is, or you may not even realize that this impairment is among your

complete “Liechtenstein,” that may not be an option if you have more than two hundred or so options (country names) to guess the finished name outcome, which Figure 12 will spare you the pain of having to do.

To expand the size of the country column:

1. Hover over the border between the country name and the values section until a double arrow appears.

2. Drag the arrow until the column width expands to com-fortably fit the country names in the immediate view, as you see in Figure 12.

You can also wrap the text fields, but I wouldn’t recommend that approach for the highlight table because it increases the height of these wrapped fields and throws off the sizing for the entire visual, and can make it more difficult to read.

Use Filters Within VisualsThe line chart you created in Figure 4 looks like colored spaghetti lines over a two-hundred-year time frame, which can make it slightly difficult to analyze. Because incomplete data drives much of this fluctuation, if you want to use this line chart visual as an effective analysis tool, it seems sen-sible to filter down the chart to only show the trends from 1950 and onward, as you see in Figure 13.

To adjust the line chart:

1. Go to Sheet 1 and add Years to the Filters Marks card.2. Select Years and the condition as greater than or equal

to the 1st of January in 1950, as you see in Figure 14.3. Filter out the null region from the table (remember

these are countries that you don’t have a matching region for because they are so small) by dragging the Region dimension to the Filter card.

4. Also filter out nulls by excluding them from the data by clicking on the null values at the bottom of the chart and selecting Filter data.

Figure 10: 10-year bins dialog box

Figure 11: A table sorting by lowest mortality rate in 2010 to 2015 to highest rate

62 codemag.com

You update the filled map in Figure 15 to use a diverging Orange-Blue color scale with a very light gray serving as the color representing the midpoint. Although blue represents lower infant mortality rates and orange represents high infant mortality rates, the key part about setting up this color scale is selecting what value to use as the midpoint (Figure 20).

In 2015, the country of Angola experienced the highest infant mortality rate of all the countries, at 96 deaths in the first two years of a baby’s life for every 1000 live births. I decided to set the center or midpoint of the color scale to this rate because it allows the viewer to put in context how historical infant mortality rates across all countries compare to today. It allows the viewer to see that although unfortunately Angola still lags behind other countries in population health in today’s world,

colleagues and peers. If you look at the diagram in Figure 17, you can see how orange and blue navigate around issues with vision impairment pretty easily. Green and red, on the other hand, look the same for those with color-blind impairments. Also, remember that, like, many other disabilities, it occurs in a spectrum rather than an absolute impact.

Accounting for those with color-blindness can serve as a starting point to selecting your own color palette. Tableau has a color palette that you can see in Figure 18, specifically offering ten color options to choose from.

You want to give a unique color for each region so that even those who are colorblind can distinguish the regions in the line charts, as seen in Figure 19.

Figure 12: Adjusting column widths for country label names

Figure 13: Filtered line graph visual



You can then apply this same Orange-Blue color scale pal-ette to the highlight table from Figure 12 to better analyze two aspects of the infant mortality rate data: rankings be-tween countries and improved health outcomes over time. You use the same midpoint as for the filled map with An-gola’s 2015 infant mortality rate of 96.

As you see in Figure 22, although all the countries in the most recent time frame of 2010 to 2015 have better infant survival rates than Angola (indicated with the blue cell color), when you look back at historical trends for these rates, some of the most developed countries today, like Japan and Singapore, had higher infant mortality rates only fifty years ago than Angola does today. Even developed European countries, like France, Germany, and Austria, also have much lower rates. Although you may lament about the difference between the

it still represents an improved infant survival from historical infant mortality rates across a two-hundred-year time span as you see in Figure 20, including a lower mortality rate than a developed country, like Germany, saw historically.

To change the colors on the filled map (Figure 21):

1. Go to Sheet 2 and click on the Color Marks card.2. A dialog box opens up where you select the Orange-

Blue color scheme, and then select to reverse the col-ors so that blue indicates lower rates and orange indi-cates higher rates (Figure 20).

3. To change the midpoint that the color scale uses, go to the Advanced options and put a check mark next to the center options, where you can now type in 96 as the value to center the color scale.

Figure 14: Setting up a Year filtering condition

Figure 15: Adding years filter to map

64 codemag.com

how all of these visuals come together in a single consolidated dashboard with which the viewer can dynamically interact. After making several strategic design and formatting decisions for the visuals, you end up with the dashboard you see in Figure 23.

You want the dashboard to line up in a way that’s easy to read by purposely deciding:

• To make all of the chart big enough to view without scrolling or panning in.

• To place visuals, such as the highlight map, to the side to effectively create a border.

• To use a maximum of three large visuals to avoid clutter.

You now need to make adjustments to get everything to fit together after updating each visual to get the updated dashboard, as you see in Figure 24.

To make these changes:

1. Remove Sheet 4 from the dashboard by selecting the visual so you can see a box around it and then click on the X in the upper right-hand corner.

2. Take the Region key on the upper right-hand side and drag it over so that you can see it on the top of the line chart, and then adjust it to make it smaller. You can also adjust the width of the region names by clicking into the legend and when you see an arrow pop up, drag the width of the text field to where you want.

3. Now drag the infant mortality color gradient to under-neath the map and to the left of the highlight table. Drag the top of the scale up to put the label above it.

4. Remove the layout container from the right-hand side of the screen where the legends used to be by clicking on it, then clicking on the X.

5. To make the highlight table bigger so you can see it in a bigger picture on the dashboard, select the con-tainer and pull it over until it almost takes over half the screen on the right-hand side.

6. To get the highlight table to fit on an entire view, select the chart, then click on the down arrow at the edge, select Fit and choose Entire View.

healthiest and the sickest countries today, you need to re-member that these outcomes still represent a much healthier world for everyone now than less than half a century ago.

In highlight table, to update the color scheme (Figure 22):

1. Click on the Color Marks card to open up the selection options.

2. Select Orange-Blue diverging, reverse the color scheme, and then, on the Advanced options, select Center and put in 96, which is the infant mortality rate for Angola in the most recent year, 2015.

3. To remove the extra blue spaces for the null values, right click on the mortality rates with the colors and select Filter. Then select the Special options by choosing the button on the far right and selecting Non-null values.

4. You also want to remove the text values, so select the second infant mortality aggregation with the text icon next to it and delete it.

If you run into problems where Tableau won’t let you remove the color from the null cells (I ran into this a time or two when testing), I suggest trying to recreate the visual again or going back to previous steps and removing the null values at that point (you may have to test different options). If the visual flips the rows and columns but removes the nulls, you can easily flip it back into the positions you want.

Placing the Visuals on the DashboardSo far, you’ve updated the chart type and formatting of indi-vidual visuals, but you now need to take a step back and see

Figure 16: Setting up a Years filter on the filled map:

Figure 17: Normal vs. color-blind vision


65codemag.com

We Prefer to Eat Pie Rather than See It in a Visual

Although pie chart visuals allow you to see the rough breakdown of the “pieces of the pie,” you can’t easily compare the size of the actual pieces within just a few seconds. Can you tell more than which piece of the pie chart represents the largest value?

If you want to know the totals and rank of the aggregated numbers and you can’t easily do that analysis, you need to use another chart. You can easily perform this endeavor in a bar chart. If you’re analyzing two numbers, you can put the totals and their percentages in a small table or even a KPI metric, which saves space and becomes easy to read for a small amount of data.

option, and the floating option frees up even more space to use for the rest of the visuals on the dashboard.

You’ll want to add:

• A title to the entire dashboard to explicitly tell the viewers what data they’re working with and give them a nudge to indicate that falling infant mortality rates represent improved health outcomes, and also to give them context on the trends and encourage them to learn more.

• Titles on individual visuals, where needed, to provide context on what they see and how to potentially think about the results and interact with the data.

Put Titles and Labels on the DashboardAlthough you already made strategic design and formatting decisions to improve the dashboard, you still need to effec-tively label these components or visuals so that the viewer understands relatively quickly exactly what each component represents, as you see in Figure 25. Although you may know what Sheet 1 does because you designed the visual, you can’t assume that the viewer does as well.

You can also add the year filter to the map chart that allow you to see the infant mortality rates around the world for the se-lected year or years. This allows the user to dynamically change their own map view in the dashboard themselves, as you saw in Figure 25. The drop-down list takes up less space than the list

Figure 18: Color-blind palette

Figure 19: Updating region key colors


66 codemag.com

2. Make the title in size 14 font and the subtitle details in size 10.

3. Add names to the visuals. Double click on the title for Sheet 1 and in the dialog box, enter Since 1950, infant mortality trends across all regions trend downward, and set the font size to 12.

4. Next, double click on Sheet 2 and delete all the text in the dialog box so you no longer have a title on the map.

5. Double click on Sheet 3 to change the name of the highlight table to All countries show improvements in the infant mortality rates, from the healthiest countries to the sickest and set the font size to 12.

6. Select the map visual container, then click on the down arrow on the outside frame, select Filters, and select

• The Years filter that enables you to see different views of the filled map by selecting the years you can see.

Determine whether you can remove some titles, too, such as the map title, because you already know it’s a map

The steps to add titles are:

1. In the Dashboard 1 tab, select the Dashboard menu at the top, select Show Title where you can input the name Infant Mortality Rates Trend Downwards for ALL Coun-tries Over the Last Half Century and the subtitle Gap-minder data helps us prove that the world is getting better for almost everyone (see Figure 26).

Figure 20: Orange-Blue diverging color palette

Figure 21: Diverging color palette filled map


67codemag.com

down, which indicates an improved population health out-look. Similar to the way you set up the blue-orange color scale for the filled map and highlight tables in Figure 21 and Figure 22 with the highest mortality rate in 2015 as the midpoint in the diverging color scale, you can also use a reference line as another way for the viewer to analyze these rate trends. In 1950, you can see that the healthiest region, Europe, had an infant mortality rate of 58.9 as you see in Figure 27. By setting a constant reference line at this point on the y-axis, you can see that although other regions may lag behind Europe in terms of relative improved health outcomes, you can use this number as a benchmark to show what you could call a time delay in this trend

the Year of Year from the list, where you now see the filter on the far right.

7. Select this filter container, click on the down arrow, and choose Multiple Values (dropdown). Then click on the down arrow again, select Floating, which means that you can move the filter over the map to select the year, and you can drag it over to the bottom of the map where it doesn’t directly sit on top of any countries in the map.

Adding Elements of AnalysisIn the line chart in Figure 19, you saw that since 1950, average infant mortality rates across all regions trend

Figure 22: Blue-orange heat applied to highlight table

Figure 23: The updated dashboard


68 codemag.com

how to enable the user (the customer) to experience the “Ikea Effect” discussed earlier, where they do most of the work by in-teracting with the dashboard, but still feel that their investment in using the dashboard was worthwhile. Like a furniture pack with components ready for assembly, as the developers of the dashboard, you analyzed, measured, and packaged the pre-made components for the customer before they even receive them.

Now you need to communicate the instructions for how to assemble the product by telling the user how to interact with the visuals within the dashboard. You can’t assume that because you know how to interact with the data that they will as well. We need to provide clear instructions that guide them in how they can change filters or click on coun-tries in the map to change the dashboard view.

To help the viewer navigate the dashboard and make the most of using the dashboard, you should use nudging techniques to give them instruction prompts for how to use the visuals. These nudging techniques appear as instructions in the visual or fil-ter title to gently guide them with their selection options and encourage them to explore the dashboard. You want to avoid wordy instructions or difficult procedures to follow. You need to simply tell the viewer what you want them to do, without it coming across in an authoritarian way. Examples include nudg-ing instruction cues that you see in Figure 31 include:

• Adding the Gapminder logo by selecting on the Dash-board tab of the pane on the far left-hand side of the screen, and then selecting Image from the options at the bottom, where a new container opens up in which you can select the image path in the dialog box.• The logo now appears in the dashboard but looks

strange because of the position it currently resides in, so you need to highlight the container and move it to the top left-hand corner of the screen before the title details. This can be quite tricky, so put the logo in with a container and move the dashboard title into the blank container.

To add a reference line to the line chart:

1. Click on the y-axis and right click to Add reference line.2. Select the entire table, select a constant line, enter the

value of 58.9, choose to see no label, and then use a thick dashed line, as you see in Figure 28.

Use Tooltips as a Hidden Design WeaponTooltips allow you to increase the capabilities within inter-active data visualization applications because you can hide some of the information away from the immediate view of the user, but they pop up when the user scrolls over the relevant data points in a visual. I sometimes think of it like a third dimension that you can add to a two-dimensional dashboard. You can add data to tooltips that you don’t see in the visual as well. You can also customize the wording and structure of the tooltips, as seen in Figure 29.

In the highlight table, I find it difficult to read the row and col-umn headers in the visual because there are so many of them in a small space. By pushing the details into the tooltips, as you see in Figure 30, you can ultimately format a clean visual without compromising the design or details behind it.

To edit the text and values within the tooltips:

1. Click on the Tooltips Marks card that opens up a new dialog box on the Sheet 1 tab for the line chart.

2. Edit the tooltip to refine and summarize what you want to say (you can type in the tooltip box to rename the labels or create sentences). Change the year for the bins to Decade and delete Avg from the mortality rate de-tails. You can also change the details in the map to make them easier to scroll over.

Creating Interactivity Through Instruction PromptsBusinesses decide to leverage data visualization tools like Tab-leau because of the interactive capabilities that enable the end users to explore and analyze data trends. You want to think about

Figure 24: Dashboard with fitted visuals


69codemag.com

in the highlight table. Make this addition font size 8 so the users can see it, but it’s not too prominent.

2. In the filter for the map, double-click on the year filter, and change the title to say Select year to see trends in map. You can also double-click on the legend below to update the Avg in the legend title to Average.

3. Now in the highlight table, double-click on the title and add the text Hover over a cell to see the country, year, and infant mortality rate. Again change the font size to 8.

4. You also want to remove the labels from the highlight table because you can’t even read them in the first place. Right-click on both the column labels and the

• Selecting a year from the filter to change the map view.

• Clicking on a region to filter the entire view.• Thumbing over a color cell block in the highlight table

to see more details.

To add instruction prompts and the final formatting details to the dashboard (Figure 29):

1. In the title of the line chart, double-click to edit the title, and, underneath the title, add the instructions by entering the text Select a Region from the legend to see the related countries in the map and their trends

Figure 25: Dashboard titles and visual titles

Figure 26: Adding dashboard titles


70 codemag.com

the chart to see the entire logo as well as the other visuals without them pushing one another out.

7. Now you need to set up the interactivity between the charts so the instruction prompts work. To do so, go to Sheet 2 and make sure to add the Region to the Details Marks card, so that the region legend can filter this map. Do the same to the highlight table by going to Sheet 3 and adding the Region to the Details Marks card.

8. Now go back to the dashboard and select the line chart to highlight its container. Select the filter icon to set up this chart as a filter for the other charts.

row labels separately and choose to remove labels for both of them.

5. Also remove Sheet 4 because the dashboard doesn’t use it as a visual.

6. You can add the Gapminder logo to the dashboard by first putting an empty container (found on the bottom left) and dragging it onto the canvas. Push it into the position next to the dashboard title so they share the same space. Now keep this layout container selected and choose Im-age from this same selection option box, and point to the location where you saved the Gapminder logo. Now adjust

Figure 27: Updated line chart with reference line

Figure 28: Reference line dialog box conditions



Now that you’ve set up the dashboard, you can put yourself in the position of the user and test it out. In Figure 32, you see Asia selected as the region in the line chart legend. This creates a new view of the data that highlights key trends and analysis for the Asia region. You can also select a single year in the map to see the Asian countries’ health for that year. Choosing Asia filters all three charts, and you can see in the highlight table the disparity between the rankings within the Asian countries.

9. Make sure that you can see all your visuals. The best way to do this is to make sure to leave enough white space between the legend fields and the visuals, and then upload to Tableau Public Online to make sure it fits as you anticipated. If it doesn’t, go back to the dashboard and make adjustments based on what you saw pushed out of place. This make take a little bit of practice!

Figure 29: Tooltip dialog box

Figure 30: Highlight table with updated tooltips

72 codemag.com

Mitigating the Blind Spots of Those with Color-Blindness

Roughly one in 10 people experience visual color-blindness and you need to be mindful of them. Some of you reading this are color-blind.

Although it can seem overwhelming to create a color palette specificly for color blind viewers, you can follow a few simple rules to avoid falling into problems. As a rule of thumb, I avoid green and red together, and instead substitute orange and blue for heat maps, for example.

I encourage you to take these best practices techniques and use some creativity to set up and experiment with vi-sual options, then see how the users respond and go from there!

You maximize your dashboard’s influence by taking the ini-tial Tableau dashboard and strategically making changes to update the design, formatting, and interactivity. This, in turn, increases the user’s understanding and interac-tions with the dashboard interface and the likelihood they will use it by letting them take ownership in changing the views. Designing an effective dashboard gives flexibility.

Figure 32: Filter entire dashboard for Asia region

Helen Wall


Figure 31: The final dashboard with instruction prompts

73codemag.com What Captain Marvel Can Teach Us about Management

v

Nov/Dec 2019Volume 20 Issue 6

Group Publisher Markus Egger

Associate PublisherRick Strahl

Editor-in-ChiefRod Paddock

Managing EditorEllen Whitney

Content EditorMelanie Spiller

Editorial ContributorsOtto DobretsbergerJim DuffyJeff EtterMike Yeager

Writers In This IssueSumeya Block Sara ChippsKate Gregory Julie LermanAshleigh Lodge Sahil MalikJeannine Takaki-Nelson Dian SchaffhauserCraig Shoemaker Helen Wall

Technical ReviewersMarkus EggerRod Paddock

ProductionFranz WimmerKing Laurin GmbH39057 St. Michael/Eppan, Italy

PrintingFry Communications, Inc.800 West Church Rd.Mechanicsburg, PA 17055

Advertising SalesTammy Ferguson832-717-4445 ext [email protected]

Circulation & DistributionGeneral Circulation: EPS Software Corp.Newsstand: The NEWS Group (TNG) Media Solutions

SubscriptionsSubscription ManagerColleen [email protected]

US subscriptions are US $29.99 for one year. Subscriptions outside the US are US $49.99. Payments should be made in US dollars drawn on a US bank. American Express, MasterCard, Visa, and Discover credit cards accepted. Bill me option is available only for US subscriptions. Back issues are available. For subscription information, e-mail [email protected].

Subscribe online atwww.codemag.com

CODE Developer Magazine6605 Cypresswood Drive, Ste 425, Spring, Texas 77379Phone: 832-717-4445Fax: 832-717-4460

CODE COMPILERS

(Continued from 74)

should consider the kind of example you’re set-ting for your daughter.”

And so Maria heads off with Carol and the rest of the gang. That’s a clear-cut case of a manager allowing her people to “lead up,” to make the big decision. For growth to happen, those at every level need to exert influence on the people above them in the organizational chart and managers can help them do that by responding favorably to their ideas.

When You Fall Down, Get Back Up Failure hurts. In Carol’s case, that includes crash-ing a go-kart as a kid, falling off a climbing rope as a young woman, and putting up with another pi-lot—this one male—in a bar, over a beer, telling her she’s a “decent pilot, but [she’s] too emotional.”

Later on, these scenes re-emerge, but this time we get to see how those scenes play out, with Carol picking herself up after every failure. Sure, it’s a montage just like the ones Nike feeds us, but those commercials get a bazillion views because they work. Managers don’t give up; they get up.

“Failure, failure, failure, failure,” said Staci. “You keep getting up and that will get you closer toward your goal.”

Don’t Let Anybody Tie Your HandsThroughout Captain Marvel, Carol is handicapped from using her full powers. In that opening fight scene, she complains to Yon-Rogg that he won’t let her use her special energy waves, and he insists that if she were ready to apply them, she’d also be able to knock him down without them. Later, when she confronts the head of her old planet, the Supreme Intelligence, she realizes that she’s “been fighting with one arm tied behind my back” and yanks out the chip attached to her neck that they’ve been using to control her. Finally, in a scene with her former mentor, Yon-Rogg tosses away his weapon and eggs her on, encouraging her to just “turn off the light show” and fight him arm to arm. Her response as she blasts him away: “I have nothing to prove to you.”

The best managers don’t force their people to act in ways that minimize their powers. They em-brace the brilliance, help them turn their flaws into good qualities and allow them to remain true to themselves.

Yes, Sometimes You Have to Do the Dirty JobsAfter saving C-53, otherwise known as Planet Earth, it’s only right that Carol and Fury wash the dishes. Even superheroes need to help out

with household chores. And like them, managers should always look for opportunities to do the grunt work, if only to remember how hard and mind-numbing it can sometimes be.

Plus, said Suzanne, an administrative services officer for a county government (and my wife), “Doing tasks with people can sometimes lead to greater strength of relationships for the people you need to have follow you.”

Compile Your PlaylistHeck, yeah, few jobs are as hard as managing people. But music can help you, like nothing else, push through the limitations, recommit to the mission, and inspire your team to keep up. What’s on Captain Marvel’s playlist? For a few, try Heart’s “Crazy on You,” No Doubt’s “Just A Girl,” and Des’ree’s “You Gotta Be.”

With tunes like that and a little work honing that inspirational stance, Danvers, I’d think about signing onto your team.

Dian [email protected]

Her management days over, these days, Dian Schaffhauser prefers to go it alone as a

freelance reporter covering business and technology from Northern California.

codemag.com

Carol expresses curiosity about the “communica-tor” and Fury reassures her that he’s only texting his mom.

When they finally do escape, Carol comes under attack again from what we believe at first to be a S.H.I.E.L.D. team; Fury has led them right to her. He quickly realizes, however, that appearances can be deceptive and rejoins Carol in her attempt to leave the bunker, this time via fighter jet. They barricade themselves on the bunker’s flight deck, and moments from certain doom, Carol holds her hand out. She wants Fury to give her the com-municator. Now. As she tells him, “You obviously can’t be trusted with it.”

“She jumps right on it,” observes my friend Sta-ci, firefighter and forest aviation officer. “She doesn’t let it fester.” She takes care of what she views as a problem immediately. Good managers don’t avoid conflict.

Likewise, they don’t hold grudges. That takes too much energy.

Let Your People Lead UpWhen Carol’s friend Maria is invited to join the mission as a co-pilot to track down Dr. Lawson’s ship in a jerry-rigged plane, she begs off. As a single mom, she reminds Carol that she can’t leave her daughter Monica. “There’s no way I’m going, baby,” she says. “It’s too dangerous.”

But Monica won’t have any of that. “Testing brand new aerospace tech is dangerous. Didn’t you use to do that?” she suggests. Besides, she adds, she’ll stay with her grandparents.

Maria turns to Carol, who’s listening in on the conversation. “Your plan is to leave the atmo-sphere in a craft not designed for the journey, and you anticipate hostile encounters with a technologically superior foreign enemy. Correct?”

Carol doesn’t say a word; just shrugs. But Monica speaks up: “That’s what I’m saying. You have to go.” Besides, she adds, “I just think that you

After all, who of us alive will ever forget the out-stretched arms and taunting stance the purple-haired Rapinoe displayed every time she scored dur-ing the Women’s World Cup? (Alas, the movie ap-peared months before the American win in France.)

Pose aside, however, there was plenty more signal-ing that “Vers” was somebody worth following. Re-cently, a small corps of friends gathered in front of my flatscreen to rewatch the movie (they’d all seen it in the theater), finish off a couple of bottles of California vino and share what struck them about Captain Marvel’s management qualities, viewed from their own perspectives as managers.

Warning! Yes, the rest of this column contains a mighty collection of what some might consider spoilers, but that I prefer to call “preparatory notes.” I enjoyed my second viewing of Captain Mar-vel more than my first, not only because I knew what was coming but also because I understood things better. (The wine helped too.) No need to thank me.

It’s Not Always Bad to Be Driven by Your EmotionsDuring a sparring match, when mentor Yon-Rogg has our hero Carol pinned to the floor and her fists begin to sizzle orange in frustration, he tells her, “There is nothing more dangerous to a warrior than emotion.” (Oh, how often have we heard, “Don’t let your emotions get the best of you”?) Yet leaders know that the passion behind the emotion can drive them and their staff to keep going when things are looking down. On top of that, having a grasp of emotional intel-ligence—being able to understand your own emo-tions and influence the emotions of those around you—will get you much further than sheer tech-nical skill. Understanding how people work and what motivates them emotionally is critically im-portant to pulling them in to help you achieve your goals and is far more effective than sending out yet another directive-by-Slack.

Stop ApologizingCarol has just spent the last five minutes chasing down a continually shape-shifting Skrull through a moving Metro Rail train. Imagine trying to hunt

down a prey that can take the form of anybody around you. How do you identify the enemy? But because she’s our hero, she has an innate ability to pick out the bad guy, which becomes more ob-vious when he blasts his way through the top of a railcar and they take the fight up on the roof. As 1990s Los Angeles flashes by in the background, Carol gives as good as she gets until they head into a tunnel. Suddenly, she can’t see anything until they stop at the station.

She leaps out of the car to join the surge of people disembarking and spies the shapeshifter walking away, still in the image of the man whose form he last took. She grabs him from behind and he turns, ready to receive the blow. But one look tells her this is the authentic person, not the form stolen by the shapeshifter. Her fist drops and she moves on—with no apology.

Later on, after she’s insulted Tom, the guy who lives next door to her BFF Maria, the same thing happens.

As my friend Donna, a college instructor, pointed out, “Look at that! She didn’t apologize. Women apologize far too freakin’ much.”

Sure, a good manager is capable of saying, “I’m sorry,” but not every time she makes a blunder. So when does an apology pass Carol’s lips? Only when she finds out just why Skrull General Talos has been trying to capture her. And then it’s an authentic apology, born of self-awareness. When a manager uses “sorry” too much, it loses impact and can be perceived as weakness.

Jump on the Problem Carol and S.H.I.E.L.D. agent Fury are in a hid-den government mountain bunker hunting down information about the mysterious Dr. Lawson, whom Carol believes holds the key to stopping the Skrulls from taking over the universe. When the pair tell officials why they’re there, they’re locked into a nameless office. After a half-heart-ed attempt to get free, Fury pulls out his “state-of-the-art two-way pager” and sends a furtive message to his work partner, Agent Coulson: “Detained with target. Need backup.” (Continued on page 73)

What Captain Marvel Can Teach Us about Management74

What Captain Marvel Can Teach Us about ManagementToo bad the timing wasn’t better for Captain Marvel and Megan Rapinoe to coordinate. Otherwise, instead of standing like Tinker Bell as she stared down Ronan the Accuser, with his take-no-prisoners Kree military force and shipload of warheads, Carol Danvers’ posture would have been a bit more, well, managerial.

MANAGED CODER

www.codemag.com/framework

mailto:[email protected]

www.jetbrains.com/resharper

Documents

What’s Your - library.ashoka.edu.inlibrary.ashoka.edu.in/wp-content/uploads/2020/01/... · lets me read and write upside-down with great facility (I can’t write in cursive upside-down,