Socorro Documentation - Read the Docs › pdf › socorro › v8 › socorro.pdf2.1Socorro VM (built with Vagrant + Puppet) You can build a standalone Socorro development VM - see

Socorro DocumentationRelease 2

Mozilla

June 18, 2014

Contents

1 Overview 31.1 Socorro Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Socorro UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Installation 52.1 Socorro VM (built with Vagrant + Puppet) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Automated Install using Puppet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Manual Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Collector 113.1 Collector Python Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Common Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Collector Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Processor 134.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Middleware API 155.1 API map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.3 Crashes Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.4 Crashes Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.5 Crashes Paireduuid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.6 Crashes Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.7 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.8 Crash Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.9 Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.10 Priorityjobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.11 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.12 Products Builds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.13 Signature URLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.14 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.15 List Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.16 Versions Info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.17 Forcing an implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Socorro UI 396.1 Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

i

6.2 Adding new reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 UI Installation 437.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.2 Trouble Shooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

8 Server 478.1 The Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

9 crontabber 499.1 crontab runs crontabber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.2 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.3 Own configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509.4 App names versus/or class names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519.5 Manual intervention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519.6 Frequency and execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.7 Timezone and UTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.8 Writing cron apps (aka. jobs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

10 Throttling 5510.1 throttleConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

11 Deployment 5711.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5711.2 Outage Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

12 Development Discussions 5912.1 Coding Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.2 New Developer Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.3 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7012.4 Standalone Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8612.5 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8712.6 Crash Repro Filtering Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8812.7 Disk Performance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8912.8 Dumping Dump Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9112.9 JSON Dump Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9312.10 Processed Dump Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9612.11 Report Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9712.12 Code and Database Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9912.13 Out-of-Date Data Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10612.14 Database Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10712.15 Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11312.16 Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11312.17 Tables used primarily when processing Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11312.18 Tables primarily used during data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11512.19 Tables primarily used for materialized views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11612.20 Dimensions tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11612.21 View tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11712.22 Bug tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11912.23 Meta data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12012.24 Database Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12012.25 Common Config . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12112.26 Populate ElasticSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

13 PostgreSQL Database 127

ii

13.1 PostgreSQL Database Tables by Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12713.2 Manually Populated Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12713.3 Tables Receiving External Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12713.4 Automatically Populated Reference Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12813.5 Matviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12813.6 Application Management Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12913.7 Deprecated Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12913.8 PostgreSQL Database Table Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13013.9 Raw Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13013.10 Normalized Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13113.11 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13313.12 Matviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13513.13 Note On Release Channel Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13713.14 Application Support Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13713.15 Creating a New Matview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13813.16 Do I Want a Matview? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13813.17 Components of a Matview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13913.18 Creating the Matview Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13913.19 Database Admin Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14213.20 MatView Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14213.21 Schema Management Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14613.22 Other Administrative Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14813.23 Custom Time-Date Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14813.24 Database Misc Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15013.25 Formatting Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15013.26 API Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15113.27 Populate PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

14 How generic app and an example works using configman 15514.1 The minimum app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15514.2 Connecting and handling transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15514.3 What was the point of that?! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

15 Writing documentation 15715.1 Installing Sphinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15715.2 Making the HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15715.3 Making it appear on ReadTheDocs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15715.4 Or, just send the pull request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15715.5 Or, just edit the documentation online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

16 Indices and tables 159

iii

iv

Socorro Documentation, Release 2

The current focus of Socorro development is to make a server which can accept crash reports from Firefox. Seehttp://wiki.mozilla.org/Breakpad for more information.

Socorro mailing list https://lists.mozilla.org/listinfo/tools-socorro

This documentation is available on Github, and if you want to, feel free to clone the repo, make some changes in afork and send us a pull request.

Contents:

Contents 1

http://wiki.mozilla.org/Breakpad

https://lists.mozilla.org/listinfo/tools-socorro

https://github.com/mozilla/socorro/tree/master/docs


2 Contents

CHAPTER 1

Overview

The Socorro Crash Reporting system consists of two pieces, the Socorro Server and the Socorro UI.

1.1 Socorro Server

Server is a Python API and a collection of applications and web services that use the API. The applications togetherembody a set of servers to take crash dumps generated by remote clients, process using the breakpad_stackdumpapplication and save the results in HBase. Additional processes aggregate and filter data for storage in a relationaldatabase.

The server consists of these components:

• Collector

• Hadoop/HBase

• Processor

• [[SocorroRegistrar]]

• [[SocorroWebServices]]

1.2 Socorro UI

Socorro UI is a Web application to access and analyze the database contents via search and generated reports.

1.3 Data Flow

Crash dumps are accepted by the Collector, a mod_wsgi application running under Apache. Collector stores thecrashes into HBase.

Using Hadoop jobs, the crash dumps in HBase are converted into searchable json files using Processor.

The Processor s are also long running applications that live on Hadoop processing nodes. They accept tasks frommap reduce jobs and employ the stackwalk_server? to convert crashes into json files stored back into HBase. Filteringthrough these converted crashes using the Throttling rules initially applied by the Collector.

The Socorro UI allows developers to browse the crash information from the relational database. In addition to beingable to examine specific individual crash reports, there are trend reports that show which crashes are the most commonas well as the status of bugs about those crashes in Bugzilla.

3


Next Steps:

Installation

4 Chapter 1. Overview

CHAPTER 2

Installation

2.1 Socorro VM (built with Vagrant + Puppet)

You can build a standalone Socorro development VM - see Setup a development environment for more info.

The config files and puppet manifests in ./puppet/ are a useful reference when setting up Socorro for the first time.

2.2 Automated Install using Puppet

It is possible to use puppet to script an install onto an existing environment. This has been tested in EC2 but shouldwork on any regular Ubuntu Lucid install.

See puppet/bootstrap.sh for an example.

2.3 Manual Install

2.3.1 Requirements

Breakpad client and symbols

Socorro aggregates and reports on Breakpad crashes. Read more about getting started with Breakpad.You will need to produce symbols for your application and make these files available to Socorro.

• Linux (tested on Ubuntu Lucid and RHEL/CentOS 6)

• HBase (Cloudera CDH3)

• PostgreSQL 9.0

• Python 2.6

2.3.2 Ubuntu

1. Add PostgreSQL 9.0 PPA from https://launchpad.net/~pitti/+archive/postgresql

2. Add Cloudera apt source from https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onUbuntuSystems

5

http://code.google.com/p/google-breakpad/wiki/GettingStartedWithBreakpad

http://code.google.com/p/google-breakpad/wiki/LinuxStarterGuide#Producing_symbols_for_your_application

https://launchpad.net/~pitti/+archive/postgresql

https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onUbuntuSystems

https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onUbuntuSystems


3. Install dependencies using apt-get

As root:

apt-get install supervisor rsyslog libcurl4-openssl-dev build-essential sun-java6-jdk ant python-software-properties subversion libpq-dev python-virtualenv python-dev libcrypt-ssleay-perl phpunit php5-tidy python-psycopg2 python-simplejson apache2 libapache2-mod-wsgi memcached php5-pgsql php5-curl php5-dev php-pear php5-common php5-cli php5-memcache php5 php5-gd php5-mysql php5-ldap hadoop-hbase hadoop-hbase-master hadoop-hbase-thrift curl liblzo2-dev postgresql-9.0 postgresql-plperl-9.0 postgresql-contrib

2.3.3 RHEL/Centos

Use “text install” Choose “minimal” as install option.

1. Add Cloudera yum repo from https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onRedHatSystems

2. Add PostgreSQL 9.0 yum repo from http://www.postgresql.org/download/linux#yum

3. Install Sun Java JDK version JDK 6u16 - Download appropriate package fromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html

4. Install dependencies using YUM:

As root:

yum install python-psycopg2 simplejson httpd mod_ssl mod_wsgi postgresql-server postgresql-plperl perl-pgsql_perl5 postgresql-contrib subversion make rsync php-pecl-memcache memcached php-pgsql subversion gcc-c++ curl-devel ant python-virtualenv php-phpunit-PHPUnit hadoop-0.20 hadoop-hbase daemonize

5. Disable SELinux

As root: Edit /etc/sysconfig/selinux and set “SELINUX=disabled”

6. Reboot

As root:

shutdown -r now

2.3.4 Download and install Socorro

Determine latest release tag from https://wiki.mozilla.org/Socorro:Releases#Previous_Releases

Clone from github, as the socorro user:

git clone https://github.com/mozilla/socorrogit checkout LATEST_RELEASE_TAG_GOES_HEREcd socorrocp scripts/config/commonconfig.py.dist scripts/config/commonconfig.py

Edit scripts/config/commonconfig.py

From inside the Socorro checkout, as the socorro user, change:

databaseName.default = ’breakpad’databaseUserName.default = ’breakpad_rw’databasePassword.default = ’aPassword’

If you change the password, make sure to change it in sql/roles.sql as well.

2.3.5 Run unit/functional tests, and generate report

From inside the Socorro checkout, as the socorro user:

6 Chapter 2. Installation

https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onRedHatSystems

https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onRedHatSystems

http://www.postgresql.org/download/linux#yum

http://www.oracle.com/technetwork/java/javase/downloads/index.html

https://wiki.mozilla.org/Socorro:Releases#Previous_Releases


make coverage

2.3.6 Set up directories and permissions

As root:

mkdir /etc/socorromkdir /var/log/socorromkdir -p /data/socorrouseradd socorrochown socorro:socorro /var/log/socorromkdir /home/socorro/primaryCrashStore /home/socorro/fallbackchown apache /home/socorro/primaryCrashStore /home/socorro/fallbackchmod 2775 /home/socorro/primaryCrashStore /home/socorro/fallback

Note - use www-data instead of apache for debian/ubuntu

Compile minidump_stackwalk


make minidump_stackwalk

2.3.7 Install socorro


make install

By default, this installs files to /data/socorro. You can change this by specifying the PREFIX:

make install PREFIX=/usr/local/socorro

2.3.8 How Socorro Works

There are two main parts to Socorro:

1. collects, processes, and allows real-time searches and results for individual crash reports

This requires both HBase and PostgreSQL, as well as the Collector, Crashmover, Monitor, Processor andMiddleware and UI.

Individual crash reports are pulled from long-term storage (HBase) using the /report/index/ page, forexample: http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE

The search feature is at: http://crash-stats/query

2. a set of batch jobs which compiles aggregate reports and graphs, such as “Top Crashes by Signature”

This requires PostgreSQL, Middleware and UI. It triggered once per day by the “daily_matviews” cronjob, covering data processed in the previous UTC day.

Every other page on http://crash-stats is of this type.

2.3. Manual Install 7

http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE

http://crash-stats/query

http://crash-stats


2.3.9 Crash Flow

The basic flow of an incoming crash is:

(breakpad client) -> (collector) -> (local file system) -> (newCrashMover.py) -> (hbase)

A single machine will need to run the Monitor service, which watches hbase for incoming crashes and queues themup for the Processor service (which can run on one or more servers). Monitor and Processor use PostgreSQL tocoordinate.

Finally, processed jobs are inserted into both hbase and PostgreSQL

2.3.10 Configure Socorro

These pages show how to start the services manually, please also see the next section “Install startup scripts”:

• Start configuration with Common Config

• On the machine(s) to run collector, setup Collector

• On the machine(s) to run collector setup Crash Mover

• On the machine to run monitor, setup Monitor

• On same machine that runs monitor, setup Deferred Cleanup

• On the machine(s) to run processor, setup Processor

2.3.11 Install startup scripts

RHEL/CentOS only (Ubuntu TODO - see ./puppet/files/etc_supervisor for supervisord example)

As root:

ln -s /data/socorro/application/scripts/init.d/socorro-{monitor,processor,crashmover} /etc/init.d/chkconfig socorro-monitor onchkconfig socorro-processor onchkconfig socorro-crashmover onservice httpd restartchkconfig httpd onservice memcached restartchkconfig memcached on

2.3.12 Install Socorro cron jobs

As root:

ln -s /data/socorro/application/scripts/crons/socorrorc /etc/socorro/crontab /data/socorro/application/scripts/crons/example.crontab

2.3.13 PostgreSQL Config

RHEL/CentOS - Initialize and enable on startup (not needed for Ubuntu)

As root:



service postgresql initdbservice postgresql startchkconfig postgresql on

As root:

• edit /var/lib/pgsql/data/pg_hba.conf and change IPv4/IPv6 connection from “ident” to “md5”

• edit /var/lib/pgsql/data/postgresql.conf and:

– uncomment # listen_addresses = ‘localhost’

– change TimeZone to ‘UTC’

• edit other postgresql.conf paramters per www.postgresql.org community guides

2.3.14 Populate PostgreSQL Database

Refer to Populate PostgreSQL for information about loading the schema and populating the database.

This step is required to get basic information about existing product names and versions into the system.

2.3.15 Configure Apache

As root:

edit /etc/httpd/conf.d/socorro.confcp config/socorro.conf /etc/httpd/conf.d/socorro.confmkdir /var/log/httpd/{crash-stats,crash-reports,socorro-api}.example.comchown apache /data/socorro/htdocs/application/logs/

Note - use www-data instead of apache for debian/ubuntu

2.3.16 Enable PHP short_open_tag

As root:

edit /etc/php.ini and make the following changes:

short_open_tag = Ondate.timezone = ’America/Los_Angeles’

2.3.17 Configure Kohana (PHP/web UI)

Refer to UI Installation (deprecated as of 2.2, new docs TODO)

2.3.18 Hadoop+HBase install

Configure Hadoop 0.20 + HBase 0.89 Refer to https://ccp.cloudera.com/display/CDHDOC/HBase+Installation

Note - you can start with a standalone setup, but read all of the above for info on a real, distributed setup!

RHEL/CentOS only (not needed for Ubuntu) Install startup scripts

As root:

2.3. Manual Install 9

https://ccp.cloudera.com/display/CDHDOC/HBase+Installation


service hadoop-hbase-master startchkconfig hadoop-hbase-master onservice hadoop-hbase-thrift startchkconfig hadoop-hbase-thrift on

2.3.19 Load Hbase schema

FIXME this skips LZO suport, remove the “sed” command if you have it installed


cat analysis/hbase_schema | sed ’s/LZO/NONE/g’ | hbase shell

2.3.20 System Test

Generate a test crash:

1. Install http://code.google.com/p/crashme/ add-on for Firefox

2. Point your Firefox install at http://crash-reports/submit

See: https://developer.mozilla.org/en/Environment_variables_affecting_crash_reporting

If you already have a crash available and wish to submit it, you can use the standalone submitter tool:


virtualenv socorro-virtualenv. socorro-virtualenv/bin/activatepip install postercp scripts/config/submitterconfig.py.dist scripts/config/submitterconfig.pyexport PYTHONPATH=.:thirdpartypython scripts/submitter.py -u http://crash-reports/submit -j ~/Downloads/crash.json -d ~/Downloads/crash.dump

You should get a “CrashID” returned. Check syslog logs for user.*, should see the CrashID returned being collected.

Attempt to pull up the newly inserted crash: http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE

The (syslog “user” facility) logs should show this new crash being inserted for priority processing, and it should beavailable shortly thereafter.


http://code.google.com/p/crashme/

http://crash-reports/submit

https://developer.mozilla.org/en/Environment_variables_affecting_crash_reporting


CHAPTER 3

Collector

Collector is an application that runs under Apache using mod-python. Its task is accepting crash reports from remoteclients and saving them in a place and format usable by further applications.

Raw crashes are accepted via HTTP POST. The form data from the POST is then arranged into a JSON and saved intothe local file system. The collector is responsible for assigning an ooid? (Our Own ID) to the crash. It also assigns aThrottle? value which determines if the crash is eventually to go into the relational database.

Should the saving to a local file system fail, there is a fallback storage mechanism. A second file system can beconfigured to take the failed saves. This file system would likely be an NFS mounted file system.

After a crash is saved, there is an app called Crash Mover that will transfer the crashes to HBase.

3.1 Collector Python Configuration

Like all the Socorro applications, the configuration is actually executable Python code. Two configuration files arerelevant for collector

• Copy .../scripts/config/commonconfig.py.dist to .../config/commonconfig.py. This configura-tion file contains constants used by many of the Socorro applications.

• Copy .../scripts/config/collectorconfig.py.dist to .../config/collectorconfig.py

3.2 Common Configuration

There are two constants in ‘.../scripts/config/commonconfig.py’ of interest to collector: jsonFileSuffix, and dumpFile-Suffix. Other constants in this file are ignored.

To setup the common configuration, see Common Config.

3.3 Collector Configuration

collectorconfig.py has several options to adjust how files are stored:

See sample config code on Github

11

https://github.com/mozilla/socorro/blob/master/scripts/config/collectorconfig.py.dist


12 Chapter 3. Collector

CHAPTER 4

Processor

4.1 Introduction

Socorro Processor is a multithreaded application that applies JSON/dump pairs to the stackwalk_server application,parses the output, and records the results in the hbase. The processor, coupled with stackwalk_server, is computation-ally intensive. Multiple instances of the processor can be run simultaneously from different machines.


13

https://github.com/mozilla/socorro/blob/master/scripts/config/processorconfig.py.dist


14 Chapter 4. Processor

CHAPTER 5

Middleware API

5.1 API map

5.1.1 New-style, documented services

• /bugs/

• /crashes/

– /crashes/comments

– /crashes/frequency

– /crashes/paireduuid

– /crashes/signatures

• extensions/

• crashtrends/

• job/

• priorityjobs/

• products/

• products/builds/

• products/

– products/builds/

– products/versions/

• report/

– report/list/

• signatureurls

• search/

– search/crashes/

– search/signatures/

• util/

– util/versions_info/

15


5.1.2 Old-style, undocumented services

See source code in .../socorro/services/ for more details.

• /adu/byday

• /adu/byday/details

• /bugs/by/signatures

• /crash

• /current/versions

• /email

• /emailcampaigns/campaign

• /emailcampaigns/campaigns/page

• /emailcampaigns/create

• /emailcampaigns/subscription

• /emailcampaigns/volume

• /reports/hang

• /schedule/priority/job

• /topcrash/sig/trend/history

• /topcrash/sig/trend/rank

5.2 Bugs

Return a list of signature - bug id associations.

5.2.1 API specifications

HTTP method POSTURL schema /bugs/Full URL /bugs/Example http://socorro-api/bpapi/bugs/ data: signatures=mysignature+anothersig+jsCrashSig

5.2.2 Mandatory parameters

Name Type of value Default value Descriptionsignatures List of strings None Signatures of bugs to get.

5.2.3 Optional parameters

None.

16 Chapter 5. Middleware API

http://socorro-api/bpapi/bugs/


5.2.4 Return value

In normal cases, return something like this:

{"hits": [

{"id": "789012","signature": "mysignature"

},{

"id": "405060","signature": "anothersig"

}],"total": 2

}

5.3 Crashes Comments

Return a list of comments on crash reports, filtered by signatures and other fields.


HTTPmethod

GET

URLschema

/crashes/comments/(parameters)

FullURL

/crashes/comments/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/

Exam-ple

http://socorro-api/bpapi/crashes/comments/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/


Name Type of value Default value Descriptionsignature String None Signature of crash reports to get.

5.3. Crashes Comments 17






Name Type ofvalue

De-faultvalue

Description

products String orlist ofstrings

‘Fire-fox‘

The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . )

from Date Now -7 days

Search for crashes that happened after this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.

to Date Now Search for crashes that happened before this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.

versions String orlist ofstrings

None Restring to a specific version of the product. Several versions can bespecified, separated by a + symbol.

os String orlist ofstrings

None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Severalversions can be specified, separated by a + symbol.

branches String orlist ofstrings

None Restrict to a branch of the product. Several branches can be specified,separated by a + symbol.

reasons String orlist ofstrings

None Restricts search to crashes caused by this reason.

build_ids Integer orlist ofintegers

None Restricts search to crashes that happened on a product with this build ID.

build_from Integer orlist ofintegers

None Restricts search to crashes with a build id greater than this.

build_to Integer orlist ofintegers

None Restricts search to crashes with a build id lower than this.

re-port_process

String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘.

re-port_type

String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘.

plugin_in String orlist ofstrings

‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘.

plu-gin_search_mode

String ‘de-fault‘

How to search for this plugin. report_process has to be set to plugin. Canbe either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘.

plu-gin_terms

String orlist ofstrings

None Terms to search for. Several terms can be specified, separated by a +symbol. report_process has to be set to plugin.

5.3.4 Return value




{"hits": [

{"date_processed": "2011-03-16 06:54:56.385843","uuid": "06a0c9b5-0381-42ce-855a-ccaaa2120116","user_comments": "My firefox is crashing in an awesome way","email": "[email protected]"

},{

"date_processed": "2011-03-16 06:54:56.385843","uuid": "06a0c9b5-0381-42ce-855a-ccaaa2120116","user_comments": "I <3 Firefox crashes!","email": "[email protected]"

}],"total": 2

}

If no signature is passed as a parameter, return null.

5.4 Crashes Frequency

Return the number and frequency of crashes on each OS.


HTTPmethod

GET

URLschema

/crashes/frequency/(parameters)

FullURL

/crashes/frequency/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/

Exam-ple

http://socorro-api/bpapi/crashes/frequency/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/



5.4. Crashes Frequency 19






Name Type ofvalue

De-faultvalue

Description


‘Fire-fox‘



















re-port_process


re-port_type




plu-gin_search_mode



plu-gin_terms



5.4.4 Return value




{"hits": [

{"count": 167,"build_date": "20120129064235","count_mac": 0,"frequency_windows": 1,"count_windows": 167,"frequency": 1,"count_linux": 0,"total": 167,"frequency_linux": 0,"frequency_mac": 0

},{

"count": 1,"build_date": "20120129063944","count_mac": 1,"frequency_windows": 0,"count_windows": 0,"frequency": 1,"count_linux": 0,"total": 1,"frequency_linux": 0,"frequency_mac": 1

}],"total": 2

}

5.5 Crashes Paireduuid

Return paired uuid given a uuid and an optional hangid.


HTTP method GETURL schema /crashes/paireduuid/(optional_parameters)Full URL /crashes/paireduuid/uuid/(uuid)/hangid/(hangid)/Example http://socorro-api/bpapi/crashes/paireduuid/uuid/e8820616-1462-49b6-9784-e99a32120201/


Name Type of value Descriptionuuid String Unique identifier of the crash report.


Name Type of value Default value Descriptionhangid String None Hang ID of the crash report.

5.5. Crashes Paireduuid 21

http://socorro-api/bpapi/crashes/paireduuid/uuid/e8820616-1462-49b6-9784-e99a32120201/


5.5.4 Return value

Return an object like the following:

{"hits": [

{"uuid": "e8820616-1462-49b6-9784-e99a32120201"

}],"total": 1

}

Note that if a hangid is passed to the service, it will always return maximum one result. Remove that hangid to get allpaired uuid.

5.6 Crashes Signatures

Return top crashers by signatures.


HTTPmethod

GET

URLschema

/crashes/signatures/(optional_parameters)

FullURL

/crashes/signatures/product/(product)/version/(version)/to_from/(to_date)/duration/(number_of_days)/crash_type/(crash_type)/limit/(number_of_results)/os/(operating_system)/

Exam-ple

http://socorro-api/bpapi/crashes/signatures/product/Firefox/version/9.0a1/


Name Type of value Descriptionproduct String Product for which to get top crashes by signatures.version String Version of the product for which to get top crashes.


Name Type ofvalue

Defaultvalue

Description

crash_type String all Type of crashes to get, can be “browser”, “plugin”, “content” or“all”.

end_date Date Now Date before which to get top crashes.duration Int One week Number of hours during which to get crashes.os String None Limit crashes to only one OS.limit Int 100 Number of results to retrieve.


http://socorro-api/bpapi/crashes/signatures/product/Firefox/version/9.0a1/


5.6.4 Return value

Return an object like the following:

{"totalPercentage": 0.9999999999999994,"end_date": "2011-12-08 00:00:00","start_date": "2011-12-07 17:00:00","crashes": [

{"count": 3,"mac_count": 3,"changeInRank": 11,"currentRank": 0,"previousRank": 11,"percentOfTotal": 0.142857142857143,"win_count": 0,"changeInPercentOfTotal": 0.117857142857143,"linux_count": 0,"hang_count": 0,"signature": "objc_msgSend | __CFXNotificationPost","previousPercentOfTotal": 0.025,"plugin_count": 0

}],"totalNumberOfCrashes": 1

}

5.7 Extensions

Return a list of extensions associated with a crash’s UUID.


HTTP method GETURL schema /extensions/(optional_parameters)Full URL /extensions/uuid/(uuid)/date/(crash_date)/Example http://socorro-api/bpapi/extensions/uuid/xxxx-xxxx-xxxx/date/2012-02-29T01:23:45+00:00/


Name Type of value Default value Descriptionuuid String None Unique Identifier of the specific crash to get extensions from.date Datetime None Exact datetime of the crash.


None

5.7. Extensions 23

http://socorro-api/bpapi/extensions/uuid/xxxx-xxxx-xxxx/date/2012-02-29T01:23:45+00:00/


5.7.4 Return value

Return a list of extensions:

{"total": 1,"hits": [

{"report_id": 1234,"date_processed": "2012-02-29T01:23:45+00:00","extension_key": 5678,"extension_id": "[email protected]","extension_version": "1.2"

}]

}

5.8 Crash Trends

Return a list of nightly or aurora crashes that took place between two dates.


HTTPmethod

GET

URLschema

/crashtrends/(optional_parameters)

Full URL /crashtrends/start_date/(start_date)/end_date/(end_date)/product/(product)/version/(version)Example http://socorro-api/bpapi/crashtrends/start_date/2012-03-01/end_date/2012-03-

15/product/Firefox/version/13.0a1


Name Type of value Default value Descriptionstart_date Datetime None The earliest date of crashes we wish to evaluateend_date Datetime None The latest date of crashes we wish to evaluate.product String None The product.version String None The version.


None

5.8.4 Return value

Return a total of crashes, along with their build date, by build ID:


http://socorro-api/bpapi/crashtrends/start_date/2012-03-01/end_date/2012-03-15/product/Firefox/version/13.0a1

http://socorro-api/bpapi/crashtrends/start_date/2012-03-01/end_date/2012-03-15/product/Firefox/version/13.0a1


[{

"build_date": "2012-02-10","version_string": "12.0a2","product_version_id": 856,"days_out": 6,"report_count": 515,"report_date": "2012-02-16","product_name": "Firefox"

}]

5.9 Job

Handle the jobs queue for crash reports processing.


HTTP method GETURL schema /job/(parameters)Full URL /job/uuid/(uuid)/Example http://socorro-api/bpapi/job/uuid/e8820616-1462-49b6-9784-e99a32120201/


Name Type of value Default value Descriptionuuid String None Unique identifier of the crash report to find.


None

5.9.4 Return value

With a GET HTTP method, the service will return data in the following form:

{"hits": [

{"id": 1,"pathname": "","uuid": "e8820616-1462-49b6-9784-e99a32120201","owner": 3,"priority": 0,"queueddatetime": "2012-02-29T01:23:45+00:00","starteddatetime": "2012-02-29T01:23:45+00:00","completeddatetime": "2012-02-29T01:23:45+00:00","success": True,"message": "Hello"

}

5.9. Job 25

http://socorro-api/bpapi/job/uuid/e8820616-1462-49b6-9784-e99a32120201/


],"total": 1

}

5.10 Priorityjobs

Handle the priority jobs queue for crash reports processing.


HTTP method GET, POSTURL schema /priorityjobs/(parameters)Full GET URL /priorityjobs/uuid/(uuid)/GET Example http://socorro-api/bpapi/priorityjobs/uuid/e8820616-1462-49b6-9784-e99a32120201/POST Example http://socorro-api/bpapi/priorityjobs/, data: uuid=e8820616-1462-49b6-9784-e99a32120201


Name Type of value Default value Descriptionuuid String None Unique identifier of the crash report to mark.


None

5.10.4 Return value

With a GET HTTP method, the service will return data in the following form:

{"hits": [

{"uuid": "e8820616-1462-49b6-9784-e99a32120201"}],"total": 1

}

With a POST HTTP method, it will return true if the uuid has been successfully added to the priorityjobs queue, andfalse if the uuid is already in the queue or if there has been a problem.

5.11 Products

Return information about product(s) and version(s) depending on the parameters the service is called with.


http://socorro-api/bpapi/priorityjobs/uuid/e8820616-1462-49b6-9784-e99a32120201/

http://socorro-api/bpapi/priorityjobs/



HTTP method GETURL schema /products/(optional_parameters)Full URL /products/versions/(versions)Example http://socorro-api/bpapi/products/versions/Firefox:9.0a1/


Name Type of value Defaultvalue

Description

ver-sions

String or list ofstrings

None Several product:version strings can be specified, separated by a+ symbol.

5.11.3 Return value

If the service is called with the optional versions parameter, the service returns an object with an array of resultslabeled as hits and a total:

{"hits": [

{"is_featured": boolean,"throttle": float,"end_date": "string","start_date": "integer","build_type": "string","product": "string","version": "string"

}...

],"total": 1

}

If the service is called with no parameters, it returns an object containing a list of products as well as a total, indicatingthe number of products returned:

{"hits": [{

"sort": 1,"release_name": "firefox","rapid_release_version": "5.0","product_name": "Firefox"

},...], "total": 6

}

5.12 Products Builds

Query and update information about builds for products.

5.12. Products Builds 27

http://socorro-api/bpapi/products/versions/Firefox:9.0a1/



HTTP method GET, POSTURL schema /products/builds/(optional_parameters)Full URL /products/builds/product/(product)/version/(version)/date_from/(date_from)/GET Example POST Example http://socorro-api/bpapi/products/builds/product/Firefox/version/9.0a1/

http://socorro-api/bpapi/products/builds/product/Firefox/,data: version=10.0&platform=macosx&build_id=20120416012345&

build_type=Beta&beta_number=2&repository=mozilla-central

5.12.2 Mandatory GET parameters

Name Type of value Default value Descriptionproduct String None Product for which to get nightly builds.

5.12.3 Optional GET parameters

Name Type of value Default value Descriptionversion String None Version of the product for which to get nightly builds.from_date Date Now - 7 days Date from which to get nightly builds.

5.12.4 GET return value

Return an array of objects:

[{

"product": "string","version": "string","platform": "string","buildid": "integer","build_type": "string","beta_number": "string","repository": "string","date": "string"

},...

]

5.12.5 Mandatory POST parameters

Name Type of value Default value Descriptionproduct String None Product for which to add a build.version String None Version for new build, e.g. “10.0”.platform String None Platform for new build, e.g. “macosx”.build_id String None Build ID for new build (YYYYMMDD######).build_type String None Type of build, e.g. “Release”, “Beta”, “Aurora”, etc.


http://socorro-api/bpapi/products/builds/product/Firefox/version/9.0a1/

http://socorro-api/bpapi/products/builds/product/Firefox/


5.12.6 Optional POST parameters

Name Type ofvalue

Defaultvalue

Description

beta_numberString None Beta number if build_type is “Beta”. Mandatory if build_type is“Beta”, ignored otherwise.

repository String “” The repository from which this release came.

5.12.7 POST return value

On success, returns a 303 See Other redirect to the newly-added build’s API page at:

/products/builds/product/(product)/version/(version)/

5.13 Signature URLs

Returns a list of urls for a specific signature, product(s), version(s)s as well as start and end date. Also includes thetotal number of times this URL has been reported for the parameters specified above.


HTTPmethod

GET

URLschema

/signatureurls/(parameters)

FullURL

/signa-tureurls/signature/(signature)/start_date/(start_date)/end_date/(end_date)/products/(products)/versions/(versions)

Exam-ple

http://socorro-api/bpapi/signatureurls/signature/samplesignature/start_date/2012-03-01T00:00:00+00:00/end_date/2012-03-31T00:00:00+00:00/products/Firefox+Fennec/versions/Firefox:4.0.1+Fennec:13.0/


Name Type of value Default value Descriptionsignature String None The signature for which urls shoud be foundstart_date Date None Date from which to collect urlsend_date Date None Date up to, but not including, for which urls should be collectedproducts String None Product(s) for which to find urlsversions String None Version(s) of the above products to find urls for

5.13.3 Return value

Returns an object with a list of urls and the total count for each, as well as a counter, ‘total’, for the total number ofresults in the result set.

{

“hits”: [ {“url”: “about:blank”, “crash_count”: 1936}, ...

5.13. Signature URLs 29





], “total”: 1

}

5.14 Search

Search for crashes according to a large number of parameters and return a list of crashes or a list of distinct signatures.


HTTPmethod

GET

URLschema

/search/(data_type)/(optional_parameters)

FullURL

/search/(data_type)/for/(terms)/products/(products)/from/(from_date)/to/(to_date)/in/(fields)/versions/(versions)/os/(os_name)/branches/(branches)/search_mode/(search_mode)/reasons/(crash_reasons)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/result_number/(number)/result_offset/(offset)/

Exam-ple

http://socorro-api/bpapi/search/crashes/for/libflash.so/in/signature/products/Firefox/versions/Firefox:4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/


Name Type of value Default value Descriptiondata_type String ‘signatures‘ Type of data we are looking for. Can be ‘crashes‘ or

‘signatures‘.






5.14. Search 31



Name Type ofvalue

De-faultvalue

Description

for String orlist ofstrings

None Terms we are searching for. Each term must be URL encoded. Severalterms can be specified, separated by a + symbol.


‘Fire-fox‘





in String orlist ofstrings

All Fields we are searching in. Several fields can be specified, separated by a +symbol. This is NOT implemented for PostgreSQL.







search_modeString ‘de-fault‘

Set how to search. Can be either ‘default‘, ‘is_exactly‘, ‘contains‘ or‘starts_with‘.









re-port_process


re-port_type




plu-gin_search_mode



plu-gin_terms



re-sult_number

Integer 100 Number of results to return.

re-sult_offset

Integer 0 Offset of the first result to return.



5.14.4 Return value

If data_type is crashes, return value looks like:

{"hits": [

{"count": 1,"signature": "arena_dalloc_small | arena_dalloc | free | CloseDir",

},{

"count": 1,"signature": "XPCWrappedNativeScope::TraceJS(JSTracer*, XPCJSRuntime*)","is_solaris": 0,"is_linux": 0,"numplugin": 0,"is_windows": 0,"is_mac": 0,"numhang": 0

}],"total": 2

}

If data_type is signatures, return value looks like:

{"hits": [

{"client_crash_date": "2011-03-16 13:55:10.0","dump": "...","signature": "arena_dalloc_small | arena_dalloc | free | CloseDir","process_type": null,"id": 231224257,"hangid": null,"version": "4.0b13pre","build": "20110314162350","product": "Firefox","os_name": "Mac OS X","date_processed": "2011-03-16 06:54:56.385843","reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS","address": "0x1d3aff03","...": "..."

}],"total": 1

}

If an error occured, the API will return something like this:

Well, for the moment it doesn’t return anything but an Internal ErrorHTTP header... We will improve that soon! :)

5.15 List Report

Return a list of crash reports with a specified signature and filtered by a wide range of options.

5.15. List Report 33



HTTPmethod

GET

URLschema

/report/list/(parameters)

FullURL

/re-port/list/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/

Exam-ple

http://socorro-api/bpapi/report/list/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/









Name Type ofvalue

De-faultvalue

Description


‘Fire-fox‘



















re-port_process


re-port_type




plu-gin_search_mode



plu-gin_terms



re-sult_number

Integer 100 Number of results to return.

re-sult_offset

Integer 0 Offset of the first result to return.

5.15. List Report 35


5.15.4 Return value


{"hits": [

{"client_crash_date": "2011-03-16 13:55:10.0","dump": "...","signature": "arena_dalloc_small | arena_dalloc | free | CloseDir","process_type": null,"id": 231224257,"hangid": null,"version": "4.0b13pre","build": "20110314162350","product": "Firefox","os_name": "Mac OS X","date_processed": "2011-03-16 06:54:56.385843","reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS","address": "0x1d3aff03","...": "..."

},{

"client_crash_date": "2011-03-16 11:35:37.0","...": "..."

}],"total": 2

}

If signature is empty or nonexistent, raise a BadRequest error.

If another error occured, the API will return a 500 Internal Error HTTP header.

5.16 Versions Info

Return information about one or several couples product:version.


HTTP method GETURL schema /util/versions_info/(optional_parameters)Full URL /util/versions_info/versions/(versions)/Example http://socorro-api/bpapi/util/versions_info/versions/Firefox:9.0a1+Fennec:7.0/


None.


Name Type of value Default value Descriptionversions String or list of strings None Product:Versions couples for which information is asked.


http://socorro-api/bpapi/util/versions_info/versions/Firefox:9.0a1+Fennec:7.0/


5.16.4 Return value

If parameter versions is unvalid, return value is None. Otherwise it looks like this:

{"product_name:version_string": {

"product_version_id": integer,"version_string": "string","product_name": "string","major_version": "string" or None,"release_channel": "string" or None,"build_id": [list, of, decimals] or None

}}

5.17 Forcing an implementation

For debuging reasons, you can add a parameter to force the API to use a specific implementation module. That modulemust be inside socorro.external and contain the needed service implementation.

Name Type of value Default value Descriptionforce_api_impl String None Force the service to use a specific module.

For example, if you want to force search to be executed with ElasticSearch, you can add to the middleware callforce_api_impl/elasticsearch/. If socorro.external.elasticsearch exists and contains a search module, it will get loadedand used.

5.17. Forcing an implementation 37



CHAPTER 6

Socorro UI

The Socorro UI is a KohanaPHP implementation that will operate the frontend website for the Crash Reporter website.

6.1 Coding Standards

Maintaining coding standards will encourage current developers and future developers to implement clean and consis-tent code throughout the codebase.

The PEAR Coding Standards (http://pear.php.net/manual/en/standards.php) will serve as the basis for the Socorro UIcoding standards.

• Always include header documentation for each class and each method.

– When updating a class or method that does not have header documentation, add header documentationbefore committing.

– Header documentation should be added for all methods within each controller, model, library andhelper class.

– @param documentation is required for all parameters

– Header documentation should be less than 80 characters in width.

• Add inline documentation for complex logic within a method.

• Use 4 character tab indentations for both PHP and Javascript

• Method names must inherently describe the functionality within that method.

– Method names must be written in a camel-case format. e.g. getThisThing

– Method names should follow the verb-noun format, such as a getThing, editThing, etc.

• Use carriage returns in if statements containing more than 2 statements and in arrays containing more than 3array members for readability.

• All important files, such as controllers, models and libraries, must have the Mozilla Public License at the top ofthe file.

6.2 Adding new reports

Here is an example of a new report which uses a web service to fetch data (JSON via HTTP) and displays the result asan HTML table.

39

http://pear.php.net/manual/en/standards.php


Kohana uses the Model-View-Controller (MVC) pattern: http://en.wikipedia.org/wiki/Model-view-controller

Create model, view(s) and controller for new report (substituting “newreport” for something more appropriate):

6.2.1 Configuration (optional)

webapp-php/application/config/new_report.php

<?php defined(’SYSPATH’) OR die(’No direct access allowed.’);

// The number of rows to display.$config[’numberofrows’] = 20;

// The number of results to display on the by_version page.$config[’byversion_limit’] = 300;?>

6.2.2 Model

webapp-php/application/models/newreport.php

See Add a service to the Middleware for details about writing a middleware service for this to use.

<?phpclass NewReport_Model extends Model {

public function getNewReportViaWebService() {// this should be pulled from the middleware service

}}?>

6.2.3 View

webapp-php/application/views/newreport/byversion.php

<?php slot::start(’head’) ?><title>New Report for <?php out::H($product) ?> <?php out::H($version) ?></title><?php echo html::script(array(

’js/path/to/scripts/you/need.js’))?>

<?php echo html::stylesheet(array(’css/path/to/css/you/need.css’

), ’screen’)?><?php slot::end() ?>

6.2.4 Controller

webapp-php/application/controllers/newreport.php

<?php defined(’SYSPATH’) or die(’No direct script access.’);require_once(Kohanna::find_file(’libraries’, ’somelib’, TRUE ’php’));

class NewReport_Controller extends Controller {

40 Chapter 6. Socorro UI

http://en.wikipedia.org/wiki/Model-view-controller


public function __construct() {parent::__construct();$this->newreport_model = new NewReport_Model();

}

// Public functions map to routes on the controller// http://<base-url>/NewReport/index/[product, version, ?’foo’=’bar’, etc]public function index() {

$resp = $this->newreport_model->getNewReportViaWebService();if ($resp) {

$this->setViewData(array(’resp’ => $resp,’nav_selection’ => ’new_report’,’foo’ => $resp->foo,

));} else {

header("Data access error", TRUE, 500);$this->setViewData(array(

’resp’ => $resp,’nav_selection’ => ’new_report’,

));}

}

}?>

6.2. Adding new reports 41


42 Chapter 6. Socorro UI

CHAPTER 7

UI Installation

7.1 Installation

Follow these steps to get the Socorro UI up and running.

7.1.1 Apache

Set up Apache with a vhost as you see fit. You will either need AllowOverride to enable .htaccess files or you maypaste the .htaccess rules into your vhost.

7.1.2 KohanaPHP Installation

1. Copy .htaccess file and edit the host path if your webapp is not at the domain root.:

cp htaccess-dist .htaccessvim .htaccess

2. Copy application/config/config.php-dist and change the hosting path and domain.:

cp application/config/config.php-dist application/config/config.phpvim application/config/config.php

For a production install, you may want to set $config[’display_errors’] to FALSE.

3. Copy application/config/database.php and edit its database settings.:

cp application/config/database.php-dist application/config/database.phpvim application/config/database.php

4. Copy application/config/cache.php and update the cache setting to be file-based or memcache-based.:

cp application/config/cache.php-dist application/config/cache.phpvim application/config/cache.php

5. If you selected memcache-based caching, copy application/config/cache_memcache.php and update the settingsaccordingly.:

cp application/config/cache_memcache.php-dist application/config/cache_memcache.phpvim application/config/cache_memcache.php

6. Copy all other config -dist files to their config location.:

43


cp application/config/application.php-dist application/config/application.phpcp application/config/webserviceclient.php-dist application/config/webserviceclient.phpcp application/config/daily.php-dist application/config/daily.phpcp application/config/products.php-dist application/config/products.php

7. Copy application/config/auth.php and edit it to setup your preferred authentication method, or to disable authen-tication. Edit $config[’driver’] to change your authentication method. Edit $config[’proto’] to remove the httpsrequirement if necessary.:

cp application/config/auth.php-dist application/config/auth.phpvim application/config/auth.php

8. If you are using LDAP, copy application/config/ldap.php and edit its settings.:

cp application/config/ldap.php-dist application/config/ldap.phpvim application/config/ldap.php

9. Ensure that the application logs and cache directories are writeable.:

a+rw application/logs application/cache

7.1.3 Dump Files

Socorro UI needs to access the processed dump files via HTTP. You will need to setup Apache or some other systemto ensure that dump files may be accessed at ‘http://example.com/dumps/<UUID>.jsonz’ . This can be accomplishedvia mod_rewrite rules, just like in the next section “Serving Raw dump files”.

Example config: processeddumps.mod_rewrite.txt

Next, update the $config[’crash_dump_local_url’] value in application/config/application.php to point to the properdirectory.

7.1.4 Raw Dump Files

When a user is logged in to Socorro UI as an admin, they may view raw crash dump files. These rawcrashes can be served up by Apache by adding the following rewrite rules. The values should match the val-ues in the middleware code at scripts/config/commonconfig.py settings. Links to raw dumps are available in thehttp://example.com/report/index/{uuid} crash report pages.

Example config: webapp-php/docs/rawdumps.mod_rewrite.txt

Next, update the $config[’raw_dump_url’] value in application/config/application.php to point to the proper directory.

7.1.5 Web Services

Many parts of Socorro UI rely on web services provided by the Python-based middleware layer.

7.1.6 Middleware

Copy the scripts/config/webapiconfig.py file, edit it accordingly and execute the script to listen on the indicated port.:

cp scripts/config/webapiconfig.py-dist scripts/config/webapiconfig.py.pyvim scripts/config/webapiconfig.pypython scripts/webservices.py 8083

44 Chapter 7. UI Installation

http://example.com/dumps

https://github.com/mozilla/socorro/blob/master/webapp-php/docs/processeddumps.mod_rewrite.txt

http://example.com/report/index

https://github.com/mozilla/socorro/blob/master/webapp-php/docs/rawdumps.mod_rewrite.txt


7.1.7 Socorro UI

Copy application/config/webserviceclient.php, edit the file and change $config[’socorro_hostname’] to contain theproper hostname and port number. If necessary, update $config[’basic_auth’]:

cp application/config/webserviceclient.php-dist application/config/webserviceclient.phpvim application/config/webserviceclient.php

7.1.8 Testing Your Setup

There are 2 ways in which you can test your Socorro UI setup.

7.1.9 Search

Visit the website containing the Socorro UI, and click Advanced Search. Perform a search for the product you’veadded to the site, which you know have crash reports associated with it in the reports table in your database.

7.1.10 Report

Within the search results set you received, click a signature in the results set. Next click the timestamp for a particularsignature, which will take you to a page that displays an individual crash report.

7.2 Trouble Shooting

7.2.1 println the sql

To see what SQL queries are being executed: Edit ‘webapp-php/system/libraries/Database.php’ line 443 Ko-hana::log(‘debug’, $sql); Do a svn ignore on this file, if you plan on checking in code.

This will show up in the debug log ‘application/logs/date.log.php’

Examine your database and see why you don’t get the expected results.

7.2.2 404?

Is your ‘.htaccess’ properly setup?

7.2.3 /report/pending never goes to /report/index?

If you see a pending screen and didn’t expect one this means that the record in report and dumps couldn’t be joinedso it’s waiting for the processor on the backend to populate one or both tables. Investigate with the uuid and look atreports and dump tables.

7.2.4 Config Files

Ensure that the appropriate config files in webapp/application/config have been copied from .php-dist to .php

7.2. Trouble Shooting 45


46 Chapter 7. UI Installation

CHAPTER 8

Server

The Socorro Server is a collection of Python applications and a Python package ([[SocorroPackage]]) that runs thebackend of the Socorro system.

8.1 The Applications

Executables for the applications are generally found in the .../scripts directory.

• ../scripts/startCollector.py - Collector

• ../scripts/startDeferredCleanup.py - Deferred Cleanup

• ../scripts/startMonitor.py - Monitor

• ../scripts/startProcessor.py - Processor

• ../scripts/startTopCrashes.py - Top Crashers By Signature

• ../scripts/startBugzilla.py - BugzillaAssociations

• ../scripts/startMtfb.py - MeanTimeBeforeFailure

• ../scripts/startServerStatus.py - server status

• ../scripts/startTopCrashByUrl.py - Top Crashers By URL

47


48 Chapter 8. Server

CHAPTER 9

crontabber

crontabber is a script that handles all cron job scripting. Unlike traditional UNIX crontab all execution is donevia the ./crontabber.py script and the configuration about frequency and exact time to run is part of the configurationfiles. The configuration is done using configman and it looks something like this:

# name: jobs# doc: List of jobs and their frequency separated by ‘|‘# converter: configman.converters.class_list_converterjobs=socorro.cron.jobs.foo.FooCronApp|12h

socorro.cron.jobs.bar.BarCronApp|1dsocorro.cron.jobs.pgjob.PGCronApp|1d|03:00

9.1 crontab runs crontabber

crontabber can be run at any time. Because the exact execution time is in configuration you can’t accidentallyexecute jobs that aren’t supposed to execute simply by running crontabber.

However, it can’t be run as daemon. It actually needs to be run by UNIX crontab every, say, 5 minutes. So insteadof your crontab being a huge list of jobs at different times, all you need is this:

*/5 * * * * PYTHONPATH="..." socorro/cron/crontabber.py

That’s all you need! Obviously the granularity of crontabber is limited by the granularity you execute it.

By moving away from UNIX crontab we have better control of the cron apps and their inter-relationship. We canalso remove unnecessary boilerplate cruft.

9.2 Dependencies

In crontabber the state of previous runs of cron apps within are remembered (stored internally in a JSON file)which makes it possible to assign dependencies between the cron apps.

This is used to potentially prevent running jobs. Not to automatically run those that depend. For example, ifFooCronApp depends on BarCronApp it just won’t run if BarCronApp last resulted in an error or simply hasn’tbeen run the last time it should.

Overriding dependencies is possible with the --force parameter. For example, suppose you know BarCronAppcan now be run you do that like this:

49


./crontabber.py --job=BarCronApp --force

Dependencies inside the cron apps are defined by settings a class attribute on the cron app. The attribute is calleddepends_on and its value can be a string, a tuple or a list. In this example, since BarCronApp depends onFooCronApp it’s class would look something like this:

from socorro.cron.crontabber import BaseCronApp

class BarCronApp(BaseCronApp):app_name = ’BarCronApp’app_description = ’Does some bar things’depends_on = (’FooCronApp’,)

def run(self):...

9.3 Own configurations

Each cron app can have its own configuration(s). Obviously they must always have a good default that is good enoughotherwise you can’t run crontabber to run all jobs that are due. To make overrideable configuration options addthe required_config class attribute. Here’s an example:

from configman import Namespacefrom socorro.cron.crontabber import BaseCronApp

class FooCronApp(BaseCronApp):app_name = ’foo’

required_config = Namespace()required_config.add_option(

’bugzilla_url’,default=’https://bugs.mozilla.org’,doc=’Base URL for bugzilla’

)

def run(self):...print self.config.bugzilla_url...

Note: Inside that run() method in that example, the self.config object is a special one. It’s basically a refer-ence to the configuration specifically for this class but it has access to all configuration objects defined in the“root”. I.e. you can access things like self.config.logger here too but other cron app won’t have accessto self.config.bugzilla_url since that’s unique to this app.

To override cron app specific options on the command line you need to use a special syntax to associate it with thiscron app class. Usually, the best hint of how to do this is to use python crontabber.py --help. In thisexample it would be:

python crontabber.py --job=foo --class-FooCronApp.bugzilla_url=...

50 Chapter 9. crontabber


9.4 App names versus/or class names

Every cron app in crontabber must have a class attribute called app_name. This value must be unique. If youlike, it can be the same as the class it’s in. When you list jobs you list the full path to the class but it’s the app_namewithin the found class that gets remembered.

If you change the app_name all previously know information about it being run is lost. If you change the name andpath of the class, the only other thing you need to change is the configuration that refers to it.

Best practice recommendation is this:

• Name the class like a typical python class, i.e. capitalize and optionally camel case the rest. For example:UpdateADUCronApp

• Optional but good practice is to keep the suffix CronApp to the class name.

• Make the app_name value lower case and replace spaces with -.

9.5 Manual intervention

First of all, to add a new job all you need to do is add it to the config file that crontabber is reading from. Thanksto being a configman application it automatically picks up configurations from files called crontabber.ini,crontabber.conf or crontabber.json. To create a new config file, use admin.dump_config like this:

python socorro/cron/crontabber.py --admin.dump_conf ini

All errors that happen are reported to the standard python logging module. Also, the latest error (type, value andtraceback) is stored in the JSON database too. If any of your cron apps have an error you can see it with:

python socorro/cron/crontabber.py --list-jobs

Here’s a sample output:

=== JOB ========================================================================Class: socorro.cron.jobs.foo.FooCronAppApp name: fooFrequency: 12hLast run: 2012-04-05 14:49:56 (1 minute ago)Next run: 2012-04-06 02:49:56 (in 11 hours, 58 minutes)

=== JOB ========================================================================Class: socorro.cron.jobs.bar.BarCronAppApp name: barFrequency: 1dLast run: 2012-04-05 14:49:56 (1 minute ago)Next run: 2012-04-06 14:49:56 (in 23 hours, 58 minutes)Error!! (1 times)

File "socorro/cron/crontabber.py", line 316, in run_oneself._run_job(job_class)

File "socorro/cron/crontabber.py", line 369, in _run_jobinstance.main()

File "/Use[snip]orro/socorro/cron/crontabber.py", line 47, in mainself.run()

File "/Use[snip]orro/socorro/cron/jobs/bar.py", line 10, in runraise NameError(’doesnotexist’)

9.4. App names versus/or class names 51


It will only keep the latest error but it will include an error count that tells you how many times it has tried and failed.The error count increments every time any error happens and is reset once no error happens. So, only the latest erroris kept and to find out about past error you have to inspect the log files.

NOTE: If a cron app that is configured to run every 2 days runs into an error; it will try to run again in 2 days.

So, suppose you inspect the error and write a fix. If you’re impatient and don’t want to wait till it’s time to run again,you can start it again like this:

python socorro/cron/crontabber.py --job=my-app-name# or if you preferpython socorro/cron/crontabber.py --job=path.to.MyCronAppClass

This will attempt it again and no matter if it works or errors it will pick up the frequency from the configuration andupdate what time it will run next.

9.6 Frequency and execution time

The format for configuring jobs looks like this:

socorro.cron.jobs.bar.BarCronApp|30m

or like this:

socorro.cron.jobs.pgjob.PGCronApp|2d|03:00

Hopefully the format is self-explanatory. The first number is required and it must be a number followed by “y”, “d”,“h” or “m”. (years, days, hours, minutes).

For jobs that have a frequency longer than 24 hours you can specify exactly when it should run. This format has to bein the 24-hour format of HH:MM.

If you’re ever uncertain that your recent changes to the configuration file is correct or not, instead of waiting aroundyou can check it with:

python socorro/cron/crontabber.py --configtest

which will do nothing if all is OK.

9.7 Timezone and UTC

No. There is no timezone in any of the dates and times in crontabber. All is assumed local time. I.e. whatever theserver it’s running on is using.

The reason for this is the ability to specify exactly when something should be run. So if you want something to run atexactly 3AM every day, that’s 3AM in relation to where the server is located.

9.8 Writing cron apps (aka. jobs)

Because of the configurable nature of the crontabber the actual cron apps can be located anywhere. For example,if it’s related to HBase it could for example be in socorro/external/hbase/mycronapp.py. However, forthe most part it’s probably a good idea to write them in socorro/cron/jobs/ and write one class per file to makeit clear. There are already some “sample apps” in there that does nothing except serving as good examples. With time,we can hopefully delete these as other, real apps, can work as examples and inspiration.



The most common apps will be execution of certain specific pieces of SQL against the PostgreSQL database. Forthose, the socorro/cron/jobs/pgjob.py example is good to look at. At the time of writing it looks like this:

from socorro.cron.crontabber import PostgreSQLCronApp

class PGCronApp(PostgreSQLCronApp):app_name = ’pg-job’app_description = ’Does some foo things’

def run(self, connection):cursor = connection.cursor()cursor.execute(’select relname from pg_class’)

Let’s pick that a part a bit... The most important difference is the different base class. Unlike the BaseCronAppclass, this one is executing the run() method with a connection instance as the one and only parameter. Thatconnection will automatically take care of transactions! That means that you don’t have to run somethingconnection.commit() and if you want the transaction to roll back, all you have to do is raise an error. Forexample:

def run(self, connection):cursor = connection.cursor()today = datetime.datetime.today()cursor.execute(’INSERT INTO jobs (room) VALUES (bathroom)’)if today.strftime(’%A’) in (’Saturday’, ’Sunday’):

raise ValueError("Today is not a good day!")else:

cursor.execute(’INSERT INTO jobs(tool) VALUES (brush)’)

Silly but hopefully it’s clear enough.

Raising an error inside a cron app will not stop the other jobs from running other than the those that depend on it.

9.8. Writing cron apps (aka. jobs) 53



CHAPTER 10

Throttling

The Collector has the ability to vet crashes as the come into the system. Originally, this system was used to provide astatistical sampling from the incoming stream of crashes. In 1.8, throttling is a way to allow a sampling of crashes tobe put into the database.

Throttling, the disposition of a JSON/dump pair, is controlled by the contents of the JSON file. The JSON files arecollections of keys and values. Collector can examine these key/value pairs and assign a pass through probability. Forexample we may want to pass 100% of all alpha or beta releases to the database. In production, however, we may wantto only save 10%.

For details on how to configure throtttling, see the configuration section of Collector. Below is a section about thecollector throttling rules.

10.1 throttleConditions

This option tells the collector how to route a given JSON/dump pair to storage for further processing or deferredstorage. This consists of a list of conditions in this form: (JsonFileKey?, ConditionFunction?, Probability)

• JsonFileKey?: a name of a field from the HTTP POST form. The possibilities are: “StartupTime?”, “Vendor”,“InstallTime?”, “timestamp”, “Add-ons”, “BuildID”, “SecondsSinceLastCrash?”, “UserID”, “ProductName?”,“URL”, “Theme”, “Version”, “CrashTime?”

• ConditionFunction?: a function returning a boolean, regular expression or a constant used to test the value forthe JsonFileKey?.

• Probability: an integer between 0 and 100 inclusive. At 100, all JSON files, for which the ConditionFunction?returns true, will be saved in the database. At 0, no JSON files for which the ConditionFunction? returns truewill be saved to the database. At 25, there is twenty-five percent probability that a matching JSON file will bewritten to the database.

There must be at least one entry in the throttleConditions list. The example below shows the default case.

These conditions are applied one at a time to each submitted crash. The first match of a condition function to a valuestops the iteration through the list. The probability of that first matched condition will be applied to that crash.

Keep the list short to avoid bogging down the collector.:

throttleConditions = cm.Option()throttleConditions.default = [

#("Version", lambda x: x[-3:] == "pre", 25), # queue 25% of crashes with version ending in "pre"#("Add-ons", re.compile(’inspector\@mozilla\.org\:1\..*’), 75), # queue 75% of crashes where the inspector addon is at 1.x#("UserID", "d6d2b6b0-c9e0-4646-8627-0b1bdd4a92bb", 100), # queue all of this user’s crashes#("SecondsSinceLastCrash", lambda x: 300 >= int(x) >= 0, 100), # queue all crashes that happened within 5 minutes of another crash

55


(None, True, 10) # queue 10% of what’s left]

56 Chapter 10. Throttling

CHAPTER 11

Deployment

11.1 Introduction

Below are general deployment instructions for installations of Socorro.

11.2 Outage Page

if the system is to be taken down for maintenance, these steps will show users an outage page during the maintenanceperiod

• backup webapp-php/index.php

• You can copy webapp-php/docs/outage.php over webapp-php/index.php and all traffic will be served this outagemessage.

• Do work

• copy backup over webapp-php/index.php

add other task instructions here

57


58 Chapter 11. Deployment

CHAPTER 12

Development Discussions

12.1 Coding Conventions

12.1.1 Introduction

The following coding conventions are designed to ensure that the Socorro code is easy to read, hack, test, and deploy.

12.1.2 Style Guide

• Python should follow PEP 8 with 4 space indents

• PHP code follows the PEAR coding standard

• JavaScript is indented by four spaces

• Unit Testing is strongly encouraged

12.1.3 Review

New checkins that are non-trivial should be reviewed by one of the core hackers. The commit message should indicatethe reviewer and the issue number if applicable.

12.1.4 Testing

Any features that are only available to admins should be tested to ensure that only non-admin users to not have access.

Before checking in changes to the socorro python code, be sure to run the unit tests.

12.2 New Developer Guide

If you are new to Socorro, you will find here good resources to start hacking:

59


12.2.1 General architecture of Socorro

If you clone our git repository, you will find the following folders. Here is what each of them contains:

Folder Descriptionanalysis/ Contains metrics jobs such as mapreduce. Will be moved.config/ Contains the Apache configuration for the different parts of the Socorro application.docs/ Documentation of the Socorro project (the one you are reading right now).scripts/ Scripts for launching the different parts of the Socorro application.socorro/ Core code of the Socorro project.sql/ SQL scripts related to our PostgreSQL database. Contains schemas and update queries.thirparty/ External libraries used by Socorro.tools/ External tools used by Socorro.webapp-php/ Front-end PHP application (also called UI). See Socorro UI.

Socorro submodules

The core code module of Socorro, called socorro, contains a lot of code. Here are descriptions of every submodulein there:

Module Descriptioncollector All code related to collectors.cron All cron jobs running around Socorro.database PostgreSQL related code.deferredcleanup Osolete.external Here are APIs related to external resources like databases.integrationtest Osolete.lib Different libraries used all over Socorro’s code.middleware New-style middleware services place.monitor All code related to monitors.othertests Some other tests?services Old-style middleware services place.storage HBase related code.unittest All our unit tests are here.webapi Contains a few tools used by web-based services.

12.2.2 Setup a development environment

The best and easiest way to get started with a complete dev environment is to use Vagrant and our installation script.

Standalone dev environment in your existing environment

If you don’t want to do things the easy way, or can’t use a virtual machine, you can install everything in yourown development environment. All steps are described in Standalone Development Environment.

1. Install VirtualBox from: http://www.virtualbox.org/

2. Install Vagrant from: http://vagrantup.com/

3. Download base box

# NOTE: if you have a 32-bit host, change "lucid64" to "lucid32"vagrant box add socorro-all http://files.vagrantup.com/lucid64.box

60 Chapter 12. Development Discussions

https://github.com/mozilla/socorro

http://www.virtualbox.org/

http://vagrantup.com/


4. Copy base box, boot VM and provision it with puppet:

vagrant up

5. Add to /etc/hosts (on the HOST machine!):

33.33.33.10 crash-stats crash-reports socorro-api

Enjoy your Socorro environment!

• browse UI: http://crash-stats

• submit crashes: http://crash-reports/submit (accepts HTTP POST only, see System Test for information onsubmitting test crashes)

• query data via middleware API: http://socorro-api/bpapi/adu/byday/p/WaterWolf/v/1.0/rt/any/osx/start/YYYY-MM-DD/end/YYYY-MM-DD (where WaterWolf is a valid productname and YYYY-MM-DD are validstart/end dates)

Apply your changes

Edit files in your git checkout on the host as usual. To actually make changes take effect, you can run:

vagrant provision

This reruns puppet inside the VM to deploy the source to /data/socorro and restarts any necessary services.

How Socorro works

See How Socorro Works and Crash Flow.

Setting up a new database

Note that the existing puppet manifests populate PostgreSQL if the “breakpad” database does not exist. See PopulatePostgreSQL for more information on how this process works, and how to customize it.

Enabling HBase

Socorro supports HBase as a long-term storage archive for both raw and processed crashes. Since it requires Sun (nowOracle) Java and does not work with OpenJDK, and generally has much higher memory requirements than all theother dependencies, it is not enabled by default.

If you wish to enable it, edit the nodes.pp file:

vi puppet/manifests/nodes/nodes.pp

And remove the comment (‘#’) marker from the socorro-hbase include:

# include socorro-hbase

Re-provision vagrant, and HBase will be installed, started and the default Socorro schema will be loaded:

vagrant provision

NOTE - this will download and install Java from Oracle, which means that you will be bound by the terms of theirlicense agreement - http://www.oracle.com/technetwork/java/javase/terms/license/

12.2. New Developer Guide 61

http://crash-stats

http://crash-reports/submit

http://socorro-api/bpapi/adu/byday/p/WaterWolf/v/1.0/rt/any/osx/start/YYYY-MM-DD/end/YYYY-MM-DD

http://socorro-api/bpapi/adu/byday/p/WaterWolf/v/1.0/rt/any/osx/start/YYYY-MM-DD/end/YYYY-MM-DD

http://www.oracle.com/technetwork/java/javase/terms/license/


Debugging

You can SSH into your VM by running:

vagrant ssh

By default, your socorro git checkout will be shared into the VM via NFS at /home/socorro/dev/socorro

Running “make install” as socorro user in /home/socorro/dev/socorro will cause Socorro to be installed to/data/socorro/. You will need to restart the apache2 or supervisord services if you modify middleware or backendcode, respectively (note that “vagrant provision” as described above does all of this for you).

Logs for the (PHP Kohana) webapp are at:

/data/socorro/htdocs/application/logs/

All other Socorro apps log to syslog, using the user.* facility:

/var/log/user.log

Apache may log important errors too, such as WSGI apps not starting up or problems with the Apache or PHP configs:

/var/log/apache/error.log

Supervisord captures the stderr/stdout of the backend jobs, these are normally the same as syslog but may log importanterrors if the daemons cannot be started. You can also find stdout/stderr from cron jobs in this location:

/var/log/socorro/

Loading data from an existing Socorro install

Given a PostgreSQL dump named “minidb.dump”, run the following.

vagrant ssh# shut down database userssudo /etc/init.d/supervisor force-stopsudo /etc/init.d/apache2 stop

# drop old db and load snapshotsudo su - postgresdropdb breakpadcreatedb -E ’utf8’ -l ’en_US.utf8’ -T template0 breakpadpg_restore -Fc -d breakpad minidb.dump

This may take several hours, depending on your hardware. One way to speed this up would be to add more CPU coresto the VM (via virtualbox GUI), default is 1.

Add “-j n” to pg_restore command above, where n is number of CPU cores - 1

Pulling crash reports from an existing production install

The Socorro PostgreSQL database only contains a small subset of the information about individual crashes (enoughto run aggregate reports). For instance the full stack is only available in long-term storage (such as HBase).

If you have imported a database from a production instance, you may want to configure the web UIto pull individual crash reports from production via the web service (so URLs such as http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE will work).





The /report/index page actually pulls it’s data from a URL such as: http://crash-stats/dumps/YOUR_CRASH_ID_GOES_HERE.jsonz

You can cause your dev instance to fall back to your production instance by modifying:

webapp-php/application/config/application.php

Change the URL in this config value to point to your desired production instance:

<?php$config[’crash_dump_local_url_fallback’] = ’http://crash-stats/dumps/%1$s.jsonz’;?>

Note that the crash ID must be in both your local database and the remote (production) HBase instance for this towork.

See https://github.com/mozilla/socorro/blob/master/webapp-php/application/config/application.php-dist

(OPTIONAL) Populating Elastic Search

See Populate ElasticSearch.

12.2.3 Add a service to the Middleware

Architecture overview

The middleware is a simple REST API providing JSON data depending on the URL that is called. It is made of alist of services, each one binding a certain URL with parameters. Documentation for each service is available in theMiddleware API page.

Those services are not containing any code, but are only interfaces. They are using other resources from the externalmodule. That external module is composed of one submodule for each external resource we are using. For example,there is a PostgreSQL submodule, an ElasticSearch submodule and a HBase submodule.

You will also find some common code among external resources in socorro.lib.

Class hierarchy


http://crash-stats/dumps/YOUR_CRASH_ID_GOES_HERE.jsonz

http://crash-stats/dumps/YOUR_CRASH_ID_GOES_HERE.jsonz

https://github.com/mozilla/socorro/blob/master/webapp-php/application/config/application.php-dist


REST services in Socorro are divided into two separate modules. socorro.middleware is the module that con-tains the actual service, the class that will receive HTTP requests and return the right data. However, services do notdo any kind of computation, they only find the right implementation class and call it.

Implementations of services are found in socorro.external. They are separated in submodules, one for eachexternal resource that we use. For example, in socorro.external.postgresql you will find everything thatis related to data stored in PostgreSQL: SQL queries mainly, but also arguments sanitizing and data formating.

The way it works overall is simple: the service in socorro.middlewarewill define a URL and will parse the argu-ments when the service is called. That service will then find the right implementation class in socorro.externaland call it with the parameters. The implementation class will do what it has to do (SQL query, computation... ) andreturn a Python dictionary. The service will then automatically transform that dictionary into a JSON string and sendit back via HTTP.

Create the service

First create a new file for your service in socorro/middleware/ and call it nameofservice_service.py.This is a convention for the next version of our config manager. Then create a class inside as follow:

import logging

from socorro.middleware.service import DataAPIService

logger = logging.getLogger("webapi")

class MyService(DataAPIService):

service_name = "my_service" # Name of the submodule to look for in externaluri = "/my/service/(.*)" # URL of the service

def __init__(self, config):super(MyService, self).__init__(config)logger.debug(’MyService service __init__’)

def get(self, *args):# Parse parameters of the URLparams = self.parse_query_string(args[0])

# Find the implementation module in external depending on the configurationmodule = self.get_module(params)

# Instantiate the implementation classimpl = module.MyService(config=self.context)

# Call and return the result of the implementation methodreturn impl.mymethod(**params)

uri is the URL pattern you want to match. It is a regular expression, and the content of each part ((.*)) will be inargs.

service_name will be used to find the corresponding implementation resource. It has to match the filename of themodule you need.

If you want to add mandatory parameters, modify the URI and values will be passed in args.



Use external resources

The socorro.external contains everything related to outer resources like databases. Each submodule has a baseclass and classes for specific functionalities. If the function you need for your service is not already in there, you createa new file and a new class to implement it. To do so, follow this pattern:

from socorro.external.myresource.base import MyResourceBase

class MyModule(MyResourceBase):

def __init__(self, *args, **kwargs):super(MyModule, self).__init__(*args, **kwargs)

def my_method(self, **kwargs):do_stuff()return my_json_result

One of the things that you will want to do is filtering arguments and giving them default values. There is a functionto do that in socorro.lib.external_common that is called parse_arguments. The documentation of thatfunction says:

Return a dict of parameters.

Take a list of filters and for each try to get the correspondingvalue in arguments or a default value. Then check that value’s type.

Example:filters = [

("param1", "default", ["list", "str"]),("param2", None, "int"),("param3", ["list", "of", 4, "values"], ["list", "str"])

]arguments = {

"param1": "value1","unknown": 12345

}=>{

"param1": ["value1"],"param2": 0,"param3": ["list", "of", "4", "values"]

}

Here is an example of how to use this:

class Products(PostgreSQLBase):def versions_info(self, **kwargs):

# Parse argumentsfilters = [

("product", "Firefox", "str"),("versions", None, ["list", "str"])

]params = external_common.parse_arguments(filters, kwargs)

params.product # "Firefox" by default or a stringparams.versions # [] by default or a list of strings



Configuration

Finally add your service to the list of running services in scripts/config/webapiconfig.py.dist as follow:

import socorro.middleware.search_service as searchimport socorro.middleware.myservice_service as myservice # add

servicesList = cm.Option()servicesList.doc = ’a python list of classes to offer as services’servicesList.default = [myservice.MyService, search.Search, (...)] # add

You can also add a config key for the implementation of your service. If you don’t, your service will use the defaultconfig key (serviceImplementationModule). To add a specific configuration key:

# MyService service configmyserviceImplementationModule = cm.Option()myserviceImplementationModule.doc = "String, name of the module myservice uses."myserviceImplementationModule.default = ’socorro.external.elasticsearch’ # for example

Then restart Apache and you should be good to go! If you’re using a Vagrant VM, you can hit the middleware directlyby calling http://socorro-api/bpapi/myservice/params/.

And then?

Once you are done creating your service in the middleware, you might want to use it in the WebApp. If so, have a lookat Socorro UI.

You might also want to document it. We are keeping track of all existing services’ documentation in our MiddlewareAPI page. Please add yours!

Writing a PostgreSQL middleware unit test

First create your new test file in the appropriate localtion as specified above, for example so-corro/unittest/external/postgresql/test_myservice.py

Next you want to import the following:

from socorro.external.postgresql.myservice import MyServiceimport socorro.unittest.testlib.util as testutil

As this is a PostgreSQL service unit test we also add:

from .unittestbase import PostgreSQLTestCase

Next item to add is your setup_module function, below is a barebones version that would be sufficient for most tests:

#------------------------------------------------------------------------------def setup_module():

testutil.nosePrintModule(__file__)

Next is the setup function in which you create and populate your dummy table(s)

#==============================================================================class TestMyService(PostgreSQLTestCase):

#--------------------------------------------------------------------------def setUp(self):


http://socorro-api/bpapi/myservice/params/


super(TestMyService, self).setUp()

cursor = self.connection.cursor()

#Create tablecursor.execute("""

CREATE TABLE product_info(

product_version_id integer not null,product_name citext,version_string citext,

);""")

# Insert datacursor.execute("""

INSERT INTO product_info VALUES(

1,’%s’,’%s’

);""" % ("Firefox", "8.0"))

self.connection.commit()

For your test table(s) you can include as many, or as few, columns and rows of data as your tests will require. Nextwe add the tearDown function that will clean up after our tests has run, by dropping tables we created in the setUpfunction.

#--------------------------------------------------------------------------def tearDown(self):

""" Cleanup the database, delete tables and functions """cursor = self.connection.cursor()cursor.execute("""

DROP TABLE product_info;""")self.connection.commit()super(TestProducts, self).tearDown()

Next, we write our actual tests against the dummy data we created in setUp. First step is to create an instance of theclass we are going to test:

#--------------------------------------------------------------------------def test_get(self):

products = Products(config=self.config)

Next we write our first test passing the parameters to our function it expects:

#......................................................................# Test 1: find one exact match for one product and one versionparams = {

"versions": "Firefox:8.0"}

Next we call our function passing the above parameters:

res = products.get_versions(**params)



The above will now return a response that we need to test and determine whether it contains what we expect. In orderto do this we create our expected response:

res_expected = {"hits": [

{"product_version_id": 1,"product_name": "Firefox","version_string": "8.0"

}],"total": 1

}

And finally we call the assertEquals function to test whether our response matches our expected response:

self.assertEqual(res, res_expected)

Running a PostgreSQL middleware unit test

If you have not already done so, install nose tests. From the commons line run the command:

sudo apt-get install python-nose

Once the installation completes change directory to, socorro/unittest/config/ and run the following:

cp commonconfig.py.dist commonconfig.py

Now you can open up the file and edit it’s contents to match your testing environment. If you are running this in a VMvia Socorro Vagrant, you can leave the content of the file as is. Next cd into socorro/unittest. To run all of the unittests, run the following:

nosetests

When writing a new test you most likely are more interested in running your own, and just your own, in-stead of running all of the unit tests that form part of Socorro. If your test is located in, for exampleunittest/external/postgresql/test_myservice.py then you can run your test as follows:

nosetests socorro.external.postgresql.test_myservice

Ensuring good style

To ensure that the Python code you wrote passes PEP8 you need to run check.py. To do this your first step is to installit. From the terminal run:

pip install -e git://github.com/jbalogh/check.git#egg=check

P.S. You may need to sudo the command above

Once installed, run the following:

check.py /path/to/your/file

12.2.4 How to Review a Pull Request

Part of our job as developers is to review and provide feedback on what our colleagues do. The goal of this process isto:



• test that a new feature works as expected

• make sure the code is clean

• make sure the code doesn’t break anything

Here are several steps you can follow when reviewing a pull request. Depending on the size of that pull request, youmight want to skip some phases.

Read the code

The first task when reviewing is to read the code and verify that it is coherent and clean. Try to understand thealgorithm and its goal, make sure that it is what was asked in the related bug. When there is something that you findnon-trivial and that is not documented, ask for a doc-string or an inline comment so it becomes easier for others tounderstand the code.

Pull the code into your local environment

To go on testing, you will need to have the code in your local environment. Let’s say you want to test the branchmy-dev-branch of rhelmer’s git repository. Here is one method to get the content of that remote branch into yourrepo:

git remote add rhelmer https://github.com/rhelmer/socorro.git # the first time onlygit fetch rhelmer my-dev-branch:my-dev-branchgit checkout my-dev-branch

Once you are in that branch, you can actually test the code or run tools on it.

Use a code quality tool

Running a code quality tool is a good and easy way to find coding and styling problems. For Python, we usecheck.py (check by jbalogh on github). This tool will run pyflakes on a file or a folder, and will then checkthat PEP 8 is respected.

To install check.py, run the following command:

pip install -e git://github.com/jbalogh/check.git#egg=check

For JavaScript, we suggest that you use JSHint. There are also a lot of tools for PHP, you can choose one you like.

For HTML and CSS files, please use the tools from the W3C: CSS Validator and HTML Validator.

Run the unit tests

Socorro has a growing number of unit tests that are very helpful at verifying nothing breaks. Before approving andmerging a pull request, you should run all unit tests to make sure they still pass.

Note that those unit tests will be run when the pull request is merged, but it is easier to fix something before it landson master than after.

To run the unit tests in a Vagrant VM, do the following:

make test

This installs all the dependencies needed and run all the tests. You need to have a running PostgreSQL instance forthis to work, with a specific config file for the tests in socorro/unittest/config/commonconfig.py.

For further documentation on unit tests, please read Unit Testing.


https://github.com/jbalogh/check

http://pypi.python.org/pypi/pyflakes

http://www.jshint.com/

http://jigsaw.w3.org/css-validator/

http://validator.w3.org/


Test manually

This is not always possible in a local environment, but when it is you should make sure the new code behave asexpected. Read applychanges-label.

Test before

This is a process to verify that one’s work is good and can go into master with little risk of breaking something.However, the developer is responsible for his or her bug and that review process doesn’t mean he or she shouldn’t gothrough all these steps. The reviewer is here to make sure the developer didn’t miss something, but it’s easier to fixsomething before a review process than after. Please test your code before opening a pull request!

12.3 Glossary

Build: a date encoding used to identify when a client was compiled. (submission metadata)

Crash Report Details Page - A crash stats page displaying all known details of a crash

Crash Dump/Metadata pair - shorthand for The pair of Raw Crash Dump and corresponding Raw Crash Metadata

Deferred Job Storage: a file system location where Crash Dump/Metadata pair are kept without being processed.

Dump File: See Raw Crash Dump, don’t use this term it makes me giggle

Job: a job queue item for a Raw Crash Dump that needs to be processed

JSON Dump Storage: the Python module that implements File System

Materialized view: the tables in the database containing the data for used in statistical analysis. Including: [[Mean-TimeBeforeFailure]], Top Crashers By Signature, Top Crashers By URL. The “Trend Reports” from the Socorro UIdisplay information from these tables.

Minidump: see ‘raw crash dump’

Minidump_stackwalk: an application from the Breakpad project that takes a raw dump file, marries it with symbolsand produces output usable by developers. This application is invoked by Processor.

Monitor: the Socorro application in charge of queuing jobs. See Monitor

OOID: A crash report ID. Originally a 32bit value, the original legacy system stored it in the database as a hexidecimaltext form. Each crash is assigned an OOID by the Collector when the crash is recieved.

Platform: the OS that a client runs on. This term has been historically a point of confusion and it is preferred that theterm OS or Client OS be used instead.

Processed Dump Storage: the disk location where the output files of the minidump_stackwalk program are stored.The actual files are stored with a .jsonz extension.

Processor: the Socorro application in charge of applying minidump_stackwalk to queued jobs. See Processor

Raw Crash Dump, Raw Dump: the data sent from a client to Socorro containing the state of the application at thetime of failure. It is paired with a Raw Crash Metadata file.

Raw Crash Metadata - the metadata sent from a client to Socorro to describe the Raw Crash. It is saved in JSONformat, not to be confused with a Cooked Crash Dump.

Raw JSON file: See Crash Dump Metadata... a file in the JSON format containing metadata about a ‘dump file’.Saved with a ‘.json’ suffix.

Release: a categorization of an application’s product name and version. The categories are: “major”, “milestone”, or“development”. Within the database, an enum called ReleaseEnum? represents these categories.



Reporter: another name for the Socorro UI

Skip List: lists of signature regular expressions used in generating a crash’s overall signature in the Processor. seeSignature Generation

Standard Job Storage: a file system location where JSON/dump pairs are kept for processing

Throttling: statistically, we don’t have to save every single crash. This option of the Collector configuration allowsus to selectively throw away dumps.

Trend Reports: the pages in the Socorro UI that display the data from the materialized views.

UUID: a univeral unique identifier. Term is being deprecated in favor of OOID.

Web head: a machine that runs Collector

12.3.1 Deferred Job Storage

Deferred storage is where the JSON/dump pairs are saved if they’ve been filtered out by Collector throttling. Thelocation of the deferred job storage is determined by the configuration parameter deferredStorageRoot found in theCommon Config.

JSON/dump pairs that are saved in deferred storage are not likely to ever be processed further. They are held for aconfigurable number of days until deleted by Deferred Cleanup.

Occasionally, a developer will request a report via Reporter on a job that was saved in deferred storage. Monitor willlook for the job in deferred storage if it cannot find it in standard storage.

For more information on the storage technique, see File System

12.3.2 JSON Dump Storage

What this system offers

Crash data is stored so that it can be quickly located based on a Universally Unique Identifier (uuid) or visited by thedate and time when reported

Directory Structure

The crash files are located in a tree with two branches: the name or “index” branch and the date branch.

• The name branch consists of paths based on the first few pairs of characters of the uuid. Name branch holds the two data files and a relative symbolic link to the date branch directory associated with the particular uuid. For the uuid: 22adfb61-f75b-11dc-b6be-001321b0783d The “depth” is the number of sub-directories between the name directory and the actual file. By default, to conserve inodes, depth is two.

– By default, the json file is stored (depth 2) as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.json

– The json file could be stored (depth 4) as %(root)s/name/22/ad/fb/61/22adfb61-f75b-11dc-b6be-001321b0783d.json

– The dump file is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.dump

– The symbolic link is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d and(see below) references (own location)/%(toDateFromName)s/2008/09/30/12/05/webhead01_0/

• The date branch consists of paths based on the year, month, day, hour, minute-segment, webhead host name and a small sequence number. For each uuid, it holds a relative symbolic link referring to the actual name directory holding the data for that uuid. For the uuid above, submitted at 2008-09-30T12:05 from webhead01

12.3. Glossary 71


– The symbolic link is stored as %(root)s/date/2008/09/30/12/05/webhead01_0/22adfb61-f75b-11dc-b6be-001321b0783d and references (own location)/%(toNameFromDate)s/22/ad/fb/61/

• Note (name layout) In the examples on this page, the name/index branch uses the first 4 characters of the uuidas two character-pairs naming subdirectories. This is a configurable setting called storageDepth in the Collectorconfiguration. To use the 8 characters, storageDepth is set to 4. To use 6 characters, set to 3. The defaultstorageDepth is 2 because on our system, with (approximately) 64K leaf directories, the number of files per leafis reasonable; and the number of inodes required by directory entries is not so large as to cause undue difficulty.A storageDepth of 4 was examined, and was found to crash the file system by requiring too many inodes.

• If the uuids are such that their initial few characters are well spread among all possibles, then the lookup canbe very quick. If the first few characters of the uuids are not well distributed, the resulting directories may bevery large. If, despite well chosen uuids, the leaf name directories become too large, it would be simple to addanother level, reducing the number of files by approximately a factor of 256; however bear in mind the issue ofinodes.

• Note (symbolic links) The symbolic links are relative rather than absolute, to avoid issues that might arise fromvariously mounted nfs volumes.

• Note (maxDirectoryEntries) If the number of links in a particular webhead subdirectory would exceed maxDi-rectoryEntries, then a new webhead directory is created by appending a larger _N : .../webhead01_0 first, then.../webhead01_1 etc. For the moment, maxDirectoryEntries is ignored for the name branch.

How it’s used

We use the file system storage for incoming dumps caught by Collector. There are two instances of the file systemused for different purposes: standard storage and deferred storage.

Standard Job Storage

This is where json/dump pairs are stored for further processing. The Monitor finds new dumps and queues them forprocessing. It does this by walking the date branch of the file system using the API function destructiveDateWalk. Asit moves through the date branch, it notes every uuid (in the form of a symbolic link) that it encounters. It queues theinformation from the symbolic link and then deletes the symbolic link. This insures that it only ever finds new entries.Later, the Processor will read the json/dump pair by doing a direct lookup of the uuid on the name branch.

In the case of priority processing, the target uuid is looked up directly on the name branch. Then the link to the datebranch is used to locate and delete the link on the date branch. This insures that a priority job is not found a secondtime as a new job by the Monitor.

Deferred Job Storage

This is where jobs go that are deferred by Monitor‘s throttling mechanism. If a json/dump pair is needed for priorityprocessing, it can be looked up directly on the name branch. In such a case, just as with priority jobs in standardstorage, we destroy the links between the two branches. However, in this case, destroying the links prevents thejson/dump pair from being deleted by the deferred cleanup process.

When it comes time to drop old json/dump pairs that are no longer needed within the deferred storage, the system isgiven a date threshold. It walks the appropriate parts of the date branch older than the threshold. It uses the links tothe name branch to blow away the elderly json/dump pairs.

class JsonDumpStorage

socorro.lib.JsonDumpStorage holds data and implements methods for creating and accessing crash files.



public methods

• __init__(self, root=".", maxDirectoryEntries=1024, **kwargs)

Take note of our root directory, maximum allowed date->name links per directory, some relative relations, andwhatever else we may need. Much of this (c|sh)ould be read from a config file.

Recognized keyword args:

– dateName. Default = ‘date’

– indexName. Default = ‘name’

– jsonSuffix. Default = ‘.json’. If not startswith(‘.’) then ‘.’ is prepended

– dumpSuffix. Default = ‘.dump’. If not startswith(‘.’) then ‘.’ is prepended

– dumpPermissions. Default 660

– dirPermissions. Default 770

– dumpGID. Default None. If None, then owned by the owner of the running script.

• newEntry (self, uuid, webheadHostName=’webhead01’, timestamp=DT.datetime.now())

Sets up the name and date storage for the given uuid.

– Creates any directories that it needs along the path to the appropriate storage location (possiblyadjusting ownership and mode)

– Creates two relative symbolic links:

* the date branch link pointing to the name directory holding the files;

* the name branch link pointing to the date branch directory holding that link.

– Returns a 2-tuple containing files open for writing: (jsonfile,dumpfile)

• getJson (self, uuid)

Returns an absolute pathname for the json file for a given uuid. Raises OSError if the file is missing

• getDump (self, uuid)

Returns an absolute pathname for the dump file for a given uuid. Raises OSError if the file is missing

• markAsSeen (self,uuid)

Removes the links associated with the two data files for this uuid, thus marking them as seen. Quietlyreturns if the uuid has no associated links.

• destructiveDateWalk (self)

This function is a generator that yields all(see note) uuids found by walking the date branch of thefile system.

Just before yielding a value, it deletes both the links (from date to name and from nameto date) After visiting all the uuids in a given date branch, recursively deletes any emptysubdirectories in the date branch Since the file system may be manipulated in a differentthread, if no .json or .dump file is found, the links are left, and we do not yield that uuidnote To avoid race conditions, does not visit the date subdirectory corresponding to thecurrent time

• remove (self, uuid)

Removes all instances of the uuid from the file system including the json file, the dump file, and thetwo links if they still exist.

12.3. Glossary 73


– Ignores missing link, json and dump files: You may call it with bogus data, though of courseyou should not

• move (self, uuid, newAbsolutePath)

Moves the json file then the dump file to newAbsolutePath.

– Removes associated symbolic links if they still exist.

– Raises IOError if either the json or dump file for the uuid is not found, and retains any links, butdoes not roll back the json file if the dump file is not found.

• removeOlderThan (self, timestamp)

– Walks the date branch removing all entries strictly older than the timestamp.

– Removes the corresponding entries in the name branch.

member data

Most of the member data are set in the constructor, a few are constants, the rest are simple calculations based on theothers.

• root: The directory that holds both the date and index(name) subdirectories

• maxDirectoryEntries: The maximum number of links in each webhead directory on the date branch. Default =1024

• dateName: The name of the date branch subdirectory. Default = ‘date’

• indexName: The name of the index branch subdirectory. Default = ‘name’

• jsonSuffix: the suffix of the json crash file. Default = ‘.json’

• dumpSuffix: the suffix of the dump crash file. Default = ‘.dump’

• dateBranch: The full path to the date branch

• nameBranch: The full path to the index branch

• dumpPermissions: The permissions for the crash files. Default = 660

• dirPermissions: The permissions for the directories holding crash files. Default = 770

• dumpGID: The group ID for the directories and crash files. Default: Owned by the owner of the running script.

• toNameFromDate: The relative path from a leaf of the dateBranch to the nameBranch

• toDateFromName: The relative path from a leaf of the nameBranch to the dateBranch

• minutesPerSlot: How many minutes in each sub-hour slot. Default = 5

• slotRange: A precalculated range of slot edges = range(self.minutesPerSlot, 60, self.minutesPerSlot)

12.3.3 Processed Dump Storage

Processed dumps are stored in two places: the relational database as well as in flat files within a file system. Thisforking of the storage scheme came from the realization that the infrequently used data within the database ‘dumps’tables was causing performance problems within PostgreSQL. The ‘dumps’ tables took nearly eighty percent of thetotal storage, making replication and backup problematic. Since the ‘dumps’ table’s data is used only when a userrequests a specific crash dump by uuid, most of the data is rarely, if ever, accessed.

We decided to migrate these dump into a file system storage outside the database. Details can be seen at: DumpingDump Tables



In the file system, after processing, dumps are stored a gzip compressed JSON file format. This format echos aflattening of the ‘reports’, ‘extensions’ and the now deprecated ‘dumps’ tables within the database.

Directory Structure

Just as in the JsonDumpStorage scheme, there are two branches: ‘name’ and ‘date’

Access by Name

Most lookups of processed crash data happens by name. We use a radix storage technique where the first 4 characters ofthe file name are used for two levels of directory names. A file called aabbf9cb-395b-47e8-9600-4f20e2090331.jsonzwould be found in the file system as .../aa/bb/aabbf9cb-395b-47e8-9600-4f20e2090331.jsonz

Access by Date

For the purposes of finding crashes that happened at specific date and time, a hierarchy of date directories offer quicklookup. The leaves of the date directories contain symbolic links to the locations of crash data.

JSON File Format

example:

{"signature": "nsThread::ProcessNextEvent(int, int*)","uuid": "aabbf9cb-395b-47e8-9600-4f20e2090331","date_processed": "2009-03-31 14:45:09.215601","install_age": 100113,"uptime": 7,"last_crash": 95113,"product": "SomeProduct","version": "3.5.2","build_id": "20090223121634","branch": "1.9.1","os_name": "Mac OS X","os_version": "10.5.6 9G55","cpu_name": "x86","cpu_info": "GenuineIntel family 6 model 15 stepping 6","crash_reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS","crash_address": "0xe9b246","User Comments": "This thing crashed.\nHelp me Kirk.","app_notes": "","success": true,"truncated": false,"processor_notes": "","distributor":"","distributor_version": "","add-ons": [["{ABDE892B-13A8-4d1b-88E6-365A6E755758}", "1.0"], ["{b2e293ee-fd7e-4c71-a714-5f4750d8d7b7}", "2.2.0.9"], ["{972ce4c6-7e08-4474-a285-3208198ce6fd}", "3.5.2"]],"dump":"OS|Mac OS X|10.5.6 9G55\\nCPU|x86|GenuineIntel family 6 model 15 stepping 6|2\\nCrash|EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE|0x1558c095|0\\nModule|firefox-bin||firefox-bin|988FA8BFC789C4C07C32D61867BB42B60|0x00001000|0x00001fff|\\n....."}

The “dump” component is the direct streamed output from the Breakpad “minidump_stackwalk” program. Unfortu-nately, that project does not give detailed documentation of the format.

12.3. Glossary 75


12.3.4 Standard Job Storage

Standard storage is where the JSON/dump pairs are saved while they wait for processing. The location of the standardstorage is determined by the configuration parameter storageRoot found in the Common Config.

The file system is divided into two parts: date based storage and name based storage. Both branches use a radix sortbreakdown to locate files. The original version of Socorro used only the date based storage, but it was found to be tooslow to search when under a heavy load.

For a deeper discussion of the storage technique: see File System

12.3.5 Top Crashers By URL

Introduction

The Top Crashers By Url report displays aggregate crash counts by unique urls or by unique domains. From here onecan drill down to crash signatures. For crashes with comments, we display the comment in a link to the individualcrash. In the future, signatures will be linked to search results, once we support url/domain as a search parameter.

Details

Data Definitions

Urls - This is everything before the query string. Domains - This is the entire hostname.

Examples:

http://www.example.com/page.html?foo=bar

• url - http://www.example.com/page.html

• domain - www.example.com

chrome://example/content/extension.xul

• url - chrome://example/content/extension.xul

• domain - example

about:config

invalid, no protocol

Filtering

For a crash report to be counted it much have the following:

• A url which is not null or empty and which has a protocol

• Aggregates are calculated 1 day at a time for the previous day

• At the level of aggregation, it must have more than 1 record

Crash data viewed from the url perspective is a very long tail of crashes for a single unique url. We cut off this tailwhich reduces data storage and processing time by an order of magnitude.

A consequence of this filtering (only good urls + multiple crashes) makes the total crash aggregates much lower thantop crashers or raw queries. Keep this in mind when using aggregates: Top crashers (by os) is a much better gauge.


http://www.example.com/page.html


Administration

Configuring new products

The Top Crashers By URl report is powered by the tcbyurlconfig and productdims tables.

1. Make sure your product is in the productdims table

(a) If not, insert it. The following sets up a specific version of a specific product for all, win, and macplatforms.:

INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’ALL’,’major’);INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’Win’,’major’);INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’Mac’,’major’);

2. Insert a config entry for the exact product you want to report on. usually this is os_name = ALL.:

INSERT INTO tcbyurlconfig (productdims_id, enabled)SELECT id, ’Y’ FROM productdims WHERE product = ’Firefox’ AND version = ’3.0.4’ AND os_name = ’ALL’;

3. wait for results

4. reap the profit.

Suspending Reports

Table tcbyurlconfig has an ‘enabled’ column. Set it false to stop the cron from updating the reports for a particularproduct.

Mozilla Specific

Make sure to match up the release type. versions with pre are milestone. Versions with a or b in them are development.

Operations

This report is populated by a cron python script which runs at 10:00 PM PST. The run is controlled by configurationdata from a table in the database. All products which are enabled in this config table will have their daily reportgenerated.

In future this will be managed via an admin page, but currently it is managed via SQL.

Development

Details about the database design are in Report Database Design

12.3.6 Top Crashers By Signature

Introduction

Topcrashers By Signature compiles the 14 days’ worth of crash reports (organized by signature) for a given version.This report is useful for finding new topcrashes, determining if topcrashes have been filed, and seeing trending oftopcrashes over time (for a specific version).

12.3. Glossary 77


Details

For the ideal topcrashers by signature report, we want to gather the following data:

• crashes by version (e.g., Firefox 3.0.9)

• date a crash occurred (to know if it’s within our window)

• stack signature

• average uptime (since last browser start) averaged over window

• bug numbers related to crash signature

Additionally, we need the ability to either a) go back in time or b) “freeze” the topcrashers by signature report ona specific day. This allows us to compare, say, the last day of a release to the newest release (e.g., Firefox 3.0.8 toFirefox 3.0.9). Without the ability to go back to a specific day of topcrash reports or freeze topcrash reports, we haveno easy ability to compare releases (as new crashes come in for old releases, the topcrash list changes substantially).

Ideal Outputs

(to be filled)

See [[SocorroUIInstallation]] for additional details.

Operations

• Need a recalculation every 4 to 6 hours

• Need top 500 signatures, ranked over last 14 days

• Note that this implies for the database that each slice is aggregated from the full window (which slides forwardeach time)

12.3.7 Signature Generation

Introduction

The Processor creates an overall signature for a crash based on the signatures of the stack frame of the crashing thread.It walks the stack from the frame with the lowest number (the top of the stack) applying rules and accumulating a listof signatures found to be relevant. Once the rules are done, the list of signatures is concatenated into a single string.That single string become the crash’s overall signature.

Normalization

Before any frame signatures are considered, they are normalized. This is just a string formating change. Runs ofspaces are compressed to just one space. Commas are insured to always be followed by a space, integer values arereplaced by ‘int’. Signatures that match the signaturesWithLineNumbersRegEx regular expression are combined withtheir source code line. Frames that have no function information are written as sourcecode/line number pairs. If nosource code is available, it tries to find a module/address pair. Failing that, it falls back to just an address.

The SkipList Rules

The signature is generated by walking through each stack frame considering its ‘name’ (as normalized above). Frames/ names are skipped or added to the signature list according to the rules. When a signature list is complete, it is



converted to string by concatenating the frame names with spaces and a vertical bar between each name, for exam-ple: objc_msgSend | IdleTimerVector is the signature for a stack that contained (irrelevant frames), “objc_msgSend”,“IdleTimerVector” which matched neither prefix nor irrelevant regular expressions and possibly other frames whichdid not become part of the signature.

regular expressions

Each SkipList rule is a regular expression. Typically, it takes the form of an alternation of frame names, but any legalregular expression can be used. Regular expression alternation syntax is a|b|c: Match on ‘a’ or ‘b’ or ‘c’. This work isdone in Python, so use Python Regular Expression Syntax

signatureSentinels

A typical rule might be: “_purecall”.

This is the first rule to be applied. The code iterates through the stack frame, throwing away everything it finds until itencounters a match to this regular expression or the end of the stack. If it finds a match, it passes all the frames afterthe match to the next step. If it finds no match, it passes the whole list of frames to the next step.

irrelevantSignatureRegEx

A typical rule might be: “@0x0-9a-fA-F{2,}|@0x1-9a-fA-F|RaiseException|CxxThrowException”.

A frame which matches this regular expression will be appended to the signature only if a prefix frame has alreadybeen seen (see next rule).

prefixSignatureRegEx

A typical rule might be “@0x0|strchr|strstr|strlen|PL_strlen|strcmp|wcslen|memcpy|memmove|memcmp|malloc|realloc|objc_msgSend”,though at Mozilla it has grown much longer.

This is the rule that generates compound signatures. A frame that matches this regular expression changes the stateof the machine to ‘seen prefix’. In ‘seen prefix’ state, irrelevant or prefix frames are appended. As soon as a frame isneither, it is appended and the signature list is complete.

Once the signature list is complete, the signature is generated as mentioned above

12.3.8 Crash Mover

The Collector dumps all the crashes that it receives into the local file system. This application is responsible fortransferring those crashes into hbase.

Configuration:

import statimport socorro.lib.ConfigurationManager as cm

#-------------------------------------------------------------------------------# general

numberOfThreads = cm.Option()numberOfThreads.doc = ’the number of threads to use’numberOfThreads.default = 4

12.3. Glossary 79


#-------------------------------------------------------------------------------# source storage

sourceStorageClass = cm.Option()sourceStorageClass.doc = ’the fully qualified name of the source storage class’sourceStorageClass.default = ’socorro.storage.crashstorage.CrashStorageSystemForLocalFS’sourceStorageClass.fromStringConverter = cm.classConverter

from config.collectorconfig import localFSfrom config.collectorconfig import localFSDumpDirCountfrom config.collectorconfig import localFSDumpGIDfrom config.collectorconfig import localFSDumpPermissionsfrom config.collectorconfig import localFSDirPermissionsfrom config.collectorconfig import fallbackFSfrom config.collectorconfig import fallbackDumpDirCountfrom config.collectorconfig import fallbackDumpGIDfrom config.collectorconfig import fallbackDumpPermissionsfrom config.collectorconfig import fallbackDirPermissions

from config.commonconfig import jsonFileSuffixfrom config.commonconfig import dumpFileSuffix

#-------------------------------------------------------------------------------# destination storage

destinationStorageClass = cm.Option()destinationStorageClass.doc = ’the fully qualified name of the source storage class’destinationStorageClass.default = ’socorro.storage.crashstorage.CrashStorageSystemForHBase’destinationStorageClass.fromStringConverter = cm.classConverter

from config.commonconfig import hbaseHostfrom config.commonconfig import hbasePortfrom config.commonconfig import hbaseTimeout

#-------------------------------------------------------------------------------# logging

syslogHost = cm.Option()syslogHost.doc = ’syslog hostname’syslogHost.default = ’localhost’

syslogPort = cm.Option()syslogPort.doc = ’syslog port’syslogPort.default = 514

syslogFacilityString = cm.Option()syslogFacilityString.doc = ’syslog facility string ("user", "local0", etc)’syslogFacilityString.default = ’user’

syslogLineFormatString = cm.Option()syslogLineFormatString.doc = ’python logging system format for syslog entries’syslogLineFormatString.default = ’Socorro Storage Mover (pid %(process)d): %(asctime)s %(levelname)s - %(threadName)s - %(message)s’

syslogErrorLoggingLevel = cm.Option()syslogErrorLoggingLevel.doc = ’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’syslogErrorLoggingLevel.default = 10

stderrLineFormatString = cm.Option()



stderrLineFormatString.doc = ’python logging system format for logging to stderr’stderrLineFormatString.default = ’%(asctime)s %(levelname)s - %(threadName)s - %(message)s’

stderrErrorLoggingLevel = cm.Option()stderrErrorLoggingLevel.doc = ’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’stderrErrorLoggingLevel.default = 10

12.3.9 Collector

Collector is an application that runs under Apache using mod-python. Its task is accepting crash reports from remoteclients and saving them in a place and format usable by further applications.

Raw crashes are accepted via HTTP POST. The form data from the POST is then arranged into a JSON and saved intothe local file system. The collector is responsible for assigning an ooid? (Our Own ID) to the crash. It also assigns aThrottle? value which determines if the crash is eventually to go into the relational database.

Should the saving to a local file system fail, there is a fallback storage mechanism. A second file system can beconfigured to take the failed saves. This file system would likely be an NFS mounted file system.

After a crash is saved, there is an app called Crash Mover that will transfer the crashes to HBase.

Collector Python Configuration

Like all the Socorro applications, the configuration is actually executable Python code. Two configuration files arerelevant for collector

• Copy .../scripts/config/commonconfig.py.dist to .../config/commonconfig.py. This configura-tion file contains constants used by many of the Socorro applications.

• Copy .../scripts/config/collectorconfig.py.dist to .../config/collectorconfig.py

Common Configuration

There are two constants in ‘.../scripts/config/commonconfig.py’ of interest to collector: jsonFileSuffix, and dumpFile-Suffix. Other constants in this file are ignored.

To setup the common configuration, see Common Config.

Collector Configuration

collectorconfig.py has several options to adjust how files are stored:


12.3.10 Reporter

Deprecated.

See :ref:‘uiinstallation-chapter‘_

12.3. Glossary 81

https://github.com/mozilla/socorro/blob/master/scripts/config/collectorconfig.py.dist


12.3.11 Monitor

Monitor is a multithreaded application with several mandates. It’s main job is to find new JSON/dump pairs and queuethem for further processing. It looks for new JSON/dump pairs in the file system location designated by the constantstorageRoot from the Common Config file. Once it finds a pair, it queues them as a “job” in the database ‘jobs’ tableand assigns it to a specific processor. Once queued, the monitor goes on to find other new jobs to queue.

Monitor also locates and queues priority jobs. If a user requests a report via the Reporter and that crash report hasnot yet been processed, the Reporter puts the requested crash’s UUID into the database’s ‘priorityjobs’ table. Monitorlooks in three places for the requested job:

• the processors - if monitor finds the job already assigned to a processor, it raises the priority of that job so theprocessor will do it quickly

• the storageRoot file system - if the job is found here, it queues it for priority processing immediately rather thanwaiting for standard mechanism to eventually find it

• the deferredStorageRoot file system - if the requested crash was filtered out by server side throttling, monitorwill find it and queue it immediately from that location.

Monitor is also responsible for keeping the StandardJobStorage file system neat and tidy. It monitors the ‘jobs’ queuein the database. Once it sees that a previously queued job has been completed, it moves the JSON/dump pairs to longterm storage or it deletes them (based on a configuration setting). Jobs that fail their further processing stage are alsoeither saved in a “failed” storage area or deleted.

Monitor is a command line application meant to be run continuously as a daemon. It can log its actions to stderr and/orto automatically rotating log files. See the configuration options below beginning with stderr* and logFile* for moreinformation.

The monitor app is found as .../scripts/monitor.py In order to run monitor, the socorro package must bevisible somewhere on the python path.

Configuration

Monitor, like all the Socorro applications, uses the common configuration for several of its constants. For setup ofcommon configuration, see Common Config.

monitor also has an executable configuration file of its own. A sample file is found at.../scripts/config/monitorconfig.py.dist. Copy this file to .../scripts/config/monitorconfig.pyand edit it for site specific settings.

In each case where a site specific value is desired, replace the value for the .default member.

standardLoopDelay

Monitor has to scan the StandardJobStorage looking for jobs. This value represents the delay between scans.:

standardLoopDelay = cm.Option()standardLoopDelay.doc = ’the time between scans for jobs (HHH:MM:SS)’standardLoopDelay.default = ’00:05:00’standardLoopDelay.fromStringConverter = cm.timeDeltaConverter

cleanupJobsLoopDelay

Monitor archives or deletes JSON/dump pairs from the StandardJobStorageThis? value represents the delay betweenruns of the archive/delete routines.:

cleanupJobsLoopDelay = cm.Option()cleanupJobsLoopDelay.doc = ’the time between runs of the job clean up routines (HHH:MM:SS)’cleanupJobsLoopDelay.default = ’00:05:00’cleanupJobsLoopDelay.fromStringConverter = cm.timeDeltaConverter



priorityLoopDelay

The frequency to look for priority jobs.:

priorityLoopDelay = cm.Option()priorityLoopDelay.doc = ’the time between checks for priority jobs (HHH:MM:SS)’priorityLoopDelay.default = ’00:01:00’priorityLoopDelay.fromStringConverter = cm.timeDeltaConverter

saveSuccessfulMinidumpsTo:

saveSuccessfulMinidumpsTo = cm.Option()saveSuccessfulMinidumpsTo.doc = ’the location for saving successfully processed dumps (leave blank to delete them instead)’saveSuccessfulMinidumpsTo.default = ’/tmp/socorro-sucessful’

saveFailedMinidumpsTo:

saveFailedMinidumpsTo = cm.Option()saveFailedMinidumpsTo.doc = ’the location for saving dumps that failed processing (leave blank to delete them instead)’saveSuccessfulMinidumpsTo.default = ’/tmp/socorro-failed’

logFilePathname

Monitor can log its actions to a set of automatically rotating log files. This is the name and location of the logs.:

logFilePathname = cm.Option()logFilePathname.doc = ’full pathname for the log file’logFilePathname.default = ’./monitor.log’

logFileMaximumSize

This is the maximum size in bytes allowed for a log file. Once this number is achieved, the logs rotate and a new logis started.:

logFileMaximumSize = cm.Option()logFileMaximumSize.doc = ’maximum size in bytes of the log file’logFileMaximumSize.default = 1000000

logFileMaximumBackupHistory

The maximum number of log files to keep.:

logFileMaximumBackupHistory = cm.Option()logFileMaximumBackupHistory.doc = ’maximum number of log files to keep’logFileMaximumBackupHistory.default = 50

logFileLineFormatString

A Python format string that controls the format of individual lines in the logs:

logFileLineFormatString = cm.Option()logFileLineFormatString.doc = ’python logging system format for log file entries’logFileLineFormatString.default = ’%(asctime)s %(levelname)s - %(message)s’

logFileErrorLoggingLevel

Logging is done in severity levels - the lower the number, the more verbose the logs.:

logFileErrorLoggingLevel = cm.Option()logFileErrorLoggingLevel.doc = ’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’logFileErrorLoggingLevel.default = 10

12.3. Glossary 83


stderrLineFormatString

In parallel with creating log files, Monitor can log to stderr. This is a Python format string that controls the format ofindividual lines sent to stderr.:

stderrLineFormatString = cm.Option()stderrLineFormatString.doc = ’python logging system format for logging to stderr’stderrLineFormatString.default = ’%(asctime)s %(levelname)s - %(message)s’

stderrErrorLoggingLevel

Logging to stderr is done in severity levels independently from the log file severity levels - the lower the number, themore verbose the output to stderr.:


12.3.12 File System

Socorro uses two similar file system storage schemes in two distinct places within the system. Raw crash dumps fromthe field use a system called JSON Dump Storage while at the other end, processed dumps use the Processed DumpStorage scheme.

12.3.13 Deferred Cleanup

When the Collector throttles the flow of crash dumps, it saves deferred crashes into Deferred Job Storage. TheseJSON/dump pairs will live in deferred storage for a configurable number of days. It is the task of the deferred cleanupapplication to implement the policy to delete old crash dumps.

The deferred cleanup application is a command line app meant to be run via as a cron job. It should be set to run onceevery twenty-four hours.

Configuration

deferredcleanup uses the common configuration for to get the constant deferredStorageRoot. For setup of commonconfiguration, see Common Config.

deferredcleanup also has an executable configuration file of its own. A sample file isfound at .../scripts/config/deferredcleanupconfig.py.dist. Copy this file to.../scripts/config/deferredcleanupconfig.py and edit it for site specific settings.

In each case where a site specific value is desired, replace the value for the .default member.

maximumDeferredJobAge

This constant specifies how many days deferred jobs are allowed to stay in deferred storage. Job deletion is permanent.:

maximumDeferredJobAge = cm.Option()maximumDeferredJobAge.doc = ’the maximum number of days that deferred jobs stick around’maximumDeferredJobAge.default = 2

dryRun

Used during testing and development, this prevents deferredcleanup from actually deleting things.:



dryRun = cm.Option()dryRun.doc = "don’t really delete anything"dryRun.default = FalsedryRun.fromStringConverter = cm.booleanConverter

logFilePathname

Deferredcleanup can log its actions to a set of automatically rotating log files. This is the name and location of thelogs.:

logFilePathname = cm.Option()logFilePathname.doc = ’full pathname for the log file’logFilePathname.default = ’./processor.log’

logFileMaximumSize

















12.3. Glossary 85



12.4 Standalone Development Environment

You can easily bring up a full Socorro VM, see Setup a development environment for more info.

However, in some cases it can make sense to run components standalone in a development environment, for exampleif you want to run just one or two components and connect them to an existing Socorro install for debugging.

12.4.1 Setting up

1) clone the repo (http://github.com/mozilla/socorro)

git clone git://github.com/mozilla/socorro.gitcd socorro/

2) set up Python path

export PYTHONPATH=.:thirdparty/

3) create virtualenv and use it (this installs all needed Socorro dependencies)

make virtualenv. socorro-virtualenv/bin/activate

4) copy default Socorro config (also see Common Config)

pushd scripts/configfor file in *.py.dist; do cp $file ‘basename $file .dist‘; doneedit commonconfig.py (...)popd

12.4.2 Install and configure UI

1) symlink webapp-php/ to HTDOCS area

mv ~/public_html ~/public_html.oldln -s ./webapp-php ~/public_html

2) copy default webapp config (also see UI Installation)

cp htaccess-dist .htaccesspushd webapp-php/application/config/for file in *.php-dist; do cp $file ‘basename $file -dist‘; doneedit database.php config.php (...)popd

3) make sure log area is writable to webserver user

chmod o+rwx webapp-php/application/logs


http://github.com/mozilla/socorro


12.4.3 Launch standalone Middleware instance

Edit scripts/config/webapiconfig.py and change wsgiInstallation to False (this allows the middleware to run in stan-dalone mode):

wsgiInstallation.default = False

NOTE - make sure to use an unused port, it should be the same as whatever you configure in webapp-php/application/config/webserviceclient.php

python scripts/webservices.py 9191

This will use whichever database you configured in commonconfig.py

12.5 Unit Testing

There are (some, and a growing number of) unit tests for the Socorro code

12.5.1 How to Unit Test

• configure your test environment (see below)

• install nosetests

• cd to socorro/unittests

• chant nosetests and observe the result

– You should expect more than 185 tests (186 as of 2009-03-25)

– You should see exactly two failures (unless you are running as root), with this assertion: Assertion-Error: You must run this test as root (don’t forget root’s PYTHONPATH):

ERROR: testCopyFromGid (socorro.unittest.lib.testJsonDumpStorageGid.TestJsonDumpStorageGid)ERROR: testNewEntryGid (socorro.unittest.lib.testJsonDumpStorageGid.TestJsonDumpStorageGid)

• You may ‘observe’ the result by chanting nosetests > test.out 2>&1 and then examining test.out (orany name you prefer)

• There is a bash shell file: socorro/unittest/red which may sourced to provide a bash function red that simplifieswatching test logfiles in a separate terminal window. In that window, cd to the unittest sub-directory of interest,then source the file: . ../red, then chant red. The effect is to clear the screen, then tail -F the logfile associatedwith tests in that directory. You may chant red –help to be reminded.

• The red file also provides a function noseErrors which simplifies the examination of nosetests output. ChantnoseErrors –help for a brief summary.

12.5.2 How to write Unit Tests

Nose provides some nice tools. Some of the tests require nose and nosetests (or a tool that mimics its behavior)However, it is also quite possible to use Python’s unittest. No tutorial here. Instead, take a look at an existing test fileand do something usefully similar.

12.5. Unit Testing 87


12.5.3 Where to write Unit Tests

To maintain the current test layout, note that for every directory under socorro, there is a same-name directory undersocorro/unittest where the test code for the working directory should be placed. In addition, there is unittest/testlibthat holds a library of useful testing code as well as some tests for that library.

If you add a unittest subdirectory holding new tests, you must also provide init.py which may be empty, or nosetestswill not enter the directory looking for tests.

12.5.4 How to configure your test environment

• You must have a working postgresql installation see Installation for version. It need not be locally hosted,though if not, please be careful about username and password for the test user. Also be careful not to step on aworking database: The test cleanup code drops tables.

• You must either provide for a postgreql account with name and password that matches the config fileor edit the test config file to provide an appropriate test account and password. That file is so-corro/unittest/config/commonconfig.py. If you add a new test config file that needs database access, you shouldimport the details from commonconfig, as exemplified in the existing config files.

• You must provide a a database appropriate for the test user (default: test. That database must support PLPGSQL.As the owner of the test database, while connected to that database, invoke CREATE LANGUAGE PLPGSQL;

• You must have installed nose and nosetests; nosetests should be on your PATH and the nose code/egg should beon your PYTHONPATH

• You must have installed the psycopg2 python module

• You must adjust your PYTHONPATH to include the directory holding soccoro. E.g if you have in-stalled socorro at /home/tester/Mozilla/socorro then your PYTHONPATH should look like...:/home/tester/Mozilla:/home/tester/Mozilla/thirdparty:...

12.6 Crash Repro Filtering Report

12.6.1 Introduction

This page describes a report that assists in analyzing crash data for a stack signature in order to try and reproduce acrash and develop a reproducible test case.

12.6.2 Details

for each release pull a data set of one weeks worth of data ranked by signature like:

http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.0.10&date=&range_value=7&range_unit=days&query_search=signature&query_type=contains&query=

the provide a list like this with several fields of interest for examing the data

Date Product Version Build OS CPU Reason Address Uptime Comments

but also need to add urls into the version of this report that is behind auth. “reason” is not so helpful to me at this stage,but others can weigh in on the idea of removing it.

maybe just make it include all these or allow users to pick the fields it shows like bugzilla does?

Signature,Crash Address,UUIDProduct,Version,Build,OS,Time,Uptime,Last Crash,URL,User Comments

anyway, get something close to what we have now in “Crash Reports in PR_MD_SEND”


http://code.google.com/p/python-nose/

http://initd.org/

http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.0.10&date=&range_value=7&range_unit=days&query_search=signature&query_type=contains&query=


http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.0.10&query_search=signature&query_type=contains&query=&date=&range_value=7&range_unit=days&do_query=1&signature=_PR_MD_SEND

next allow the report user to apply filters to build more precise queries from the set of reports.. filters might be fromany of the fields or it would really cool if we could also filter from other items in the crash report like the full stacktrace and/or module list:

filter uptime? < 60 secondsand filter address? exactly_matches 0x187d000and fliter url? contains mail.google.comor fliter url? conttains mail.yahoo.comand filter modulelist? does_not_contain "mswsock.dll 5.1.2600.3394"

that last example of module list might be a stretch, but would be very valuable to check module list for existance ornon-existance of binary components and their version numbers.

from there we would want to see the results and export to csv to import things like url lists into page load testingsystems to look for reproducible crashers.

12.7 Disk Performance Tests

12.7.1 Introduction

Any DBMS for a database which is larger than memory can be no faster than disk speed. This document outlines aseries of tests for testing disk speed to determine if you have an issue. Written originally by PostgreSQL Experts Inc.for Mozilla.

12.7.2 Running Tests

Note: all of the below require you to have plenty of disk space available. And their figures are only reliable if nothingelse is running on the system.

Simplest Test: The DD Test

This test measures the most basic single-threaded disk access: a large sequential write, followed by a large sequentialread. It is relevant to database performance because it gives you a maximum speed for sequential scans for large tables.Real table scans are generally about 30% of this maximum.

dd is a Unix command line utility which simply writes to a block device. We use it for this 3-step test. The other thingyou need to know for this test is your RAM size.

1. We create a large file which is 2x the size of RAM, and synch it to disk. This makes sure that we get the realsustained write rate, because caching can have little effect. Since there are 125000 blocks per GB (8k blocksizeis used because it’s what Postgres uses), if we had 8GB of RAM, we would run the following:

time sh -c "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync"

dd will report a time and write rate to us, and “time” will report a larger time. The time and rate reported bydd represents the rate without any lag or synch time; divide the data size by the time reported by “time” forsynchronous file writing rate.

2. Next we want to write another large file, this one the size of RAM, in order to flush out the FS cache so that wecan read directly from disk later.:

dd if=/dec/zero of=ddfile2 bs=8K count=500000

3. Now, we want to read the first file back. Since the FS cache is full from the second file, this should be 100%disk access:

12.7. Disk Performance Tests 89

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.0.10&query_search=signature&query_type=contains&query=&date=&range_value=7&range_unit=days&do_query=1&signature=_PR_MD_SEND


time dd if=ddfile of=/dev/null bs=8k

This time, “time” and dd will be very close together; any difference will be strictly storage lag time.

12.7.3 Bonnie++

Bonnie++ is a more sophisticated set of tests which tests random reads and writes, as well as seeks, and file cre-ation and deletion operations. For a modern system, you want to use the last version, 1.95, downloaded fromhttp://www.coker.com.au/bonnie++/experimental/ This final version of bonnie++ supports concurrency and measureslag time. However, it is not available in package form in most OSes, so you’ll have to compile it using g++.

Again, for Mozilla we want to test performance for a database which is larger than RAM, since that’s what we have.Therefore, we’re going to run a concurrent Bonnie++ test where the total size of the files is about 150% of RAM,forcing the use of disk. We’re also going to run 8 threads to simulate concurrent file access. Our command line for amachine with 16GB RAM is:

bonnie++ -d /path/to/storage -c 8 -r 16000 -n 100

The results we get back look something like this:

Version 1.95 ------Sequential Output------ --Sequential Input- --Random-Concurrency 8 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPtm-breakpad0 32000M 757 99 71323 16 30594 5 2192 99 57555 4 262.5 13Latency 15462us 6918ms 4933ms 11096us 706ms 241msVersion 1.95 ------Sequential Create------ --------Random Create--------tm-breakpad01-maste -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--

files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP100 44410 75 +++++ +++ 72407 81 45787 77 +++++ +++ 63167 72

Latency 9957us 477us 533us 649us 93us 552us

So, the interesting parts of this are:

Sequential Output: Block: this is sequential writes like dd does. It’s 70MB/s.

Sequential Input: Block: this is sequential reads from disk. It’s 57MB/s.

Sequential Output: Rewrite: is reading, then writing, a file which has been flushed to disk. This rate will be lower thaneither of the above, and is at 30MB/s.

Random: Seeks: this is how many individual blocks Bonnie can seek to per second; it’s a fast 262.

Latency: this is the full round-trip lag time for the mentioned operation. On this platform, these times are catas-trophically bad; 1/4 second round-trip to return a single random block, and 3/4 seconds to return the start of a largefile.

The figures on file creations and deletion are generally less interesting to databases. The +++++ are for runs that wereso fast the error margin makes the figures meaningless; for better figures, increase -n.

12.7.4 IOZone

Now, if you don’t think Bonnie++ told you enough, you’ll want to run Iozone. Iozone is a benchmark mostly know forcreating pretty graphs (http://www.iozone.org/) of filesystem performance with different file, batch, and block sizes.However, this kind of comprehensize profiling is completely unnecessary for a DBMS, where we already know thefile access pattern, and can take up to 4 days to run. So do not run Iozone in automated (-a) mode!

Instead, run a limited test. This test will still take several hours to run, but will return a more limited set of relevantresults. Run this on a 16GB system with 8 cores, from a directory on the storage you want to measure:


http://www.coker.com.au/bonnie++/experimental/

http://www.iozone.org/


iozone -R -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 8 -l 6 -u 6 -r 8k -s 4G -F f1 f2 f3 f4 f5 f6

This runs the following tests: write/read, rewrite/reread, random-read/write, read-backwards, re-write-record, stride-read, random mix. It does these tests using 6 concurrent processes, a block size of 8k (Postgres’ block size) for 4Gfiles named f1 to f6. The aggregate size of the files is 24G, so that they won’t all fit in memory at once.

In theory, the relevance of these tests to database activity is the following:

write/read: basic sequential writes and reads.

rewrite/reread: writes and reads of frequently accessed tables (in memory)

random-read/write: index access, and writes of individual rows

read-backwards: might be relevant to reverse index scans.

re-write-record: frequently updated row behavior

stride-read: might be relevant to bitmapscan

random mix: general database access average behavior.

The results you get will look like this:

Children see throughput for 6 initial writers = 108042.81 KB/secParent sees throughput for 6 initial writers = 31770.90 KB/secMin throughput per process = 13815.83 KB/secMax throughput per process = 35004.07 KB/secAvg throughput per process = 18007.13 KB/secMin xfer = 1655408.00 KB

And so on through all the tests. These results are pretty self-explanatory, except that I have no idea what the differencebetween “Children see” and “Parent sees” means. Iozone documentation is next-to-nonexistant.

Note: IOZone appears to have several bugs, and places where its documentation and actual features don’t match.Particularly, it appears to have locking issues in concurrent access mode for some writing activity so that concurrencythroughput may be lower than actual.

12.8 Dumping Dump Tables

A work item that came out of the Socorro Postgres work week is to dump the dump tables and store cooked dumps asgzipped files. Drop dumps table

convert each dumps table row to a compressed file on disk

12.8.1 Bugzilla

https://bugzilla.mozilla.org/show_bug.cgi?id=484032

12.8.2 Library support

‘done’ as of 2009-05-07 in socorro.lib.dmpStorage (Coding, testing is done; integration testing is done, ‘go live’ istoday) Socorro UI

/report/index/{uuid}

• Will stop using the dumps table.

• Will start using gzipped files

12.8. Dumping Dump Tables 91



– Will use the report uuid to locate the dump on a file system

– Will use apache mod-rewrite to serve the actual file. The rewrite rule is based onthe uuid, and is ‘simple’: AABBCCDDEEFFGGHHIIJJKKLLM2090308.jsonz =>AA/BB/AABBCCDDEEFFGGHHIIJJKKLLM2090308.jsonz

– report/index will include a link to JSON dump

link rel=’alternate’ type=’application/json’ href=’/reporter/dumps/cdaa07ae-475b-11dd-8dfa-001cc45a2ce4.jsonz’

12.8.3 Dump file format

• Will be gzip compressed JSON encoded cooked dump files

• Partial JSON file

• Full JSONZ file

12.8.4 On Disk Location

application.conf dumpPath Example for kahn $config’dumpPath’? = ‘/mnt/socorro_dumps/named’;

In the dumps directory we will have an .htaccess file:

AddType "application/json; charset=UTF-8" jsonzAddEncoding gzip jsonz

Webhead will serve these files as:

Content-Type: application/json; charset=utf-8Content-Encoding: gzip

**Note:* You’d expect the dump files to be named json.gz, but this is broken in Safari. By setting HTTP headers andnaming the file jsonz, an unknown file extension, this works across browsers.

12.8.5 Socorro UI

• Existing URL won’t change.

• Second JSON request back to server will load jsonz file

Example:

• http://crash-stats.mozilla.com/report/index/d92ebf79-9858-450d-9868-0fe042090211

• http://crash-stats.mozilla.com/dump/d92ebf79-9858-450d-9868-0fe042090211.jsonz

mod rewrite rules will match /dump/.jsonz and change them to access a file share.

12.8.6 Future Enhancement

A future enhancement if we find webheads are high CPU would be to move populating the report/index page to clientside.


http://crash-stats.mozilla.com/report/index/d92ebf79-9858-450d-9868-0fe042090211

http://crash-stats.mozilla.com/dump/d92ebf79-9858-450d-9868-0fe042090211.jsonz


12.8.7 Test Page

http://people.mozilla.org/~aking/Socorro/dumpingDump/json-test.html - Uses browser to decompress a gzip com-pressed JSON file during an AJAX request, pulls it apart and appends to the page.

Test file made with gzip dump.json

12.9 JSON Dump Storage

12.9.1 What this system offers

Crash data is stored so that it can be quickly located based on a Universally Unique Identifier (uuid) or visited by thedate and time when reported

12.9.2 Directory Structure

The crash files are located in a tree with two branches: the name or “index” branch and the date branch.

• The name branch consists of paths based on the first few pairs of characters of the uuid. Name branch holds the two data files and a relative symbolic link to the date branch directory associated with the particular uuid. For the uuid: 22adfb61-f75b-11dc-b6be-001321b0783d The “depth” is the number of sub-directories between the name directory and the actual file. By default, to conserve inodes, depth is two.

– By default, the json file is stored (depth 2) as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.json

– The json file could be stored (depth 4) as %(root)s/name/22/ad/fb/61/22adfb61-f75b-11dc-b6be-001321b0783d.json

– The dump file is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.dump

– The symbolic link is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d and(see below) references (own location)/%(toDateFromName)s/2008/09/30/12/05/webhead01_0/

• The date branch consists of paths based on the year, month, day, hour, minute-segment, webhead host name and a small sequence number. For each uuid, it holds a relative symbolic link referring to the actual name directory holding the data for that uuid. For the uuid above, submitted at 2008-09-30T12:05 from webhead01

– The symbolic link is stored as %(root)s/date/2008/09/30/12/05/webhead01_0/22adfb61-f75b-11dc-b6be-001321b0783d and references (own location)/%(toNameFromDate)s/22/ad/fb/61/

• Note (name layout) In the examples on this page, the name/index branch uses the first 4 characters of the uuidas two character-pairs naming subdirectories. This is a configurable setting called storageDepth in the Collectorconfiguration. To use the 8 characters, storageDepth is set to 4. To use 6 characters, set to 3. The defaultstorageDepth is 2 because on our system, with (approximately) 64K leaf directories, the number of files per leafis reasonable; and the number of inodes required by directory entries is not so large as to cause undue difficulty.A storageDepth of 4 was examined, and was found to crash the file system by requiring too many inodes.

• If the uuids are such that their initial few characters are well spread among all possibles, then the lookup canbe very quick. If the first few characters of the uuids are not well distributed, the resulting directories may bevery large. If, despite well chosen uuids, the leaf name directories become too large, it would be simple to addanother level, reducing the number of files by approximately a factor of 256; however bear in mind the issue ofinodes.

• Note (symbolic links) The symbolic links are relative rather than absolute, to avoid issues that might arise fromvariously mounted nfs volumes.

• Note (maxDirectoryEntries) If the number of links in a particular webhead subdirectory would exceed maxDi-rectoryEntries, then a new webhead directory is created by appending a larger _N : .../webhead01_0 first, then.../webhead01_1 etc. For the moment, maxDirectoryEntries is ignored for the name branch.

12.9. JSON Dump Storage 93

http://people.mozilla.org/~aking/Socorro/dumpingDump/json-test.html


12.9.3 How it’s used

We use the file system storage for incoming dumps caught by Collector. There are two instances of the file systemused for different purposes: standard storage and deferred storage.

12.9.4 Standard Job Storage

This is where json/dump pairs are stored for further processing. The Monitor finds new dumps and queues them forprocessing. It does this by walking the date branch of the file system using the API function destructiveDateWalk. Asit moves through the date branch, it notes every uuid (in the form of a symbolic link) that it encounters. It queues theinformation from the symbolic link and then deletes the symbolic link. This insures that it only ever finds new entries.Later, the Processor will read the json/dump pair by doing a direct lookup of the uuid on the name branch.

In the case of priority processing, the target uuid is looked up directly on the name branch. Then the link to the datebranch is used to locate and delete the link on the date branch. This insures that a priority job is not found a secondtime as a new job by the Monitor.

12.9.5 Deferred Job Storage

This is where jobs go that are deferred by Monitor‘s throttling mechanism. If a json/dump pair is needed for priorityprocessing, it can be looked up directly on the name branch. In such a case, just as with priority jobs in standardstorage, we destroy the links between the two branches. However, in this case, destroying the links prevents thejson/dump pair from being deleted by the deferred cleanup process.

When it comes time to drop old json/dump pairs that are no longer needed within the deferred storage, the system isgiven a date threshold. It walks the appropriate parts of the date branch older than the threshold. It uses the links tothe name branch to blow away the elderly json/dump pairs.

12.9.6 class JsonDumpStorage

socorro.lib.JsonDumpStorage holds data and implements methods for creating and accessing crash files.

public methods

• __init__(self, root=".", maxDirectoryEntries=1024, **kwargs)

Take note of our root directory, maximum allowed date->name links per directory, some relative relations, andwhatever else we may need. Much of this (c|sh)ould be read from a config file.

Recognized keyword args:

– dateName. Default = ‘date’

– indexName. Default = ‘name’

– jsonSuffix. Default = ‘.json’. If not startswith(‘.’) then ‘.’ is prepended

– dumpSuffix. Default = ‘.dump’. If not startswith(‘.’) then ‘.’ is prepended

– dumpPermissions. Default 660

– dirPermissions. Default 770

– dumpGID. Default None. If None, then owned by the owner of the running script.

• newEntry (self, uuid, webheadHostName=’webhead01’, timestamp=DT.datetime.now())

Sets up the name and date storage for the given uuid.



– Creates any directories that it needs along the path to the appropriate storage location (possiblyadjusting ownership and mode)

– Creates two relative symbolic links:

* the date branch link pointing to the name directory holding the files;

* the name branch link pointing to the date branch directory holding that link.

– Returns a 2-tuple containing files open for writing: (jsonfile,dumpfile)

• getJson (self, uuid)

Returns an absolute pathname for the json file for a given uuid. Raises OSError if the file is missing

• getDump (self, uuid)

Returns an absolute pathname for the dump file for a given uuid. Raises OSError if the file is missing

• markAsSeen (self,uuid)

Removes the links associated with the two data files for this uuid, thus marking them as seen. Quietlyreturns if the uuid has no associated links.

• destructiveDateWalk (self)

This function is a generator that yields all(see note) uuids found by walking the date branch of thefile system.

Just before yielding a value, it deletes both the links (from date to name and from nameto date) After visiting all the uuids in a given date branch, recursively deletes any emptysubdirectories in the date branch Since the file system may be manipulated in a differentthread, if no .json or .dump file is found, the links are left, and we do not yield that uuidnote To avoid race conditions, does not visit the date subdirectory corresponding to thecurrent time

• remove (self, uuid)

Removes all instances of the uuid from the file system including the json file, the dump file, and thetwo links if they still exist.

– Ignores missing link, json and dump files: You may call it with bogus data, though of courseyou should not

• move (self, uuid, newAbsolutePath)

Moves the json file then the dump file to newAbsolutePath.

– Removes associated symbolic links if they still exist.

– Raises IOError if either the json or dump file for the uuid is not found, and retains any links, butdoes not roll back the json file if the dump file is not found.

• removeOlderThan (self, timestamp)

– Walks the date branch removing all entries strictly older than the timestamp.

– Removes the corresponding entries in the name branch.

member data

Most of the member data are set in the constructor, a few are constants, the rest are simple calculations based on theothers.

• root: The directory that holds both the date and index(name) subdirectories

12.9. JSON Dump Storage 95


• maxDirectoryEntries: The maximum number of links in each webhead directory on the date branch. Default =1024

• dateName: The name of the date branch subdirectory. Default = ‘date’

• indexName: The name of the index branch subdirectory. Default = ‘name’

• jsonSuffix: the suffix of the json crash file. Default = ‘.json’

• dumpSuffix: the suffix of the dump crash file. Default = ‘.dump’

• dateBranch: The full path to the date branch

• nameBranch: The full path to the index branch

• dumpPermissions: The permissions for the crash files. Default = 660

• dirPermissions: The permissions for the directories holding crash files. Default = 770

• dumpGID: The group ID for the directories and crash files. Default: Owned by the owner of the running script.

• toNameFromDate: The relative path from a leaf of the dateBranch to the nameBranch

• toDateFromName: The relative path from a leaf of the nameBranch to the dateBranch

• minutesPerSlot: How many minutes in each sub-hour slot. Default = 5

• slotRange: A precalculated range of slot edges = range(self.minutesPerSlot, 60, self.minutesPerSlot)

12.10 Processed Dump Storage

Processed dumps are stored in two places: the relational database as well as in flat files within a file system. Thisforking of the storage scheme came from the realization that the infrequently used data within the database ‘dumps’tables was causing performance problems within PostgreSQL. The ‘dumps’ tables took nearly eighty percent of thetotal storage, making replication and backup problematic. Since the ‘dumps’ table’s data is used only when a userrequests a specific crash dump by uuid, most of the data is rarely, if ever, accessed.

We decided to migrate these dump into a file system storage outside the database. Details can be seen at: DumpingDump Tables

In the file system, after processing, dumps are stored a gzip compressed JSON file format. This format echos aflattening of the ‘reports’, ‘extensions’ and the now deprecated ‘dumps’ tables within the database.

12.10.1 Directory Structure

Just as in the JsonDumpStorage scheme, there are two branches: ‘name’ and ‘date’

12.10.2 Access by Name

Most lookups of processed crash data happens by name. We use a radix storage technique where the first 4 characters ofthe file name are used for two levels of directory names. A file called aabbf9cb-395b-47e8-9600-4f20e2090331.jsonzwould be found in the file system as .../aa/bb/aabbf9cb-395b-47e8-9600-4f20e2090331.jsonz

12.10.3 Access by Date

For the purposes of finding crashes that happened at specific date and time, a hierarchy of date directories offer quicklookup. The leaves of the date directories contain symbolic links to the locations of crash data.



12.10.4 JSON File Format

example:

{"signature": "nsThread::ProcessNextEvent(int, int*)","uuid": "aabbf9cb-395b-47e8-9600-4f20e2090331","date_processed": "2009-03-31 14:45:09.215601","install_age": 100113,"uptime": 7,"last_crash": 95113,"product": "SomeProduct","version": "3.5.2","build_id": "20090223121634","branch": "1.9.1","os_name": "Mac OS X","os_version": "10.5.6 9G55","cpu_name": "x86","cpu_info": "GenuineIntel family 6 model 15 stepping 6","crash_reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS","crash_address": "0xe9b246","User Comments": "This thing crashed.\nHelp me Kirk.","app_notes": "","success": true,"truncated": false,"processor_notes": "","distributor":"","distributor_version": "","add-ons": [["{ABDE892B-13A8-4d1b-88E6-365A6E755758}", "1.0"], ["{b2e293ee-fd7e-4c71-a714-5f4750d8d7b7}", "2.2.0.9"], ["{972ce4c6-7e08-4474-a285-3208198ce6fd}", "3.5.2"]],"dump":"OS|Mac OS X|10.5.6 9G55\\nCPU|x86|GenuineIntel family 6 model 15 stepping 6|2\\nCrash|EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE|0x1558c095|0\\nModule|firefox-bin||firefox-bin|988FA8BFC789C4C07C32D61867BB42B60|0x00001000|0x00001fff|\\n....."}

The “dump” component is the direct streamed output from the Breakpad “minidump_stackwalk” program. Unfortu-nately, that project does not give detailed documentation of the format.

12.11 Report Database Design

12.11.1 Introduction

With the launch of [[MeanTimeBeforeFailure]] and Top Crashers By URL reports, we have added 8 new databasetables. The call into the following categories:

• configuration

– mtbfconfig

– tcbyurlconfig

• facts

– mtbffacts

– topcrashurlfacts

• dimensions

– productdims

– urldims

12.11. Report Database Design 97


– signaturedims

• relational

– topcrashurlfactsreports

What relational? Aren’t they all?

12.11.2 Star Schema

Taking inspiration from data warehousing, we implement the datastore with dimensional modeling instead of rela-tional modeling. The pattern used star schemas. Our implementation is a very lightweight approach as we don’tautomatically generate facts for every combination of dimensions. This is not a Pentaho competitor :)

Star schemas are optimized for:

• read only systems

• large amounts of data

• viewed from different levels of granularity

12.11.3 Pattern

The dimensions and facts are the heart of the pattern.

dimensions

Each dimension is property with various attributes and values at different levels of granularity. Example:

urldims - table would have columns:iddomainurl

Sample values

1. en-us.www.mozilla.com, ALL

2. http://en-us.www.mozilla.com/en-US/firefox/3.0.5/whatsnew/

3. en-us.www.mozilla.com, http://en-us.www.mozilla.com/en-US/firefox/features/

We see a dimension that describes the property “url”. This is useful for talking about crashes that happen on a specificurl. We also see two levels of granularity, a specific URL as well as all urls under a domain.

Dimensions give us ways to slice and dice aggregate crash data, then drill down or rollup this information.

Note: time could be a dimension ( and usually is in data warehouses ). For MTBF and Top Crash By URl we don’ttreat it as a 1st class dimension as their are no requirements to roll it up ( say to Q1 crashes, etc) and having it be acolumn in the facts table provides better performance.

facts

For a given report it will be powered by a main facts table.

Example:

topcrashurlfacts - table would have the columns:idcountrankday


http://en-us.www.mozilla.com/en-US/firefox/3.0.5/whatsnew/

http://en-us.www.mozilla.com/en-US/firefox/features/


productdims_idurldims_idsignaturedims_id

A top crashers by url fact has two key elements, an aggregate crash count and the rank respective to others facts. So ifwe have static values for all dimensions and day, then we can see who has the most crashes.

Reporting

The general pattern of creating a report is for a series of static and 1 or two variable dimensions, display the facts thatmeet this criteria.

12.12 Code and Database Update

12.12.1 Socorro Wish List

One of my (griswolf) directives is approximately “make everything work efficiently and the same.” Toward this end,there are several tasks:

Probably most important, we have an inefficient database design, and some inefficient code working with it.

Next, we have a collection of ‘one-off’ code (and database schemas) that could be more easily maintained using acommon infrastructure, common coding conventions, common schema layout, common patterns.

Finally, we have enhancement requests that would become more feasible after such changes: Such requests wouldbe more easily handled in a cleaner programming environment; and in a cleaner environment there might be fewersignificant bugs, leaving more time to work on enhancements.

Current state: See [[SocorroDatabaseSchema]]

12.12.2 Another Way to do Materialized Views?

The current system is somewhere between ad hoc reporting and a star architecture. The main part of this proposalfocuses on converting further toward a star architecture. However there may be another way: MapReduce techniques,which could possibly be run external to Mozilla (for instance: Amazon Web Services) could be used to mine dumpfiles and create statistical data stored in files or database. Lars mentioned to me that we now have some statistics folkon board who are interested in this.

12.12.3 Database Design

• There are some legacy tables (reports, topcrasher) that are not normalized. Other tables are partly normalized. Non-normal has consequences:

– Data is duplicated, causing possible synchronization issues.

* JOSH: duplicated data is normal for materialized views and is not a problem a priori.

– Data is duplicated, increasing size.

* JOSH: I don’t believe that the matview tables are that large, although we will want to look atpartitioning them in the future because they will continue to grow.

* FRANK: Lars points out that size-limiting partitions which reference each other must all bepartitioned on the same key. This makes partitions a little more interesting

12.12. Code and Database Update 99


– SELECT statements on multiple varchar fields, even when indexed, are probably slower than SELECTstatements on a single foreign key. (And even if not, maintaining larger index tables has a time andspace cost)

• There are legacy tables that contain deprecated columns, a slight inefficiency.

• In some cases, separable details are conflated, making it difficult to access by a single area of concern. Forinstance, the table that describes our products has an os_name column, requiring us to pretend we deal with anos named ‘ALL’ in order to examine product data without regard to os.

• According to Postgresql consultants, some types are not as efficient as others. Example TEXT (which we use only a little) is slightly more time-efficient than VARCHAR(n) (which we mostly use)

– JOSH: this is a minor issue, and should only be changed if we’re modifying the fields/tables anyway.

– FRANK: We have already run into a size limitation for signatures which are now VARCHAR(255).Experiment shows that conversion to TEXT is slow because of index rebuilding, but conversion toVARCHAR(BIGGER_NUMBER) can be done by manipulating typemod (the number of chars inVARCHAR) in the system tables. So change from VARCHAR to TEXT needs to be scheduled inadvance, with an expected ‘long’ turn around.

• Current indexes were carefully audited during PGExperts week. Schema changes will require careful reevalua-tion

12.12.4 Commonality

• Some of the tables that provide statistics (Mean Time Before Failure, for example) use a variant of the “Star” data warehousing pattern, which is well known and understood. Some do not. After discussion we have reached agreement that all should be partly ‘starred’

– osdims and productdims are appropriate dimension tables for each view that cares about operatingsystem or product

– url and signature ‘dimension’ tables are used to filter materialized views:

* the ‘fact’ tables for views will use ids from these filter/dimension tables

* the filter/dimension tables will hold only data that has passed a particular frequency threshold,initial guess at threshold: 3 per week.

• Python code has been written by a variety of people with various skill levels, doing things in a variety of ways.Mostly, this is acceptable, but required changes give us an opportunity.

• We now specify Python version 2.4, which is adequate. Possible to upgrade to 2.5.x or 2.6.x with both ease andsafety. This is an opportunity to do so. No code needs to change for this.

• New features (safely) available in Python 2.5:

– unified try/except/finally: instead of a try/finally block holding a try/except block

– there is a very nice with: syntax useful for block-scoped non GC’d resources such as open files (liketry: with an automatic finally: at block end)

– generators are significantly more powerful, which might have some uses in our code

– and lots more that seems less obviously useful to Socorro

– better exception hierarchy

• New features (safely) available in Python 2.6

– json library ships with Python 2.6

– multiprocessing library parallel to threading library ships with Python 2.6



– Command line option ‘-3’ flags things that will work differently or fail in Python 3 (looking ahead isgood)

• We use nosetests which is not correctly and fully functional in a Python 2.4 environment.

12.12.5 Viewable Interface

• We have been gradually providing a more useful view of the crash data. Sometimes this is intrinsically hard,sometimes it is made more difficult by our schema.

• We have requests for:

– Better linkage between crash reports and bugs

– Ability to view by OS and OS version, by signature, by product, by product version (some of this willbe easier with a new schema)

– Ability to view historical data, current data, (sliding) windows of data and trends

• Some of the requests seem likely to be too time or space costly. In some cases these might be feasible with amore efficient system

12.12.6 Consequences of Possible Changes

• (Only) Add new tables (two kinds of changes)

– “replace in place”, for instance add table reports_normal while leaving table reports in place)

– “brand new”, for instance add new productdims and osdims tables to serve a new tobcrashbysignaturetable

– Existing views are not impacted (for good or ill)

– Duplication of data (some tables near normal form, some not, etc) becomes worse than it now is

– No immediate need to migrate data: Options

* Maybe provide two views: “Historic” and “Current”

* Maybe write ‘orrible look-both-ways code to access both tables from single view

* Maybe migrate data

– Code that looks at old schema is (mostly?) unchanged

– Code that looks at new schema is opportunity for improved design, etc.

– Can do one thing at a time, with multiple ‘easy’ rollouts (each one is still a rollout, though)

– Long term goal: Stop using old tables and code

• (Only) Drop redundant or deprecated columns in existing tables:

– Existing views are no less useful, Viewer and Controller code will need some maintenance

– Data migration is ‘simple’

* beware that dropped columns may be part of a (foreign) key or index

– Data migration is needed at rollout

– Minimally useful

• Optimize database types, indexes, keys:



– Existing views are not much impacted

* May want to optimize queries in Viewer and Controller code

* May need to guard for field size or type in Controller code

– Details of changes are ‘picky’ and may need some hand holding by consultants, maybe testing.

• Normalize existing tables (while adding new tables as needed):

– Much existing code needs re-write

* With different Model comes a need for different Viewers and Controllers

* Opportunity to clarify old code

* Opportunity to optimize queries

– Data migration is needed at rollout

– Rollout is complex (but need only one for complete conversion)

– JOSH: in general, Matview generation should be optimized to be insert-only. In some cases, this willinvolve having a “current week” partition which gets dropped and recreated until the current week iscompleted. Updates are generally at least 4x as expensive as inserts.

12.12.7 Rough plan as of 2009 June

• Soon: Materialized views will make use of dimensions and ‘filtered dimensions’ tables

• Later: Normalize the ‘raw’ data to make use of tables describing operating system and product details. Leavesignatures and urls raw

12.12.8 Specific Database Changes

Star Data Warhousing

Existing tables

• –dimension: signaturedims: associate the base crash signature string with an id– Use signature TEXT directly

• dimension: productdims: associate a product, version, release and os_name with an id

– os_name is neither sufficient for os drill-down (which wants os_version) nor properly part of a productdimension

• dimension: urldims: associate (a large number of) domains and urls, each pair with an id

• config: mtbfconfig: specifies the date-interval during which a given product (productdims) is of interest forMTBF analysis

• config: tcbyurlconfig: specifies whether a particular product (productdims) is now of interest for Top Crash byURL analysis.

• fact: mtbffacts: collects daily summary of average time before failure for each product

• –report: topcrashurlfactsreports: associates a crash uuid and a comment with a row of topcrashurlfacts ?Appar-ently never used?–

Needed/Changed tables

Matview changes “Soon”



• config (new): product_visibility: Specifies date interval during which a product (productdims id) is of interestfor any view. ?Replaces mtbfconfig?

• dimension (new): osdims: associate an os name and os version with an id

• dimension (edit): productdims: remove the os_name column (replaced by another dimension osdims above)

• fact (replace): topcrashers: The table now in use to provide Top Crash by Signature view. Will be replaced bytopcrashfacts

• fact (new): topcrashfacts: collect periodic count of crashes, average uptime before crash and rank of each signature by signature, os, product

– replaces existing topcrashers table which is poorly organized for current needs

• config (new): tcbysignatureconfig: specify which products and operating systems are currently of interest fortcbysigfacts

• fact: (renamed, edit) top_crashes_by_url: collects daily summary of crashes by product, url (productdims,urldims)

• fact: (new): top_crashes_by_url_signature: associates a given row from top_crashes_by_url with one or moresignatures

Incoming (raw) changes “Later”

• details (new): osdetails, parallel to osdims, but on the incoming side will be implemented later

• details (new): productdetails, parallel to productdims, but on the incoming side will be implemented later

• reports: Holds details of each analyzed crash report. It is not in normal form, which causes some ongoing difficulty

– columns product, version, build should be replaced by productdetails foreign key later

– column signature LARS: NULL is a legal value here. We’ll have to make sure that we use left outerjoins to retrieve the report records.

– columns cpu_name, cpu_info are not currently in use in any other table, but could be a foreign keyinto cpudims

– columns os_name, os_version should be replaced by osdims foreign key

– columns email, user_id are deprecated and should be dropped

Details

New or significantly changed tables

New product_visibility table (soon, matview):

table product_visibility (id serial NOT NULL PRIMARY KEY,productdims_id integer not null,start_date timestamp, -- used by MTBFend_date timestamp,ignore boolean default False -- force aggregation off for this product id

New osdims table (soon, matview) NOTE: Data available only if ‘recently frequent’:

table osdims(id serial NOT NULL PRIMARY KEY,os_name TEXT NOT NULL,os_version TEXT);constraint osdims_key (os_name, os_version) unique (os_name, os_version);



Edited productdims table (soon, matview) NOTE: use case for adding products is under discussion:

CREATE TYPE release_enum AS ENUM (’major’, ’milestone’, ’development’);table productdims (id serial NOT NULL PRIMARY KEY,product TEXT NOT NULL,version TEXT NOT NULL,release release_enum NOT NULL,constraint productdims_key (product, version) unique ( product, version ));

New product_details table (later, raw data) NOTE: All data will be stored (raw data should not lose details):

table product_details (id serial NOT NULL PRIMARY KEY,product TEXT NOT NULL, -- /was/ character varying(30)version TEXT NOT NULL, -- /was/ character varying(16)release release_enum NOT NULL -- /was/ character varying(50) NOT NULL);

Edit mtbffacts to use edited productdims and new osdims (soon, matview):

table mtbffacts (id serial NOT NULL PRIMARY KEY,avg_seconds integer NOT NULL,report_count integer NOT NULL,window_end timestamp, -- was DATEproductdims_id integer,osdims_id integerconstraint mtbffacts_key unique ( productdims_id, osdims_id, day ););

New top_crashes_by_signature table (soon, matview):

table top_crashes_by_signature (id serial NOT NULL PRIMARY KEY,count integer NOT NULL DEFAULT 0,average_uptime real DEFAULT 0.0,window_end timestamp without time zone,window_size interval,productdims_id integer NOT NULL, -- foreign key. NOTE: Filtered by recent frequencyosdims_id integer NOT NULL, -- foreign key. NOTE: Filtered by recent frequencysignature TEXTconstraint top_crash_by_signature_key (window_end, signature, productdims_id, osdims_id) unique (window_end, signature, productdims_id, osdims_id));-- some INDEXes are surely needed --

New/Renamed top_crashes_by_url table (soon, matview):

table top_crashes_by_url (id serial NOT NULL,count integer NOT NULL,window_end timestamp without time zone NOT NULL,window_size interval not null,productdims_id integer,osdims_id integer NOT NULL,urldims_id integerconstraint top_crashes_by_url_key (uridims_id,osdims_id,productdims_id, window_end) unique (uridims_id,osdims_id,productdims_id, window_end));

New top_crashes_by_url_signature (soon, matview):



table top_crash_by_url_signature (top_crashes_by_url_id integer, -- foreign keycount integer NOT NULL,signature TEXT NOT NULLconstraint top_crashes_by_url_signature_key (top_crashes_by_url_id,signature) unique (top_crashes_by_url_id,signature));

New crash_reports table (later, raw view) Replaces reports table:

table crash_reports (id serial NOT NULL PRIMARY KEY,uuid TEXT NOT NULL -- /was/ character varying(50)client_crash_date timestamp with time zone,install_age integer,last_crash integer,uptime integer,cpu_name TEXT, -- /was/ character varying(100),cpu_info TEXT, -- /was/ character varying(100),reason TEXT, -- /was/ character varying(255),address TEXT, -- /was/ character varying(20),build_date timestamp without time zone,started_datetime timestamp without time zone,completed_datetime timestamp without time zone,date_processed timestamp without time zone,success boolean,truncated boolean,processor_notes TEXT,user_comments TEXT, -- /was/ character varying(1024),app_notes TEXT, -- /was/ character varying(1024),distributor TEXT, -- /was/ character varying(20),distributor_version TEXT, -- /was/ character varying(20)signature TEXT,productdims_id INTEGER, -- /new/ foreign key NOTE Filtered by recent frequencyosdims_id INTEGER, -- /new/ foreign key NOTE Filtered by recent frequencyurldims_id INTEGER -- /new/ foreign key NOTE Filtered by recent frequency-- /remove - see productdims_id/ - product character varying(30),-- /remove - see productdims_id/ version character varying(16),-- /remove - redundant with build_date/ -- build character varying(30),-- /remove - see urldims_id/ url character varying(255),-- /remove - see osdims_id/ os_name character varying(100),-- /remove - see osdims_id/ os_version character varying(100),-- /remove - deprecated/ email character varying(100),-- /remove - deprecated/ user_id character varying(50),);-- This is a partitioned table: INDEXes are provided on date-based partitions

Tables with Minor Changes: varchar->text:

table branches (product TEXT NOT NULL, -- /was/ character varying(30)version TEXT NOT NULL, -- /was/ character varying(16)branch TEXT NOT NULL, -- /was/ character varying(24)PRIMARY KEY (product, version)

table extensions (report_id integer NOT NULL, -- foreign keydate_processed timestamp without time zone,extension_key integer NOT NULL,extension_id TEXT NOT NULL, -- /was/ character varying(100)



extension_version TEXT -- /was/ character varying(16)

table frames (report_id integer NOT NULL,date_processed timestamp without time zone,frame_num INTEGER NOT NULL,signature TEXT -- /was/ varchar(255));

table priority_jobsuuid TEXT NOT NULL PRIMARY KEY -- /was/ varchar(255)

table processors (id serial NOT NULL PRIMARY KEY,name TEXT NOT NULL UNIQUE, -- /was/ varchar(255)startdatetime timestamp without time zone NOT NULL,lastseendatetime timestamp without time zone);

table jobs (id serial NOT NULL PRIMARY KEY,pathname TEXT NOT NULL, -- /was/ character varying(1024)uuid TEXT NOT NULL UNIQUE, -- /was/ varchar(50)owner integer,priority integer DEFAULT 0,queueddatetime timestamp without time zone,starteddatetime timestamp without time zone,completeddatetime timestamp without time zone,success boolean,message TEXT,FOREIGN KEY (owner) REFERENCES processors (id));

table urldims (id serial NOT NULL PRIMARY KEY,domain TEXT NOT NULL, -- /was/ character varying(255)url TEXT NOT NULL -- /was/ character varying(255)key url -- for drilling by urlkey domain -- for drilling by domain);

table topcrashurlfactsreports (id serial NOT NULL PRIMARY KEY,uuid TEXT NOT NULL, -- /was/ character varying(50)comments TEXT, -- /was/ character varying(500)topcrashurlfacts_id integer);

12.13 Out-of-Date Data Warning

While portions of this doc are still relevant and interesting for current socorro usage, be aware that it is extremely outof date when compared to current schema.



12.14 Database Schema

12.14.1 Introduction

Socorro is married to the PostgreSQL database: It makes use of a significant number of PostrgeSQL and psycopg2(python) features and extensions. Making a database-neutral API has been explored, and for now is not being pursued.

The tables can be divided into three major categories: crash data, aggregate reporting and process control.

12.14.2 crash data

12.14.3 reports

This table participates in DatabasePartitioning

Holds a lot of data about each crash report:

Table "reports"Column | Type | Modifiers | Description

---------------------+-----------------------------+-----------------+-------------id | integer | not null serial | unique idclient_crash_date | timestamp with time zone | | as reported by clientdate_processed | timestamp without time zone | | when entered into jobs tableuuid | character varying(50) | not null | unique tag for jobproduct | character varying(30) | | name of product ("Firefox")version | character varying(16) | | version of product("3.0.6")build | character varying(30) | | build of product ("2009041522")signature | character varying(255) | | signature of ’top’ frame of crashurl | character varying(255) | | associated with crashinstall_age | integer | | in seconds since installedlast_crash | integer | | in seconds since last crashuptime | integer | | in seconds since recent startcpu_name | character varying(100) | | as reported by client ("x86")cpu_info | character varying(100) | | as reported by client ("GenuineIntel family 15 model 4 stepping 1")reason | character varying(255) | | as reported by clientaddress | character varying(20) | | memory addressos_name | character varying(100) | | name of os ("Windows NT")os_version | character varying(100) | | version of os ("5.1.2600 Service Pack 3")email | character varying(100) | | -- deprecatedbuild_date | timestamp without time zone | | product build date (column build has same info, different format)user_id | character varying(50) | | -- deprecatedstarted_datetime | timestamp without time zone | | when processor starts processing reportcompleted_datetime | timestamp without time zone | | when processor finishes processing reportsuccess | boolean | | whether finish was goodtruncated | boolean | | whether some dump data was removedprocessor_notes | text | | error messages during monitor processing of reportuser_comments | character varying(1024) | | if any, by userapp_notes | character varying(1024) | | arbitrary, sent by client (exception detail, etc)distributor | character varying(20) | | future use: "Linux distro"distributor_version | character varying(20) | | future use: "Linux distro version"

Partitioned Child TableIndexes:

"reports_aDate_pkey" PRIMARY KEY, btree (id)"reports_aDate_unique_uuid" UNIQUE, btree (uuid)"reports_aDate_date_processed_key" btree (date_processed)

12.14. Database Schema 107


"reports_aDate_product_version_key" btree (product, version)"reports_aDate_signature_date_processed_key" btree (signature, date_processed)"reports_aDate_signature_key" btree (signature)"reports_aDate_url_key" btree (url)"reports_aDate_uuid_key" btree (uuid)

Check constraints:"reports_aDate_date_check" CHECK (aDate::timestamp without time zone <= date_processed AND date_processed < aDate+WEEK::timestamp without time zone)

Inherits: reports

12.14.4 dumps

This table is deprecated (dump data is stored in the file system) see [[DumpingDumpTables]] for more information.

12.14.5 branches

This table has been replaced by a view of productdims:

CREATE VIEW branches AS SELECT product,version,branch FROM productdims;

12.14.6 extensions

This table participates in [[DatabasePartitioning]].

Holds data about what extensions are associated with a given report:

Table "extensions"Column | Type | Modifiers | Description

------------------+-----------------------------+-----------+-------------report_id | integer | not null | in child: foreign key reference to child of table ’reports’date_processed | timestamp without time zone | | set to time when the row is insertedextension_key | integer | not null | the name of this extensionextension_id | character varying(100) | not null | the id of this extensionextension_version | character varying(30) | | the version of this extension


"extensions_aDate_pkey" PRIMARY KEY, btree (report_id)"extensions_aDate_report_id_date_key" btree (report_id, date_processed)

Check constraints:"extensions_aDate_date_check" CHECK (’aDate’::timestamp without time zone <= date_processed AND date_processed < ’aDate+WEEK’::timestamp without time zone)

Foreign-key constraints:"extensions_aDate_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports_aDate(id) ON DELETE CASCADE

Inherits: extensions

12.14.7 frames

This table participates in [[DatabasePartitioning]]

Holds data about the frames in the dump associated with a particular report:

Table "frames"Column | Type | Modifiers | Description

----------------+-----------------------------+-----------+-------------report_id | integer | not null | in child: foreign key reference to child of table reports



date_processed | timestamp without time zone | | set to time when the row is inserted (?)frame_num | integer | not null | ordinal: one row per stack-frame per report, from 0=topsignature | character varying(255) | | signature as returned by minidump_stackwalk


"frames_aDate_pkey" PRIMARY KEY, btree (report_id, frame_num)"frames_aDate_report_id_date_key" btree (report_id, date_processed)

Check constraints:"frames_aDate_date_check" CHECK (’aDate’::timestamp without time zone <= date_processed AND date_processed < ’aDate+WEEK’::timestamp without time zone)

Foreign-key constraints:"frames_aDate_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports_aDate(id) ON DELETE CASCADE

Inherits: frames

Aggregate Reporting===================

.. image:: SocorroSchema.Aggregate.20090722.png

12.14.8 productdims

Dimension table that describes the product, version, gecko version (‘branch’) and type of release. Note that the releasestring is completely determined by the version string: A version like ‘X.Y.Z’ is ‘major’. A version with suffix ‘pre’ is‘development’ and a version with ‘a’ or ‘b’ (alpha or beta) is ‘milestone’. Note: current version does not conflate osdetails (see osdims):

Table productdimsColumn | Type | Modifiers | Description

---------+--------------+-----------+-------------id | integer | (serial) |product | text | not null |version | text | not null |branch | text | not null | gecko versionrelease | release_enum | | ’major’, ’milestone’, ’development’

Indexes:"productdims_pkey1" PRIMARY KEY, btree (id)"productdims_product_version_key" UNIQUE, btree (product, version)"productdims_release_key" btree (release)

12.14.9 osdims

Dimension table that describes an operating system name and version. Because there are so many very similar Linuxversions, the data saved here is simplified which allows many different ‘detailed version’ Linuxen to share the samerow in this table.:

Table osdimsColumn | Type | Modifiers | Description

------------+------------------------+-----------+-------------id | integer | (serial) |os_name | character varying(100) | |os_version | character varying(100) | |

Indexes:"osdims_pkey" PRIMARY KEY, btree (id)"osdims_name_version_key" btree (os_name, os_version)



12.14.10 product_visibility

Specifies the date-interval during which a given product (productdims_id is the foreign key) is of interest for aggregateanalysis. MTBF obeys start_date, but calculates its own end date as 60 days later. Top crash by (url|signature) tablesobey both start_date and end_date. Column ignore is a boolean, default False, which allows a product version to bequickly turned off. Note: Supersedes mtbfconfig and tcbyurlconfig. (MTBF is not now in use):

Table product_visibilityColumn | Type | Modifiers | Description

----------------+-----------------------------+---------------+-------------productdims_id | integer | not null |start_date | timestamp without time zone | |end_date | timestamp without time zone | |ignore | boolean | default false |

Indexes:"product_visibility_pkey" PRIMARY KEY, btree (productdims_id)"product_visibility_end_date" btree (end_date)"product_visibility_start_date" btree (start_date)

Foreign-key constraints:"product_visibility_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE

12.14.11 time_before_failure

Collects daily summary of average (mean) time before failure for each product of interest without regard to specificsignature.:

Table time_before_failureColumn | Type | Modifiers | Description

--------------------+-----------------------------+------------+-------------id | integer | (serial) |sum_uptime_seconds | double precision | not null |report_count | integer | not null |productdims_id | integer | |osdims_id | integer | |window_end | timestamp without time zone | not null |window_size | interval | not null |

Indexes:"time_before_failure_pkey" PRIMARY KEY, btree (id)"time_before_failure_os_id_key" btree (osdims_id)"time_before_failure_product_id_key" btree (productdims_id)"time_before_failure_window_end_window_size_key" btree (window_end, window_size)

Foreign-key constraints:"time_before_failure_osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE"time_before_failure_productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE

12.14.12 top_crashes_by_signature

The “fact” table that associates signatures with crash statistics:

Table top_crashes_by_signatureColumn | Type | Modifiers | Description

----------------+-----------------------------+--------------------+-------------id | integer | (serial) |count | integer | not null default 0 |uptime | real | default 0.0 |signature | text | |



productdims_id | integer | |osdims_id | integer | |window_end | timestamp without time zone | not null |window_size | interval | not null |

Indexes:"top_crashes_by_signature_pkey" PRIMARY KEY, btree (id)"top_crashes_by_signature_osdims_key" btree (osdims_id)"top_crashes_by_signature_productdims_key" btree (productdims_id)"top_crashes_by_signature_signature_key" btree (signature)"top_crashes_by_signature_window_end_idx" btree (window_end DESC)

Foreign-key constraints:"osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE"productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE

12.14.13 urldims

A dimensions table that associates an url and its domain with a particular id.

For example, given full url http://www.whatever.com/some/path?foo=bar&goo=car

the domain is the host name: www.whatever.com

the url is everything before the query part: http://www.whatever.com/some/path:

Table "urldims"Column | Type | Modifiers | Description

--------+------------------------+-----------------+-------------id | integer | not null serial | unique iddomain | character varying(255) | not null | the hostnameurl | character varying(255) | not null | the url up to query

Indexes:"urldims_pkey" PRIMARY KEY, btree (id)"urldims_url_domain_key" UNIQUE, btree (url, domain)

12.14.14 top_crashes_by_url

The “fact” table that associates urls with crash statistics:

Table top_crashes_by_urlColumn | Type | Modifiers | Description

----------------+-----------------------------+-----------+-------------id | integer | (serial) |count | integer | not null |urldims_id | integer | |productdims_id | integer | |osdims_id | integer | |window_end | timestamp without time zone | not null |window_size | interval | not null |

Indexes:"top_crashes_by_url_pkey" PRIMARY KEY, btree (id)"top_crashes_by_url_count_key" btree (count)"top_crashes_by_url_osdims_key" btree (osdims_id)"top_crashes_by_url_productdims_key" btree (productdims_id)"top_crashes_by_url_urldims_key" btree (urldims_id)"top_crashes_by_url_window_end_window_size_key" btree (window_end, window_size)

Foreign-key constraints:


http://www.whatever.com/some/path?foo=bar&goo=car

http://www.whatever.com/some/path


"top_crashes_by_url_osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE"top_crashes_by_url_productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE"top_crashes_by_url_urldims_id_fkey" FOREIGN KEY (urldims_id) REFERENCES urldims(id) ON DELETE CASCADE

12.14.15 top_crashes_by_url_signature

Associates count of each signature with a row in top_crashes_by_url table:

Table top_crashes_by_url_signatureColumn | Type | Modifiers | Description

-----------------------+---------+-----------+-------------top_crashes_by_url_id | integer | not null |signature | text | not null |count | integer | not null |

Indexes:"top_crashes_by_url_signature_pkey" PRIMARY KEY, btree (top_crashes_by_url_id, signature)

Foreign-key constraints:"top_crashes_by_url_signature_fkey" FOREIGN KEY (top_crashes_by_url_id) REFERENCES top_crashes_by_url(id) ON DELETE CASCADE

12.14.16 topcrashurlfactsreports

Associates a job uuid with comments and a row in the topcrashurlfacts table.:

Table "topcrashurlfactsreports"Column | Type | Modifiers | Description

---------------------+------------------------+-----------------+-------------id | integer | not null serial | unique iduuid | character varying(50) | not null | job uuid stringcomments | character varying(500) | | ?programmer provided?topcrashurlfacts_id | integer | | crash statistics for a product,os,url,signature and day

Indexes:"topcrashurlfactsreports_pkey" PRIMARY KEY, btree (id)"topcrashurlfactsreports_topcrashurlfacts_id_key" btree (topcrashurlfacts_id)

Foreign-key constraints:"topcrashurlfactsreports_topcrashurlfacts_id_fkey" FOREIGN KEY (topcrashurlfacts_id) REFERENCES topcrashurlfacts(id) ON DELETE CASCADE

12.14.17 alexa_topsites

Stores a weekly dump of the top 1,000 sites as measured by Alexa (csv):

Table "public.alexa_topsites"Column | Type | Modifiers

--------------+-----------------------------+------------------------domain | text | not nullrank | integer | default 10000last_updated | timestamp without time zone | not null default now()

Indexes:"alexa_topsites_pkey" PRIMARY KEY, btree (domain)



12.15 Package

The applications that run the Server are written in Python. The source code for these packages is collected into a singlepackage.

There is no current installation script for this package. It just must be available somewhere on the PYTHONPATH.

12.15.1 Package Layout

• .../scripts : for socorro applications

• .../scripts/config : configuration for socorro applications

• .../socorro : python package root

• .../socorro/collector : modules used by the collector application

• .../socorro/cron : modules used by various applications intended to run by cron

• .../socorro/database : modules associated with the relational database

• .../socorro/deferredcleanup : modules used by the deferred file system cleanup script

• .../socorro/integrationtest : for future use

• .../socorro/lib : common modules used throughout the system

• .../socorro/monitor : modules used by the monitor application

• .../socorro/processor : modules used by the processor application

• .../socorro/unittest : testing framework modules

12.16 Schema

(See bottom of page for inline graphic)

12.17 Tables used primarily when processing Jobs

Reports (Partitioned)

Reports table contains the ‘cooked’ data received from breakpad and abstracted. Data from this table is further trans-formed into ‘materialized views’ (see below). Reports is unchanged from prior version.:

CREATE TABLE reports (id serial NOT NULL PRIMARY KEY,client_crash_date timestamp with time zone,date_processed timestamp without time zone,uuid character varying(50) NOT NULL UNIQUE,product character varying(30),version character varying(16),build character varying(30),signature character varying(255),url character varying(255),install_age integer,last_crash integer,uptime integer,

12.15. Package 113


cpu_name character varying(100),cpu_info character varying(100),reason character varying(255),address character varying(20),os_name character varying(100),os_version character varying(100),email character varying(100), -- Now always NULL or emptybuild_date timestamp without time zone,user_id character varying(50), -- Now always NULL or emptystarted_datetime timestamp without time zone,completed_datetime timestamp without time zone,success boolean,truncated boolean,processor_notes text,user_comments character varying(1024),app_notes character varying(1024),distributor character varying(20),distributor_version character varying(20)

);Indices are on child/partition tables, not base tableindex: date_processedindex: uuidindex: signatureindex: urlindex: (product,version)index: (uuid, date_processed)index: (signature, date_processed)

Processors

Processors table keeps track of the current state of the processor that pull things out of the file system and into thereports database. Processors is unchanged from prior version.:

CREATE TABLE processors (id serial NOT NULL PRIMARY KEY,name varchar(255) NOT NULL UNIQUE,startdatetime timestamp without time zone NOT NULL,lastseendatetime timestamp without time zone

);

Jobs

Jobs table holds data about jobs that are queued for the processors to handle. Jobs is unchanged from prior version.:

CREATE TABLE jobs (id serial NOT NULL PRIMARY KEY,pathname character varying(1024) NOT NULL,uuid varchar(50) NOT NULL UNIQUE,owner integer,priority integer DEFAULT 0,queueddatetime timestamp without time zone,starteddatetime timestamp without time zone,completeddatetime timestamp without time zone,success boolean,message text,FOREIGN KEY (owner) REFERENCES processors (id) on delete cascade

);index: ownerindex: (owner, starteddatetime)



index (completeddatetime, priority DESC)

Priority Jobs

Priority Jobs table is used to mark rows in the jobs table that need to be processed soon. Priority Jobs is unchangedfrom prior versions.:

CREATE TABLE priortyjobs (uuid varchar(255) NOT NULL PRIMARY KEY

);

12.18 Tables primarily used during data extraction

Branches

Branches table associates a product and version with with the gecko version (called ‘branch’):

CREATE TABLE branches (product character varying(30) NOT NULL,version character varying(16) NOT NULL,branch character varying(24) NOT NULL

);

Extensions (Partitioned)

Extensions table associates a report with the extensions on the crashing application. Extensions is unchanged fromprior version. (Not now in use):

CREATE TABLE extensions (report_id integer NOT NULL, -- Foreign key references parallel reports partition(id)date_processed timestamp without time zone,extension_key integer NOT NULL,extension_id character varying(100) NOT NULL,extension_version character varying(16),FOREIGN KEY (report_id) REFERENCES reports_<partition>(id) on delete cascade

);Index is on child/partition tables, not base tableindex: (report_id,date_processed)

Frames (Partitioned)

Frames table associates a report with the stack frames and their signatures that were seen in the crashing application.Frames is unchanged from prior version.:

CREATE TABLE frames (report_id integer NOT NULL,date_processed timestamp without time zone,frame_num integer NOT NULL,signature varchar(255)FOREIGN KEY (report_id) REFERENCES reports_<partition>(id) on delete cascade

);Index is on child/partition tables, not base tableindex: (report_id,date_processed)

Plugins

Electrolysis support for out of process plugin crashes:

12.18. Tables primarily used during data extraction 115


CREATE TABLE plugins(

id serial NOT NULL PRIMARY KEY,filename TEXT NOT NULL,name TEXT NOT NULL,CONSTRAINT filename_name_key UNIQUE (filename, name)

)

Plugins_Reports? (Partitioned)

Records oopp details. a report has 0 or 1 entry in this table.:

CREATE TABLE plugins_reports(

report_id INTEGER NOT NULL,plugin_id INTEGER NOT NULL,date_processed TIMESTAMP WITHOUT TIME ZONE,version TEXT NOT NULL

)

Indices are on child/partition tables, not base table. Setup via schema.py Example for plugins_reports_20100125:

PRIMARY KEY (report_id, plugin_id),CONSTRAINT plugins_reports_20100125_report_id_fkey FOREIGN KEY (report_id) REFERENCES reports_20100125 (id) ON DELETE CASCADE,CONSTRAINT plugins_reports_20100125_plugin_id_fkey FOREIGN KEY (plugin_id) REFERENCES plugins (id) ON DELETE CASCADE,CONSTRAINT plugins_reports_20100125_date_check CHECK ((’2010-01-25 00:00:00’::TIMESTAMP without TIME zone <= date_processed) AND ( date_processed < ’2010-02-01 00:00:00’::TIMESTAMP without TIME zone)

12.19 Tables primarily used for materialized views

product visibility

Product visibility controls which products are subject to having data aggregated into the various materialized views.Replaces mtbfconfig, tcbyurlconfig:

CREATE TABLE product_visibility (productdims_id integer NOT NULL PRIMARY KEY,start_date timestamp, -- set this manually for all mat viewsend_date timestamp, -- set this manually: Used by mat views that careignore boolean default False, -- force aggregation off for this product idFOREIGN KEY (productdims_id) REFERENCES productdims(id)

);index: end_dateindex: start_date

12.20 Dimensions tables

signaturedims

Signature dims was a table associating signature with id, no longer used. Instead, signatures are stored directly in theplaces that need them.

productdims

Product dims associates a product, version and release key. An enum is used for the release key. Product dims haschanged from prior version by dropping the os_name column, which has been promoted into its own osdims table.:



CREATE TYPE release_enum AS ENUM (’major’, ’milestone’, ’development’);"

CREATE TABLE productdims (id serial NOT NULL PRIMARY KEY,product TEXT NOT NULL, -- varchar(30)version TEXT NOT NULL, -- varchar(16)release release_enum -- ’major’:x.y.z..., ’milestone’:x.ypre, ’development’:x.y[ab]z

);unique index: (product,version)index: release

osdims

OS dims associates an os name and version. Promoted from earlier versions where os_name was stored directly in‘facts’ tables.:

CREATE TABLE osdims (id serial NOT NULL PRIMARY KEY,os_name CHARACTER VARYING(100) NOT NULL,os_version CHARACTER VARYING(100)

);index: (os_name,os_version)

urldims

URL dims associates a domain and a simplified url. URL dims is unchanged from prior version.:

CREATE TABLE urldims (id serial NOT NULL,domain character varying(255) NOT NULL,url character varying(255) NOT NULL

);unique index: (url,domain)

12.21 View tables

View tables now have a uniform layout:

• id: The unique id for this row

• aggregated data: As appropriate for the view

• keys: One or more of signature, urldims id, productdims id, osdims id

• window_end: Used to keep track of most recently aggregated row

• window_size: Used redundantly in case aggregation window changes

time before failure

Aggregate the amount of time the app ran from startup to fail, and from prior fail to current fail. Replaces mtbffactstable.:

CREATE TABLE time_before_failure (id serial NOT NULL PRIMARY KEY,sum_uptime_seconds integer NOT NULL,report_count integer NOT NULL,productdims_id integer,osdims_id integer,window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,

12.21. View tables 117


window_size INTERVAL NOT NULL,FOREIGN KEY (productdims_id) REFERENCES productdims(id),FOREIGN KEY (osdims_id) REFERENCES osdims(id)

);index: (window_end,window_size)index: productdims_idindex: osdims_id

top crashes by signature

Aggregate the number of crashes per unit of time associated with a particular stack signature. Replaces topcrasherstable.:

CREATE TABLE top_crashes_by_signature (id serial NOT NULL PRIMARY KEY,count integer NOT NULL DEFAULT 0,uptime real DEFAULT 0.0,signature TEXT,productdims_id integer,osdims_id integer,window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,window_size INTERVAL NOT NULL,FOREIGN KEY (productdims_id) REFERENCES productdims(id),FOREIGN KEY (osdims_id) REFERENCES osdims(id)

);index: productdims_idindex: osdims_idindex: signatureindex: (window_end,window_size)

top crashes by url

Aggregate the number of crashes associated with a particular URL. Replaces topcrashurlfacts table.:

CREATE TABLE top_crashes_by_url (id serial NOT NULL PRIMARY KEY,count integer NOT NULL,urldims_id integer,productdims_id integer,osdims_id integer,window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,window_size INTERVAL NOT NULL,FOREIGN KEY (urldims_id) REFERENCES urldims(id)FOREIGN KEY (productdims_id) REFERENCES productdims(id),FOREIGN KEY (osdims_id) REFERENCES osdims(id)

);index: countindex: urldims_idindex: productdims_idindex: osdims_idindex: (window_end,window_size)

top crashes by url signature

Associate top crashes by url with their signature(s). Promoted from prior topcrashurlfacts where signaturedims id wasstored directly. Use of this table allows multiple signatures to be associated with the same crashing url.:

CREATE TABLE top_crashes_by_url_signature (top_crashes_by_url_id integer NOT NULL, -- foreign keysignature TEXT NOT NULL,



count integer NOT NULL,FOREIGN KEY (top_crashes_by_url_id) REFERENCES crashes_by_url(id)

);primary key: (top_crashes_by_url_id,signature)

top crash url facts reports

Associate a crash uuid and comment with a particular top crash by url row. This table’s schema is unchanged fromprior version, but the topcrashurlfacts_id column is re-purposed to map to the new top_crashes_by_url table.:

CREATE TABLE topcrashurlfactsreports (id serial NOT NULL PRIMARY KEY,uuid character varying(50) NOT NULL,comments character varying(500),topcrashurlfacts_id integerFOREIGN KEY (topcrashurlfacts_id) REFERENCES top_crashes_by_url(id)

);index: topcrashurlfacts_id

12.22 Bug tracking

bugs

Periodically extract new and changed items from the bug tracking database. Bugs is recently added.:

CREATE TABLE bugs (id int NOT NULL PRIMARY KEY,status text,resolution text,short_desc text

);

bug associations

Associate signatures with bug ids. Bug Associations is recently added.:

CREATE TABLE bug_associations (signature text NOT NULL,bug_id int NOT NULL,FOREIGN KEY (bug_id) REFERENCES bugs(id)

);primary key: (signature, bug_id)index: bug_id

Nightly Builds

Stores nightly builds in Postgres.:

CREATE TABLE builds (product text,version text,platform text,buildid BIGINT,changeset text,filename text,date timestamp without time zone default now(),CONSTRAINT builds_key UNIQUE (product, version, platform, buildid)

);

12.22. Bug tracking 119


12.23 Meta data

Server status

Server Status table keeps track of the current status of jobs processors. Server status is unchanged from prior version.:

CREATE TABLE server_status (id serial NOT NULL PRIMARY KEY,date_recently_completed timestamp without time zone,date_oldest_job_queued timestamp without time zone,avg_process_sec real,avg_wait_sec real,waiting_job_count integer NOT NULL,processors_count integer NOT NULL,date_created timestamp without time zone NOT NULL

);index: (date_created,id)

12.24 Database Setup

This app is under development. For progress information see: Bugzilla 454438

This is an application that will set up the PostgreSQL database schema for Socorro. It starts with an empty databaseand creates all the tables, indexes, constraints, stored procedures and triggers needed to run a Socorro instance.

Before this application can be run, however, there have been set up a regular user that will be used for the day to dayoperations. While it is not recommended that the regular user have the full set of super user privileges, the regular usermust be privileged enough to create tables within the database.

Before the application that sets up the database can be run, the Common Config must be set up. The configuration filefor this app itself is outlined at the end of this page.

12.24.1 Running the setupDatabase app

.../scripts/setupDatabase.py

12.24.2 Configuring setupDatabase app

This application relies on its own configuration file as well as the common configuration file Common Config.

copy the .../scripts/config/setupdatabaseconfig.py.dist file to.../scripts/config/setupdatabase.py and edit the file to make site specific changes.

logFilePathname




Monitor can log its actions to a set of automatically rotating log files. This is the name and location of the logs.:

logFilePathname = cm.Option()logFilePathname.doc = ’full pathname for the log file’logFilePathname.default = ’./monitor.log’

logFileMaximumSize


















12.25 Common Config

To avoid repetition between configurations of a half dozen independently running applications, common settings areconsolidated in a common configuration file: OB.../scripts/config/commonconfig.py.dist.

12.25. Common Config 121


All Socorro applications have these constants available to them. For a Socorro applications that are command linedriven, each of these default values can be overidden by a command line switch of the same name.

To setup this configuration file, just copy the example, .../scripts/config/commonconfig.py.dist to.../scripts/config/commonconfig.py.

Edit the file for your local situation.:

import socorro.lib.ConfigurationManager as cmimport datetimeimport stat

#---------------------------------------------------------------------------# Relational Database Section

databaseHost = cm.Option()databaseHost.doc = ’the hostname of the database servers’databaseHost.default = ’localhost’

databasePort = cm.Option()databasePort.doc = ’the port of the database on the host’databasePort.default = 5432

databaseName = cm.Option()databaseName.doc = ’the name of the database within the server’databaseName.default = ’’

databaseUserName = cm.Option()databaseUserName.doc = ’the user name for the database servers’databaseUserName.default = ’’

databasePassword = cm.Option()databasePassword.doc = ’the password for the database user’databasePassword.default = ’’

#---------------------------------------------------------------------------# Crash storage system

jsonFileSuffix = cm.Option()jsonFileSuffix.doc = ’the suffix used to identify a json file’jsonFileSuffix.default = ’.json’

dumpFileSuffix = cm.Option()dumpFileSuffix.doc = ’the suffix used to identify a dump file’dumpFileSuffix.default = ’.dump’

#---------------------------------------------------------------------------# HBase storage system

hbaseHost = cm.Option()hbaseHost.doc = ’Hostname for hbase hadoop cluster. May be a VIP or load balancer’hbaseHost.default = ’localhost’

hbasePort = cm.Option()hbasePort.doc = ’hbase port number’hbasePort.default = 9090

hbaseTimeout = cm.Option()hbaseTimeout.doc = ’timeout in milliseconds for an HBase connection’



hbaseTimeout.default = 5000

#---------------------------------------------------------------------------# misc

processorCheckInTime = cm.Option()processorCheckInTime.doc = ’the time after which a processor is considered dead (hh:mm:ss)’processorCheckInTime.default = "00:05:00"processorCheckInTime.fromStringConverter = lambda x: str(cm.timeDeltaConverter(x))

startWindow = cm.Option()startWindow.doc = ’The start of the single aggregation window (YYYY-MM-DD [hh:mm:ss])’startWindow.fromStringConverter = cm.dateTimeConverter

deltaWindow = cm.Option()deltaWindow.doc = ’The length of the single aggregation window ([dd:]hh:mm:ss)’deltaWindow.fromStringConverter = cm.timeDeltaConverter

defaultDeltaWindow = cm.Option()defaultDeltaWindow.doc = ’The length of the single aggregation window ([dd:]hh:mm:ss)’defaultDeltaWindow.fromStringConverter = cm.timeDeltaConverter

# override this default for your particular cron taskdefaultDeltaWindow.default = ’00:12:00’

endWindow = cm.Option()endWindow.doc = ’The end of the single aggregation window (YYYY-MM-DD [hh:mm:ss])’endWindow.fromStringConverter = cm.dateTimeConverter

startDate = cm.Option()startDate.doc = ’The start of the overall/outer aggregation window (YYYY-MM-DD [hh:mm])’startDate.fromStringConverter = cm.dateTimeConverter

deltaDate = cm.Option()deltaDate.doc = ’The length of the overall/outer aggregation window ([dd:]hh:mm:ss)’deltaDate.fromStringConverter = cm.timeDeltaConverter

initialDeltaDate = cm.Option()initialDeltaDate.doc = ’The length of the overall/outer aggregation window ([dd:]hh:mm:ss)’initialDeltaDate.fromStringConverter = cm.timeDeltaConverter

# override this default for your particular cron taskinitialDeltaDate.default = ’4:00:00:00’

minutesPerSlot = cm.Option()minutesPerSlot.doc = ’how many minutes per leaf directory in the date storage branch’minutesPerSlot.default = 1

endDate = cm.Option()endDate.doc = ’The end of the overall/outer aggregation window (YYYY-MM-DD [hh:mm:ss])’endDate.fromStringConverter = cm.dateTimeConverter

debug = cm.Option()debug.doc = ’do debug output and routines’debug.default = Falsedebug.singleCharacter = ’D’debug.fromStringConverter = cm.booleanConverter

12.25. Common Config 123


12.26 Populate ElasticSearch

12.26.1 Install ElasticSearch

First you need to install ElasticSearch. The procedure is well described in this tutorial: Setting up elasticsearch. Don’tbother configuring ES if you don’t know you will need it, it generally works just fine out of the box.

Note: ElasticSearch is not yet included in our Vagrant dev VMs but should be sometime soon.

12.26.2 Increase open files limit

ElasticSearch needs to open a lot of files when indexing, often reaching the limits imposed by UNIX systems. Toavoid errors when indexing, you will have to increase the limits imposed by your OS.

First see what user is running ElasticSearch. It may be root or vagrant. Use top for example and look for anelasticsearch-l process. Then edit /etc/security/limits.conf and add at the end the following:

root soft nofile 4096root hard nofile 10240

Replace root with vagrant (or whatever user is running ES) if needed, save and restart your VM.

You will also need to increase the system-wide file descriptors limit by editing /etc/sysctl.conf and adding atthe end:

fs.file-max = 100000

After you saved and closed the file, run sysctl -p, then cat /proc/sys/fs/file-max to verify it worked.No restart is required here.

Note: I am not sure whether restarting the VM is necessary, or if ElasticSearch only is needed. Don’t hesitate to makethis more precise with the result of your experiments.

Source: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

12.26.3 Download the dump

You can get a recent dump for ElasticSearch in http://people.mozilla.org/~agaudebert/socorro/es-dumps/.

You will also need to get the mapping of our Socorro indexes: http://people.mozilla.org/~agaudebert/socorro/es-dumps/mapping.json

12.26.4 Run the script

The script to import crashes into ElasticSearch is not yet merged into our official repository. To get it, you will needto fetch github.com/AdrianGaudebert/socorro and checkout branch 696722-script-import-es:

git remote add AdrianGaudebert https://github.com/AdrianGaudebert/socorro.gitgit fetch AdrianGaudebertgit branch --track 696722-script-import-es AdrianGaudebert/696722-script-import-esgit checkout 696722-script-import-es

Before you can run the script, you will have to stop supervisord:

sudo /etc/init.d/supervisor force-stop


http://www.elasticsearch.org/tutorials/2010/07/01/setting-up-elasticsearch.html

http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

http://people.mozilla.org/~agaudebert/socorro/es-dumps/

http://people.mozilla.org/~agaudebert/socorro/es-dumps/mapping.json

http://people.mozilla.org/~agaudebert/socorro/es-dumps/mapping.json


The script is called movecrashes.py and is in .../scripts/. It has a few dependencies over Socorro and thusneeds to be ran from the root of a Socorro directory with $PYTHONPATH = .:thirdparty. Use it as follow:

python scripts/movecrashers.py import /path/to/dump.tar /path/to/mapping.json

This will simply import all crash reports contained in the dump into ElasticSearch, without cleaning anything before.If you want to have more data than available in the dump, you can just run that import again and create duplicates.

If you want to clean the old socorro data first, just run rebuild instead of import:

python scripts/movecrashers.py rebuild /path/to/dump.tar /path/to/mapping.json

Note that this will only delete indexes called socorro_xxxxxx. If you’re using a shared ES instance, or have otherindexes you want to keep, there is no risk they get deleted in this process.

12.26. Populate ElasticSearch 125



CHAPTER 13

PostgreSQL Database

13.1 PostgreSQL Database Tables by Data Source

Last updated: 2011-01-15

This document breaks down the tables in the Socorro PostgreSQL database by where their data comes from, ratherthan by what the table contains. This is a prerequisite to populating a brand-new socorro database or creating synthetictesting workloads.

13.2 Manually Populated Tables

The following tables have no code to populate them automatically. Initial population and any updating need to be doneby hand. Generally there’s no UI, either; use queries.

• daily_crash_codes

• os_name_matches

• os_names

• process_types

• product_release_channels

• products

• release_channel_matches

• release_channels

• uptime_levels

• windows_versions

• product_productid_map

• report_partition_info

13.3 Tables Receiving External Data

These tables actually get inserted into by various external utilities. This is most of our “incoming” data.

bugs list of bugs, populated by bugzilla-scraper

127


extensions populated by processors

plugins_reports populated by processors

raw_adu populated by daily batch job from metrics

releases_raw populated by daily FTP-scraper

reports populated by processors

13.4 Automatically Populated Reference Tables

Lookup lists and dimension tables, populated by cron jobs and/or processors based on the above tables. Most areannotated with the job or process which populates them. Where the populating process is marked with an @, thatindicates a job which is due to be phased out.

addresses cron job, part of update_reports_clean based on reports

domains cron job, part of update_reports_clean based on reports

flash_versions cron job, part of update_reports_clean based on reports

os_versions cron job, update_os_versions based on reports@ cron job, update_reports_clean based on reports

plugins populated by processors based on crash data

product_version_builds cron job, update_product_versions, based on releases_raw

product_versions cron job, update_product_versions, based on releases_raw

reasons cron job, update_reports_clean, based on reports

reports_bad cron job, update_reports_clean, based on reports future cron job to delete data from this table

signatures cron job, update_signatures, based on reports@ cron job, update_reports_clean, based on reports

13.5 Matviews

Reporting tables, designed to be called directly by the mware/UI/reports. Populated by cron job batch. Where popu-lating functions are marked with a @, they are due to be replaced with new jobs.

bug_associations not sure

daily_crashes daily_crashes based on reports

daily_hangs update_hang_report based on reports

os_signature_counts update_os_signature_counts based on reports

product_adu daily_adu based on raw_adu

product_signature_counts update_product_signature_counts based on reports

reports_clean update_reports_clean based on reports

reports_user_info update_reports_clean based on reports

reports_duplicates find_reports_duplicates based don reports

signature_bugs_rollup not sure

signature_first@ update_signatures based on reports@

signature_products update_signatures based on reports@

128 Chapter 13. PostgreSQL Database


signature_products_rollup update_signatures based on reports@

tcbs update_tcbs based on reports

uptime_signature_counts update_uptime_signature_counts based on reports

13.6 Application Management Tables

These tables are used by various parts of the application to do other things than reporting. They are populated/managedby those applications.

• email campaign tables

– email_campaigns

– email_campaigns_contacts

– email_contacts

• processor management tables

– jobs

– priorityjobs

– priority_jobs_*

– processors

– server_status

• UI management tables

– sessions

• monitoring tables

– replication_test

• cronjob and database management

– cronjobs

– report_partition_info

13.7 Deprecated Tables

These tables are supporting functionality which is scheduled to be removed over the next few versions of Socorro. Assuch, we are ignoring them.

• alexa_topsites

• builds

• frames

• osdims

• priorityjobs_log

• priorityjobs_logging_switch

• product_visibility

13.6. Application Management Tables 129


• productdims

• productdims_version_sort

• release_build_type_map

• signature_build

• signature_productdims

• top_crashes_by_signature

• top_crashes_by_url

• top_crashes_by_url_signature

• urldims

13.8 PostgreSQL Database Table Descriptions

This document describes the various tables in PostgreSQL by their purpose and essentially what data each contains.This is intended as a reference for socorro developers and analytics users.

Tables which are in the database but not listed below are probably legacy tables which are slated for removal in futureSocorro releases. Certainly if the tables are not described, they should not be used for new features or reports.

13.9 Raw Data Tables

These tables hold “raw” data as it comes in from external sources. As such, these tables are quite large and contain alot of garbage and data which needs to be conditionally evaluated. This means that you should avoid using these tablesfor reports and interfaces unless the data you need isn’t available anywhere else – and even then, you should see aboutgetting the data added to a matview or normalized fact table.

13.9.1 reports

The primary “raw data” table, reports contains the most used information about crashes, one row per crash report.Primary key is the UUID field.

The reports table is partitioned by date_processed into weekly partitions, so any query you run against it should includefilter criteria (WHERE) on the date_processed column. Examples:

WHERE date_processed BETWEEN ’2012-02-12 11:05:09+07’ AND ’2012-02-17 11:05:09+07’WHERE date_processed >= DATE ’2012-02-12’ AND date_processed < DATE ’2012-02-17’WHERE utc_day_is(date_processed, ’2012-02-15’)

Data in this table comes from the processors.

13.9.2 extensions

Contains information on add-ons installed in the user’s application. Currently linked to reports via a synthetic report_id(this will be fixed to be UUID in some future release). Data is partitioned by date_processed into weekly partitions, soinclude a filter on date_processed in every query hitting this table. Has zero to several rows for each crash.




13.9.3 plugins_reports

Contains information on some, but not all, installed modules implicated in the crash: the “most interesting” modules.Relates to dimension table plugins. Currently linked to reports via a synthetic report_id (this will be fixed to beUUID in some future release). Data is partitioned by date_processed into weekly partitions, so include a filter ondate_processed in every query hitting this table. Has zero to several rows for each crash.


13.9.4 bugs

Contains lists of bugs thought to be related to crash reports, for linking to crashes. Populated by a daily cronjob.

13.9.5 bug_associations

Links bugs from the bugs table to crash signatures. Populated by daily cronjob.

13.9.6 raw_adu

Contains counts of estimated Average Daily Users as calculated by the Metrics department, grouped by product,version, build, os, and UTC date. Populated by a daily cronjob.

13.9.7 releases_raw

Contains raw data about Mozilla releases, including product, version, platform and build information. Populatedhourly via FTP-scraping.

13.9.8 reports_duplicates

Contains UUIDs of groups of crash reports thought to be duplicates according to the current automated duplicate-finding algorithm. Populated by hourly cronjob.

13.10 Normalized Fact Tables

13.10.1 reports_clean

Contains cleaned and normalized data from the reports table, including product-version, os, os version, signature,reason, and more. Partitioned by date into weekly partitions, so each query against this table should contain a predicateon date_processed:

WHERE date_processed BETWEEN ’2012-02-12 11:05:09+07’ AND ’2012-02-17 11:05:09+07’WHERE date_processed >= DATE ’2012-02-12’ AND date_processed < DATE ’2012-02-17’WHERE utc_day_is(date_processed, ’2012-02-15’)

Because reports_clean is much smaller than reports and is normalized into unequivocal relationships with dimenstiontables, it is much easier to use and faster to execute queries against. However, it excludes data in the reports tablewhich doesn’t conform to normalized data, including:

• product versions before the first Rapid Release versions (e.g. Firefox 3.6)

13.10. Normalized Fact Tables 131


• Camino

• corrupt reports, including ones which indicate a breakpad bug

Populated hourly, 3 hours behind the current time, from data in reports via cronjob. The UUID column is the primarykey. There is one row per crash report, although some crash reports are suspected to be duplicates.

Columns:

uuid artificial unique identifier assigned by the collectors to the crash at collection time. Contains the date collectedplus a random string.

date_processed timestamp (with time zone) at which the crash was received by the collectors. Also the partition keyfor partitioning reports_clean. Note that the time will be 7-8 hours off for crashes before February 2012 due toa shift from PST to UTC.

client_crash_date timestamp with time zone at which the users’ crashing machine though the crash was happening.Often innacurrate due to clock issues, is primarily supplied as an anchor timestamp for uptime and install_age.

product_version_id foreign key to the product_versions table.

build numeric build identifier as supplied by the client. Might not match any real build in product_version_builds fora variety of reasons.

signature_id foreign key to the signatures dimension table.

install_age time interval between installation and crash, as reported by the client. To get the reported install date, do( SELECT client_crash_date - install_age ).

uptime time interval between program start and crash, as reported by the client.

reason_id foreign key to the reasons table.

address_id foreign key to the addresses table.

os_name name of the OS of the crashing host, for OSes which match known OSes.

os_version_id foreign key to the os_versions table.

hang_id UUID assigned to the hang pair grouping for hang pairs. May not match anything if the hang pair wasbroken by sampling or lost crash reports.

flash_version_id foreign key to the flash_versions table

process_type Crashing process type, linked to process_types dimension.

release_channel release channel from which the crashing product was obtained, unless altered by the user (this hap-pens more than you’d think). Note that non-Mozilla builds are usually lumped into the “release” channel.

duplicate_of UUID of the “leader” of the duplicate group if this crash is marked as a possible duplicate. If UUID andduplicate_of are the same, this crash is the “leader”. Selection of leader is arbitrary.

domain_id foreign key to the domains dimension

architecture CPU architecture of the client as reported (e.g. ‘x86’, ‘arm’).

cores number of CPU cores on the client, as reported.

13.10.2 reports_user_info

Contains a handful of “optional” information from the reports table which is either security-sensitive or is not includedin all reports and is large. This includes the full URL, user email address, comments, and app_notes. As such, accessto this table in production may be restricted.



Partitioned by date into weekly partitions, so each query against this table should contain a predicate ondate_processed. Relates to reports_clean via UUID, which is also its primary key.

13.10.3 product_adu

The normalized version of raw_adu, contains summarized estimated counts of users for each product-version sinceRapid Release began. Populated by daily cronjob.

13.11 Dimensions

These tables contain lookup lists and taxonomy for the fact tables in Socorro. Generally they are auto-populated basedon encountering new values in the raw data, on an hourly basis. A few tables below are manually populated and changeextremely seldom, if at all.

Dimensions which are lookup lists of short values join to the fact tables by natural key, although it is not actually nec-essary to reference them (e.g. os_name, release_channel). Dimension lists which have long values or are taxonomiesor heirarchies join to the fact tables using a surrogate key (e.g. product_version_id, reason_id).

Some dimensions which come from raw crash data have a “first_seen” column which displays when that value wasfirst encountered in a crash and added to the dimension table. Since the first_seen columns were added in September2011, most of these will have the value ‘2011-01-01’ which is not meaningful. Only dates after 2011-09-15 actuallyindicate a first appearance.

13.11.1 addresses

Contains a list of crash location “addresses”, extracted hourly from the raw data. Surrogate key: address_id.

13.11.2 daily_crash_codes

Reference list for the cryptic single-character codes in the daily_crashes table. Legacy, to be eventually restructured.Natural key: crash_code. Manually populated.

13.11.3 domains

List of HTTP domains extracted from raw reports by applying a truncation regex to the crashing URL. These shouldcontain no personal information. Contains a “first seen” column. Surrogate key: domain_id

13.11.4 flash_versions

List of Abobe Flash version numbers harvested from crashes. Has a “first_seen” column. Surrogate key:flash_version_id.

13.11.5 os_names

Canonical list of OS names used in Sorocco. Natural key. Fixed list, manually populated.

13.11. Dimensions 133


13.11.6 os_versions

List of versions for each OS based on data harvested from crashes. Contains some garbage versions because we cannotvalidate. Surrogate key: os_version_id.

13.11.7 plugins

List of “interesting modules” harvested from raw crashes, populated by the processors. Surrogate key: ID. Links toplugins_reports.

13.11.8 process_types

Standing list of crashing process types (browser, plugin and hang). Manually input. Natural key.

13.11.9 products

List of supported products, along with the first version on rapid release. Manually maintained. Natural key: prod-uct_name.

13.11.10 product_versions

Contains a list of versions for each product, since the beginning of rapid release (i.e. since Firefox 5.0). Versionnumbers are available expressed several different ways, and there is a sort column for sorting versions. Also containsbuild_date/sunset_date visibility information and the featured_version flag. “build_type” means the same thing as“release_channel”. Surrogate key: product_version_id.

Version columns include:

version_string The canonical, complete version number for display to users

release_version The version number as provided in crash reports (and usually the same as the one on the FTP server).Can be missing suffixes like “b2” or “esr”.

major_version Just the first two numbers of the version number, e.g. “11.0”

version_sort An alphanumeric string which allows you to sort version numbers in the correct order.

beta_number The sequential beta release number if the product-version is a beta. For “final betas”, this number willbe 99.

13.11.11 product_version_builds

Contains a list of builds for each product-version. Note that platform information is not at all normalized. Natural key:product_version_id, build_id.

13.11.12 product_release_channels

Contains an intersection of products and release channels, mainly in order to store throttle values. Manually populated.Natural key: product_name, release_channel.



13.11.13 reasons

Contains a list of “crash reason” values harvested from raw crashes. Has a “first seen” column. Surrogate key:reason_id.

13.11.14 release_channels

Contains a list of available Release Channels. Manually populated. Natural key. See “note on release channelcolumns” below.

13.11.15 signatures

List of crash signatures harvested from incoming raw data. Populated by hourly cronjob. Has a first_seen column.Surrogate key: signature_id.

13.11.16 uptime_levels

Reference list of uptime “levels” for use in reports, primarily the Signature Summary. Manually populated.

13.11.17 windows_versions

Reference list of Window major/minor versions with their accompanying common names for reports. Manually pop-ulated.

13.12 Matviews

These data summaries are derived data from the fact tables and/or the raw data tables. They are populated by hourlyor daily cronjobs, and are frequently regenerated if historical data needs to be corrected. If these matviews contain thedata you need, you should use them first because they are smaller and more efficient than the fact tables or the rawtables.

13.12.1 correlations

Summaries crashes by product-version, os, reason and signature. Populated by daily cron job. Is the root for the othercorrelations reports. Correlation reports in the database will not be active/populated until 2.5.2 or later.

13.12.2 correlation_addons

Contains crash-count summaries of addons per correlation. Populated by daily cronjob.

13.12.3 correlation_cores

Contains crash-count summaries of crashes per architecture and number of cores. Populated by daily cronjob.

13.12. Matviews 135


13.12.4 correlation_modules

Will contain crash-counts for modules per correlation. Will be populated daily by pull from Hbase.

13.12.5 daily_crashes

Stores crash counts per product-version, OS, and day. This is probably the oldest matview, and has unintuitive andhistorical column names; it will probably be overhauled or replaced. The report_type column defines 5 different setsof counts, see daily_crash_codes above.

We recommended that you use the VIEW daily_crash_ratio instead of using daily_crashes, as the structure ofdaily_crashes is hard to understand and is likely to change in the future.

13.12.6 daily_hangs and hang_report

daily_hangs contains a correlation of hang crash reports with their related hang pair crashes, plus additional summarydata. Duplicates contains an array of UUIDs of possible duplicates.

hang_report is a dynamic view which flattens daily_hangs and its related dimension tables.

13.12.7 nightly_builds

contains summaries of crashes-by-age for Nightly and Aurora releases. Will be populated in Socorro 2.5.1.

13.12.8 product_crash_ratio

Dynamic VIEW which shows crashes, ADU, adjusted crashes, and the crash/100ADU ratio, for each product andversions. Recommended for backing graphs and similar.

13.12.9 product_os_crash_ratio

Dynamic VIEW which shows crashes, ADU, adjusted crashes, and the crash/100ADU ratio for each product, OS andversion. Recommended for backing graphs and similar.

13.12.10 product_info

dynamic VIEW which suppies the most essential information about each product version for both old and new prod-ucts.

13.12.11 signature_products and signature_products_rollup

Summary of which signatures appear in which product_version_ids, with first appearance dates.

The rollup contains an array-style summary of the signatures with lists of product-versions.

13.12.12 tcbs

Short for “Top Crashes By Signature”, tcbs contains counts of crashes per day, signature, product-version, and columnscounting each OS.



13.13 Note On Release Channel Columns

Due to a historical error, the column name for the Release Channel in various tables may be named “release_channel”,“build_type”, or “build_channel”. All three of these column names refer to exactly the same thing. While we regretthe confusion, it has not been thought to be worth the refactoring effort to clean it up.

13.14 Application Support Tables

These tables are used by various parts of the application to do other things than reporting. They are populated/managedby those applications. Most are not accessible to the various reporting users, as they do not contain reportable data.

13.14.1 data processing control tables

These tables contain data which supports data processing by the processors and cronjobs.

product_productid_map maps product names based on productIDs, in cases where the product name supplied byBreakpad is not correct (i.e. FennecAndroid).

reports_bad contains the last day of rejected UUIDs for copying from reports to reports_clean. intended for auditingof the reports_clean code.

os_name_matches contains regexs for matching commonly found OS names in crashes with canonical OS names.

release_channel_matches contains LIKE match strings for release channels for channel names commonly found incrashes with canonical names.

special_product_platforms contains mapping information for rewriting data from FTP-scraping to have the correctproduct and platform. Currently used only for Fennec.

transform_rules contains rule data for rewriting crashes by the processors. May be used in the future for otherrule-based rewriting by other components.

13.14.2 email campaign tables

These tables support the application which emails crash reporters with follow-ups. As such, access to these tables willrestricted.

• email_campaigns

• email_campaigns_contacts

• email_contacts

13.14.3 processor management tables

These tables are used to coordinate activities of the up-to-120 processors and the monitor.

jobs The current main queue for crashes waiting to be processed.

priorityjobs The queue for user-requested “priority” crash processing.

processors The registration list for currently active processors.

server_status Contains summary statistics on the various processor servers.

13.13. Note On Release Channel Columns 137


13.14.4 UI management tables

sessions contains session information for people logged into the administration interface for Socorro.

13.14.5 monitoring tables

replication_test Contains a timestamp for ganglia to measure the speed of replication.

13.14.6 cronjob and database management

These tables support scheduled tasks which are run in Socorro.

cronjobs contains last-completed and success/failure status for each cronjob which affects the database. Currentlydoes not include all cronjobs.

report_partition_info contains configuration information on how the partitioning cronjob needs to partition the var-ious partitioned database tables.

socorro_db_version contains the socorro version of the current database. updated by the upgrade scripts.

socorro_db_version_history contains the history of version upgrades of the current database.

13.15 Creating a New Matview

A materialized view, or “matview” is the results of a query stored as a table in the PostgreSQL database. Matviewsmake user interfaces much more responsive by eliminating searches over many GB or sparse data at request time. Themajority of the time, new matviews will have the following characteristics:

• they will pull data from reports_clean and/or reports_user_info

• they will be updated once per day and store daily summary data

• they will be updated by a cron job calling a stored procedure

The rest of this guide assumes that all three conditions above are true. For matviews for which one or more conditionsare not true, consult the PostgreSQL DBAs for your matview.

13.16 Do I Want a Matview?

Before proceeding to construct a new matview, test the responsiveness of simply running a query over reports_cleanand/or reports_user_info. You may find that the query returns fast enough ( < 100ms ) without its own matview.Remember to test the extreme cases: Firefox release version on Windows, or Fennec aurora version.

Also, matviews are really only effective if they are smaller than 1/4 the size of the base data from which they areconstructed. Otherwise, it’s generally better to simply look at adding new indexes to the base data. Try populating acouple days of the matview, ad-hoc, and checking its size (pg_total_relation_size()) compared to the base table fromwhich it’s drawn. The new signature summaries was a good example of this; the matviews to meet the spec wouldhave been 1/3 the size of reports_clean, so we added a couple new indexes to reports_clean instead.



13.17 Components of a Matview

In order to create a new matview, you will create or modify five or six things:

1. a table to hold the matview data

2. an update function to insert new matview data once per day

3. a backfill function to backfill one day of the matview

4. add a line in the general backfill_matviews function

5. if the matview is to be backfilled from deployment, a script to do this

6. a test that the matview is being populated correctly.

Point (6) is not yet addressed by a test framework for Socorro, so we’re skipping it currently.

For the rest of this doc, please refer to the template matview code sql/templates/general_matview_template.sql in theSocorro source code.

13.18 Creating the Matview Table

The matview table should be the basis for the report or screen you want. It’s important that it be able to cope with allof the different filter and grouping criteria which users are allowed to supply. On the other hand, most of the time it’snot helpful to try to have one matview support several different reports; the matview gets bloated and slow.

In general, each matview will have the following things:

• one or more grouping columns

• a report_date column

• one or more summary data columns

If they are available, all columns should use surrogate keys to lookup lists (i.e. use signature_id, not the full text of thesignature). Generally the primary key of the matview will be the combination of all grouping columns plus the reportdate.

So, as an example, we’re going to create a simple matview for summarizing crashes per product, web domain. Whileit’s unlikely that such a matview would be useful in practice (we could just query reports_clean directly) it makes agood example. Here’s the model for the table:

table product_domain_countsproduct_versiondomainreport_datereport_countkey product_version, domain, report_date

We actually use the custom procedure create_table_if_not_exists() to create this. This function handles idempotence,permissions, and secondary indexes for us, like so:

SELECT create_table_if_not_exists(’product_domain_counts’$x$CREATE TABLE product_domain_counts (

product_version_id INT NOT NULL,domain_id INT NOT NULL,report_date DATE NOT NULL,report_count INT NOT NULL DEFAULT 0,

13.17. Components of a Matview 139


constraint product_domain_counts_key (product_version_id, domain_id, report_date )

);$x$,’breakpad_rw’, ARRAY[’domain_id’] );

See DatabaseAdminFunctions in the docs for more information about the function.

You’ll notice that the resulting matview uses the surrogate keys of the corresponsing lookup lists rather than the actualvalues. This is to keep matview sizes down and improve performance. You’ll also notice that there are no foriegnkeys to the various lookup list tables; this is partly a performance optimization, but mostly because, since matviewsare populated by stored procedure, validating input is not critical. We also don’t expect to need cascading updates ordeletes on the lookup lists.

13.18.1 Creating The Update Function

Once you have the table, you’ll need to write a function to be called by cron once per day in order to populate thematview with new data.

This function will:

• be named update_{name_of_matview}

• take two parameters, a date and a boolean

• return a boolean, with true = success and ERROR = failure

• check if data it depends on is available

• check if it’s already been run for the day

• pull its data from reports_clean, reports_user_info, and/or other matviews (_not_ reports or other raw data tables)

So, here’s our update function for the product_domains table:

CREATE OR REPLACE FUNCTION update_product_domain_counts (updateday DATE, checkdata BOOLEAN default TRUE )

RETURNS BOOLEANLANGUAGE plpgsqlSET work_mem = ’512MB’SET temp_buffers = ’512MB’SET client_min_messages = ’ERROR’AS $f$BEGIN-- this function populates a daily matview-- for crash counts by product and domain-- depends on reports_clean

-- check if we’ve been runIF checkdata THEN

PERFORM 1 FROM product_domain_countsWHERE report_date = updatedayLIMIT 1;IF FOUND THEN

RAISE EXCEPTION ’product_domain_counts has already been run for %.’,updateday;END IF;

END IF;

-- check if reports_clean is completeIF NOT reports_clean_done(updateday) THEN



IF checkdata THENRAISE EXCEPTION ’Reports_clean has not been updated to the end of %’,updateday;

ELSERETURN TRUE;

END IF;END IF;

-- now insert the new records-- this should be some appropriate query, this simple group by-- is just provided as an exampleINSERT INTO product_domain_counts

( product_version_id, domain_id, report_date, report_count )SELECT product_version_id, domain_id,

updateday,count(*)

FROM reports_cleanWHERE domain_id IS NOT NULL

AND date_processed >= updateday::timestamptzAND date_processed < ( updateday + 1 )::timestamptz

GROUP BY product_version_id, domain_id;

RETURN TRUE;END; $f$;

Note that the update functions could be written in PL/python if you wish; however, there isn’t yet a template for that.

13.18.2 Creating The Backfill Function

The second function which needs to be created is one for backfilling data for specific dates, for when we need tobackfill missing or corrected data. This function will also be used to fill in data when we first deploy the matview.

The backfill function will generally be very simple; it just calls a delete for the days data and then the update function,with the “checkdata” flag disabled:

CREATE OR REPLACE FUNCTION backfill_product_domain_counts(updateday DATE )

RETURNS BOOLEANLANGUAGE plpgsql AS$f$BEGIN

DELETE FROM product_domain_counts WHERE report_date = updateday;PERFORM update_product_domain_counts(updateday, false);

RETURN TRUE;END; $f$;

13.18.3 Adding The Function To The Omnibus Backfill

Usually when we backfill data we recreate all matview data for the period affected. This is accomplished by insertingit into the backfill_matviews table:

INSERT INTO backfill_matviews ( matview, function_name, frequency )VALUES ( ’product_domain_counts’, ’backfill_product_domain_counts’, ’daily’ );

13.18. Creating the Matview Table 141


NOTE: the above is not yet active. Until it is, send a request to Josh Berkus to add your new backfill to the omnibusbackfill function.

13.18.4 Filling in Initial Data

Generally when creating a new matview, we want to fill in two weeks or so of data. This can be done with either aPython or a PL/pgSQL script. A PL/pgSQL script would be created as a SQL file and look like this:

DO $f$DECLARE

thisday DATE := ’2012-01-14’;lastday DATE;

BEGIN

-- set backfill to the last day we have ADU forSELECT max("date")INTO lastdayFROM raw_adu;

WHILE thisday <= lastday LOOP

RAISE INFO ’backfilling %’, thisday;

PERFORM backfill_product_domain_counts(thisday);

thisday := thisday + 1;

END LOOP;

END;$f$;

This script would then be checked into the set of upgrade scripts for that version of the database.

13.19 Database Admin Function Reference

What follows is a listing of custom functions written for Socorro in the PostgreSQL database which are intended fordatabase administration, particularly scheduled tasks. Many of these functions depend on other, internal functionswhich are not documented.

All functions below return BOOLEAN, with TRUE meaning completion, and throw an ERROR if they fail, unlessotherwise noted.

13.20 MatView Functions

These functions manage the population of the many Materialized Views in Socorro. In general, for each matview thereare two functions which maintain it:

update_{matview_name} ( DATE )

fills in one day of the matview for the first timewill error if data is already present, or source datais missing



backfill_{matview_name} ( DATE )

deletes one day of data for the matview and recreatesit. will warn, but not error, if source data is missingsafe for use without downtime

Exceptions to the above are generally for procedures which need to run hourly or more frequently (e.g. up-date_reports_clean, reports_duplicates). Also, some functions have shortcut names where they don’t use the fullname of the matview (e.g. update_adu).

Note that the various matviews can take radically different amounts of time to update or backfill ... from a couple ofseconds to 10 minutes for one day.

In addition, there are several procedures which are designed to update or backfill multiple matviews for a range ofdays. These are designed for when there has been some kind of widespread issue in crash processing and a bunch ofcrashes have been reprocessed and need to be re-aggregated.

These mass-backfill functions generally give a lot of command-line feedback on their progress, and should be run ina screen session, as they may take hours to complete. These functions, as the most generally used, are listed first. Ifyou are doing a mass-backfill, you probably want to limit the backfill to a week at a time in order to prevent it fromrunning too long before committing.

13.20.1 backfill_matviews

Purpose: backfills data for all matviews for a specific range of dates. For use when data is either missing or needs tobe retroactively corrected.

Called By: manually by admin as needed

backfill_matviews (startdate DATE,optional enddate DATE default current_date,optional reportsclean BOOLEAN default true

)

SELECT backfill_matviews( ’2011-11-01’, ’2011-11-27’, false );SELECT backfill_matviews( ’2011-11-01’ );

startdate the first date to backfill

enddate the last date to backfill. defaults to the current UTC date.

reportsclean whether or not to backfill reports_clean as well. defaults to true supplied because the backfill of re-ports_clean takes a lot of time.

13.20.2 backfill_reports_clean

Purpose: backfill only the reports_clean normalized fact table.

Called By: admin as needed

backfill_reports_clean (starttime TIMESTAMPTZ,endtime TIMESTAMPTZ,

)

SELECT backfill_reports_clean ( ’2011-11-17’, ’2011-11-29 14:00:00’ );

13.20. MatView Functions 143


starttime timestamp to start backfill

endtime timestamp to halt backfill at

Note: if backfilling less than 1 day, will backfill in 1-hour increments. If backfilling more than one day, will backfillin 6-hour increments. Can take a long time to backfill more than a couple of days.

13.20.3 update_adu, backfill_adu

Purpose: updates or backfills one day of the product_adu table, which is one of the two matviews powering the graphsin socorro. Note that if ADU is out of date, it has no dependancies, so you only need to run this function.

Called By: update function called by the update_matviews cron job.

update_adu (updateday DATE);

backfill_adu (updateday DATE);

SELECT update_adu(’2011-11-26’);

SELECT backfill_adu(’2011-11-26’);

updateday DATE of the UTC crash report day to update or backfill

13.20.4 update_products

Purpose: updates the list of product_versions and product_version_builds based on the contents of releases_raw.

Called By: daily cron job

update_products ()

SELECT update_products ( ’2011-12-04’ );

Notes: takes no parameters as the product update is always cumulative. As of 2.3.5, only looks at product_versionswith build dates in the last 30 days. There is no backfill function because it is always a cumulative update.

13.20.5 update_tcbs, backfill_tcbs

Purpose: updates “tcbs” based on the contents of the report_clean table


update_tcbs (updateday DATE,checkdata BOOLEAN optional default true)

SELECT update_tcbs ( ’2011-11-26’ );

backfill_tcbs (updateday DATE



)

SELECT backfill_tcbs ( ’2011-11-26’ );

updateday UTC day to pull data for.

checkdata whether or not to check dependant data and throw an error if it’s not found.

Notes: updates only “new”-style versions. Until 2.4, update_tcbs pulled data directly from reports and not re-ports_clean.

13.20.6 update_daily_crashes, backfill_daily_crashes

Purpose: updates “daily_crashes” based on the contents of the report_clean table


update_daily_crashes (updateday DATE,checkdata BOOLEAN optional default true)

SELECT update_daily_crashes ( ’2011-11-26’ );

backfill_daily_crashes (updateday DATE)

SELECT backfill_daily_crashes ( ’2011-11-26’ );



Notes: updates only “new”-style versions. Until 2.4, update_daily_crashes pulled data directly from reports and notreports_clean. Probably the slowest of the regular update functions; can date up to 4 minutes to do one day.

13.20.7 update_rank_compare, backfill_rank_compare

Purpose: updates “rank_compare” based on the contents of the reports_clean table


update_rank_compare (updateday DATE optional default yesterday,checkdata BOOLEAN optional default true)

SELECT update_rank_compare ( ’2011-11-26’ );

backfill_rank_compare (updateday DATE optional default yesterday)

SELECT backfill_rank_compare ( ’2011-11-26’ );

updateday UTC day to pull data for. Optional; defaults to ( CURRENT_DATE - 1 ).


13.20. MatView Functions 145


Note: this matview is not historical, but contains only one day of data. As such, running either the update or backfillfunction replaces all existing data. Since it needs an exclusive lock on the matview, it is possible (though unlikely) forit to fail to obtain the lock and error out.

13.20.8 update_nightly_builds, backfill_nightly_builds

Purpose: updates “nightly_builds” based on the contents of the reports_clean table


update_nightly_builds (updateday DATE optional default yesterday,checkdata BOOLEAN optional default true)

SELECT update_nightly_builds ( ’2011-11-26’ );

backfill_nightly_builds (updateday DATE optional default yesterday)

SELECT backfill_nightly_builds ( ’2011-11-26’ );


checkdata whether or not to check dependant data and throw an error if it’s not found. Optional.

13.21 Schema Management Functions

These functions support partitioning, upgrades, and other management of tables and views.

13.21.1 weekly_report_partitions

Purpose: to create new paritions for the reports table and its child tables every week.

Called By: weekly cron job

weekly_report_partitions (optional numweeks integer default 2,optional targetdate date default current_date

)

SELECT weekly_report_partitions();SELECT weekly_report_partitions(3,’2011-11-09’);

numweeks number of weeks ahead to create partitions

targetdate date for the starting week, if not today

13.21.2 try_lock_table

Purpose: attempt to get a lock on a table, looping with sleeps until the lock is obtained.

Called by: various functions internally



try_lock_table (tabname TEXT,mode TEXT optional default ’EXCLUSIVE’,attempts INT optional default 20

) returns BOOLEAN

IF NOT try_lock_table(’rank_compare’, ’ACCESS EXCLUSIVE’) THENRAISE EXCEPTION ’unable to lock the rank_compare table for update.’;

END IF;

tabname the table name to lock

mode the lock mode per PostgreSQL docs. Defaults to ‘EXCLUSIVE’.

attempts the number of attempts to make, with 3 second sleeps between each. optional, defaults to 20.

Returns TRUE for table locked, FALSE for unable to lock.

13.21.3 create_table_if_not_exists

Purpose: creates a new table, skipping if the table is found to already exist.

Called By: upgrade scripts

create_table_if_not_exists (tablename TEXT,declaration TEXT,tableowner TEXT optional default ’breakpad_rw’,indexes TEXT ARRAY default empty list

)

SELECT create_table_if_not_exists ( ’rank_compare’, $q$create table rank_compare (

product_version_id int not null,signature_id int not null,rank_days int not null,report_count int,total_reports bigint,rank_report_count int,percent_of_total numeric,constraint rank_compare_key primary key ( product_version_id, signature_id, rank_days )

);$q$, ’breakpad_rw’,ARRAY [ ’product_version_id,rank_report_count’, ’signature_id’ ]);

tablename name of the new table to create

declaration full CREATE TABLE sql statement, plus whatever other SQL statements you only want to run on tablecreation such as priming it with a few records and creating the primary key. If running more than one SQLstatement, separate them with semicolons.

tableowner the ROLE which owns the table. usually ‘breakpad_rw’. optional.

indexes an array of sets of columns to create regular btree indexes on. use the array declaration as demonstratedabove. default is to create no indexes.

Note: this is the best way to create new tables in migration scripts, since it allows you to rerun the script multiple timeswithout erroring out. However, be aware that it only checks for the existance of the table, not its definition, so if youmodify the table definition you’ll need to manually drop and recreate it.

13.21. Schema Management Functions 147


13.22 Other Administrative Functions

13.22.1 add_old_release

Purpose: Allows you to add an old release to productdims/product_visibility.

Called By: on demand by Firefox or Camino teams.

add_old_release (product_name text,new_version text,release_type release_enum default ’major’,release_date DATE DEFAULT current_date,is_featured BOOLEAN default FALSE

) returns BOOLEAN

SELECT add_old_release (’Camino’,’2.1.1’);SELECT add_old_release (’Camino’,’2.1.2pre’,’development’,’2012-03-09’,true);

Notes: if this leads to more than 4 currently featured versions, the oldest featured vesion will be “bumped”.

13.23 Custom Time-Date Functions

The present Socorro database needs to do a lot of time, date and timezone manipulation. This is partly a naturalconsequence of the application, and the need to use both DATE and TIMESTAMPTZ values. The greater need islegacy timestamp, conversion, however; currently the processors save crash reporting timestamps as TIMESTAMPWITHOUT TIMEZONE in Pacific time, whereas the rest of the database is TIMESTAMP WITH TIME ZONE inUTC. This necessitates a lot of tricky time zone conversions.

The functions below are meant to make it easier to write queries which return correct results based on dates andtimestamps.

13.23.1 tstz_between

tstz_between (tstz TIMESTAMPTZ,bdate DATE,fdate DATE

)RETURNS BOOLEAN

SELECT tstz_between ( ’2011-11-25 15:23:11-08’,’2011-11-25’, ’2011-11-26’ );

Checks whether a timestamp with time zone is between two UTC dates, inclusive of the entire ending day.

13.23.2 utc_day_is

utc_day_is (TIMESTAMPTZ,TIMESTAMP or DATE)

RETURNS BOOLEAN



SELECT utc_day_is ( ’2011-11-26 15:23:11-08’, ’2011-11-28’ );

Checks whether the provided timestamp with time zone is within the provided UTC day, expressed as either a times-tamp without time zone or a date.

13.23.3 utc_day_near

utc_day_near (TIMESTAMPTZ,TIMESTAMP or DATE)

RETURNS BOOLEAN

SELECT utc_day_near ( ’2011-11-26 15:23:11-08’, ’2011-11-28’ );

Checks whether the provided timestamp with time zone is within an hour of the provided UTC day, expressed as eithera timestamp without time zone or a date. Used for matching when related records may cross over midnight.

13.23.4 week_begins_utc

week_begins_utc (TIMESTAMP or DATE)

RETURNS timestamptz

SELECT week_begins_utc ( ’2011-11-25’ );

Given a timestamp or date, returns the timestamp with time zone corresponding to the beginning of the week in UTCtime. Used for partitioning data by week.

13.23.5 week_ends_utc

week_ends_utc (TIMESTAMP or DATE)

RETURNS timestamptz

SELECT week_ends_utc ( ’2011-11-25’ );

Given a timestamp or date, returns the timestamp with time zone corresponding to the end of the week in UTC time.Used for partitioning data by week.

13.23.6 week_begins_partition

week_begins_partition (partname TEXT)

RETURNS timestamptz

SELECT week_begins_partition ( ’reports_20111219’ );

Given a partition table name, returns a timestamptz of the date and time that weekly partition starts.

13.23. Custom Time-Date Functions 149


13.23.7 week_ends_partition

week_ends_partition (partname TEXT)

RETURNS timestamptz

SELECT week_ends_partition ( ’reports_20111219’ );

Given a partition table name, returns a timestamptz of the date and time that weekly partition ends.

13.23.8 week_begins_partition_string

week_begins_partition_string (partname TEXT)

RETURNS text

SELECT week_begins_partition_string ( ’reports_20111219’ );

Given a partition table name, returns a string of the date and time that weekly partition starts in the format ‘YYYY-MM-DD HR:MI:SS UTC’.

13.23.9 week_ends_partition_string

week_ends_partition_string (partname TEXT)

RETURNS text

SELECT week_ends_partition_string ( ’reports_20111219’ );

Given a partition table name, returns a string of the date and time that weekly partition ends in the format ‘YYYY-MM-DD HR:MI:SS UTC’.

13.24 Database Misc Function Reference

What follows is a listing of custom functions written for Socorro in the PostgreSQL database which are useful forapplication development, but do not fit in the “Admin” or “Datetime” categories.

13.25 Formatting Functions

13.25.1 build_numeric

build_numeric (build TEXT

)RETURNS NUMERIC

SELECT build_numeric ( ’20110811165603’ );



Converts a build ID string, as supplied by the processors/breakpad, into a numeric value on which we can do compu-tations and derive a date. Returns NULL if the build string is a non-numeric value and thus corrupted.

13.25.2 build_date

build_date (buildid NUMERIC

)RETURNS DATE

SELECT build_date ( 20110811165603 );

Takes a numeric build_id and returns the date of the build.

13.26 API Functions

These functions support the middleware, making it easier to look up certain things in the database.

13.26.1 get_product_version_ids

get_product_version_ids (product CITEXT,versions VARIADIC CITEXT

)

SELECT get_product_version_ids ( ’Firefox’,’11.0a1’ );SELECT get_product_version_ids ( ’Firefox’,’11.0a1’,’11.0a2’,’11.0b1’);

Takes a product name and a list of version_strings, and returns an array (list) of surrogate keys (product_version_ids)which can then be used in queries like:

SELECT * FROM reports_clean WHERE date_processed BETWEEN ’2012-03-21’ AND ’2012-03-38’WHERE product_version_id = ANY ( $list );

13.27 Populate PostgreSQL

Socorro supports multiple products, each of which may contain multiple versions.

• A product is a global product name, such as Firefox, Thunderbird, Fennec, etc.

• A version is a revision of a particular product, such as Firefox 3.6.6 or Firefox 3.6.5

• A branch is the indicator for the Gecko platform used in a Mozilla product / version. If your crash reportingproject does not have a need for branch support, just enter “1.0” as the branch number for your product / version.

13.27.1 Customize CSV files

Socorro comes with a set of CSV files you can customize and use to bootstrap your database.

Shut down all Socorro services, drop your database (if needed) and load the schema. From inside the Socorro checkout,as postgres user:

13.26. API Functions 151


./socorro/external/postgresql/setupdb_app.py --database_name=breakpad_rw

Customize CSVs, at minimum you probably need to bump the dates and build IDs in: raw_adu.csv reports.csvreleases_raw.csv

You will probably want to change “WaterWolf” to your own product name and version history, if you are setting thisup for production.

Also, note that the backfill procedure will ignore build IDs over 30 days old.

From inside the Socorro checkout, as the postgres user:

cd tools/dataloadedit *.csv./import.sh

See PostgreSQL Database Tables by Data Source for a complete explanation of each table.

13.27.2 Run backfill function to populate matviews

Socorro depends upon materialized views which run nightly, to display graphs and show reports such as “Top CrashBy Signature”.

IMPORTANT NOTE - many reports use the reports_clean_done() stored procedure to check that reports exist for thelast UTC hour of the day being processed, as a way to catch problems. If your crash volume is low enough, you maywant to modify this function (it is in breakpad_schema.sql referenced above).

Normally this is run for the previous day by cron_daily_matviews.sh but you can simply run the backfill function tobootstrap the system:

This is normally run by the import.sh, so take a look in there if you need to make adjustments.

There also needs to be at least one featured version, which is controlled by setting “featured_version” column to “true”for one or more rows in the product_version table.

Restart memcached as the root user:

/etc/init.d/memcached restart

Now the Socorro UI should now work.

You can change settings using the admin UI, which will be at http://crash-stats/admin (or the equivalent hostname foryour install.)

13.27.3 Load data via snapshot

If you have access to an existing Socorro database snapshot, you can load it like so:

# shut down database userssudo /etc/init.d/supervisor force-stopsudo /etc/init.d/apache2 stop

# drop old db and load snapshotsudo su - postgresdropdb breakpadcreatedb -E ’utf8’ -l ’en_US.utf8’ -T template0 breakpadpg_restore -Fc -d breakpad minidb.dump

This may take several hours, depending on your hardware. One way to speed this up would be to:


http://crash-stats/admin


• If in a VirtualBox environment, add more CPU cores to the VM (via virtualbox GUI), default is 1

• Add “-j n” to pg_restore command above, where n is number of CPU cores - 1

13.27. Populate PostgreSQL 153



CHAPTER 14

How generic app and an example works using configman

14.1 The minimum app

To illustrate the example, let’s look at an example of an app that uses generic_app to leverage configman torun. Let’s look at weeklyReportsPartitions.py

As you can see, it’s a subclass of the socorro.app.generic_app.App class which is a the-least-you-need wrapper for aminimal app. As you can see, it takes care of logging and executing your main function.

14.2 Connecting and handling transactions

Let’s go back to the weeklyReportsPartitions.py cron script and take a look at what it does.

It only really has one configman option and that’s the transaction_executor_class. The default value isTransactionExecutorWithBackoff which is the class that’s going to take care of two things:

1. execute a callable that accepts an opened database connection as first and only parameter

2. committing the transaction if there are no errors and rolling back the transaction if an exception is raised

3. NB: if an OperationalError or InterfaceError exception is raised,TransactionExecutorWithBackoff will log that and retry after configurable delay

Note that TransactionExecutorWithBackoff is the default transaction_executor_class but if youoverride it, for example by the command line, with TransactionExecutor no exceptions are swallowed and itdoesn’t retry.

Now, connections are created and closed by the ConnectionContext class. As you mighthave noticed, the default database_class defined in the TransactionExecutor issocorro.external.postgresql.connection_context.ConnectionContext as you can seehere

The idea is that any external module (e.g. HBase, PostgreSQL, etc) can define a ConnectionContext class as perthis model. What its job is is to create and close connections and it has to do so in a contextmanager. What that meansis that you can do this:

connector = ConnectionContext()with connector() as connection: # opens a connection

do_something(connection)# closes the connection

And if errors are raised within the do_something function it doesn’t matter. The connection will be closed.

155

https://github.com/mozilla/socorro/blob/master/socorro/cron/weeklyReportsPartitions.py

https://github.com/mozilla/socorro/blob/master/socorro/app/generic_app.py

https://github.com/mozilla/socorro/blob/master/socorro/database/transaction_executor.py#L59

https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/connection_context.py#L11

https://github.com/mozilla/socorro/blob/master/socorro/database/transaction_executor.py#L29


14.3 What was the point of that?!

For one thing, this app being a configman derived app means that all configuration settings are as flexible asconfigman is. You can supply different values for any of the options either by the command line (try running--help on the ./weeklyReportsPartitions.py script) and you can control them with various configurationfiles as per your liking.

The other thing to notice is that when writing another similar cron script, all you need to do is to worry about exactlywhat to execute and let the framework take care of transactions and opening and closing connections. Each class issupposed to do one job and one job only.

configman uses not only basic options such as database_password but also more complex options such asaggregators. These are basically invariant options that depend on each other and uses functions in there to get its stufftogether.

156 Chapter 14. How generic app and an example works using configman

CHAPTER 15

Writing documentation

To contribute with your documentation follow these steps to be able to modify the git repo, build a local copy anddeploy on ReadTheDocs.org.

15.1 Installing Sphinx

Sphinx is an external tool that compiles these reStructuredText into HTML. Since it’s a python tool you can install itwith easy_install or pip like this:

pip install sphinx

15.2 Making the HTML

Now you can build the docs with this simple command:

cd docsmake html

This should update the revelant HTML files in socorro/docs/_build and you can preview it locally like this(on OS X for example):

open _build/html/index.html

To modify the index itself, edit index.rst and (for instance you may want to add or remove a document filename,without the rst extension, from the ”.. toctree::” section)

15.3 Making it appear on ReadTheDocs

ReadTheDocs.org is wired to build the documentation nightly from this git repository but if you want to make docu-mentation changes appear immediately you can use their webhooks to re-create the build and update the documentationright away.

15.4 Or, just send the pull request

If you have a relevant update to the documentation but don’t have time to set up your Sphinx and git environment youcan just edit these files in raw mode and send in a pull request.

157

https://readthedocs.org/

http://sphinx.pocoo.org/

http://sphinx.pocoo.org/rest.html

https://readthedocs.org/

http://readthedocs.org/docs/read-the-docs/latest/webhooks.html


15.5 Or, just edit the documentation online

The simplest way to edit the documentation is to just edit it inside the Github editor. To get started, go tohttps://github.com/mozilla/socorro and browse in the docs directory to find the file you want to edit.

Then click the “Edit this file” button in the upper right-hand corner and type away.

When you’re done, write a comment underneath and click “Commit Changes”.

If you are unsure about how to edit reStructuredText and don’t want to trial-and-error your way through the editing,then one thing you can do is to copy the text into an online reStructuredText editor and see if you get the syntax right.Obviously you’ll receive warnings and errors about broken internal references but at least you’ll know if syntax iscorrect.

158 Chapter 15. Writing documentation

https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/tree/master/docs

http://rst.ninjs.org/

CHAPTER 16

Indices and tables

• genindex

• modindex

• search

159

Documents

Socorro Documentation - Read the Docs › pdf › socorro › v8 › socorro.pdf2.1Socorro VM (built with Vagrant + Puppet) You can build a standalone Socorro development VM - see