Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Socorro DocumentationRelease 2
Mozilla
June 18, 2014
Contents
1 Overview 31.1 Socorro Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Socorro UI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Data Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Installation 52.1 Socorro VM (built with Vagrant + Puppet) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Automated Install using Puppet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Manual Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Collector 113.1 Collector Python Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Common Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Collector Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Processor 134.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Middleware API 155.1 API map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.3 Crashes Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.4 Crashes Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.5 Crashes Paireduuid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.6 Crashes Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.7 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.8 Crash Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.9 Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.10 Priorityjobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.11 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.12 Products Builds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.13 Signature URLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.14 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.15 List Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.16 Versions Info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.17 Forcing an implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6 Socorro UI 396.1 Coding Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
i
6.2 Adding new reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7 UI Installation 437.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437.2 Trouble Shooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8 Server 478.1 The Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9 crontabber 499.1 crontab runs crontabber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.2 Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.3 Own configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509.4 App names versus/or class names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519.5 Manual intervention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519.6 Frequency and execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.7 Timezone and UTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529.8 Writing cron apps (aka. jobs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
10 Throttling 5510.1 throttleConditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
11 Deployment 5711.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5711.2 Outage Page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
12 Development Discussions 5912.1 Coding Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.2 New Developer Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5912.3 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7012.4 Standalone Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8612.5 Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8712.6 Crash Repro Filtering Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8812.7 Disk Performance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8912.8 Dumping Dump Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9112.9 JSON Dump Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9312.10 Processed Dump Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9612.11 Report Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9712.12 Code and Database Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9912.13 Out-of-Date Data Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10612.14 Database Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10712.15 Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11312.16 Schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11312.17 Tables used primarily when processing Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11312.18 Tables primarily used during data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11512.19 Tables primarily used for materialized views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11612.20 Dimensions tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11612.21 View tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11712.22 Bug tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11912.23 Meta data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12012.24 Database Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12012.25 Common Config . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12112.26 Populate ElasticSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
13 PostgreSQL Database 127
ii
13.1 PostgreSQL Database Tables by Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12713.2 Manually Populated Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12713.3 Tables Receiving External Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12713.4 Automatically Populated Reference Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12813.5 Matviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12813.6 Application Management Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12913.7 Deprecated Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12913.8 PostgreSQL Database Table Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13013.9 Raw Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13013.10 Normalized Fact Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13113.11 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13313.12 Matviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13513.13 Note On Release Channel Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13713.14 Application Support Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13713.15 Creating a New Matview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13813.16 Do I Want a Matview? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13813.17 Components of a Matview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13913.18 Creating the Matview Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13913.19 Database Admin Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14213.20 MatView Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14213.21 Schema Management Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14613.22 Other Administrative Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14813.23 Custom Time-Date Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14813.24 Database Misc Function Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15013.25 Formatting Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15013.26 API Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15113.27 Populate PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
14 How generic app and an example works using configman 15514.1 The minimum app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15514.2 Connecting and handling transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15514.3 What was the point of that?! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
15 Writing documentation 15715.1 Installing Sphinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15715.2 Making the HTML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15715.3 Making it appear on ReadTheDocs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15715.4 Or, just send the pull request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15715.5 Or, just edit the documentation online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
16 Indices and tables 159
iii
iv
Socorro Documentation, Release 2
The current focus of Socorro development is to make a server which can accept crash reports from Firefox. Seehttp://wiki.mozilla.org/Breakpad for more information.
Socorro mailing list https://lists.mozilla.org/listinfo/tools-socorro
This documentation is available on Github, and if you want to, feel free to clone the repo, make some changes in afork and send us a pull request.
Contents:
Contents 1
Socorro Documentation, Release 2
2 Contents
CHAPTER 1
Overview
The Socorro Crash Reporting system consists of two pieces, the Socorro Server and the Socorro UI.
1.1 Socorro Server
Server is a Python API and a collection of applications and web services that use the API. The applications togetherembody a set of servers to take crash dumps generated by remote clients, process using the breakpad_stackdumpapplication and save the results in HBase. Additional processes aggregate and filter data for storage in a relationaldatabase.
The server consists of these components:
• Collector
• Hadoop/HBase
• Processor
• [[SocorroRegistrar]]
• [[SocorroWebServices]]
1.2 Socorro UI
Socorro UI is a Web application to access and analyze the database contents via search and generated reports.
1.3 Data Flow
Crash dumps are accepted by the Collector, a mod_wsgi application running under Apache. Collector stores thecrashes into HBase.
Using Hadoop jobs, the crash dumps in HBase are converted into searchable json files using Processor.
The Processor s are also long running applications that live on Hadoop processing nodes. They accept tasks frommap reduce jobs and employ the stackwalk_server? to convert crashes into json files stored back into HBase. Filteringthrough these converted crashes using the Throttling rules initially applied by the Collector.
The Socorro UI allows developers to browse the crash information from the relational database. In addition to beingable to examine specific individual crash reports, there are trend reports that show which crashes are the most commonas well as the status of bugs about those crashes in Bugzilla.
3
Socorro Documentation, Release 2
Next Steps:
Installation
4 Chapter 1. Overview
CHAPTER 2
Installation
2.1 Socorro VM (built with Vagrant + Puppet)
You can build a standalone Socorro development VM - see Setup a development environment for more info.
The config files and puppet manifests in ./puppet/ are a useful reference when setting up Socorro for the first time.
2.2 Automated Install using Puppet
It is possible to use puppet to script an install onto an existing environment. This has been tested in EC2 but shouldwork on any regular Ubuntu Lucid install.
See puppet/bootstrap.sh for an example.
2.3 Manual Install
2.3.1 Requirements
Breakpad client and symbols
Socorro aggregates and reports on Breakpad crashes. Read more about getting started with Breakpad.You will need to produce symbols for your application and make these files available to Socorro.
• Linux (tested on Ubuntu Lucid and RHEL/CentOS 6)
• HBase (Cloudera CDH3)
• PostgreSQL 9.0
• Python 2.6
2.3.2 Ubuntu
1. Add PostgreSQL 9.0 PPA from https://launchpad.net/~pitti/+archive/postgresql
2. Add Cloudera apt source from https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onUbuntuSystems
5
Socorro Documentation, Release 2
3. Install dependencies using apt-get
As root:
apt-get install supervisor rsyslog libcurl4-openssl-dev build-essential sun-java6-jdk ant python-software-properties subversion libpq-dev python-virtualenv python-dev libcrypt-ssleay-perl phpunit php5-tidy python-psycopg2 python-simplejson apache2 libapache2-mod-wsgi memcached php5-pgsql php5-curl php5-dev php-pear php5-common php5-cli php5-memcache php5 php5-gd php5-mysql php5-ldap hadoop-hbase hadoop-hbase-master hadoop-hbase-thrift curl liblzo2-dev postgresql-9.0 postgresql-plperl-9.0 postgresql-contrib
2.3.3 RHEL/Centos
Use “text install” Choose “minimal” as install option.
1. Add Cloudera yum repo from https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation-InstallingCDH3onRedHatSystems
2. Add PostgreSQL 9.0 yum repo from http://www.postgresql.org/download/linux#yum
3. Install Sun Java JDK version JDK 6u16 - Download appropriate package fromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html
4. Install dependencies using YUM:
As root:
yum install python-psycopg2 simplejson httpd mod_ssl mod_wsgi postgresql-server postgresql-plperl perl-pgsql_perl5 postgresql-contrib subversion make rsync php-pecl-memcache memcached php-pgsql subversion gcc-c++ curl-devel ant python-virtualenv php-phpunit-PHPUnit hadoop-0.20 hadoop-hbase daemonize
5. Disable SELinux
As root: Edit /etc/sysconfig/selinux and set “SELINUX=disabled”
6. Reboot
As root:
shutdown -r now
2.3.4 Download and install Socorro
Determine latest release tag from https://wiki.mozilla.org/Socorro:Releases#Previous_Releases
Clone from github, as the socorro user:
git clone https://github.com/mozilla/socorrogit checkout LATEST_RELEASE_TAG_GOES_HEREcd socorrocp scripts/config/commonconfig.py.dist scripts/config/commonconfig.py
Edit scripts/config/commonconfig.py
From inside the Socorro checkout, as the socorro user, change:
databaseName.default = ’breakpad’databaseUserName.default = ’breakpad_rw’databasePassword.default = ’aPassword’
If you change the password, make sure to change it in sql/roles.sql as well.
2.3.5 Run unit/functional tests, and generate report
From inside the Socorro checkout, as the socorro user:
6 Chapter 2. Installation
Socorro Documentation, Release 2
make coverage
2.3.6 Set up directories and permissions
As root:
mkdir /etc/socorromkdir /var/log/socorromkdir -p /data/socorrouseradd socorrochown socorro:socorro /var/log/socorromkdir /home/socorro/primaryCrashStore /home/socorro/fallbackchown apache /home/socorro/primaryCrashStore /home/socorro/fallbackchmod 2775 /home/socorro/primaryCrashStore /home/socorro/fallback
Note - use www-data instead of apache for debian/ubuntu
Compile minidump_stackwalk
From inside the Socorro checkout, as the socorro user:
make minidump_stackwalk
2.3.7 Install socorro
From inside the Socorro checkout, as the socorro user:
make install
By default, this installs files to /data/socorro. You can change this by specifying the PREFIX:
make install PREFIX=/usr/local/socorro
2.3.8 How Socorro Works
There are two main parts to Socorro:
1. collects, processes, and allows real-time searches and results for individual crash reports
This requires both HBase and PostgreSQL, as well as the Collector, Crashmover, Monitor, Processor andMiddleware and UI.
Individual crash reports are pulled from long-term storage (HBase) using the /report/index/ page, forexample: http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE
The search feature is at: http://crash-stats/query
2. a set of batch jobs which compiles aggregate reports and graphs, such as “Top Crashes by Signature”
This requires PostgreSQL, Middleware and UI. It triggered once per day by the “daily_matviews” cronjob, covering data processed in the previous UTC day.
Every other page on http://crash-stats is of this type.
2.3. Manual Install 7
Socorro Documentation, Release 2
2.3.9 Crash Flow
The basic flow of an incoming crash is:
(breakpad client) -> (collector) -> (local file system) -> (newCrashMover.py) -> (hbase)
A single machine will need to run the Monitor service, which watches hbase for incoming crashes and queues themup for the Processor service (which can run on one or more servers). Monitor and Processor use PostgreSQL tocoordinate.
Finally, processed jobs are inserted into both hbase and PostgreSQL
2.3.10 Configure Socorro
These pages show how to start the services manually, please also see the next section “Install startup scripts”:
• Start configuration with Common Config
• On the machine(s) to run collector, setup Collector
• On the machine(s) to run collector setup Crash Mover
• On the machine to run monitor, setup Monitor
• On same machine that runs monitor, setup Deferred Cleanup
• On the machine(s) to run processor, setup Processor
2.3.11 Install startup scripts
RHEL/CentOS only (Ubuntu TODO - see ./puppet/files/etc_supervisor for supervisord example)
As root:
ln -s /data/socorro/application/scripts/init.d/socorro-{monitor,processor,crashmover} /etc/init.d/chkconfig socorro-monitor onchkconfig socorro-processor onchkconfig socorro-crashmover onservice httpd restartchkconfig httpd onservice memcached restartchkconfig memcached on
2.3.12 Install Socorro cron jobs
As root:
ln -s /data/socorro/application/scripts/crons/socorrorc /etc/socorro/crontab /data/socorro/application/scripts/crons/example.crontab
2.3.13 PostgreSQL Config
RHEL/CentOS - Initialize and enable on startup (not needed for Ubuntu)
As root:
8 Chapter 2. Installation
Socorro Documentation, Release 2
service postgresql initdbservice postgresql startchkconfig postgresql on
As root:
• edit /var/lib/pgsql/data/pg_hba.conf and change IPv4/IPv6 connection from “ident” to “md5”
• edit /var/lib/pgsql/data/postgresql.conf and:
– uncomment # listen_addresses = ‘localhost’
– change TimeZone to ‘UTC’
• edit other postgresql.conf paramters per www.postgresql.org community guides
2.3.14 Populate PostgreSQL Database
Refer to Populate PostgreSQL for information about loading the schema and populating the database.
This step is required to get basic information about existing product names and versions into the system.
2.3.15 Configure Apache
As root:
edit /etc/httpd/conf.d/socorro.confcp config/socorro.conf /etc/httpd/conf.d/socorro.confmkdir /var/log/httpd/{crash-stats,crash-reports,socorro-api}.example.comchown apache /data/socorro/htdocs/application/logs/
Note - use www-data instead of apache for debian/ubuntu
2.3.16 Enable PHP short_open_tag
As root:
edit /etc/php.ini and make the following changes:
short_open_tag = Ondate.timezone = ’America/Los_Angeles’
2.3.17 Configure Kohana (PHP/web UI)
Refer to UI Installation (deprecated as of 2.2, new docs TODO)
2.3.18 Hadoop+HBase install
Configure Hadoop 0.20 + HBase 0.89 Refer to https://ccp.cloudera.com/display/CDHDOC/HBase+Installation
Note - you can start with a standalone setup, but read all of the above for info on a real, distributed setup!
RHEL/CentOS only (not needed for Ubuntu) Install startup scripts
As root:
2.3. Manual Install 9
Socorro Documentation, Release 2
service hadoop-hbase-master startchkconfig hadoop-hbase-master onservice hadoop-hbase-thrift startchkconfig hadoop-hbase-thrift on
2.3.19 Load Hbase schema
FIXME this skips LZO suport, remove the “sed” command if you have it installed
From inside the Socorro checkout, as the socorro user:
cat analysis/hbase_schema | sed ’s/LZO/NONE/g’ | hbase shell
2.3.20 System Test
Generate a test crash:
1. Install http://code.google.com/p/crashme/ add-on for Firefox
2. Point your Firefox install at http://crash-reports/submit
See: https://developer.mozilla.org/en/Environment_variables_affecting_crash_reporting
If you already have a crash available and wish to submit it, you can use the standalone submitter tool:
From inside the Socorro checkout, as the socorro user:
virtualenv socorro-virtualenv. socorro-virtualenv/bin/activatepip install postercp scripts/config/submitterconfig.py.dist scripts/config/submitterconfig.pyexport PYTHONPATH=.:thirdpartypython scripts/submitter.py -u http://crash-reports/submit -j ~/Downloads/crash.json -d ~/Downloads/crash.dump
You should get a “CrashID” returned. Check syslog logs for user.*, should see the CrashID returned being collected.
Attempt to pull up the newly inserted crash: http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE
The (syslog “user” facility) logs should show this new crash being inserted for priority processing, and it should beavailable shortly thereafter.
10 Chapter 2. Installation
CHAPTER 3
Collector
Collector is an application that runs under Apache using mod-python. Its task is accepting crash reports from remoteclients and saving them in a place and format usable by further applications.
Raw crashes are accepted via HTTP POST. The form data from the POST is then arranged into a JSON and saved intothe local file system. The collector is responsible for assigning an ooid? (Our Own ID) to the crash. It also assigns aThrottle? value which determines if the crash is eventually to go into the relational database.
Should the saving to a local file system fail, there is a fallback storage mechanism. A second file system can beconfigured to take the failed saves. This file system would likely be an NFS mounted file system.
After a crash is saved, there is an app called Crash Mover that will transfer the crashes to HBase.
3.1 Collector Python Configuration
Like all the Socorro applications, the configuration is actually executable Python code. Two configuration files arerelevant for collector
• Copy .../scripts/config/commonconfig.py.dist to .../config/commonconfig.py. This configura-tion file contains constants used by many of the Socorro applications.
• Copy .../scripts/config/collectorconfig.py.dist to .../config/collectorconfig.py
3.2 Common Configuration
There are two constants in ‘.../scripts/config/commonconfig.py’ of interest to collector: jsonFileSuffix, and dumpFile-Suffix. Other constants in this file are ignored.
To setup the common configuration, see Common Config.
3.3 Collector Configuration
collectorconfig.py has several options to adjust how files are stored:
See sample config code on Github
11
Socorro Documentation, Release 2
12 Chapter 3. Collector
CHAPTER 4
Processor
4.1 Introduction
Socorro Processor is a multithreaded application that applies JSON/dump pairs to the stackwalk_server application,parses the output, and records the results in the hbase. The processor, coupled with stackwalk_server, is computation-ally intensive. Multiple instances of the processor can be run simultaneously from different machines.
See sample config code on Github
13
Socorro Documentation, Release 2
14 Chapter 4. Processor
CHAPTER 5
Middleware API
5.1 API map
5.1.1 New-style, documented services
• /bugs/
• /crashes/
– /crashes/comments
– /crashes/frequency
– /crashes/paireduuid
– /crashes/signatures
• extensions/
• crashtrends/
• job/
• priorityjobs/
• products/
• products/builds/
• products/
– products/builds/
– products/versions/
• report/
– report/list/
• signatureurls
• search/
– search/crashes/
– search/signatures/
• util/
– util/versions_info/
15
Socorro Documentation, Release 2
5.1.2 Old-style, undocumented services
See source code in .../socorro/services/ for more details.
• /adu/byday
• /adu/byday/details
• /bugs/by/signatures
• /crash
• /current/versions
• /emailcampaigns/campaign
• /emailcampaigns/campaigns/page
• /emailcampaigns/create
• /emailcampaigns/subscription
• /emailcampaigns/volume
• /reports/hang
• /schedule/priority/job
• /topcrash/sig/trend/history
• /topcrash/sig/trend/rank
5.2 Bugs
Return a list of signature - bug id associations.
5.2.1 API specifications
HTTP method POSTURL schema /bugs/Full URL /bugs/Example http://socorro-api/bpapi/bugs/ data: signatures=mysignature+anothersig+jsCrashSig
5.2.2 Mandatory parameters
Name Type of value Default value Descriptionsignatures List of strings None Signatures of bugs to get.
5.2.3 Optional parameters
None.
16 Chapter 5. Middleware API
Socorro Documentation, Release 2
5.2.4 Return value
In normal cases, return something like this:
{"hits": [
{"id": "789012","signature": "mysignature"
},{
"id": "405060","signature": "anothersig"
}],"total": 2
}
5.3 Crashes Comments
Return a list of comments on crash reports, filtered by signatures and other fields.
5.3.1 API specifications
HTTPmethod
GET
URLschema
/crashes/comments/(parameters)
FullURL
/crashes/comments/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/
Exam-ple
http://socorro-api/bpapi/crashes/comments/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/
5.3.2 Mandatory parameters
Name Type of value Default value Descriptionsignature String None Signature of crash reports to get.
5.3. Crashes Comments 17
Socorro Documentation, Release 2
5.3.3 Optional parameters
Name Type ofvalue
De-faultvalue
Description
products String orlist ofstrings
‘Fire-fox‘
The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . )
from Date Now -7 days
Search for crashes that happened after this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.
to Date Now Search for crashes that happened before this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.
versions String orlist ofstrings
None Restring to a specific version of the product. Several versions can bespecified, separated by a + symbol.
os String orlist ofstrings
None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Severalversions can be specified, separated by a + symbol.
branches String orlist ofstrings
None Restrict to a branch of the product. Several branches can be specified,separated by a + symbol.
reasons String orlist ofstrings
None Restricts search to crashes caused by this reason.
build_ids Integer orlist ofintegers
None Restricts search to crashes that happened on a product with this build ID.
build_from Integer orlist ofintegers
None Restricts search to crashes with a build id greater than this.
build_to Integer orlist ofintegers
None Restricts search to crashes with a build id lower than this.
re-port_process
String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘.
re-port_type
String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘.
plugin_in String orlist ofstrings
‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘.
plu-gin_search_mode
String ‘de-fault‘
How to search for this plugin. report_process has to be set to plugin. Canbe either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘.
plu-gin_terms
String orlist ofstrings
None Terms to search for. Several terms can be specified, separated by a +symbol. report_process has to be set to plugin.
5.3.4 Return value
In normal cases, return something like this:
18 Chapter 5. Middleware API
Socorro Documentation, Release 2
{"hits": [
{"date_processed": "2011-03-16 06:54:56.385843","uuid": "06a0c9b5-0381-42ce-855a-ccaaa2120116","user_comments": "My firefox is crashing in an awesome way","email": "[email protected]"
},{
"date_processed": "2011-03-16 06:54:56.385843","uuid": "06a0c9b5-0381-42ce-855a-ccaaa2120116","user_comments": "I <3 Firefox crashes!","email": "[email protected]"
}],"total": 2
}
If no signature is passed as a parameter, return null.
5.4 Crashes Frequency
Return the number and frequency of crashes on each OS.
5.4.1 API specifications
HTTPmethod
GET
URLschema
/crashes/frequency/(parameters)
FullURL
/crashes/frequency/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/
Exam-ple
http://socorro-api/bpapi/crashes/frequency/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/
5.4.2 Mandatory parameters
Name Type of value Default value Descriptionsignature String None Signature of crash reports to get.
5.4. Crashes Frequency 19
Socorro Documentation, Release 2
5.4.3 Optional parameters
Name Type ofvalue
De-faultvalue
Description
products String orlist ofstrings
‘Fire-fox‘
The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . )
from Date Now -7 days
Search for crashes that happened after this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.
to Date Now Search for crashes that happened before this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.
versions String orlist ofstrings
None Restring to a specific version of the product. Several versions can bespecified, separated by a + symbol.
os String orlist ofstrings
None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Severalversions can be specified, separated by a + symbol.
branches String orlist ofstrings
None Restrict to a branch of the product. Several branches can be specified,separated by a + symbol.
reasons String orlist ofstrings
None Restricts search to crashes caused by this reason.
build_ids Integer orlist ofintegers
None Restricts search to crashes that happened on a product with this build ID.
build_from Integer orlist ofintegers
None Restricts search to crashes with a build id greater than this.
build_to Integer orlist ofintegers
None Restricts search to crashes with a build id lower than this.
re-port_process
String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘.
re-port_type
String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘.
plugin_in String orlist ofstrings
‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘.
plu-gin_search_mode
String ‘de-fault‘
How to search for this plugin. report_process has to be set to plugin. Canbe either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘.
plu-gin_terms
String orlist ofstrings
None Terms to search for. Several terms can be specified, separated by a +symbol. report_process has to be set to plugin.
5.4.4 Return value
In normal cases, return something like this:
20 Chapter 5. Middleware API
Socorro Documentation, Release 2
{"hits": [
{"count": 167,"build_date": "20120129064235","count_mac": 0,"frequency_windows": 1,"count_windows": 167,"frequency": 1,"count_linux": 0,"total": 167,"frequency_linux": 0,"frequency_mac": 0
},{
"count": 1,"build_date": "20120129063944","count_mac": 1,"frequency_windows": 0,"count_windows": 0,"frequency": 1,"count_linux": 0,"total": 1,"frequency_linux": 0,"frequency_mac": 1
}],"total": 2
}
5.5 Crashes Paireduuid
Return paired uuid given a uuid and an optional hangid.
5.5.1 API specifications
HTTP method GETURL schema /crashes/paireduuid/(optional_parameters)Full URL /crashes/paireduuid/uuid/(uuid)/hangid/(hangid)/Example http://socorro-api/bpapi/crashes/paireduuid/uuid/e8820616-1462-49b6-9784-e99a32120201/
5.5.2 Mandatory parameters
Name Type of value Descriptionuuid String Unique identifier of the crash report.
5.5.3 Optional parameters
Name Type of value Default value Descriptionhangid String None Hang ID of the crash report.
5.5. Crashes Paireduuid 21
Socorro Documentation, Release 2
5.5.4 Return value
Return an object like the following:
{"hits": [
{"uuid": "e8820616-1462-49b6-9784-e99a32120201"
}],"total": 1
}
Note that if a hangid is passed to the service, it will always return maximum one result. Remove that hangid to get allpaired uuid.
5.6 Crashes Signatures
Return top crashers by signatures.
5.6.1 API specifications
HTTPmethod
GET
URLschema
/crashes/signatures/(optional_parameters)
FullURL
/crashes/signatures/product/(product)/version/(version)/to_from/(to_date)/duration/(number_of_days)/crash_type/(crash_type)/limit/(number_of_results)/os/(operating_system)/
Exam-ple
http://socorro-api/bpapi/crashes/signatures/product/Firefox/version/9.0a1/
5.6.2 Mandatory parameters
Name Type of value Descriptionproduct String Product for which to get top crashes by signatures.version String Version of the product for which to get top crashes.
5.6.3 Optional parameters
Name Type ofvalue
Defaultvalue
Description
crash_type String all Type of crashes to get, can be “browser”, “plugin”, “content” or“all”.
end_date Date Now Date before which to get top crashes.duration Int One week Number of hours during which to get crashes.os String None Limit crashes to only one OS.limit Int 100 Number of results to retrieve.
22 Chapter 5. Middleware API
Socorro Documentation, Release 2
5.6.4 Return value
Return an object like the following:
{"totalPercentage": 0.9999999999999994,"end_date": "2011-12-08 00:00:00","start_date": "2011-12-07 17:00:00","crashes": [
{"count": 3,"mac_count": 3,"changeInRank": 11,"currentRank": 0,"previousRank": 11,"percentOfTotal": 0.142857142857143,"win_count": 0,"changeInPercentOfTotal": 0.117857142857143,"linux_count": 0,"hang_count": 0,"signature": "objc_msgSend | __CFXNotificationPost","previousPercentOfTotal": 0.025,"plugin_count": 0
}],"totalNumberOfCrashes": 1
}
5.7 Extensions
Return a list of extensions associated with a crash’s UUID.
5.7.1 API specifications
HTTP method GETURL schema /extensions/(optional_parameters)Full URL /extensions/uuid/(uuid)/date/(crash_date)/Example http://socorro-api/bpapi/extensions/uuid/xxxx-xxxx-xxxx/date/2012-02-29T01:23:45+00:00/
5.7.2 Mandatory parameters
Name Type of value Default value Descriptionuuid String None Unique Identifier of the specific crash to get extensions from.date Datetime None Exact datetime of the crash.
5.7.3 Optional parameters
None
5.7. Extensions 23
Socorro Documentation, Release 2
5.7.4 Return value
Return a list of extensions:
{"total": 1,"hits": [
{"report_id": 1234,"date_processed": "2012-02-29T01:23:45+00:00","extension_key": 5678,"extension_id": "[email protected]","extension_version": "1.2"
}]
}
5.8 Crash Trends
Return a list of nightly or aurora crashes that took place between two dates.
5.8.1 API specifications
HTTPmethod
GET
URLschema
/crashtrends/(optional_parameters)
Full URL /crashtrends/start_date/(start_date)/end_date/(end_date)/product/(product)/version/(version)Example http://socorro-api/bpapi/crashtrends/start_date/2012-03-01/end_date/2012-03-
15/product/Firefox/version/13.0a1
5.8.2 Mandatory parameters
Name Type of value Default value Descriptionstart_date Datetime None The earliest date of crashes we wish to evaluateend_date Datetime None The latest date of crashes we wish to evaluate.product String None The product.version String None The version.
5.8.3 Optional parameters
None
5.8.4 Return value
Return a total of crashes, along with their build date, by build ID:
24 Chapter 5. Middleware API
Socorro Documentation, Release 2
[{
"build_date": "2012-02-10","version_string": "12.0a2","product_version_id": 856,"days_out": 6,"report_count": 515,"report_date": "2012-02-16","product_name": "Firefox"
}]
5.9 Job
Handle the jobs queue for crash reports processing.
5.9.1 API specifications
HTTP method GETURL schema /job/(parameters)Full URL /job/uuid/(uuid)/Example http://socorro-api/bpapi/job/uuid/e8820616-1462-49b6-9784-e99a32120201/
5.9.2 Mandatory parameters
Name Type of value Default value Descriptionuuid String None Unique identifier of the crash report to find.
5.9.3 Optional parameters
None
5.9.4 Return value
With a GET HTTP method, the service will return data in the following form:
{"hits": [
{"id": 1,"pathname": "","uuid": "e8820616-1462-49b6-9784-e99a32120201","owner": 3,"priority": 0,"queueddatetime": "2012-02-29T01:23:45+00:00","starteddatetime": "2012-02-29T01:23:45+00:00","completeddatetime": "2012-02-29T01:23:45+00:00","success": True,"message": "Hello"
}
5.9. Job 25
Socorro Documentation, Release 2
],"total": 1
}
5.10 Priorityjobs
Handle the priority jobs queue for crash reports processing.
5.10.1 API specifications
HTTP method GET, POSTURL schema /priorityjobs/(parameters)Full GET URL /priorityjobs/uuid/(uuid)/GET Example http://socorro-api/bpapi/priorityjobs/uuid/e8820616-1462-49b6-9784-e99a32120201/POST Example http://socorro-api/bpapi/priorityjobs/, data: uuid=e8820616-1462-49b6-9784-e99a32120201
5.10.2 Mandatory parameters
Name Type of value Default value Descriptionuuid String None Unique identifier of the crash report to mark.
5.10.3 Optional parameters
None
5.10.4 Return value
With a GET HTTP method, the service will return data in the following form:
{"hits": [
{"uuid": "e8820616-1462-49b6-9784-e99a32120201"}],"total": 1
}
With a POST HTTP method, it will return true if the uuid has been successfully added to the priorityjobs queue, andfalse if the uuid is already in the queue or if there has been a problem.
5.11 Products
Return information about product(s) and version(s) depending on the parameters the service is called with.
26 Chapter 5. Middleware API
Socorro Documentation, Release 2
5.11.1 API specifications
HTTP method GETURL schema /products/(optional_parameters)Full URL /products/versions/(versions)Example http://socorro-api/bpapi/products/versions/Firefox:9.0a1/
5.11.2 Optional parameters
Name Type of value Defaultvalue
Description
ver-sions
String or list ofstrings
None Several product:version strings can be specified, separated by a+ symbol.
5.11.3 Return value
If the service is called with the optional versions parameter, the service returns an object with an array of resultslabeled as hits and a total:
{"hits": [
{"is_featured": boolean,"throttle": float,"end_date": "string","start_date": "integer","build_type": "string","product": "string","version": "string"
}...
],"total": 1
}
If the service is called with no parameters, it returns an object containing a list of products as well as a total, indicatingthe number of products returned:
{"hits": [{
"sort": 1,"release_name": "firefox","rapid_release_version": "5.0","product_name": "Firefox"
},...], "total": 6
}
5.12 Products Builds
Query and update information about builds for products.
5.12. Products Builds 27
Socorro Documentation, Release 2
5.12.1 API specifications
HTTP method GET, POSTURL schema /products/builds/(optional_parameters)Full URL /products/builds/product/(product)/version/(version)/date_from/(date_from)/GET Example POST Example http://socorro-api/bpapi/products/builds/product/Firefox/version/9.0a1/
http://socorro-api/bpapi/products/builds/product/Firefox/,data: version=10.0&platform=macosx&build_id=20120416012345&
build_type=Beta&beta_number=2&repository=mozilla-central
5.12.2 Mandatory GET parameters
Name Type of value Default value Descriptionproduct String None Product for which to get nightly builds.
5.12.3 Optional GET parameters
Name Type of value Default value Descriptionversion String None Version of the product for which to get nightly builds.from_date Date Now - 7 days Date from which to get nightly builds.
5.12.4 GET return value
Return an array of objects:
[{
"product": "string","version": "string","platform": "string","buildid": "integer","build_type": "string","beta_number": "string","repository": "string","date": "string"
},...
]
5.12.5 Mandatory POST parameters
Name Type of value Default value Descriptionproduct String None Product for which to add a build.version String None Version for new build, e.g. “10.0”.platform String None Platform for new build, e.g. “macosx”.build_id String None Build ID for new build (YYYYMMDD######).build_type String None Type of build, e.g. “Release”, “Beta”, “Aurora”, etc.
28 Chapter 5. Middleware API
Socorro Documentation, Release 2
5.12.6 Optional POST parameters
Name Type ofvalue
Defaultvalue
Description
beta_numberString None Beta number if build_type is “Beta”. Mandatory if build_type is“Beta”, ignored otherwise.
repository String “” The repository from which this release came.
5.12.7 POST return value
On success, returns a 303 See Other redirect to the newly-added build’s API page at:
/products/builds/product/(product)/version/(version)/
5.13 Signature URLs
Returns a list of urls for a specific signature, product(s), version(s)s as well as start and end date. Also includes thetotal number of times this URL has been reported for the parameters specified above.
5.13.1 API specifications
HTTPmethod
GET
URLschema
/signatureurls/(parameters)
FullURL
/signa-tureurls/signature/(signature)/start_date/(start_date)/end_date/(end_date)/products/(products)/versions/(versions)
Exam-ple
http://socorro-api/bpapi/signatureurls/signature/samplesignature/start_date/2012-03-01T00:00:00+00:00/end_date/2012-03-31T00:00:00+00:00/products/Firefox+Fennec/versions/Firefox:4.0.1+Fennec:13.0/
5.13.2 Mandatory parameters
Name Type of value Default value Descriptionsignature String None The signature for which urls shoud be foundstart_date Date None Date from which to collect urlsend_date Date None Date up to, but not including, for which urls should be collectedproducts String None Product(s) for which to find urlsversions String None Version(s) of the above products to find urls for
5.13.3 Return value
Returns an object with a list of urls and the total count for each, as well as a counter, ‘total’, for the total number ofresults in the result set.
{
“hits”: [ {“url”: “about:blank”, “crash_count”: 1936}, ...
5.13. Signature URLs 29
Socorro Documentation, Release 2
], “total”: 1
}
5.14 Search
Search for crashes according to a large number of parameters and return a list of crashes or a list of distinct signatures.
5.14.1 API specifications
HTTPmethod
GET
URLschema
/search/(data_type)/(optional_parameters)
FullURL
/search/(data_type)/for/(terms)/products/(products)/from/(from_date)/to/(to_date)/in/(fields)/versions/(versions)/os/(os_name)/branches/(branches)/search_mode/(search_mode)/reasons/(crash_reasons)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/result_number/(number)/result_offset/(offset)/
Exam-ple
http://socorro-api/bpapi/search/crashes/for/libflash.so/in/signature/products/Firefox/versions/Firefox:4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/
5.14.2 Mandatory parameters
Name Type of value Default value Descriptiondata_type String ‘signatures‘ Type of data we are looking for. Can be ‘crashes‘ or
‘signatures‘.
30 Chapter 5. Middleware API
Socorro Documentation, Release 2
5.14. Search 31
Socorro Documentation, Release 2
5.14.3 Optional parameters
Name Type ofvalue
De-faultvalue
Description
for String orlist ofstrings
None Terms we are searching for. Each term must be URL encoded. Severalterms can be specified, separated by a + symbol.
products String orlist ofstrings
‘Fire-fox‘
The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . )
from Date Now -7 days
Search for crashes that happened after this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.
to Date Now Search for crashes that happened before this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.
in String orlist ofstrings
All Fields we are searching in. Several fields can be specified, separated by a +symbol. This is NOT implemented for PostgreSQL.
versions String orlist ofstrings
None Restring to a specific version of the product. Several versions can bespecified, separated by a + symbol.
os String orlist ofstrings
None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Severalversions can be specified, separated by a + symbol.
branches String orlist ofstrings
None Restrict to a branch of the product. Several branches can be specified,separated by a + symbol.
search_modeString ‘de-fault‘
Set how to search. Can be either ‘default‘, ‘is_exactly‘, ‘contains‘ or‘starts_with‘.
reasons String orlist ofstrings
None Restricts search to crashes caused by this reason.
build_ids Integer orlist ofintegers
None Restricts search to crashes that happened on a product with this build ID.
build_from Integer orlist ofintegers
None Restricts search to crashes with a build id greater than this.
build_to Integer orlist ofintegers
None Restricts search to crashes with a build id lower than this.
re-port_process
String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘.
re-port_type
String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘.
plugin_in String orlist ofstrings
‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘.
plu-gin_search_mode
String ‘de-fault‘
How to search for this plugin. report_process has to be set to plugin. Canbe either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘.
plu-gin_terms
String orlist ofstrings
None Terms to search for. Several terms can be specified, separated by a +symbol. report_process has to be set to plugin.
re-sult_number
Integer 100 Number of results to return.
re-sult_offset
Integer 0 Offset of the first result to return.
32 Chapter 5. Middleware API
Socorro Documentation, Release 2
5.14.4 Return value
If data_type is crashes, return value looks like:
{"hits": [
{"count": 1,"signature": "arena_dalloc_small | arena_dalloc | free | CloseDir",
},{
"count": 1,"signature": "XPCWrappedNativeScope::TraceJS(JSTracer*, XPCJSRuntime*)","is_solaris": 0,"is_linux": 0,"numplugin": 0,"is_windows": 0,"is_mac": 0,"numhang": 0
}],"total": 2
}
If data_type is signatures, return value looks like:
{"hits": [
{"client_crash_date": "2011-03-16 13:55:10.0","dump": "...","signature": "arena_dalloc_small | arena_dalloc | free | CloseDir","process_type": null,"id": 231224257,"hangid": null,"version": "4.0b13pre","build": "20110314162350","product": "Firefox","os_name": "Mac OS X","date_processed": "2011-03-16 06:54:56.385843","reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS","address": "0x1d3aff03","...": "..."
}],"total": 1
}
If an error occured, the API will return something like this:
Well, for the moment it doesn’t return anything but an Internal ErrorHTTP header... We will improve that soon! :)
5.15 List Report
Return a list of crash reports with a specified signature and filtered by a wide range of options.
5.15. List Report 33
Socorro Documentation, Release 2
5.15.1 API specifications
HTTPmethod
GET
URLschema
/report/list/(parameters)
FullURL
/re-port/list/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/
Exam-ple
http://socorro-api/bpapi/report/list/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011-05-01/to/2011-05-05/os/Windows/
5.15.2 Mandatory parameters
Name Type of value Default value Descriptionsignature String None Signature of crash reports to get.
34 Chapter 5. Middleware API
Socorro Documentation, Release 2
5.15.3 Optional parameters
Name Type ofvalue
De-faultvalue
Description
products String orlist ofstrings
‘Fire-fox‘
The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . )
from Date Now -7 days
Search for crashes that happened after this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.
to Date Now Search for crashes that happened before this date. Can use the followingformats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-ddHH:ii:ss.S‘.
versions String orlist ofstrings
None Restring to a specific version of the product. Several versions can bespecified, separated by a + symbol.
os String orlist ofstrings
None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Severalversions can be specified, separated by a + symbol.
branches String orlist ofstrings
None Restrict to a branch of the product. Several branches can be specified,separated by a + symbol.
reasons String orlist ofstrings
None Restricts search to crashes caused by this reason.
build_ids Integer orlist ofintegers
None Restricts search to crashes that happened on a product with this build ID.
build_from Integer orlist ofintegers
None Restricts search to crashes with a build id greater than this.
build_to Integer orlist ofintegers
None Restricts search to crashes with a build id lower than this.
re-port_process
String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘.
re-port_type
String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘.
plugin_in String orlist ofstrings
‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘.
plu-gin_search_mode
String ‘de-fault‘
How to search for this plugin. report_process has to be set to plugin. Canbe either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘.
plu-gin_terms
String orlist ofstrings
None Terms to search for. Several terms can be specified, separated by a +symbol. report_process has to be set to plugin.
re-sult_number
Integer 100 Number of results to return.
re-sult_offset
Integer 0 Offset of the first result to return.
5.15. List Report 35
Socorro Documentation, Release 2
5.15.4 Return value
In normal cases, return something like this:
{"hits": [
{"client_crash_date": "2011-03-16 13:55:10.0","dump": "...","signature": "arena_dalloc_small | arena_dalloc | free | CloseDir","process_type": null,"id": 231224257,"hangid": null,"version": "4.0b13pre","build": "20110314162350","product": "Firefox","os_name": "Mac OS X","date_processed": "2011-03-16 06:54:56.385843","reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS","address": "0x1d3aff03","...": "..."
},{
"client_crash_date": "2011-03-16 11:35:37.0","...": "..."
}],"total": 2
}
If signature is empty or nonexistent, raise a BadRequest error.
If another error occured, the API will return a 500 Internal Error HTTP header.
5.16 Versions Info
Return information about one or several couples product:version.
5.16.1 API specifications
HTTP method GETURL schema /util/versions_info/(optional_parameters)Full URL /util/versions_info/versions/(versions)/Example http://socorro-api/bpapi/util/versions_info/versions/Firefox:9.0a1+Fennec:7.0/
5.16.2 Mandatory parameters
None.
5.16.3 Optional parameters
Name Type of value Default value Descriptionversions String or list of strings None Product:Versions couples for which information is asked.
36 Chapter 5. Middleware API
Socorro Documentation, Release 2
5.16.4 Return value
If parameter versions is unvalid, return value is None. Otherwise it looks like this:
{"product_name:version_string": {
"product_version_id": integer,"version_string": "string","product_name": "string","major_version": "string" or None,"release_channel": "string" or None,"build_id": [list, of, decimals] or None
}}
5.17 Forcing an implementation
For debuging reasons, you can add a parameter to force the API to use a specific implementation module. That modulemust be inside socorro.external and contain the needed service implementation.
Name Type of value Default value Descriptionforce_api_impl String None Force the service to use a specific module.
For example, if you want to force search to be executed with ElasticSearch, you can add to the middleware callforce_api_impl/elasticsearch/. If socorro.external.elasticsearch exists and contains a search module, it will get loadedand used.
5.17. Forcing an implementation 37
Socorro Documentation, Release 2
38 Chapter 5. Middleware API
CHAPTER 6
Socorro UI
The Socorro UI is a KohanaPHP implementation that will operate the frontend website for the Crash Reporter website.
6.1 Coding Standards
Maintaining coding standards will encourage current developers and future developers to implement clean and consis-tent code throughout the codebase.
The PEAR Coding Standards (http://pear.php.net/manual/en/standards.php) will serve as the basis for the Socorro UIcoding standards.
• Always include header documentation for each class and each method.
– When updating a class or method that does not have header documentation, add header documentationbefore committing.
– Header documentation should be added for all methods within each controller, model, library andhelper class.
– @param documentation is required for all parameters
– Header documentation should be less than 80 characters in width.
• Add inline documentation for complex logic within a method.
• Use 4 character tab indentations for both PHP and Javascript
• Method names must inherently describe the functionality within that method.
– Method names must be written in a camel-case format. e.g. getThisThing
– Method names should follow the verb-noun format, such as a getThing, editThing, etc.
• Use carriage returns in if statements containing more than 2 statements and in arrays containing more than 3array members for readability.
• All important files, such as controllers, models and libraries, must have the Mozilla Public License at the top ofthe file.
6.2 Adding new reports
Here is an example of a new report which uses a web service to fetch data (JSON via HTTP) and displays the result asan HTML table.
39
Socorro Documentation, Release 2
Kohana uses the Model-View-Controller (MVC) pattern: http://en.wikipedia.org/wiki/Model-view-controller
Create model, view(s) and controller for new report (substituting “newreport” for something more appropriate):
6.2.1 Configuration (optional)
webapp-php/application/config/new_report.php
<?php defined(’SYSPATH’) OR die(’No direct access allowed.’);
// The number of rows to display.$config[’numberofrows’] = 20;
// The number of results to display on the by_version page.$config[’byversion_limit’] = 300;?>
6.2.2 Model
webapp-php/application/models/newreport.php
See Add a service to the Middleware for details about writing a middleware service for this to use.
<?phpclass NewReport_Model extends Model {
public function getNewReportViaWebService() {// this should be pulled from the middleware service
}}?>
6.2.3 View
webapp-php/application/views/newreport/byversion.php
<?php slot::start(’head’) ?><title>New Report for <?php out::H($product) ?> <?php out::H($version) ?></title><?php echo html::script(array(
’js/path/to/scripts/you/need.js’))?>
<?php echo html::stylesheet(array(’css/path/to/css/you/need.css’
), ’screen’)?><?php slot::end() ?><!-- Your custom front end HTML goes here -->
6.2.4 Controller
webapp-php/application/controllers/newreport.php
<?php defined(’SYSPATH’) or die(’No direct script access.’);require_once(Kohanna::find_file(’libraries’, ’somelib’, TRUE ’php’));
class NewReport_Controller extends Controller {
40 Chapter 6. Socorro UI
Socorro Documentation, Release 2
public function __construct() {parent::__construct();$this->newreport_model = new NewReport_Model();
}
// Public functions map to routes on the controller// http://<base-url>/NewReport/index/[product, version, ?’foo’=’bar’, etc]public function index() {
$resp = $this->newreport_model->getNewReportViaWebService();if ($resp) {
$this->setViewData(array(’resp’ => $resp,’nav_selection’ => ’new_report’,’foo’ => $resp->foo,
));} else {
header("Data access error", TRUE, 500);$this->setViewData(array(
’resp’ => $resp,’nav_selection’ => ’new_report’,
));}
}
}?>
6.2. Adding new reports 41
Socorro Documentation, Release 2
42 Chapter 6. Socorro UI
CHAPTER 7
UI Installation
7.1 Installation
Follow these steps to get the Socorro UI up and running.
7.1.1 Apache
Set up Apache with a vhost as you see fit. You will either need AllowOverride to enable .htaccess files or you maypaste the .htaccess rules into your vhost.
7.1.2 KohanaPHP Installation
1. Copy .htaccess file and edit the host path if your webapp is not at the domain root.:
cp htaccess-dist .htaccessvim .htaccess
2. Copy application/config/config.php-dist and change the hosting path and domain.:
cp application/config/config.php-dist application/config/config.phpvim application/config/config.php
For a production install, you may want to set $config[’display_errors’] to FALSE.
3. Copy application/config/database.php and edit its database settings.:
cp application/config/database.php-dist application/config/database.phpvim application/config/database.php
4. Copy application/config/cache.php and update the cache setting to be file-based or memcache-based.:
cp application/config/cache.php-dist application/config/cache.phpvim application/config/cache.php
5. If you selected memcache-based caching, copy application/config/cache_memcache.php and update the settingsaccordingly.:
cp application/config/cache_memcache.php-dist application/config/cache_memcache.phpvim application/config/cache_memcache.php
6. Copy all other config -dist files to their config location.:
43
Socorro Documentation, Release 2
cp application/config/application.php-dist application/config/application.phpcp application/config/webserviceclient.php-dist application/config/webserviceclient.phpcp application/config/daily.php-dist application/config/daily.phpcp application/config/products.php-dist application/config/products.php
7. Copy application/config/auth.php and edit it to setup your preferred authentication method, or to disable authen-tication. Edit $config[’driver’] to change your authentication method. Edit $config[’proto’] to remove the httpsrequirement if necessary.:
cp application/config/auth.php-dist application/config/auth.phpvim application/config/auth.php
8. If you are using LDAP, copy application/config/ldap.php and edit its settings.:
cp application/config/ldap.php-dist application/config/ldap.phpvim application/config/ldap.php
9. Ensure that the application logs and cache directories are writeable.:
a+rw application/logs application/cache
7.1.3 Dump Files
Socorro UI needs to access the processed dump files via HTTP. You will need to setup Apache or some other systemto ensure that dump files may be accessed at ‘http://example.com/dumps/<UUID>.jsonz’ . This can be accomplishedvia mod_rewrite rules, just like in the next section “Serving Raw dump files”.
Example config: processeddumps.mod_rewrite.txt
Next, update the $config[’crash_dump_local_url’] value in application/config/application.php to point to the properdirectory.
7.1.4 Raw Dump Files
When a user is logged in to Socorro UI as an admin, they may view raw crash dump files. These rawcrashes can be served up by Apache by adding the following rewrite rules. The values should match the val-ues in the middleware code at scripts/config/commonconfig.py settings. Links to raw dumps are available in thehttp://example.com/report/index/{uuid} crash report pages.
Example config: webapp-php/docs/rawdumps.mod_rewrite.txt
Next, update the $config[’raw_dump_url’] value in application/config/application.php to point to the proper directory.
7.1.5 Web Services
Many parts of Socorro UI rely on web services provided by the Python-based middleware layer.
7.1.6 Middleware
Copy the scripts/config/webapiconfig.py file, edit it accordingly and execute the script to listen on the indicated port.:
cp scripts/config/webapiconfig.py-dist scripts/config/webapiconfig.py.pyvim scripts/config/webapiconfig.pypython scripts/webservices.py 8083
44 Chapter 7. UI Installation
Socorro Documentation, Release 2
7.1.7 Socorro UI
Copy application/config/webserviceclient.php, edit the file and change $config[’socorro_hostname’] to contain theproper hostname and port number. If necessary, update $config[’basic_auth’]:
cp application/config/webserviceclient.php-dist application/config/webserviceclient.phpvim application/config/webserviceclient.php
7.1.8 Testing Your Setup
There are 2 ways in which you can test your Socorro UI setup.
7.1.9 Search
Visit the website containing the Socorro UI, and click Advanced Search. Perform a search for the product you’veadded to the site, which you know have crash reports associated with it in the reports table in your database.
7.1.10 Report
Within the search results set you received, click a signature in the results set. Next click the timestamp for a particularsignature, which will take you to a page that displays an individual crash report.
7.2 Trouble Shooting
7.2.1 println the sql
To see what SQL queries are being executed: Edit ‘webapp-php/system/libraries/Database.php’ line 443 Ko-hana::log(‘debug’, $sql); Do a svn ignore on this file, if you plan on checking in code.
This will show up in the debug log ‘application/logs/date.log.php’
Examine your database and see why you don’t get the expected results.
7.2.2 404?
Is your ‘.htaccess’ properly setup?
7.2.3 /report/pending never goes to /report/index?
If you see a pending screen and didn’t expect one this means that the record in report and dumps couldn’t be joinedso it’s waiting for the processor on the backend to populate one or both tables. Investigate with the uuid and look atreports and dump tables.
7.2.4 Config Files
Ensure that the appropriate config files in webapp/application/config have been copied from .php-dist to .php
7.2. Trouble Shooting 45
Socorro Documentation, Release 2
46 Chapter 7. UI Installation
CHAPTER 8
Server
The Socorro Server is a collection of Python applications and a Python package ([[SocorroPackage]]) that runs thebackend of the Socorro system.
8.1 The Applications
Executables for the applications are generally found in the .../scripts directory.
• ../scripts/startCollector.py - Collector
• ../scripts/startDeferredCleanup.py - Deferred Cleanup
• ../scripts/startMonitor.py - Monitor
• ../scripts/startProcessor.py - Processor
• ../scripts/startTopCrashes.py - Top Crashers By Signature
• ../scripts/startBugzilla.py - BugzillaAssociations
• ../scripts/startMtfb.py - MeanTimeBeforeFailure
• ../scripts/startServerStatus.py - server status
• ../scripts/startTopCrashByUrl.py - Top Crashers By URL
47
Socorro Documentation, Release 2
48 Chapter 8. Server
CHAPTER 9
crontabber
crontabber is a script that handles all cron job scripting. Unlike traditional UNIX crontab all execution is donevia the ./crontabber.py script and the configuration about frequency and exact time to run is part of the configurationfiles. The configuration is done using configman and it looks something like this:
# name: jobs# doc: List of jobs and their frequency separated by ‘|‘# converter: configman.converters.class_list_converterjobs=socorro.cron.jobs.foo.FooCronApp|12h
socorro.cron.jobs.bar.BarCronApp|1dsocorro.cron.jobs.pgjob.PGCronApp|1d|03:00
9.1 crontab runs crontabber
crontabber can be run at any time. Because the exact execution time is in configuration you can’t accidentallyexecute jobs that aren’t supposed to execute simply by running crontabber.
However, it can’t be run as daemon. It actually needs to be run by UNIX crontab every, say, 5 minutes. So insteadof your crontab being a huge list of jobs at different times, all you need is this:
*/5 * * * * PYTHONPATH="..." socorro/cron/crontabber.py
That’s all you need! Obviously the granularity of crontabber is limited by the granularity you execute it.
By moving away from UNIX crontab we have better control of the cron apps and their inter-relationship. We canalso remove unnecessary boilerplate cruft.
9.2 Dependencies
In crontabber the state of previous runs of cron apps within are remembered (stored internally in a JSON file)which makes it possible to assign dependencies between the cron apps.
This is used to potentially prevent running jobs. Not to automatically run those that depend. For example, ifFooCronApp depends on BarCronApp it just won’t run if BarCronApp last resulted in an error or simply hasn’tbeen run the last time it should.
Overriding dependencies is possible with the --force parameter. For example, suppose you know BarCronAppcan now be run you do that like this:
49
Socorro Documentation, Release 2
./crontabber.py --job=BarCronApp --force
Dependencies inside the cron apps are defined by settings a class attribute on the cron app. The attribute is calleddepends_on and its value can be a string, a tuple or a list. In this example, since BarCronApp depends onFooCronApp it’s class would look something like this:
from socorro.cron.crontabber import BaseCronApp
class BarCronApp(BaseCronApp):app_name = ’BarCronApp’app_description = ’Does some bar things’depends_on = (’FooCronApp’,)
def run(self):...
9.3 Own configurations
Each cron app can have its own configuration(s). Obviously they must always have a good default that is good enoughotherwise you can’t run crontabber to run all jobs that are due. To make overrideable configuration options addthe required_config class attribute. Here’s an example:
from configman import Namespacefrom socorro.cron.crontabber import BaseCronApp
class FooCronApp(BaseCronApp):app_name = ’foo’
required_config = Namespace()required_config.add_option(
’bugzilla_url’,default=’https://bugs.mozilla.org’,doc=’Base URL for bugzilla’
)
def run(self):...print self.config.bugzilla_url...
Note: Inside that run() method in that example, the self.config object is a special one. It’s basically a refer-ence to the configuration specifically for this class but it has access to all configuration objects defined in the“root”. I.e. you can access things like self.config.logger here too but other cron app won’t have accessto self.config.bugzilla_url since that’s unique to this app.
To override cron app specific options on the command line you need to use a special syntax to associate it with thiscron app class. Usually, the best hint of how to do this is to use python crontabber.py --help. In thisexample it would be:
python crontabber.py --job=foo --class-FooCronApp.bugzilla_url=...
50 Chapter 9. crontabber
Socorro Documentation, Release 2
9.4 App names versus/or class names
Every cron app in crontabber must have a class attribute called app_name. This value must be unique. If youlike, it can be the same as the class it’s in. When you list jobs you list the full path to the class but it’s the app_namewithin the found class that gets remembered.
If you change the app_name all previously know information about it being run is lost. If you change the name andpath of the class, the only other thing you need to change is the configuration that refers to it.
Best practice recommendation is this:
• Name the class like a typical python class, i.e. capitalize and optionally camel case the rest. For example:UpdateADUCronApp
• Optional but good practice is to keep the suffix CronApp to the class name.
• Make the app_name value lower case and replace spaces with -.
9.5 Manual intervention
First of all, to add a new job all you need to do is add it to the config file that crontabber is reading from. Thanksto being a configman application it automatically picks up configurations from files called crontabber.ini,crontabber.conf or crontabber.json. To create a new config file, use admin.dump_config like this:
python socorro/cron/crontabber.py --admin.dump_conf ini
All errors that happen are reported to the standard python logging module. Also, the latest error (type, value andtraceback) is stored in the JSON database too. If any of your cron apps have an error you can see it with:
python socorro/cron/crontabber.py --list-jobs
Here’s a sample output:
=== JOB ========================================================================Class: socorro.cron.jobs.foo.FooCronAppApp name: fooFrequency: 12hLast run: 2012-04-05 14:49:56 (1 minute ago)Next run: 2012-04-06 02:49:56 (in 11 hours, 58 minutes)
=== JOB ========================================================================Class: socorro.cron.jobs.bar.BarCronAppApp name: barFrequency: 1dLast run: 2012-04-05 14:49:56 (1 minute ago)Next run: 2012-04-06 14:49:56 (in 23 hours, 58 minutes)Error!! (1 times)
File "socorro/cron/crontabber.py", line 316, in run_oneself._run_job(job_class)
File "socorro/cron/crontabber.py", line 369, in _run_jobinstance.main()
File "/Use[snip]orro/socorro/cron/crontabber.py", line 47, in mainself.run()
File "/Use[snip]orro/socorro/cron/jobs/bar.py", line 10, in runraise NameError(’doesnotexist’)
9.4. App names versus/or class names 51
Socorro Documentation, Release 2
It will only keep the latest error but it will include an error count that tells you how many times it has tried and failed.The error count increments every time any error happens and is reset once no error happens. So, only the latest erroris kept and to find out about past error you have to inspect the log files.
NOTE: If a cron app that is configured to run every 2 days runs into an error; it will try to run again in 2 days.
So, suppose you inspect the error and write a fix. If you’re impatient and don’t want to wait till it’s time to run again,you can start it again like this:
python socorro/cron/crontabber.py --job=my-app-name# or if you preferpython socorro/cron/crontabber.py --job=path.to.MyCronAppClass
This will attempt it again and no matter if it works or errors it will pick up the frequency from the configuration andupdate what time it will run next.
9.6 Frequency and execution time
The format for configuring jobs looks like this:
socorro.cron.jobs.bar.BarCronApp|30m
or like this:
socorro.cron.jobs.pgjob.PGCronApp|2d|03:00
Hopefully the format is self-explanatory. The first number is required and it must be a number followed by “y”, “d”,“h” or “m”. (years, days, hours, minutes).
For jobs that have a frequency longer than 24 hours you can specify exactly when it should run. This format has to bein the 24-hour format of HH:MM.
If you’re ever uncertain that your recent changes to the configuration file is correct or not, instead of waiting aroundyou can check it with:
python socorro/cron/crontabber.py --configtest
which will do nothing if all is OK.
9.7 Timezone and UTC
No. There is no timezone in any of the dates and times in crontabber. All is assumed local time. I.e. whatever theserver it’s running on is using.
The reason for this is the ability to specify exactly when something should be run. So if you want something to run atexactly 3AM every day, that’s 3AM in relation to where the server is located.
9.8 Writing cron apps (aka. jobs)
Because of the configurable nature of the crontabber the actual cron apps can be located anywhere. For example,if it’s related to HBase it could for example be in socorro/external/hbase/mycronapp.py. However, forthe most part it’s probably a good idea to write them in socorro/cron/jobs/ and write one class per file to makeit clear. There are already some “sample apps” in there that does nothing except serving as good examples. With time,we can hopefully delete these as other, real apps, can work as examples and inspiration.
52 Chapter 9. crontabber
Socorro Documentation, Release 2
The most common apps will be execution of certain specific pieces of SQL against the PostgreSQL database. Forthose, the socorro/cron/jobs/pgjob.py example is good to look at. At the time of writing it looks like this:
from socorro.cron.crontabber import PostgreSQLCronApp
class PGCronApp(PostgreSQLCronApp):app_name = ’pg-job’app_description = ’Does some foo things’
def run(self, connection):cursor = connection.cursor()cursor.execute(’select relname from pg_class’)
Let’s pick that a part a bit... The most important difference is the different base class. Unlike the BaseCronAppclass, this one is executing the run() method with a connection instance as the one and only parameter. Thatconnection will automatically take care of transactions! That means that you don’t have to run somethingconnection.commit() and if you want the transaction to roll back, all you have to do is raise an error. Forexample:
def run(self, connection):cursor = connection.cursor()today = datetime.datetime.today()cursor.execute(’INSERT INTO jobs (room) VALUES (bathroom)’)if today.strftime(’%A’) in (’Saturday’, ’Sunday’):
raise ValueError("Today is not a good day!")else:
cursor.execute(’INSERT INTO jobs(tool) VALUES (brush)’)
Silly but hopefully it’s clear enough.
Raising an error inside a cron app will not stop the other jobs from running other than the those that depend on it.
9.8. Writing cron apps (aka. jobs) 53
Socorro Documentation, Release 2
54 Chapter 9. crontabber
CHAPTER 10
Throttling
The Collector has the ability to vet crashes as the come into the system. Originally, this system was used to provide astatistical sampling from the incoming stream of crashes. In 1.8, throttling is a way to allow a sampling of crashes tobe put into the database.
Throttling, the disposition of a JSON/dump pair, is controlled by the contents of the JSON file. The JSON files arecollections of keys and values. Collector can examine these key/value pairs and assign a pass through probability. Forexample we may want to pass 100% of all alpha or beta releases to the database. In production, however, we may wantto only save 10%.
For details on how to configure throtttling, see the configuration section of Collector. Below is a section about thecollector throttling rules.
10.1 throttleConditions
This option tells the collector how to route a given JSON/dump pair to storage for further processing or deferredstorage. This consists of a list of conditions in this form: (JsonFileKey?, ConditionFunction?, Probability)
• JsonFileKey?: a name of a field from the HTTP POST form. The possibilities are: “StartupTime?”, “Vendor”,“InstallTime?”, “timestamp”, “Add-ons”, “BuildID”, “SecondsSinceLastCrash?”, “UserID”, “ProductName?”,“URL”, “Theme”, “Version”, “CrashTime?”
• ConditionFunction?: a function returning a boolean, regular expression or a constant used to test the value forthe JsonFileKey?.
• Probability: an integer between 0 and 100 inclusive. At 100, all JSON files, for which the ConditionFunction?returns true, will be saved in the database. At 0, no JSON files for which the ConditionFunction? returns truewill be saved to the database. At 25, there is twenty-five percent probability that a matching JSON file will bewritten to the database.
There must be at least one entry in the throttleConditions list. The example below shows the default case.
These conditions are applied one at a time to each submitted crash. The first match of a condition function to a valuestops the iteration through the list. The probability of that first matched condition will be applied to that crash.
Keep the list short to avoid bogging down the collector.:
throttleConditions = cm.Option()throttleConditions.default = [
#("Version", lambda x: x[-3:] == "pre", 25), # queue 25% of crashes with version ending in "pre"#("Add-ons", re.compile(’inspector\@mozilla\.org\:1\..*’), 75), # queue 75% of crashes where the inspector addon is at 1.x#("UserID", "d6d2b6b0-c9e0-4646-8627-0b1bdd4a92bb", 100), # queue all of this user’s crashes#("SecondsSinceLastCrash", lambda x: 300 >= int(x) >= 0, 100), # queue all crashes that happened within 5 minutes of another crash
55
Socorro Documentation, Release 2
(None, True, 10) # queue 10% of what’s left]
56 Chapter 10. Throttling
CHAPTER 11
Deployment
11.1 Introduction
Below are general deployment instructions for installations of Socorro.
11.2 Outage Page
if the system is to be taken down for maintenance, these steps will show users an outage page during the maintenanceperiod
• backup webapp-php/index.php
• You can copy webapp-php/docs/outage.php over webapp-php/index.php and all traffic will be served this outagemessage.
• Do work
• copy backup over webapp-php/index.php
add other task instructions here
57
Socorro Documentation, Release 2
58 Chapter 11. Deployment
CHAPTER 12
Development Discussions
12.1 Coding Conventions
12.1.1 Introduction
The following coding conventions are designed to ensure that the Socorro code is easy to read, hack, test, and deploy.
12.1.2 Style Guide
• Python should follow PEP 8 with 4 space indents
• PHP code follows the PEAR coding standard
• JavaScript is indented by four spaces
• Unit Testing is strongly encouraged
12.1.3 Review
New checkins that are non-trivial should be reviewed by one of the core hackers. The commit message should indicatethe reviewer and the issue number if applicable.
12.1.4 Testing
Any features that are only available to admins should be tested to ensure that only non-admin users to not have access.
Before checking in changes to the socorro python code, be sure to run the unit tests.
12.2 New Developer Guide
If you are new to Socorro, you will find here good resources to start hacking:
59
Socorro Documentation, Release 2
12.2.1 General architecture of Socorro
If you clone our git repository, you will find the following folders. Here is what each of them contains:
Folder Descriptionanalysis/ Contains metrics jobs such as mapreduce. Will be moved.config/ Contains the Apache configuration for the different parts of the Socorro application.docs/ Documentation of the Socorro project (the one you are reading right now).scripts/ Scripts for launching the different parts of the Socorro application.socorro/ Core code of the Socorro project.sql/ SQL scripts related to our PostgreSQL database. Contains schemas and update queries.thirparty/ External libraries used by Socorro.tools/ External tools used by Socorro.webapp-php/ Front-end PHP application (also called UI). See Socorro UI.
Socorro submodules
The core code module of Socorro, called socorro, contains a lot of code. Here are descriptions of every submodulein there:
Module Descriptioncollector All code related to collectors.cron All cron jobs running around Socorro.database PostgreSQL related code.deferredcleanup Osolete.external Here are APIs related to external resources like databases.integrationtest Osolete.lib Different libraries used all over Socorro’s code.middleware New-style middleware services place.monitor All code related to monitors.othertests Some other tests?services Old-style middleware services place.storage HBase related code.unittest All our unit tests are here.webapi Contains a few tools used by web-based services.
12.2.2 Setup a development environment
The best and easiest way to get started with a complete dev environment is to use Vagrant and our installation script.
Standalone dev environment in your existing environment
If you don’t want to do things the easy way, or can’t use a virtual machine, you can install everything in yourown development environment. All steps are described in Standalone Development Environment.
1. Install VirtualBox from: http://www.virtualbox.org/
2. Install Vagrant from: http://vagrantup.com/
3. Download base box
# NOTE: if you have a 32-bit host, change "lucid64" to "lucid32"vagrant box add socorro-all http://files.vagrantup.com/lucid64.box
60 Chapter 12. Development Discussions
Socorro Documentation, Release 2
4. Copy base box, boot VM and provision it with puppet:
vagrant up
5. Add to /etc/hosts (on the HOST machine!):
33.33.33.10 crash-stats crash-reports socorro-api
Enjoy your Socorro environment!
• browse UI: http://crash-stats
• submit crashes: http://crash-reports/submit (accepts HTTP POST only, see System Test for information onsubmitting test crashes)
• query data via middleware API: http://socorro-api/bpapi/adu/byday/p/WaterWolf/v/1.0/rt/any/osx/start/YYYY-MM-DD/end/YYYY-MM-DD (where WaterWolf is a valid productname and YYYY-MM-DD are validstart/end dates)
Apply your changes
Edit files in your git checkout on the host as usual. To actually make changes take effect, you can run:
vagrant provision
This reruns puppet inside the VM to deploy the source to /data/socorro and restarts any necessary services.
How Socorro works
See How Socorro Works and Crash Flow.
Setting up a new database
Note that the existing puppet manifests populate PostgreSQL if the “breakpad” database does not exist. See PopulatePostgreSQL for more information on how this process works, and how to customize it.
Enabling HBase
Socorro supports HBase as a long-term storage archive for both raw and processed crashes. Since it requires Sun (nowOracle) Java and does not work with OpenJDK, and generally has much higher memory requirements than all theother dependencies, it is not enabled by default.
If you wish to enable it, edit the nodes.pp file:
vi puppet/manifests/nodes/nodes.pp
And remove the comment (‘#’) marker from the socorro-hbase include:
# include socorro-hbase
Re-provision vagrant, and HBase will be installed, started and the default Socorro schema will be loaded:
vagrant provision
NOTE - this will download and install Java from Oracle, which means that you will be bound by the terms of theirlicense agreement - http://www.oracle.com/technetwork/java/javase/terms/license/
12.2. New Developer Guide 61
Socorro Documentation, Release 2
Debugging
You can SSH into your VM by running:
vagrant ssh
By default, your socorro git checkout will be shared into the VM via NFS at /home/socorro/dev/socorro
Running “make install” as socorro user in /home/socorro/dev/socorro will cause Socorro to be installed to/data/socorro/. You will need to restart the apache2 or supervisord services if you modify middleware or backendcode, respectively (note that “vagrant provision” as described above does all of this for you).
Logs for the (PHP Kohana) webapp are at:
/data/socorro/htdocs/application/logs/
All other Socorro apps log to syslog, using the user.* facility:
/var/log/user.log
Apache may log important errors too, such as WSGI apps not starting up or problems with the Apache or PHP configs:
/var/log/apache/error.log
Supervisord captures the stderr/stdout of the backend jobs, these are normally the same as syslog but may log importanterrors if the daemons cannot be started. You can also find stdout/stderr from cron jobs in this location:
/var/log/socorro/
Loading data from an existing Socorro install
Given a PostgreSQL dump named “minidb.dump”, run the following.
vagrant ssh# shut down database userssudo /etc/init.d/supervisor force-stopsudo /etc/init.d/apache2 stop
# drop old db and load snapshotsudo su - postgresdropdb breakpadcreatedb -E ’utf8’ -l ’en_US.utf8’ -T template0 breakpadpg_restore -Fc -d breakpad minidb.dump
This may take several hours, depending on your hardware. One way to speed this up would be to add more CPU coresto the VM (via virtualbox GUI), default is 1.
Add “-j n” to pg_restore command above, where n is number of CPU cores - 1
Pulling crash reports from an existing production install
The Socorro PostgreSQL database only contains a small subset of the information about individual crashes (enoughto run aggregate reports). For instance the full stack is only available in long-term storage (such as HBase).
If you have imported a database from a production instance, you may want to configure the web UIto pull individual crash reports from production via the web service (so URLs such as http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE will work).
62 Chapter 12. Development Discussions
Socorro Documentation, Release 2
The /report/index page actually pulls it’s data from a URL such as: http://crash-stats/dumps/YOUR_CRASH_ID_GOES_HERE.jsonz
You can cause your dev instance to fall back to your production instance by modifying:
webapp-php/application/config/application.php
Change the URL in this config value to point to your desired production instance:
<?php$config[’crash_dump_local_url_fallback’] = ’http://crash-stats/dumps/%1$s.jsonz’;?>
Note that the crash ID must be in both your local database and the remote (production) HBase instance for this towork.
See https://github.com/mozilla/socorro/blob/master/webapp-php/application/config/application.php-dist
(OPTIONAL) Populating Elastic Search
See Populate ElasticSearch.
12.2.3 Add a service to the Middleware
Architecture overview
The middleware is a simple REST API providing JSON data depending on the URL that is called. It is made of alist of services, each one binding a certain URL with parameters. Documentation for each service is available in theMiddleware API page.
Those services are not containing any code, but are only interfaces. They are using other resources from the externalmodule. That external module is composed of one submodule for each external resource we are using. For example,there is a PostgreSQL submodule, an ElasticSearch submodule and a HBase submodule.
You will also find some common code among external resources in socorro.lib.
Class hierarchy
12.2. New Developer Guide 63
Socorro Documentation, Release 2
REST services in Socorro are divided into two separate modules. socorro.middleware is the module that con-tains the actual service, the class that will receive HTTP requests and return the right data. However, services do notdo any kind of computation, they only find the right implementation class and call it.
Implementations of services are found in socorro.external. They are separated in submodules, one for eachexternal resource that we use. For example, in socorro.external.postgresql you will find everything thatis related to data stored in PostgreSQL: SQL queries mainly, but also arguments sanitizing and data formating.
The way it works overall is simple: the service in socorro.middlewarewill define a URL and will parse the argu-ments when the service is called. That service will then find the right implementation class in socorro.externaland call it with the parameters. The implementation class will do what it has to do (SQL query, computation... ) andreturn a Python dictionary. The service will then automatically transform that dictionary into a JSON string and sendit back via HTTP.
Create the service
First create a new file for your service in socorro/middleware/ and call it nameofservice_service.py.This is a convention for the next version of our config manager. Then create a class inside as follow:
import logging
from socorro.middleware.service import DataAPIService
logger = logging.getLogger("webapi")
class MyService(DataAPIService):
service_name = "my_service" # Name of the submodule to look for in externaluri = "/my/service/(.*)" # URL of the service
def __init__(self, config):super(MyService, self).__init__(config)logger.debug(’MyService service __init__’)
def get(self, *args):# Parse parameters of the URLparams = self.parse_query_string(args[0])
# Find the implementation module in external depending on the configurationmodule = self.get_module(params)
# Instantiate the implementation classimpl = module.MyService(config=self.context)
# Call and return the result of the implementation methodreturn impl.mymethod(**params)
uri is the URL pattern you want to match. It is a regular expression, and the content of each part ((.*)) will be inargs.
service_name will be used to find the corresponding implementation resource. It has to match the filename of themodule you need.
If you want to add mandatory parameters, modify the URI and values will be passed in args.
64 Chapter 12. Development Discussions
Socorro Documentation, Release 2
Use external resources
The socorro.external contains everything related to outer resources like databases. Each submodule has a baseclass and classes for specific functionalities. If the function you need for your service is not already in there, you createa new file and a new class to implement it. To do so, follow this pattern:
from socorro.external.myresource.base import MyResourceBase
class MyModule(MyResourceBase):
def __init__(self, *args, **kwargs):super(MyModule, self).__init__(*args, **kwargs)
def my_method(self, **kwargs):do_stuff()return my_json_result
One of the things that you will want to do is filtering arguments and giving them default values. There is a functionto do that in socorro.lib.external_common that is called parse_arguments. The documentation of thatfunction says:
Return a dict of parameters.
Take a list of filters and for each try to get the correspondingvalue in arguments or a default value. Then check that value’s type.
Example:filters = [
("param1", "default", ["list", "str"]),("param2", None, "int"),("param3", ["list", "of", 4, "values"], ["list", "str"])
]arguments = {
"param1": "value1","unknown": 12345
}=>{
"param1": ["value1"],"param2": 0,"param3": ["list", "of", "4", "values"]
}
Here is an example of how to use this:
class Products(PostgreSQLBase):def versions_info(self, **kwargs):
# Parse argumentsfilters = [
("product", "Firefox", "str"),("versions", None, ["list", "str"])
]params = external_common.parse_arguments(filters, kwargs)
params.product # "Firefox" by default or a stringparams.versions # [] by default or a list of strings
12.2. New Developer Guide 65
Socorro Documentation, Release 2
Configuration
Finally add your service to the list of running services in scripts/config/webapiconfig.py.dist as follow:
import socorro.middleware.search_service as searchimport socorro.middleware.myservice_service as myservice # add
servicesList = cm.Option()servicesList.doc = ’a python list of classes to offer as services’servicesList.default = [myservice.MyService, search.Search, (...)] # add
You can also add a config key for the implementation of your service. If you don’t, your service will use the defaultconfig key (serviceImplementationModule). To add a specific configuration key:
# MyService service configmyserviceImplementationModule = cm.Option()myserviceImplementationModule.doc = "String, name of the module myservice uses."myserviceImplementationModule.default = ’socorro.external.elasticsearch’ # for example
Then restart Apache and you should be good to go! If you’re using a Vagrant VM, you can hit the middleware directlyby calling http://socorro-api/bpapi/myservice/params/.
And then?
Once you are done creating your service in the middleware, you might want to use it in the WebApp. If so, have a lookat Socorro UI.
You might also want to document it. We are keeping track of all existing services’ documentation in our MiddlewareAPI page. Please add yours!
Writing a PostgreSQL middleware unit test
First create your new test file in the appropriate localtion as specified above, for example so-corro/unittest/external/postgresql/test_myservice.py
Next you want to import the following:
from socorro.external.postgresql.myservice import MyServiceimport socorro.unittest.testlib.util as testutil
As this is a PostgreSQL service unit test we also add:
from .unittestbase import PostgreSQLTestCase
Next item to add is your setup_module function, below is a barebones version that would be sufficient for most tests:
#------------------------------------------------------------------------------def setup_module():
testutil.nosePrintModule(__file__)
Next is the setup function in which you create and populate your dummy table(s)
#==============================================================================class TestMyService(PostgreSQLTestCase):
#--------------------------------------------------------------------------def setUp(self):
66 Chapter 12. Development Discussions
Socorro Documentation, Release 2
super(TestMyService, self).setUp()
cursor = self.connection.cursor()
#Create tablecursor.execute("""
CREATE TABLE product_info(
product_version_id integer not null,product_name citext,version_string citext,
);""")
# Insert datacursor.execute("""
INSERT INTO product_info VALUES(
1,’%s’,’%s’
);""" % ("Firefox", "8.0"))
self.connection.commit()
For your test table(s) you can include as many, or as few, columns and rows of data as your tests will require. Nextwe add the tearDown function that will clean up after our tests has run, by dropping tables we created in the setUpfunction.
#--------------------------------------------------------------------------def tearDown(self):
""" Cleanup the database, delete tables and functions """cursor = self.connection.cursor()cursor.execute("""
DROP TABLE product_info;""")self.connection.commit()super(TestProducts, self).tearDown()
Next, we write our actual tests against the dummy data we created in setUp. First step is to create an instance of theclass we are going to test:
#--------------------------------------------------------------------------def test_get(self):
products = Products(config=self.config)
Next we write our first test passing the parameters to our function it expects:
#......................................................................# Test 1: find one exact match for one product and one versionparams = {
"versions": "Firefox:8.0"}
Next we call our function passing the above parameters:
res = products.get_versions(**params)
12.2. New Developer Guide 67
Socorro Documentation, Release 2
The above will now return a response that we need to test and determine whether it contains what we expect. In orderto do this we create our expected response:
res_expected = {"hits": [
{"product_version_id": 1,"product_name": "Firefox","version_string": "8.0"
}],"total": 1
}
And finally we call the assertEquals function to test whether our response matches our expected response:
self.assertEqual(res, res_expected)
Running a PostgreSQL middleware unit test
If you have not already done so, install nose tests. From the commons line run the command:
sudo apt-get install python-nose
Once the installation completes change directory to, socorro/unittest/config/ and run the following:
cp commonconfig.py.dist commonconfig.py
Now you can open up the file and edit it’s contents to match your testing environment. If you are running this in a VMvia Socorro Vagrant, you can leave the content of the file as is. Next cd into socorro/unittest. To run all of the unittests, run the following:
nosetests
When writing a new test you most likely are more interested in running your own, and just your own, in-stead of running all of the unit tests that form part of Socorro. If your test is located in, for exampleunittest/external/postgresql/test_myservice.py then you can run your test as follows:
nosetests socorro.external.postgresql.test_myservice
Ensuring good style
To ensure that the Python code you wrote passes PEP8 you need to run check.py. To do this your first step is to installit. From the terminal run:
pip install -e git://github.com/jbalogh/check.git#egg=check
P.S. You may need to sudo the command above
Once installed, run the following:
check.py /path/to/your/file
12.2.4 How to Review a Pull Request
Part of our job as developers is to review and provide feedback on what our colleagues do. The goal of this process isto:
68 Chapter 12. Development Discussions
Socorro Documentation, Release 2
• test that a new feature works as expected
• make sure the code is clean
• make sure the code doesn’t break anything
Here are several steps you can follow when reviewing a pull request. Depending on the size of that pull request, youmight want to skip some phases.
Read the code
The first task when reviewing is to read the code and verify that it is coherent and clean. Try to understand thealgorithm and its goal, make sure that it is what was asked in the related bug. When there is something that you findnon-trivial and that is not documented, ask for a doc-string or an inline comment so it becomes easier for others tounderstand the code.
Pull the code into your local environment
To go on testing, you will need to have the code in your local environment. Let’s say you want to test the branchmy-dev-branch of rhelmer’s git repository. Here is one method to get the content of that remote branch into yourrepo:
git remote add rhelmer https://github.com/rhelmer/socorro.git # the first time onlygit fetch rhelmer my-dev-branch:my-dev-branchgit checkout my-dev-branch
Once you are in that branch, you can actually test the code or run tools on it.
Use a code quality tool
Running a code quality tool is a good and easy way to find coding and styling problems. For Python, we usecheck.py (check by jbalogh on github). This tool will run pyflakes on a file or a folder, and will then checkthat PEP 8 is respected.
To install check.py, run the following command:
pip install -e git://github.com/jbalogh/check.git#egg=check
For JavaScript, we suggest that you use JSHint. There are also a lot of tools for PHP, you can choose one you like.
For HTML and CSS files, please use the tools from the W3C: CSS Validator and HTML Validator.
Run the unit tests
Socorro has a growing number of unit tests that are very helpful at verifying nothing breaks. Before approving andmerging a pull request, you should run all unit tests to make sure they still pass.
Note that those unit tests will be run when the pull request is merged, but it is easier to fix something before it landson master than after.
To run the unit tests in a Vagrant VM, do the following:
make test
This installs all the dependencies needed and run all the tests. You need to have a running PostgreSQL instance forthis to work, with a specific config file for the tests in socorro/unittest/config/commonconfig.py.
For further documentation on unit tests, please read Unit Testing.
12.2. New Developer Guide 69
Socorro Documentation, Release 2
Test manually
This is not always possible in a local environment, but when it is you should make sure the new code behave asexpected. Read applychanges-label.
Test before
This is a process to verify that one’s work is good and can go into master with little risk of breaking something.However, the developer is responsible for his or her bug and that review process doesn’t mean he or she shouldn’t gothrough all these steps. The reviewer is here to make sure the developer didn’t miss something, but it’s easier to fixsomething before a review process than after. Please test your code before opening a pull request!
12.3 Glossary
Build: a date encoding used to identify when a client was compiled. (submission metadata)
Crash Report Details Page - A crash stats page displaying all known details of a crash
Crash Dump/Metadata pair - shorthand for The pair of Raw Crash Dump and corresponding Raw Crash Metadata
Deferred Job Storage: a file system location where Crash Dump/Metadata pair are kept without being processed.
Dump File: See Raw Crash Dump, don’t use this term it makes me giggle
Job: a job queue item for a Raw Crash Dump that needs to be processed
JSON Dump Storage: the Python module that implements File System
Materialized view: the tables in the database containing the data for used in statistical analysis. Including: [[Mean-TimeBeforeFailure]], Top Crashers By Signature, Top Crashers By URL. The “Trend Reports” from the Socorro UIdisplay information from these tables.
Minidump: see ‘raw crash dump’
Minidump_stackwalk: an application from the Breakpad project that takes a raw dump file, marries it with symbolsand produces output usable by developers. This application is invoked by Processor.
Monitor: the Socorro application in charge of queuing jobs. See Monitor
OOID: A crash report ID. Originally a 32bit value, the original legacy system stored it in the database as a hexidecimaltext form. Each crash is assigned an OOID by the Collector when the crash is recieved.
Platform: the OS that a client runs on. This term has been historically a point of confusion and it is preferred that theterm OS or Client OS be used instead.
Processed Dump Storage: the disk location where the output files of the minidump_stackwalk program are stored.The actual files are stored with a .jsonz extension.
Processor: the Socorro application in charge of applying minidump_stackwalk to queued jobs. See Processor
Raw Crash Dump, Raw Dump: the data sent from a client to Socorro containing the state of the application at thetime of failure. It is paired with a Raw Crash Metadata file.
Raw Crash Metadata - the metadata sent from a client to Socorro to describe the Raw Crash. It is saved in JSONformat, not to be confused with a Cooked Crash Dump.
Raw JSON file: See Crash Dump Metadata... a file in the JSON format containing metadata about a ‘dump file’.Saved with a ‘.json’ suffix.
Release: a categorization of an application’s product name and version. The categories are: “major”, “milestone”, or“development”. Within the database, an enum called ReleaseEnum? represents these categories.
70 Chapter 12. Development Discussions
Socorro Documentation, Release 2
Reporter: another name for the Socorro UI
Skip List: lists of signature regular expressions used in generating a crash’s overall signature in the Processor. seeSignature Generation
Standard Job Storage: a file system location where JSON/dump pairs are kept for processing
Throttling: statistically, we don’t have to save every single crash. This option of the Collector configuration allowsus to selectively throw away dumps.
Trend Reports: the pages in the Socorro UI that display the data from the materialized views.
UUID: a univeral unique identifier. Term is being deprecated in favor of OOID.
Web head: a machine that runs Collector
12.3.1 Deferred Job Storage
Deferred storage is where the JSON/dump pairs are saved if they’ve been filtered out by Collector throttling. Thelocation of the deferred job storage is determined by the configuration parameter deferredStorageRoot found in theCommon Config.
JSON/dump pairs that are saved in deferred storage are not likely to ever be processed further. They are held for aconfigurable number of days until deleted by Deferred Cleanup.
Occasionally, a developer will request a report via Reporter on a job that was saved in deferred storage. Monitor willlook for the job in deferred storage if it cannot find it in standard storage.
For more information on the storage technique, see File System
12.3.2 JSON Dump Storage
What this system offers
Crash data is stored so that it can be quickly located based on a Universally Unique Identifier (uuid) or visited by thedate and time when reported
Directory Structure
The crash files are located in a tree with two branches: the name or “index” branch and the date branch.
• The name branch consists of paths based on the first few pairs of characters of the uuid. Name branch holds the two data files and a relative symbolic link to the date branch directory associated with the particular uuid. For the uuid: 22adfb61-f75b-11dc-b6be-001321b0783d The “depth” is the number of sub-directories between the name directory and the actual file. By default, to conserve inodes, depth is two.
– By default, the json file is stored (depth 2) as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.json
– The json file could be stored (depth 4) as %(root)s/name/22/ad/fb/61/22adfb61-f75b-11dc-b6be-001321b0783d.json
– The dump file is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.dump
– The symbolic link is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d and(see below) references (own location)/%(toDateFromName)s/2008/09/30/12/05/webhead01_0/
• The date branch consists of paths based on the year, month, day, hour, minute-segment, webhead host name and a small sequence number. For each uuid, it holds a relative symbolic link referring to the actual name directory holding the data for that uuid. For the uuid above, submitted at 2008-09-30T12:05 from webhead01
12.3. Glossary 71
Socorro Documentation, Release 2
– The symbolic link is stored as %(root)s/date/2008/09/30/12/05/webhead01_0/22adfb61-f75b-11dc-b6be-001321b0783d and references (own location)/%(toNameFromDate)s/22/ad/fb/61/
• Note (name layout) In the examples on this page, the name/index branch uses the first 4 characters of the uuidas two character-pairs naming subdirectories. This is a configurable setting called storageDepth in the Collectorconfiguration. To use the 8 characters, storageDepth is set to 4. To use 6 characters, set to 3. The defaultstorageDepth is 2 because on our system, with (approximately) 64K leaf directories, the number of files per leafis reasonable; and the number of inodes required by directory entries is not so large as to cause undue difficulty.A storageDepth of 4 was examined, and was found to crash the file system by requiring too many inodes.
• If the uuids are such that their initial few characters are well spread among all possibles, then the lookup canbe very quick. If the first few characters of the uuids are not well distributed, the resulting directories may bevery large. If, despite well chosen uuids, the leaf name directories become too large, it would be simple to addanother level, reducing the number of files by approximately a factor of 256; however bear in mind the issue ofinodes.
• Note (symbolic links) The symbolic links are relative rather than absolute, to avoid issues that might arise fromvariously mounted nfs volumes.
• Note (maxDirectoryEntries) If the number of links in a particular webhead subdirectory would exceed maxDi-rectoryEntries, then a new webhead directory is created by appending a larger _N : .../webhead01_0 first, then.../webhead01_1 etc. For the moment, maxDirectoryEntries is ignored for the name branch.
How it’s used
We use the file system storage for incoming dumps caught by Collector. There are two instances of the file systemused for different purposes: standard storage and deferred storage.
Standard Job Storage
This is where json/dump pairs are stored for further processing. The Monitor finds new dumps and queues them forprocessing. It does this by walking the date branch of the file system using the API function destructiveDateWalk. Asit moves through the date branch, it notes every uuid (in the form of a symbolic link) that it encounters. It queues theinformation from the symbolic link and then deletes the symbolic link. This insures that it only ever finds new entries.Later, the Processor will read the json/dump pair by doing a direct lookup of the uuid on the name branch.
In the case of priority processing, the target uuid is looked up directly on the name branch. Then the link to the datebranch is used to locate and delete the link on the date branch. This insures that a priority job is not found a secondtime as a new job by the Monitor.
Deferred Job Storage
This is where jobs go that are deferred by Monitor‘s throttling mechanism. If a json/dump pair is needed for priorityprocessing, it can be looked up directly on the name branch. In such a case, just as with priority jobs in standardstorage, we destroy the links between the two branches. However, in this case, destroying the links prevents thejson/dump pair from being deleted by the deferred cleanup process.
When it comes time to drop old json/dump pairs that are no longer needed within the deferred storage, the system isgiven a date threshold. It walks the appropriate parts of the date branch older than the threshold. It uses the links tothe name branch to blow away the elderly json/dump pairs.
class JsonDumpStorage
socorro.lib.JsonDumpStorage holds data and implements methods for creating and accessing crash files.
72 Chapter 12. Development Discussions
Socorro Documentation, Release 2
public methods
• __init__(self, root=".", maxDirectoryEntries=1024, **kwargs)
Take note of our root directory, maximum allowed date->name links per directory, some relative relations, andwhatever else we may need. Much of this (c|sh)ould be read from a config file.
Recognized keyword args:
– dateName. Default = ‘date’
– indexName. Default = ‘name’
– jsonSuffix. Default = ‘.json’. If not startswith(‘.’) then ‘.’ is prepended
– dumpSuffix. Default = ‘.dump’. If not startswith(‘.’) then ‘.’ is prepended
– dumpPermissions. Default 660
– dirPermissions. Default 770
– dumpGID. Default None. If None, then owned by the owner of the running script.
• newEntry (self, uuid, webheadHostName=’webhead01’, timestamp=DT.datetime.now())
Sets up the name and date storage for the given uuid.
– Creates any directories that it needs along the path to the appropriate storage location (possiblyadjusting ownership and mode)
– Creates two relative symbolic links:
* the date branch link pointing to the name directory holding the files;
* the name branch link pointing to the date branch directory holding that link.
– Returns a 2-tuple containing files open for writing: (jsonfile,dumpfile)
• getJson (self, uuid)
Returns an absolute pathname for the json file for a given uuid. Raises OSError if the file is missing
• getDump (self, uuid)
Returns an absolute pathname for the dump file for a given uuid. Raises OSError if the file is missing
• markAsSeen (self,uuid)
Removes the links associated with the two data files for this uuid, thus marking them as seen. Quietlyreturns if the uuid has no associated links.
• destructiveDateWalk (self)
This function is a generator that yields all(see note) uuids found by walking the date branch of thefile system.
Just before yielding a value, it deletes both the links (from date to name and from nameto date) After visiting all the uuids in a given date branch, recursively deletes any emptysubdirectories in the date branch Since the file system may be manipulated in a differentthread, if no .json or .dump file is found, the links are left, and we do not yield that uuidnote To avoid race conditions, does not visit the date subdirectory corresponding to thecurrent time
• remove (self, uuid)
Removes all instances of the uuid from the file system including the json file, the dump file, and thetwo links if they still exist.
12.3. Glossary 73
Socorro Documentation, Release 2
– Ignores missing link, json and dump files: You may call it with bogus data, though of courseyou should not
• move (self, uuid, newAbsolutePath)
Moves the json file then the dump file to newAbsolutePath.
– Removes associated symbolic links if they still exist.
– Raises IOError if either the json or dump file for the uuid is not found, and retains any links, butdoes not roll back the json file if the dump file is not found.
• removeOlderThan (self, timestamp)
– Walks the date branch removing all entries strictly older than the timestamp.
– Removes the corresponding entries in the name branch.
member data
Most of the member data are set in the constructor, a few are constants, the rest are simple calculations based on theothers.
• root: The directory that holds both the date and index(name) subdirectories
• maxDirectoryEntries: The maximum number of links in each webhead directory on the date branch. Default =1024
• dateName: The name of the date branch subdirectory. Default = ‘date’
• indexName: The name of the index branch subdirectory. Default = ‘name’
• jsonSuffix: the suffix of the json crash file. Default = ‘.json’
• dumpSuffix: the suffix of the dump crash file. Default = ‘.dump’
• dateBranch: The full path to the date branch
• nameBranch: The full path to the index branch
• dumpPermissions: The permissions for the crash files. Default = 660
• dirPermissions: The permissions for the directories holding crash files. Default = 770
• dumpGID: The group ID for the directories and crash files. Default: Owned by the owner of the running script.
• toNameFromDate: The relative path from a leaf of the dateBranch to the nameBranch
• toDateFromName: The relative path from a leaf of the nameBranch to the dateBranch
• minutesPerSlot: How many minutes in each sub-hour slot. Default = 5
• slotRange: A precalculated range of slot edges = range(self.minutesPerSlot, 60, self.minutesPerSlot)
12.3.3 Processed Dump Storage
Processed dumps are stored in two places: the relational database as well as in flat files within a file system. Thisforking of the storage scheme came from the realization that the infrequently used data within the database ‘dumps’tables was causing performance problems within PostgreSQL. The ‘dumps’ tables took nearly eighty percent of thetotal storage, making replication and backup problematic. Since the ‘dumps’ table’s data is used only when a userrequests a specific crash dump by uuid, most of the data is rarely, if ever, accessed.
We decided to migrate these dump into a file system storage outside the database. Details can be seen at: DumpingDump Tables
74 Chapter 12. Development Discussions
Socorro Documentation, Release 2
In the file system, after processing, dumps are stored a gzip compressed JSON file format. This format echos aflattening of the ‘reports’, ‘extensions’ and the now deprecated ‘dumps’ tables within the database.
Directory Structure
Just as in the JsonDumpStorage scheme, there are two branches: ‘name’ and ‘date’
Access by Name
Most lookups of processed crash data happens by name. We use a radix storage technique where the first 4 characters ofthe file name are used for two levels of directory names. A file called aabbf9cb-395b-47e8-9600-4f20e2090331.jsonzwould be found in the file system as .../aa/bb/aabbf9cb-395b-47e8-9600-4f20e2090331.jsonz
Access by Date
For the purposes of finding crashes that happened at specific date and time, a hierarchy of date directories offer quicklookup. The leaves of the date directories contain symbolic links to the locations of crash data.
JSON File Format
example:
{"signature": "nsThread::ProcessNextEvent(int, int*)","uuid": "aabbf9cb-395b-47e8-9600-4f20e2090331","date_processed": "2009-03-31 14:45:09.215601","install_age": 100113,"uptime": 7,"last_crash": 95113,"product": "SomeProduct","version": "3.5.2","build_id": "20090223121634","branch": "1.9.1","os_name": "Mac OS X","os_version": "10.5.6 9G55","cpu_name": "x86","cpu_info": "GenuineIntel family 6 model 15 stepping 6","crash_reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS","crash_address": "0xe9b246","User Comments": "This thing crashed.\nHelp me Kirk.","app_notes": "","success": true,"truncated": false,"processor_notes": "","distributor":"","distributor_version": "","add-ons": [["{ABDE892B-13A8-4d1b-88E6-365A6E755758}", "1.0"], ["{b2e293ee-fd7e-4c71-a714-5f4750d8d7b7}", "2.2.0.9"], ["{972ce4c6-7e08-4474-a285-3208198ce6fd}", "3.5.2"]],"dump":"OS|Mac OS X|10.5.6 9G55\\nCPU|x86|GenuineIntel family 6 model 15 stepping 6|2\\nCrash|EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE|0x1558c095|0\\nModule|firefox-bin||firefox-bin|988FA8BFC789C4C07C32D61867BB42B60|0x00001000|0x00001fff|\\n....."}
The “dump” component is the direct streamed output from the Breakpad “minidump_stackwalk” program. Unfortu-nately, that project does not give detailed documentation of the format.
12.3. Glossary 75
Socorro Documentation, Release 2
12.3.4 Standard Job Storage
Standard storage is where the JSON/dump pairs are saved while they wait for processing. The location of the standardstorage is determined by the configuration parameter storageRoot found in the Common Config.
The file system is divided into two parts: date based storage and name based storage. Both branches use a radix sortbreakdown to locate files. The original version of Socorro used only the date based storage, but it was found to be tooslow to search when under a heavy load.
For a deeper discussion of the storage technique: see File System
12.3.5 Top Crashers By URL
Introduction
The Top Crashers By Url report displays aggregate crash counts by unique urls or by unique domains. From here onecan drill down to crash signatures. For crashes with comments, we display the comment in a link to the individualcrash. In the future, signatures will be linked to search results, once we support url/domain as a search parameter.
Details
Data Definitions
Urls - This is everything before the query string. Domains - This is the entire hostname.
Examples:
http://www.example.com/page.html?foo=bar
• url - http://www.example.com/page.html
• domain - www.example.com
chrome://example/content/extension.xul
• url - chrome://example/content/extension.xul
• domain - example
about:config
invalid, no protocol
Filtering
For a crash report to be counted it much have the following:
• A url which is not null or empty and which has a protocol
• Aggregates are calculated 1 day at a time for the previous day
• At the level of aggregation, it must have more than 1 record
Crash data viewed from the url perspective is a very long tail of crashes for a single unique url. We cut off this tailwhich reduces data storage and processing time by an order of magnitude.
A consequence of this filtering (only good urls + multiple crashes) makes the total crash aggregates much lower thantop crashers or raw queries. Keep this in mind when using aggregates: Top crashers (by os) is a much better gauge.
76 Chapter 12. Development Discussions
Socorro Documentation, Release 2
Administration
Configuring new products
The Top Crashers By URl report is powered by the tcbyurlconfig and productdims tables.
1. Make sure your product is in the productdims table
(a) If not, insert it. The following sets up a specific version of a specific product for all, win, and macplatforms.:
INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’ALL’,’major’);INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’Win’,’major’);INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’Mac’,’major’);
2. Insert a config entry for the exact product you want to report on. usually this is os_name = ALL.:
INSERT INTO tcbyurlconfig (productdims_id, enabled)SELECT id, ’Y’ FROM productdims WHERE product = ’Firefox’ AND version = ’3.0.4’ AND os_name = ’ALL’;
3. wait for results
4. reap the profit.
Suspending Reports
Table tcbyurlconfig has an ‘enabled’ column. Set it false to stop the cron from updating the reports for a particularproduct.
Mozilla Specific
Make sure to match up the release type. versions with pre are milestone. Versions with a or b in them are development.
Operations
This report is populated by a cron python script which runs at 10:00 PM PST. The run is controlled by configurationdata from a table in the database. All products which are enabled in this config table will have their daily reportgenerated.
In future this will be managed via an admin page, but currently it is managed via SQL.
Development
Details about the database design are in Report Database Design
12.3.6 Top Crashers By Signature
Introduction
Topcrashers By Signature compiles the 14 days’ worth of crash reports (organized by signature) for a given version.This report is useful for finding new topcrashes, determining if topcrashes have been filed, and seeing trending oftopcrashes over time (for a specific version).
12.3. Glossary 77
Socorro Documentation, Release 2
Details
For the ideal topcrashers by signature report, we want to gather the following data:
• crashes by version (e.g., Firefox 3.0.9)
• date a crash occurred (to know if it’s within our window)
• stack signature
• average uptime (since last browser start) averaged over window
• bug numbers related to crash signature
Additionally, we need the ability to either a) go back in time or b) “freeze” the topcrashers by signature report ona specific day. This allows us to compare, say, the last day of a release to the newest release (e.g., Firefox 3.0.8 toFirefox 3.0.9). Without the ability to go back to a specific day of topcrash reports or freeze topcrash reports, we haveno easy ability to compare releases (as new crashes come in for old releases, the topcrash list changes substantially).
Ideal Outputs
(to be filled)
See [[SocorroUIInstallation]] for additional details.
Operations
• Need a recalculation every 4 to 6 hours
• Need top 500 signatures, ranked over last 14 days
• Note that this implies for the database that each slice is aggregated from the full window (which slides forwardeach time)
12.3.7 Signature Generation
Introduction
The Processor creates an overall signature for a crash based on the signatures of the stack frame of the crashing thread.It walks the stack from the frame with the lowest number (the top of the stack) applying rules and accumulating a listof signatures found to be relevant. Once the rules are done, the list of signatures is concatenated into a single string.That single string become the crash’s overall signature.
Normalization
Before any frame signatures are considered, they are normalized. This is just a string formating change. Runs ofspaces are compressed to just one space. Commas are insured to always be followed by a space, integer values arereplaced by ‘int’. Signatures that match the signaturesWithLineNumbersRegEx regular expression are combined withtheir source code line. Frames that have no function information are written as sourcecode/line number pairs. If nosource code is available, it tries to find a module/address pair. Failing that, it falls back to just an address.
The SkipList Rules
The signature is generated by walking through each stack frame considering its ‘name’ (as normalized above). Frames/ names are skipped or added to the signature list according to the rules. When a signature list is complete, it is
78 Chapter 12. Development Discussions
Socorro Documentation, Release 2
converted to string by concatenating the frame names with spaces and a vertical bar between each name, for exam-ple: objc_msgSend | IdleTimerVector is the signature for a stack that contained (irrelevant frames), “objc_msgSend”,“IdleTimerVector” which matched neither prefix nor irrelevant regular expressions and possibly other frames whichdid not become part of the signature.
regular expressions
Each SkipList rule is a regular expression. Typically, it takes the form of an alternation of frame names, but any legalregular expression can be used. Regular expression alternation syntax is a|b|c: Match on ‘a’ or ‘b’ or ‘c’. This work isdone in Python, so use Python Regular Expression Syntax
signatureSentinels
A typical rule might be: “_purecall”.
This is the first rule to be applied. The code iterates through the stack frame, throwing away everything it finds until itencounters a match to this regular expression or the end of the stack. If it finds a match, it passes all the frames afterthe match to the next step. If it finds no match, it passes the whole list of frames to the next step.
irrelevantSignatureRegEx
A typical rule might be: “@0x0-9a-fA-F{2,}|@0x1-9a-fA-F|RaiseException|CxxThrowException”.
A frame which matches this regular expression will be appended to the signature only if a prefix frame has alreadybeen seen (see next rule).
prefixSignatureRegEx
A typical rule might be “@0x0|strchr|strstr|strlen|PL_strlen|strcmp|wcslen|memcpy|memmove|memcmp|malloc|realloc|objc_msgSend”,though at Mozilla it has grown much longer.
This is the rule that generates compound signatures. A frame that matches this regular expression changes the stateof the machine to ‘seen prefix’. In ‘seen prefix’ state, irrelevant or prefix frames are appended. As soon as a frame isneither, it is appended and the signature list is complete.
Once the signature list is complete, the signature is generated as mentioned above
12.3.8 Crash Mover
The Collector dumps all the crashes that it receives into the local file system. This application is responsible fortransferring those crashes into hbase.
Configuration:
import statimport socorro.lib.ConfigurationManager as cm
#-------------------------------------------------------------------------------# general
numberOfThreads = cm.Option()numberOfThreads.doc = ’the number of threads to use’numberOfThreads.default = 4
12.3. Glossary 79
Socorro Documentation, Release 2
#-------------------------------------------------------------------------------# source storage
sourceStorageClass = cm.Option()sourceStorageClass.doc = ’the fully qualified name of the source storage class’sourceStorageClass.default = ’socorro.storage.crashstorage.CrashStorageSystemForLocalFS’sourceStorageClass.fromStringConverter = cm.classConverter
from config.collectorconfig import localFSfrom config.collectorconfig import localFSDumpDirCountfrom config.collectorconfig import localFSDumpGIDfrom config.collectorconfig import localFSDumpPermissionsfrom config.collectorconfig import localFSDirPermissionsfrom config.collectorconfig import fallbackFSfrom config.collectorconfig import fallbackDumpDirCountfrom config.collectorconfig import fallbackDumpGIDfrom config.collectorconfig import fallbackDumpPermissionsfrom config.collectorconfig import fallbackDirPermissions
from config.commonconfig import jsonFileSuffixfrom config.commonconfig import dumpFileSuffix
#-------------------------------------------------------------------------------# destination storage
destinationStorageClass = cm.Option()destinationStorageClass.doc = ’the fully qualified name of the source storage class’destinationStorageClass.default = ’socorro.storage.crashstorage.CrashStorageSystemForHBase’destinationStorageClass.fromStringConverter = cm.classConverter
from config.commonconfig import hbaseHostfrom config.commonconfig import hbasePortfrom config.commonconfig import hbaseTimeout
#-------------------------------------------------------------------------------# logging
syslogHost = cm.Option()syslogHost.doc = ’syslog hostname’syslogHost.default = ’localhost’
syslogPort = cm.Option()syslogPort.doc = ’syslog port’syslogPort.default = 514
syslogFacilityString = cm.Option()syslogFacilityString.doc = ’syslog facility string ("user", "local0", etc)’syslogFacilityString.default = ’user’
syslogLineFormatString = cm.Option()syslogLineFormatString.doc = ’python logging system format for syslog entries’syslogLineFormatString.default = ’Socorro Storage Mover (pid %(process)d): %(asctime)s %(levelname)s - %(threadName)s - %(message)s’
syslogErrorLoggingLevel = cm.Option()syslogErrorLoggingLevel.doc = ’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’syslogErrorLoggingLevel.default = 10
stderrLineFormatString = cm.Option()
80 Chapter 12. Development Discussions
Socorro Documentation, Release 2
stderrLineFormatString.doc = ’python logging system format for logging to stderr’stderrLineFormatString.default = ’%(asctime)s %(levelname)s - %(threadName)s - %(message)s’
stderrErrorLoggingLevel = cm.Option()stderrErrorLoggingLevel.doc = ’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’stderrErrorLoggingLevel.default = 10
12.3.9 Collector
Collector is an application that runs under Apache using mod-python. Its task is accepting crash reports from remoteclients and saving them in a place and format usable by further applications.
Raw crashes are accepted via HTTP POST. The form data from the POST is then arranged into a JSON and saved intothe local file system. The collector is responsible for assigning an ooid? (Our Own ID) to the crash. It also assigns aThrottle? value which determines if the crash is eventually to go into the relational database.
Should the saving to a local file system fail, there is a fallback storage mechanism. A second file system can beconfigured to take the failed saves. This file system would likely be an NFS mounted file system.
After a crash is saved, there is an app called Crash Mover that will transfer the crashes to HBase.
Collector Python Configuration
Like all the Socorro applications, the configuration is actually executable Python code. Two configuration files arerelevant for collector
• Copy .../scripts/config/commonconfig.py.dist to .../config/commonconfig.py. This configura-tion file contains constants used by many of the Socorro applications.
• Copy .../scripts/config/collectorconfig.py.dist to .../config/collectorconfig.py
Common Configuration
There are two constants in ‘.../scripts/config/commonconfig.py’ of interest to collector: jsonFileSuffix, and dumpFile-Suffix. Other constants in this file are ignored.
To setup the common configuration, see Common Config.
Collector Configuration
collectorconfig.py has several options to adjust how files are stored:
See sample config code on Github
12.3.10 Reporter
Deprecated.
See :ref:‘uiinstallation-chapter‘_
12.3. Glossary 81
Socorro Documentation, Release 2
12.3.11 Monitor
Monitor is a multithreaded application with several mandates. It’s main job is to find new JSON/dump pairs and queuethem for further processing. It looks for new JSON/dump pairs in the file system location designated by the constantstorageRoot from the Common Config file. Once it finds a pair, it queues them as a “job” in the database ‘jobs’ tableand assigns it to a specific processor. Once queued, the monitor goes on to find other new jobs to queue.
Monitor also locates and queues priority jobs. If a user requests a report via the Reporter and that crash report hasnot yet been processed, the Reporter puts the requested crash’s UUID into the database’s ‘priorityjobs’ table. Monitorlooks in three places for the requested job:
• the processors - if monitor finds the job already assigned to a processor, it raises the priority of that job so theprocessor will do it quickly
• the storageRoot file system - if the job is found here, it queues it for priority processing immediately rather thanwaiting for standard mechanism to eventually find it
• the deferredStorageRoot file system - if the requested crash was filtered out by server side throttling, monitorwill find it and queue it immediately from that location.
Monitor is also responsible for keeping the StandardJobStorage file system neat and tidy. It monitors the ‘jobs’ queuein the database. Once it sees that a previously queued job has been completed, it moves the JSON/dump pairs to longterm storage or it deletes them (based on a configuration setting). Jobs that fail their further processing stage are alsoeither saved in a “failed” storage area or deleted.
Monitor is a command line application meant to be run continuously as a daemon. It can log its actions to stderr and/orto automatically rotating log files. See the configuration options below beginning with stderr* and logFile* for moreinformation.
The monitor app is found as .../scripts/monitor.py In order to run monitor, the socorro package must bevisible somewhere on the python path.
Configuration
Monitor, like all the Socorro applications, uses the common configuration for several of its constants. For setup ofcommon configuration, see Common Config.
monitor also has an executable configuration file of its own. A sample file is found at.../scripts/config/monitorconfig.py.dist. Copy this file to .../scripts/config/monitorconfig.pyand edit it for site specific settings.
In each case where a site specific value is desired, replace the value for the .default member.
standardLoopDelay
Monitor has to scan the StandardJobStorage looking for jobs. This value represents the delay between scans.:
standardLoopDelay = cm.Option()standardLoopDelay.doc = ’the time between scans for jobs (HHH:MM:SS)’standardLoopDelay.default = ’00:05:00’standardLoopDelay.fromStringConverter = cm.timeDeltaConverter
cleanupJobsLoopDelay
Monitor archives or deletes JSON/dump pairs from the StandardJobStorageThis? value represents the delay betweenruns of the archive/delete routines.:
cleanupJobsLoopDelay = cm.Option()cleanupJobsLoopDelay.doc = ’the time between runs of the job clean up routines (HHH:MM:SS)’cleanupJobsLoopDelay.default = ’00:05:00’cleanupJobsLoopDelay.fromStringConverter = cm.timeDeltaConverter
82 Chapter 12. Development Discussions
Socorro Documentation, Release 2
priorityLoopDelay
The frequency to look for priority jobs.:
priorityLoopDelay = cm.Option()priorityLoopDelay.doc = ’the time between checks for priority jobs (HHH:MM:SS)’priorityLoopDelay.default = ’00:01:00’priorityLoopDelay.fromStringConverter = cm.timeDeltaConverter
saveSuccessfulMinidumpsTo:
saveSuccessfulMinidumpsTo = cm.Option()saveSuccessfulMinidumpsTo.doc = ’the location for saving successfully processed dumps (leave blank to delete them instead)’saveSuccessfulMinidumpsTo.default = ’/tmp/socorro-sucessful’
saveFailedMinidumpsTo:
saveFailedMinidumpsTo = cm.Option()saveFailedMinidumpsTo.doc = ’the location for saving dumps that failed processing (leave blank to delete them instead)’saveSuccessfulMinidumpsTo.default = ’/tmp/socorro-failed’
logFilePathname
Monitor can log its actions to a set of automatically rotating log files. This is the name and location of the logs.:
logFilePathname = cm.Option()logFilePathname.doc = ’full pathname for the log file’logFilePathname.default = ’./monitor.log’
logFileMaximumSize
This is the maximum size in bytes allowed for a log file. Once this number is achieved, the logs rotate and a new logis started.:
logFileMaximumSize = cm.Option()logFileMaximumSize.doc = ’maximum size in bytes of the log file’logFileMaximumSize.default = 1000000
logFileMaximumBackupHistory
The maximum number of log files to keep.:
logFileMaximumBackupHistory = cm.Option()logFileMaximumBackupHistory.doc = ’maximum number of log files to keep’logFileMaximumBackupHistory.default = 50
logFileLineFormatString
A Python format string that controls the format of individual lines in the logs:
logFileLineFormatString = cm.Option()logFileLineFormatString.doc = ’python logging system format for log file entries’logFileLineFormatString.default = ’%(asctime)s %(levelname)s - %(message)s’
logFileErrorLoggingLevel
Logging is done in severity levels - the lower the number, the more verbose the logs.:
logFileErrorLoggingLevel = cm.Option()logFileErrorLoggingLevel.doc = ’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’logFileErrorLoggingLevel.default = 10
12.3. Glossary 83
Socorro Documentation, Release 2
stderrLineFormatString
In parallel with creating log files, Monitor can log to stderr. This is a Python format string that controls the format ofindividual lines sent to stderr.:
stderrLineFormatString = cm.Option()stderrLineFormatString.doc = ’python logging system format for logging to stderr’stderrLineFormatString.default = ’%(asctime)s %(levelname)s - %(message)s’
stderrErrorLoggingLevel
Logging to stderr is done in severity levels independently from the log file severity levels - the lower the number, themore verbose the output to stderr.:
stderrErrorLoggingLevel = cm.Option()stderrErrorLoggingLevel.doc = ’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’stderrErrorLoggingLevel.default = 40
12.3.12 File System
Socorro uses two similar file system storage schemes in two distinct places within the system. Raw crash dumps fromthe field use a system called JSON Dump Storage while at the other end, processed dumps use the Processed DumpStorage scheme.
12.3.13 Deferred Cleanup
When the Collector throttles the flow of crash dumps, it saves deferred crashes into Deferred Job Storage. TheseJSON/dump pairs will live in deferred storage for a configurable number of days. It is the task of the deferred cleanupapplication to implement the policy to delete old crash dumps.
The deferred cleanup application is a command line app meant to be run via as a cron job. It should be set to run onceevery twenty-four hours.
Configuration
deferredcleanup uses the common configuration for to get the constant deferredStorageRoot. For setup of commonconfiguration, see Common Config.
deferredcleanup also has an executable configuration file of its own. A sample file isfound at .../scripts/config/deferredcleanupconfig.py.dist. Copy this file to.../scripts/config/deferredcleanupconfig.py and edit it for site specific settings.
In each case where a site specific value is desired, replace the value for the .default member.
maximumDeferredJobAge
This constant specifies how many days deferred jobs are allowed to stay in deferred storage. Job deletion is permanent.:
maximumDeferredJobAge = cm.Option()maximumDeferredJobAge.doc = ’the maximum number of days that deferred jobs stick around’maximumDeferredJobAge.default = 2
dryRun
Used during testing and development, this prevents deferredcleanup from actually deleting things.:
84 Chapter 12. Development Discussions
Socorro Documentation, Release 2
dryRun = cm.Option()dryRun.doc = "don’t really delete anything"dryRun.default = FalsedryRun.fromStringConverter = cm.booleanConverter
logFilePathname
Deferredcleanup can log its actions to a set of automatically rotating log files. This is the name and location of thelogs.:
logFilePathname = cm.Option()logFilePathname.doc = ’full pathname for the log file’logFilePathname.default = ’./processor.log’
logFileMaximumSize
This is the maximum size in bytes allowed for a log file. Once this number is achieved, the logs rotate and a new logis started.:
logFileMaximumSize = cm.Option()logFileMaximumSize.doc = ’maximum size in bytes of the log file’logFileMaximumSize.default = 1000000
logFileMaximumBackupHistory
The maximum number of log files to keep.:
logFileMaximumBackupHistory = cm.Option()logFileMaximumBackupHistory.doc = ’maximum number of log files to keep’logFileMaximumBackupHistory.default = 50
logFileLineFormatString
A Python format string that controls the format of individual lines in the logs:
logFileLineFormatString = cm.Option()logFileLineFormatString.doc = ’python logging system format for log file entries’logFileLineFormatString.default = ’%(asctime)s %(levelname)s - %(message)s’
logFileErrorLoggingLevel
Logging is done in severity levels - the lower the number, the more verbose the logs.:
logFileErrorLoggingLevel = cm.Option()logFileErrorLoggingLevel.doc = ’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’logFileErrorLoggingLevel.default = 20
stderrLineFormatString
In parallel with creating log files, Monitor can log to stderr. This is a Python format string that controls the format ofindividual lines sent to stderr.:
stderrLineFormatString = cm.Option()stderrLineFormatString.doc = ’python logging system format for logging to stderr’stderrLineFormatString.default = ’%(asctime)s %(levelname)s - %(message)s’
stderrErrorLoggingLevel
Logging to stderr is done in severity levels independently from the log file severity levels - the lower the number, themore verbose the output to stderr.:
12.3. Glossary 85
Socorro Documentation, Release 2
stderrErrorLoggingLevel = cm.Option()stderrErrorLoggingLevel.doc = ’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’stderrErrorLoggingLevel.default = 40
12.4 Standalone Development Environment
You can easily bring up a full Socorro VM, see Setup a development environment for more info.
However, in some cases it can make sense to run components standalone in a development environment, for exampleif you want to run just one or two components and connect them to an existing Socorro install for debugging.
12.4.1 Setting up
1) clone the repo (http://github.com/mozilla/socorro)
git clone git://github.com/mozilla/socorro.gitcd socorro/
2) set up Python path
export PYTHONPATH=.:thirdparty/
3) create virtualenv and use it (this installs all needed Socorro dependencies)
make virtualenv. socorro-virtualenv/bin/activate
4) copy default Socorro config (also see Common Config)
pushd scripts/configfor file in *.py.dist; do cp $file ‘basename $file .dist‘; doneedit commonconfig.py (...)popd
12.4.2 Install and configure UI
1) symlink webapp-php/ to HTDOCS area
mv ~/public_html ~/public_html.oldln -s ./webapp-php ~/public_html
2) copy default webapp config (also see UI Installation)
cp htaccess-dist .htaccesspushd webapp-php/application/config/for file in *.php-dist; do cp $file ‘basename $file -dist‘; doneedit database.php config.php (...)popd
3) make sure log area is writable to webserver user
chmod o+rwx webapp-php/application/logs
86 Chapter 12. Development Discussions
Socorro Documentation, Release 2
12.4.3 Launch standalone Middleware instance
Edit scripts/config/webapiconfig.py and change wsgiInstallation to False (this allows the middleware to run in stan-dalone mode):
wsgiInstallation.default = False
NOTE - make sure to use an unused port, it should be the same as whatever you configure in webapp-php/application/config/webserviceclient.php
python scripts/webservices.py 9191
This will use whichever database you configured in commonconfig.py
12.5 Unit Testing
There are (some, and a growing number of) unit tests for the Socorro code
12.5.1 How to Unit Test
• configure your test environment (see below)
• install nosetests
• cd to socorro/unittests
• chant nosetests and observe the result
– You should expect more than 185 tests (186 as of 2009-03-25)
– You should see exactly two failures (unless you are running as root), with this assertion: Assertion-Error: You must run this test as root (don’t forget root’s PYTHONPATH):
ERROR: testCopyFromGid (socorro.unittest.lib.testJsonDumpStorageGid.TestJsonDumpStorageGid)ERROR: testNewEntryGid (socorro.unittest.lib.testJsonDumpStorageGid.TestJsonDumpStorageGid)
• You may ‘observe’ the result by chanting nosetests > test.out 2>&1 and then examining test.out (orany name you prefer)
• There is a bash shell file: socorro/unittest/red which may sourced to provide a bash function red that simplifieswatching test logfiles in a separate terminal window. In that window, cd to the unittest sub-directory of interest,then source the file: . ../red, then chant red. The effect is to clear the screen, then tail -F the logfile associatedwith tests in that directory. You may chant red –help to be reminded.
• The red file also provides a function noseErrors which simplifies the examination of nosetests output. ChantnoseErrors –help for a brief summary.
12.5.2 How to write Unit Tests
Nose provides some nice tools. Some of the tests require nose and nosetests (or a tool that mimics its behavior)However, it is also quite possible to use Python’s unittest. No tutorial here. Instead, take a look at an existing test fileand do something usefully similar.
12.5. Unit Testing 87
Socorro Documentation, Release 2
12.5.3 Where to write Unit Tests
To maintain the current test layout, note that for every directory under socorro, there is a same-name directory undersocorro/unittest where the test code for the working directory should be placed. In addition, there is unittest/testlibthat holds a library of useful testing code as well as some tests for that library.
If you add a unittest subdirectory holding new tests, you must also provide init.py which may be empty, or nosetestswill not enter the directory looking for tests.
12.5.4 How to configure your test environment
• You must have a working postgresql installation see Installation for version. It need not be locally hosted,though if not, please be careful about username and password for the test user. Also be careful not to step on aworking database: The test cleanup code drops tables.
• You must either provide for a postgreql account with name and password that matches the config fileor edit the test config file to provide an appropriate test account and password. That file is so-corro/unittest/config/commonconfig.py. If you add a new test config file that needs database access, you shouldimport the details from commonconfig, as exemplified in the existing config files.
• You must provide a a database appropriate for the test user (default: test. That database must support PLPGSQL.As the owner of the test database, while connected to that database, invoke CREATE LANGUAGE PLPGSQL;
• You must have installed nose and nosetests; nosetests should be on your PATH and the nose code/egg should beon your PYTHONPATH
• You must have installed the psycopg2 python module
• You must adjust your PYTHONPATH to include the directory holding soccoro. E.g if you have in-stalled socorro at /home/tester/Mozilla/socorro then your PYTHONPATH should look like...:/home/tester/Mozilla:/home/tester/Mozilla/thirdparty:...
12.6 Crash Repro Filtering Report
12.6.1 Introduction
This page describes a report that assists in analyzing crash data for a stack signature in order to try and reproduce acrash and develop a reproducible test case.
12.6.2 Details
for each release pull a data set of one weeks worth of data ranked by signature like:
http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.0.10&date=&range_value=7&range_unit=days&query_search=signature&query_type=contains&query=
the provide a list like this with several fields of interest for examing the data
Date Product Version Build OS CPU Reason Address Uptime Comments
but also need to add urls into the version of this report that is behind auth. “reason” is not so helpful to me at this stage,but others can weigh in on the idea of removing it.
maybe just make it include all these or allow users to pick the fields it shows like bugzilla does?
Signature,Crash Address,UUIDProduct,Version,Build,OS,Time,Uptime,Last Crash,URL,User Comments
anyway, get something close to what we have now in “Crash Reports in PR_MD_SEND”
88 Chapter 12. Development Discussions
Socorro Documentation, Release 2
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.0.10&query_search=signature&query_type=contains&query=&date=&range_value=7&range_unit=days&do_query=1&signature=_PR_MD_SEND
next allow the report user to apply filters to build more precise queries from the set of reports.. filters might be fromany of the fields or it would really cool if we could also filter from other items in the crash report like the full stacktrace and/or module list:
filter uptime? < 60 secondsand filter address? exactly_matches 0x187d000and fliter url? contains mail.google.comor fliter url? conttains mail.yahoo.comand filter modulelist? does_not_contain "mswsock.dll 5.1.2600.3394"
that last example of module list might be a stretch, but would be very valuable to check module list for existance ornon-existance of binary components and their version numbers.
from there we would want to see the results and export to csv to import things like url lists into page load testingsystems to look for reproducible crashers.
12.7 Disk Performance Tests
12.7.1 Introduction
Any DBMS for a database which is larger than memory can be no faster than disk speed. This document outlines aseries of tests for testing disk speed to determine if you have an issue. Written originally by PostgreSQL Experts Inc.for Mozilla.
12.7.2 Running Tests
Note: all of the below require you to have plenty of disk space available. And their figures are only reliable if nothingelse is running on the system.
Simplest Test: The DD Test
This test measures the most basic single-threaded disk access: a large sequential write, followed by a large sequentialread. It is relevant to database performance because it gives you a maximum speed for sequential scans for large tables.Real table scans are generally about 30% of this maximum.
dd is a Unix command line utility which simply writes to a block device. We use it for this 3-step test. The other thingyou need to know for this test is your RAM size.
1. We create a large file which is 2x the size of RAM, and synch it to disk. This makes sure that we get the realsustained write rate, because caching can have little effect. Since there are 125000 blocks per GB (8k blocksizeis used because it’s what Postgres uses), if we had 8GB of RAM, we would run the following:
time sh -c "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync"
dd will report a time and write rate to us, and “time” will report a larger time. The time and rate reported bydd represents the rate without any lag or synch time; divide the data size by the time reported by “time” forsynchronous file writing rate.
2. Next we want to write another large file, this one the size of RAM, in order to flush out the FS cache so that wecan read directly from disk later.:
dd if=/dec/zero of=ddfile2 bs=8K count=500000
3. Now, we want to read the first file back. Since the FS cache is full from the second file, this should be 100%disk access:
12.7. Disk Performance Tests 89
Socorro Documentation, Release 2
time dd if=ddfile of=/dev/null bs=8k
This time, “time” and dd will be very close together; any difference will be strictly storage lag time.
12.7.3 Bonnie++
Bonnie++ is a more sophisticated set of tests which tests random reads and writes, as well as seeks, and file cre-ation and deletion operations. For a modern system, you want to use the last version, 1.95, downloaded fromhttp://www.coker.com.au/bonnie++/experimental/ This final version of bonnie++ supports concurrency and measureslag time. However, it is not available in package form in most OSes, so you’ll have to compile it using g++.
Again, for Mozilla we want to test performance for a database which is larger than RAM, since that’s what we have.Therefore, we’re going to run a concurrent Bonnie++ test where the total size of the files is about 150% of RAM,forcing the use of disk. We’re also going to run 8 threads to simulate concurrent file access. Our command line for amachine with 16GB RAM is:
bonnie++ -d /path/to/storage -c 8 -r 16000 -n 100
The results we get back look something like this:
Version 1.95 ------Sequential Output------ --Sequential Input- --Random-Concurrency 8 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CPtm-breakpad0 32000M 757 99 71323 16 30594 5 2192 99 57555 4 262.5 13Latency 15462us 6918ms 4933ms 11096us 706ms 241msVersion 1.95 ------Sequential Create------ --------Random Create--------tm-breakpad01-maste -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP100 44410 75 +++++ +++ 72407 81 45787 77 +++++ +++ 63167 72
Latency 9957us 477us 533us 649us 93us 552us
So, the interesting parts of this are:
Sequential Output: Block: this is sequential writes like dd does. It’s 70MB/s.
Sequential Input: Block: this is sequential reads from disk. It’s 57MB/s.
Sequential Output: Rewrite: is reading, then writing, a file which has been flushed to disk. This rate will be lower thaneither of the above, and is at 30MB/s.
Random: Seeks: this is how many individual blocks Bonnie can seek to per second; it’s a fast 262.
Latency: this is the full round-trip lag time for the mentioned operation. On this platform, these times are catas-trophically bad; 1/4 second round-trip to return a single random block, and 3/4 seconds to return the start of a largefile.
The figures on file creations and deletion are generally less interesting to databases. The +++++ are for runs that wereso fast the error margin makes the figures meaningless; for better figures, increase -n.
12.7.4 IOZone
Now, if you don’t think Bonnie++ told you enough, you’ll want to run Iozone. Iozone is a benchmark mostly know forcreating pretty graphs (http://www.iozone.org/) of filesystem performance with different file, batch, and block sizes.However, this kind of comprehensize profiling is completely unnecessary for a DBMS, where we already know thefile access pattern, and can take up to 4 days to run. So do not run Iozone in automated (-a) mode!
Instead, run a limited test. This test will still take several hours to run, but will return a more limited set of relevantresults. Run this on a 16GB system with 8 cores, from a directory on the storage you want to measure:
90 Chapter 12. Development Discussions
Socorro Documentation, Release 2
iozone -R -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 8 -l 6 -u 6 -r 8k -s 4G -F f1 f2 f3 f4 f5 f6
This runs the following tests: write/read, rewrite/reread, random-read/write, read-backwards, re-write-record, stride-read, random mix. It does these tests using 6 concurrent processes, a block size of 8k (Postgres’ block size) for 4Gfiles named f1 to f6. The aggregate size of the files is 24G, so that they won’t all fit in memory at once.
In theory, the relevance of these tests to database activity is the following:
write/read: basic sequential writes and reads.
rewrite/reread: writes and reads of frequently accessed tables (in memory)
random-read/write: index access, and writes of individual rows
read-backwards: might be relevant to reverse index scans.
re-write-record: frequently updated row behavior
stride-read: might be relevant to bitmapscan
random mix: general database access average behavior.
The results you get will look like this:
Children see throughput for 6 initial writers = 108042.81 KB/secParent sees throughput for 6 initial writers = 31770.90 KB/secMin throughput per process = 13815.83 KB/secMax throughput per process = 35004.07 KB/secAvg throughput per process = 18007.13 KB/secMin xfer = 1655408.00 KB
And so on through all the tests. These results are pretty self-explanatory, except that I have no idea what the differencebetween “Children see” and “Parent sees” means. Iozone documentation is next-to-nonexistant.
Note: IOZone appears to have several bugs, and places where its documentation and actual features don’t match.Particularly, it appears to have locking issues in concurrent access mode for some writing activity so that concurrencythroughput may be lower than actual.
12.8 Dumping Dump Tables
A work item that came out of the Socorro Postgres work week is to dump the dump tables and store cooked dumps asgzipped files. Drop dumps table
convert each dumps table row to a compressed file on disk
12.8.1 Bugzilla
https://bugzilla.mozilla.org/show_bug.cgi?id=484032
12.8.2 Library support
‘done’ as of 2009-05-07 in socorro.lib.dmpStorage (Coding, testing is done; integration testing is done, ‘go live’ istoday) Socorro UI
/report/index/{uuid}
• Will stop using the dumps table.
• Will start using gzipped files
12.8. Dumping Dump Tables 91
Socorro Documentation, Release 2
– Will use the report uuid to locate the dump on a file system
– Will use apache mod-rewrite to serve the actual file. The rewrite rule is based onthe uuid, and is ‘simple’: AABBCCDDEEFFGGHHIIJJKKLLM2090308.jsonz =>AA/BB/AABBCCDDEEFFGGHHIIJJKKLLM2090308.jsonz
– report/index will include a link to JSON dump
link rel=’alternate’ type=’application/json’ href=’/reporter/dumps/cdaa07ae-475b-11dd-8dfa-001cc45a2ce4.jsonz’
12.8.3 Dump file format
• Will be gzip compressed JSON encoded cooked dump files
• Partial JSON file
• Full JSONZ file
12.8.4 On Disk Location
application.conf dumpPath Example for kahn $config’dumpPath’? = ‘/mnt/socorro_dumps/named’;
In the dumps directory we will have an .htaccess file:
AddType "application/json; charset=UTF-8" jsonzAddEncoding gzip jsonz
Webhead will serve these files as:
Content-Type: application/json; charset=utf-8Content-Encoding: gzip
**Note:* You’d expect the dump files to be named json.gz, but this is broken in Safari. By setting HTTP headers andnaming the file jsonz, an unknown file extension, this works across browsers.
12.8.5 Socorro UI
• Existing URL won’t change.
• Second JSON request back to server will load jsonz file
Example:
• http://crash-stats.mozilla.com/report/index/d92ebf79-9858-450d-9868-0fe042090211
• http://crash-stats.mozilla.com/dump/d92ebf79-9858-450d-9868-0fe042090211.jsonz
mod rewrite rules will match /dump/.jsonz and change them to access a file share.
12.8.6 Future Enhancement
A future enhancement if we find webheads are high CPU would be to move populating the report/index page to clientside.
92 Chapter 12. Development Discussions
Socorro Documentation, Release 2
12.8.7 Test Page
http://people.mozilla.org/~aking/Socorro/dumpingDump/json-test.html - Uses browser to decompress a gzip com-pressed JSON file during an AJAX request, pulls it apart and appends to the page.
Test file made with gzip dump.json
12.9 JSON Dump Storage
12.9.1 What this system offers
Crash data is stored so that it can be quickly located based on a Universally Unique Identifier (uuid) or visited by thedate and time when reported
12.9.2 Directory Structure
The crash files are located in a tree with two branches: the name or “index” branch and the date branch.
• The name branch consists of paths based on the first few pairs of characters of the uuid. Name branch holds the two data files and a relative symbolic link to the date branch directory associated with the particular uuid. For the uuid: 22adfb61-f75b-11dc-b6be-001321b0783d The “depth” is the number of sub-directories between the name directory and the actual file. By default, to conserve inodes, depth is two.
– By default, the json file is stored (depth 2) as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.json
– The json file could be stored (depth 4) as %(root)s/name/22/ad/fb/61/22adfb61-f75b-11dc-b6be-001321b0783d.json
– The dump file is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.dump
– The symbolic link is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d and(see below) references (own location)/%(toDateFromName)s/2008/09/30/12/05/webhead01_0/
• The date branch consists of paths based on the year, month, day, hour, minute-segment, webhead host name and a small sequence number. For each uuid, it holds a relative symbolic link referring to the actual name directory holding the data for that uuid. For the uuid above, submitted at 2008-09-30T12:05 from webhead01
– The symbolic link is stored as %(root)s/date/2008/09/30/12/05/webhead01_0/22adfb61-f75b-11dc-b6be-001321b0783d and references (own location)/%(toNameFromDate)s/22/ad/fb/61/
• Note (name layout) In the examples on this page, the name/index branch uses the first 4 characters of the uuidas two character-pairs naming subdirectories. This is a configurable setting called storageDepth in the Collectorconfiguration. To use the 8 characters, storageDepth is set to 4. To use 6 characters, set to 3. The defaultstorageDepth is 2 because on our system, with (approximately) 64K leaf directories, the number of files per leafis reasonable; and the number of inodes required by directory entries is not so large as to cause undue difficulty.A storageDepth of 4 was examined, and was found to crash the file system by requiring too many inodes.
• If the uuids are such that their initial few characters are well spread among all possibles, then the lookup canbe very quick. If the first few characters of the uuids are not well distributed, the resulting directories may bevery large. If, despite well chosen uuids, the leaf name directories become too large, it would be simple to addanother level, reducing the number of files by approximately a factor of 256; however bear in mind the issue ofinodes.
• Note (symbolic links) The symbolic links are relative rather than absolute, to avoid issues that might arise fromvariously mounted nfs volumes.
• Note (maxDirectoryEntries) If the number of links in a particular webhead subdirectory would exceed maxDi-rectoryEntries, then a new webhead directory is created by appending a larger _N : .../webhead01_0 first, then.../webhead01_1 etc. For the moment, maxDirectoryEntries is ignored for the name branch.
12.9. JSON Dump Storage 93
Socorro Documentation, Release 2
12.9.3 How it’s used
We use the file system storage for incoming dumps caught by Collector. There are two instances of the file systemused for different purposes: standard storage and deferred storage.
12.9.4 Standard Job Storage
This is where json/dump pairs are stored for further processing. The Monitor finds new dumps and queues them forprocessing. It does this by walking the date branch of the file system using the API function destructiveDateWalk. Asit moves through the date branch, it notes every uuid (in the form of a symbolic link) that it encounters. It queues theinformation from the symbolic link and then deletes the symbolic link. This insures that it only ever finds new entries.Later, the Processor will read the json/dump pair by doing a direct lookup of the uuid on the name branch.
In the case of priority processing, the target uuid is looked up directly on the name branch. Then the link to the datebranch is used to locate and delete the link on the date branch. This insures that a priority job is not found a secondtime as a new job by the Monitor.
12.9.5 Deferred Job Storage
This is where jobs go that are deferred by Monitor‘s throttling mechanism. If a json/dump pair is needed for priorityprocessing, it can be looked up directly on the name branch. In such a case, just as with priority jobs in standardstorage, we destroy the links between the two branches. However, in this case, destroying the links prevents thejson/dump pair from being deleted by the deferred cleanup process.
When it comes time to drop old json/dump pairs that are no longer needed within the deferred storage, the system isgiven a date threshold. It walks the appropriate parts of the date branch older than the threshold. It uses the links tothe name branch to blow away the elderly json/dump pairs.
12.9.6 class JsonDumpStorage
socorro.lib.JsonDumpStorage holds data and implements methods for creating and accessing crash files.
public methods
• __init__(self, root=".", maxDirectoryEntries=1024, **kwargs)
Take note of our root directory, maximum allowed date->name links per directory, some relative relations, andwhatever else we may need. Much of this (c|sh)ould be read from a config file.
Recognized keyword args:
– dateName. Default = ‘date’
– indexName. Default = ‘name’
– jsonSuffix. Default = ‘.json’. If not startswith(‘.’) then ‘.’ is prepended
– dumpSuffix. Default = ‘.dump’. If not startswith(‘.’) then ‘.’ is prepended
– dumpPermissions. Default 660
– dirPermissions. Default 770
– dumpGID. Default None. If None, then owned by the owner of the running script.
• newEntry (self, uuid, webheadHostName=’webhead01’, timestamp=DT.datetime.now())
Sets up the name and date storage for the given uuid.
94 Chapter 12. Development Discussions
Socorro Documentation, Release 2
– Creates any directories that it needs along the path to the appropriate storage location (possiblyadjusting ownership and mode)
– Creates two relative symbolic links:
* the date branch link pointing to the name directory holding the files;
* the name branch link pointing to the date branch directory holding that link.
– Returns a 2-tuple containing files open for writing: (jsonfile,dumpfile)
• getJson (self, uuid)
Returns an absolute pathname for the json file for a given uuid. Raises OSError if the file is missing
• getDump (self, uuid)
Returns an absolute pathname for the dump file for a given uuid. Raises OSError if the file is missing
• markAsSeen (self,uuid)
Removes the links associated with the two data files for this uuid, thus marking them as seen. Quietlyreturns if the uuid has no associated links.
• destructiveDateWalk (self)
This function is a generator that yields all(see note) uuids found by walking the date branch of thefile system.
Just before yielding a value, it deletes both the links (from date to name and from nameto date) After visiting all the uuids in a given date branch, recursively deletes any emptysubdirectories in the date branch Since the file system may be manipulated in a differentthread, if no .json or .dump file is found, the links are left, and we do not yield that uuidnote To avoid race conditions, does not visit the date subdirectory corresponding to thecurrent time
• remove (self, uuid)
Removes all instances of the uuid from the file system including the json file, the dump file, and thetwo links if they still exist.
– Ignores missing link, json and dump files: You may call it with bogus data, though of courseyou should not
• move (self, uuid, newAbsolutePath)
Moves the json file then the dump file to newAbsolutePath.
– Removes associated symbolic links if they still exist.
– Raises IOError if either the json or dump file for the uuid is not found, and retains any links, butdoes not roll back the json file if the dump file is not found.
• removeOlderThan (self, timestamp)
– Walks the date branch removing all entries strictly older than the timestamp.
– Removes the corresponding entries in the name branch.
member data
Most of the member data are set in the constructor, a few are constants, the rest are simple calculations based on theothers.
• root: The directory that holds both the date and index(name) subdirectories
12.9. JSON Dump Storage 95
Socorro Documentation, Release 2
• maxDirectoryEntries: The maximum number of links in each webhead directory on the date branch. Default =1024
• dateName: The name of the date branch subdirectory. Default = ‘date’
• indexName: The name of the index branch subdirectory. Default = ‘name’
• jsonSuffix: the suffix of the json crash file. Default = ‘.json’
• dumpSuffix: the suffix of the dump crash file. Default = ‘.dump’
• dateBranch: The full path to the date branch
• nameBranch: The full path to the index branch
• dumpPermissions: The permissions for the crash files. Default = 660
• dirPermissions: The permissions for the directories holding crash files. Default = 770
• dumpGID: The group ID for the directories and crash files. Default: Owned by the owner of the running script.
• toNameFromDate: The relative path from a leaf of the dateBranch to the nameBranch
• toDateFromName: The relative path from a leaf of the nameBranch to the dateBranch
• minutesPerSlot: How many minutes in each sub-hour slot. Default = 5
• slotRange: A precalculated range of slot edges = range(self.minutesPerSlot, 60, self.minutesPerSlot)
12.10 Processed Dump Storage
Processed dumps are stored in two places: the relational database as well as in flat files within a file system. Thisforking of the storage scheme came from the realization that the infrequently used data within the database ‘dumps’tables was causing performance problems within PostgreSQL. The ‘dumps’ tables took nearly eighty percent of thetotal storage, making replication and backup problematic. Since the ‘dumps’ table’s data is used only when a userrequests a specific crash dump by uuid, most of the data is rarely, if ever, accessed.
We decided to migrate these dump into a file system storage outside the database. Details can be seen at: DumpingDump Tables
In the file system, after processing, dumps are stored a gzip compressed JSON file format. This format echos aflattening of the ‘reports’, ‘extensions’ and the now deprecated ‘dumps’ tables within the database.
12.10.1 Directory Structure
Just as in the JsonDumpStorage scheme, there are two branches: ‘name’ and ‘date’
12.10.2 Access by Name
Most lookups of processed crash data happens by name. We use a radix storage technique where the first 4 characters ofthe file name are used for two levels of directory names. A file called aabbf9cb-395b-47e8-9600-4f20e2090331.jsonzwould be found in the file system as .../aa/bb/aabbf9cb-395b-47e8-9600-4f20e2090331.jsonz
12.10.3 Access by Date
For the purposes of finding crashes that happened at specific date and time, a hierarchy of date directories offer quicklookup. The leaves of the date directories contain symbolic links to the locations of crash data.
96 Chapter 12. Development Discussions
Socorro Documentation, Release 2
12.10.4 JSON File Format
example:
{"signature": "nsThread::ProcessNextEvent(int, int*)","uuid": "aabbf9cb-395b-47e8-9600-4f20e2090331","date_processed": "2009-03-31 14:45:09.215601","install_age": 100113,"uptime": 7,"last_crash": 95113,"product": "SomeProduct","version": "3.5.2","build_id": "20090223121634","branch": "1.9.1","os_name": "Mac OS X","os_version": "10.5.6 9G55","cpu_name": "x86","cpu_info": "GenuineIntel family 6 model 15 stepping 6","crash_reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS","crash_address": "0xe9b246","User Comments": "This thing crashed.\nHelp me Kirk.","app_notes": "","success": true,"truncated": false,"processor_notes": "","distributor":"","distributor_version": "","add-ons": [["{ABDE892B-13A8-4d1b-88E6-365A6E755758}", "1.0"], ["{b2e293ee-fd7e-4c71-a714-5f4750d8d7b7}", "2.2.0.9"], ["{972ce4c6-7e08-4474-a285-3208198ce6fd}", "3.5.2"]],"dump":"OS|Mac OS X|10.5.6 9G55\\nCPU|x86|GenuineIntel family 6 model 15 stepping 6|2\\nCrash|EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE|0x1558c095|0\\nModule|firefox-bin||firefox-bin|988FA8BFC789C4C07C32D61867BB42B60|0x00001000|0x00001fff|\\n....."}
The “dump” component is the direct streamed output from the Breakpad “minidump_stackwalk” program. Unfortu-nately, that project does not give detailed documentation of the format.
12.11 Report Database Design
12.11.1 Introduction
With the launch of [[MeanTimeBeforeFailure]] and Top Crashers By URL reports, we have added 8 new databasetables. The call into the following categories:
• configuration
– mtbfconfig
– tcbyurlconfig
• facts
– mtbffacts
– topcrashurlfacts
• dimensions
– productdims
– urldims
12.11. Report Database Design 97
Socorro Documentation, Release 2
– signaturedims
• relational
– topcrashurlfactsreports
What relational? Aren’t they all?
12.11.2 Star Schema
Taking inspiration from data warehousing, we implement the datastore with dimensional modeling instead of rela-tional modeling. The pattern used star schemas. Our implementation is a very lightweight approach as we don’tautomatically generate facts for every combination of dimensions. This is not a Pentaho competitor :)
Star schemas are optimized for:
• read only systems
• large amounts of data
• viewed from different levels of granularity
12.11.3 Pattern
The dimensions and facts are the heart of the pattern.
dimensions
Each dimension is property with various attributes and values at different levels of granularity. Example:
urldims - table would have columns:iddomainurl
Sample values
1. en-us.www.mozilla.com, ALL
2. http://en-us.www.mozilla.com/en-US/firefox/3.0.5/whatsnew/
3. en-us.www.mozilla.com, http://en-us.www.mozilla.com/en-US/firefox/features/
We see a dimension that describes the property “url”. This is useful for talking about crashes that happen on a specificurl. We also see two levels of granularity, a specific URL as well as all urls under a domain.
Dimensions give us ways to slice and dice aggregate crash data, then drill down or rollup this information.
Note: time could be a dimension ( and usually is in data warehouses ). For MTBF and Top Crash By URl we don’ttreat it as a 1st class dimension as their are no requirements to roll it up ( say to Q1 crashes, etc) and having it be acolumn in the facts table provides better performance.
facts
For a given report it will be powered by a main facts table.
Example:
topcrashurlfacts - table would have the columns:idcountrankday
98 Chapter 12. Development Discussions
Socorro Documentation, Release 2
productdims_idurldims_idsignaturedims_id
A top crashers by url fact has two key elements, an aggregate crash count and the rank respective to others facts. So ifwe have static values for all dimensions and day, then we can see who has the most crashes.
Reporting
The general pattern of creating a report is for a series of static and 1 or two variable dimensions, display the facts thatmeet this criteria.
12.12 Code and Database Update
12.12.1 Socorro Wish List
One of my (griswolf) directives is approximately “make everything work efficiently and the same.” Toward this end,there are several tasks:
Probably most important, we have an inefficient database design, and some inefficient code working with it.
Next, we have a collection of ‘one-off’ code (and database schemas) that could be more easily maintained using acommon infrastructure, common coding conventions, common schema layout, common patterns.
Finally, we have enhancement requests that would become more feasible after such changes: Such requests wouldbe more easily handled in a cleaner programming environment; and in a cleaner environment there might be fewersignificant bugs, leaving more time to work on enhancements.
Current state: See [[SocorroDatabaseSchema]]
12.12.2 Another Way to do Materialized Views?
The current system is somewhere between ad hoc reporting and a star architecture. The main part of this proposalfocuses on converting further toward a star architecture. However there may be another way: MapReduce techniques,which could possibly be run external to Mozilla (for instance: Amazon Web Services) could be used to mine dumpfiles and create statistical data stored in files or database. Lars mentioned to me that we now have some statistics folkon board who are interested in this.
12.12.3 Database Design
• There are some legacy tables (reports, topcrasher) that are not normalized. Other tables are partly normalized. Non-normal has consequences:
– Data is duplicated, causing possible synchronization issues.
* JOSH: duplicated data is normal for materialized views and is not a problem a priori.
– Data is duplicated, increasing size.
* JOSH: I don’t believe that the matview tables are that large, although we will want to look atpartitioning them in the future because they will continue to grow.
* FRANK: Lars points out that size-limiting partitions which reference each other must all bepartitioned on the same key. This makes partitions a little more interesting
12.12. Code and Database Update 99
Socorro Documentation, Release 2
– SELECT statements on multiple varchar fields, even when indexed, are probably slower than SELECTstatements on a single foreign key. (And even if not, maintaining larger index tables has a time andspace cost)
• There are legacy tables that contain deprecated columns, a slight inefficiency.
• In some cases, separable details are conflated, making it difficult to access by a single area of concern. Forinstance, the table that describes our products has an os_name column, requiring us to pretend we deal with anos named ‘ALL’ in order to examine product data without regard to os.
• According to Postgresql consultants, some types are not as efficient as others. Example TEXT (which we use only a little) is slightly more time-efficient than VARCHAR(n) (which we mostly use)
– JOSH: this is a minor issue, and should only be changed if we’re modifying the fields/tables anyway.
– FRANK: We have already run into a size limitation for signatures which are now VARCHAR(255).Experiment shows that conversion to TEXT is slow because of index rebuilding, but conversion toVARCHAR(BIGGER_NUMBER) can be done by manipulating typemod (the number of chars inVARCHAR) in the system tables. So change from VARCHAR to TEXT needs to be scheduled inadvance, with an expected ‘long’ turn around.
• Current indexes were carefully audited during PGExperts week. Schema changes will require careful reevalua-tion
12.12.4 Commonality
• Some of the tables that provide statistics (Mean Time Before Failure, for example) use a variant of the “Star” data warehousing pattern, which is well known and understood. Some do not. After discussion we have reached agreement that all should be partly ‘starred’
– osdims and productdims are appropriate dimension tables for each view that cares about operatingsystem or product
– url and signature ‘dimension’ tables are used to filter materialized views:
* the ‘fact’ tables for views will use ids from these filter/dimension tables
* the filter/dimension tables will hold only data that has passed a particular frequency threshold,initial guess at threshold: 3 per week.
• Python code has been written by a variety of people with various skill levels, doing things in a variety of ways.Mostly, this is acceptable, but required changes give us an opportunity.
• We now specify Python version 2.4, which is adequate. Possible to upgrade to 2.5.x or 2.6.x with both ease andsafety. This is an opportunity to do so. No code needs to change for this.
• New features (safely) available in Python 2.5:
– unified try/except/finally: instead of a try/finally block holding a try/except block
– there is a very nice with: syntax useful for block-scoped non GC’d resources such as open files (liketry: with an automatic finally: at block end)
– generators are significantly more powerful, which might have some uses in our code
– and lots more that seems less obviously useful to Socorro
– better exception hierarchy
• New features (safely) available in Python 2.6
– json library ships with Python 2.6
– multiprocessing library parallel to threading library ships with Python 2.6
100 Chapter 12. Development Discussions
Socorro Documentation, Release 2
– Command line option ‘-3’ flags things that will work differently or fail in Python 3 (looking ahead isgood)
• We use nosetests which is not correctly and fully functional in a Python 2.4 environment.
12.12.5 Viewable Interface
• We have been gradually providing a more useful view of the crash data. Sometimes this is intrinsically hard,sometimes it is made more difficult by our schema.
• We have requests for:
– Better linkage between crash reports and bugs
– Ability to view by OS and OS version, by signature, by product, by product version (some of this willbe easier with a new schema)
– Ability to view historical data, current data, (sliding) windows of data and trends
• Some of the requests seem likely to be too time or space costly. In some cases these might be feasible with amore efficient system
12.12.6 Consequences of Possible Changes
• (Only) Add new tables (two kinds of changes)
– “replace in place”, for instance add table reports_normal while leaving table reports in place)
– “brand new”, for instance add new productdims and osdims tables to serve a new tobcrashbysignaturetable
– Existing views are not impacted (for good or ill)
– Duplication of data (some tables near normal form, some not, etc) becomes worse than it now is
– No immediate need to migrate data: Options
* Maybe provide two views: “Historic” and “Current”
* Maybe write ‘orrible look-both-ways code to access both tables from single view
* Maybe migrate data
– Code that looks at old schema is (mostly?) unchanged
– Code that looks at new schema is opportunity for improved design, etc.
– Can do one thing at a time, with multiple ‘easy’ rollouts (each one is still a rollout, though)
– Long term goal: Stop using old tables and code
• (Only) Drop redundant or deprecated columns in existing tables:
– Existing views are no less useful, Viewer and Controller code will need some maintenance
– Data migration is ‘simple’
* beware that dropped columns may be part of a (foreign) key or index
– Data migration is needed at rollout
– Minimally useful
• Optimize database types, indexes, keys:
12.12. Code and Database Update 101
Socorro Documentation, Release 2
– Existing views are not much impacted
* May want to optimize queries in Viewer and Controller code
* May need to guard for field size or type in Controller code
– Details of changes are ‘picky’ and may need some hand holding by consultants, maybe testing.
• Normalize existing tables (while adding new tables as needed):
– Much existing code needs re-write
* With different Model comes a need for different Viewers and Controllers
* Opportunity to clarify old code
* Opportunity to optimize queries
– Data migration is needed at rollout
– Rollout is complex (but need only one for complete conversion)
– JOSH: in general, Matview generation should be optimized to be insert-only. In some cases, this willinvolve having a “current week” partition which gets dropped and recreated until the current week iscompleted. Updates are generally at least 4x as expensive as inserts.
12.12.7 Rough plan as of 2009 June
• Soon: Materialized views will make use of dimensions and ‘filtered dimensions’ tables
• Later: Normalize the ‘raw’ data to make use of tables describing operating system and product details. Leavesignatures and urls raw
12.12.8 Specific Database Changes
Star Data Warhousing
Existing tables
• –dimension: signaturedims: associate the base crash signature string with an id– Use signature TEXT directly
• dimension: productdims: associate a product, version, release and os_name with an id
– os_name is neither sufficient for os drill-down (which wants os_version) nor properly part of a productdimension
• dimension: urldims: associate (a large number of) domains and urls, each pair with an id
• config: mtbfconfig: specifies the date-interval during which a given product (productdims) is of interest forMTBF analysis
• config: tcbyurlconfig: specifies whether a particular product (productdims) is now of interest for Top Crash byURL analysis.
• fact: mtbffacts: collects daily summary of average time before failure for each product
• –report: topcrashurlfactsreports: associates a crash uuid and a comment with a row of topcrashurlfacts ?Appar-ently never used?–
Needed/Changed tables
Matview changes “Soon”
102 Chapter 12. Development Discussions
Socorro Documentation, Release 2
• config (new): product_visibility: Specifies date interval during which a product (productdims id) is of interestfor any view. ?Replaces mtbfconfig?
• dimension (new): osdims: associate an os name and os version with an id
• dimension (edit): productdims: remove the os_name column (replaced by another dimension osdims above)
• fact (replace): topcrashers: The table now in use to provide Top Crash by Signature view. Will be replaced bytopcrashfacts
• fact (new): topcrashfacts: collect periodic count of crashes, average uptime before crash and rank of each signature by signature, os, product
– replaces existing topcrashers table which is poorly organized for current needs
• config (new): tcbysignatureconfig: specify which products and operating systems are currently of interest fortcbysigfacts
• fact: (renamed, edit) top_crashes_by_url: collects daily summary of crashes by product, url (productdims,urldims)
• fact: (new): top_crashes_by_url_signature: associates a given row from top_crashes_by_url with one or moresignatures
Incoming (raw) changes “Later”
• details (new): osdetails, parallel to osdims, but on the incoming side will be implemented later
• details (new): productdetails, parallel to productdims, but on the incoming side will be implemented later
• reports: Holds details of each analyzed crash report. It is not in normal form, which causes some ongoing difficulty
– columns product, version, build should be replaced by productdetails foreign key later
– column signature LARS: NULL is a legal value here. We’ll have to make sure that we use left outerjoins to retrieve the report records.
– columns cpu_name, cpu_info are not currently in use in any other table, but could be a foreign keyinto cpudims
– columns os_name, os_version should be replaced by osdims foreign key
– columns email, user_id are deprecated and should be dropped
Details
New or significantly changed tables
New product_visibility table (soon, matview):
table product_visibility (id serial NOT NULL PRIMARY KEY,productdims_id integer not null,start_date timestamp, -- used by MTBFend_date timestamp,ignore boolean default False -- force aggregation off for this product id
New osdims table (soon, matview) NOTE: Data available only if ‘recently frequent’:
table osdims(id serial NOT NULL PRIMARY KEY,os_name TEXT NOT NULL,os_version TEXT);constraint osdims_key (os_name, os_version) unique (os_name, os_version);
12.12. Code and Database Update 103
Socorro Documentation, Release 2
Edited productdims table (soon, matview) NOTE: use case for adding products is under discussion:
CREATE TYPE release_enum AS ENUM (’major’, ’milestone’, ’development’);table productdims (id serial NOT NULL PRIMARY KEY,product TEXT NOT NULL,version TEXT NOT NULL,release release_enum NOT NULL,constraint productdims_key (product, version) unique ( product, version ));
New product_details table (later, raw data) NOTE: All data will be stored (raw data should not lose details):
table product_details (id serial NOT NULL PRIMARY KEY,product TEXT NOT NULL, -- /was/ character varying(30)version TEXT NOT NULL, -- /was/ character varying(16)release release_enum NOT NULL -- /was/ character varying(50) NOT NULL);
Edit mtbffacts to use edited productdims and new osdims (soon, matview):
table mtbffacts (id serial NOT NULL PRIMARY KEY,avg_seconds integer NOT NULL,report_count integer NOT NULL,window_end timestamp, -- was DATEproductdims_id integer,osdims_id integerconstraint mtbffacts_key unique ( productdims_id, osdims_id, day ););
New top_crashes_by_signature table (soon, matview):
table top_crashes_by_signature (id serial NOT NULL PRIMARY KEY,count integer NOT NULL DEFAULT 0,average_uptime real DEFAULT 0.0,window_end timestamp without time zone,window_size interval,productdims_id integer NOT NULL, -- foreign key. NOTE: Filtered by recent frequencyosdims_id integer NOT NULL, -- foreign key. NOTE: Filtered by recent frequencysignature TEXTconstraint top_crash_by_signature_key (window_end, signature, productdims_id, osdims_id) unique (window_end, signature, productdims_id, osdims_id));-- some INDEXes are surely needed --
New/Renamed top_crashes_by_url table (soon, matview):
table top_crashes_by_url (id serial NOT NULL,count integer NOT NULL,window_end timestamp without time zone NOT NULL,window_size interval not null,productdims_id integer,osdims_id integer NOT NULL,urldims_id integerconstraint top_crashes_by_url_key (uridims_id,osdims_id,productdims_id, window_end) unique (uridims_id,osdims_id,productdims_id, window_end));
New top_crashes_by_url_signature (soon, matview):
104 Chapter 12. Development Discussions
Socorro Documentation, Release 2
table top_crash_by_url_signature (top_crashes_by_url_id integer, -- foreign keycount integer NOT NULL,signature TEXT NOT NULLconstraint top_crashes_by_url_signature_key (top_crashes_by_url_id,signature) unique (top_crashes_by_url_id,signature));
New crash_reports table (later, raw view) Replaces reports table:
table crash_reports (id serial NOT NULL PRIMARY KEY,uuid TEXT NOT NULL -- /was/ character varying(50)client_crash_date timestamp with time zone,install_age integer,last_crash integer,uptime integer,cpu_name TEXT, -- /was/ character varying(100),cpu_info TEXT, -- /was/ character varying(100),reason TEXT, -- /was/ character varying(255),address TEXT, -- /was/ character varying(20),build_date timestamp without time zone,started_datetime timestamp without time zone,completed_datetime timestamp without time zone,date_processed timestamp without time zone,success boolean,truncated boolean,processor_notes TEXT,user_comments TEXT, -- /was/ character varying(1024),app_notes TEXT, -- /was/ character varying(1024),distributor TEXT, -- /was/ character varying(20),distributor_version TEXT, -- /was/ character varying(20)signature TEXT,productdims_id INTEGER, -- /new/ foreign key NOTE Filtered by recent frequencyosdims_id INTEGER, -- /new/ foreign key NOTE Filtered by recent frequencyurldims_id INTEGER -- /new/ foreign key NOTE Filtered by recent frequency-- /remove - see productdims_id/ - product character varying(30),-- /remove - see productdims_id/ version character varying(16),-- /remove - redundant with build_date/ -- build character varying(30),-- /remove - see urldims_id/ url character varying(255),-- /remove - see osdims_id/ os_name character varying(100),-- /remove - see osdims_id/ os_version character varying(100),-- /remove - deprecated/ email character varying(100),-- /remove - deprecated/ user_id character varying(50),);-- This is a partitioned table: INDEXes are provided on date-based partitions
Tables with Minor Changes: varchar->text:
table branches (product TEXT NOT NULL, -- /was/ character varying(30)version TEXT NOT NULL, -- /was/ character varying(16)branch TEXT NOT NULL, -- /was/ character varying(24)PRIMARY KEY (product, version)
table extensions (report_id integer NOT NULL, -- foreign keydate_processed timestamp without time zone,extension_key integer NOT NULL,extension_id TEXT NOT NULL, -- /was/ character varying(100)
12.12. Code and Database Update 105
Socorro Documentation, Release 2
extension_version TEXT -- /was/ character varying(16)
table frames (report_id integer NOT NULL,date_processed timestamp without time zone,frame_num INTEGER NOT NULL,signature TEXT -- /was/ varchar(255));
table priority_jobsuuid TEXT NOT NULL PRIMARY KEY -- /was/ varchar(255)
table processors (id serial NOT NULL PRIMARY KEY,name TEXT NOT NULL UNIQUE, -- /was/ varchar(255)startdatetime timestamp without time zone NOT NULL,lastseendatetime timestamp without time zone);
table jobs (id serial NOT NULL PRIMARY KEY,pathname TEXT NOT NULL, -- /was/ character varying(1024)uuid TEXT NOT NULL UNIQUE, -- /was/ varchar(50)owner integer,priority integer DEFAULT 0,queueddatetime timestamp without time zone,starteddatetime timestamp without time zone,completeddatetime timestamp without time zone,success boolean,message TEXT,FOREIGN KEY (owner) REFERENCES processors (id));
table urldims (id serial NOT NULL PRIMARY KEY,domain TEXT NOT NULL, -- /was/ character varying(255)url TEXT NOT NULL -- /was/ character varying(255)key url -- for drilling by urlkey domain -- for drilling by domain);
table topcrashurlfactsreports (id serial NOT NULL PRIMARY KEY,uuid TEXT NOT NULL, -- /was/ character varying(50)comments TEXT, -- /was/ character varying(500)topcrashurlfacts_id integer);
12.13 Out-of-Date Data Warning
While portions of this doc are still relevant and interesting for current socorro usage, be aware that it is extremely outof date when compared to current schema.
106 Chapter 12. Development Discussions
Socorro Documentation, Release 2
12.14 Database Schema
12.14.1 Introduction
Socorro is married to the PostgreSQL database: It makes use of a significant number of PostrgeSQL and psycopg2(python) features and extensions. Making a database-neutral API has been explored, and for now is not being pursued.
The tables can be divided into three major categories: crash data, aggregate reporting and process control.
12.14.2 crash data
12.14.3 reports
This table participates in DatabasePartitioning
Holds a lot of data about each crash report:
Table "reports"Column | Type | Modifiers | Description
---------------------+-----------------------------+-----------------+-------------id | integer | not null serial | unique idclient_crash_date | timestamp with time zone | | as reported by clientdate_processed | timestamp without time zone | | when entered into jobs tableuuid | character varying(50) | not null | unique tag for jobproduct | character varying(30) | | name of product ("Firefox")version | character varying(16) | | version of product("3.0.6")build | character varying(30) | | build of product ("2009041522")signature | character varying(255) | | signature of ’top’ frame of crashurl | character varying(255) | | associated with crashinstall_age | integer | | in seconds since installedlast_crash | integer | | in seconds since last crashuptime | integer | | in seconds since recent startcpu_name | character varying(100) | | as reported by client ("x86")cpu_info | character varying(100) | | as reported by client ("GenuineIntel family 15 model 4 stepping 1")reason | character varying(255) | | as reported by clientaddress | character varying(20) | | memory addressos_name | character varying(100) | | name of os ("Windows NT")os_version | character varying(100) | | version of os ("5.1.2600 Service Pack 3")email | character varying(100) | | -- deprecatedbuild_date | timestamp without time zone | | product build date (column build has same info, different format)user_id | character varying(50) | | -- deprecatedstarted_datetime | timestamp without time zone | | when processor starts processing reportcompleted_datetime | timestamp without time zone | | when processor finishes processing reportsuccess | boolean | | whether finish was goodtruncated | boolean | | whether some dump data was removedprocessor_notes | text | | error messages during monitor processing of reportuser_comments | character varying(1024) | | if any, by userapp_notes | character varying(1024) | | arbitrary, sent by client (exception detail, etc)distributor | character varying(20) | | future use: "Linux distro"distributor_version | character varying(20) | | future use: "Linux distro version"
Partitioned Child TableIndexes:
"reports_aDate_pkey" PRIMARY KEY, btree (id)"reports_aDate_unique_uuid" UNIQUE, btree (uuid)"reports_aDate_date_processed_key" btree (date_processed)
12.14. Database Schema 107
Socorro Documentation, Release 2
"reports_aDate_product_version_key" btree (product, version)"reports_aDate_signature_date_processed_key" btree (signature, date_processed)"reports_aDate_signature_key" btree (signature)"reports_aDate_url_key" btree (url)"reports_aDate_uuid_key" btree (uuid)
Check constraints:"reports_aDate_date_check" CHECK (aDate::timestamp without time zone <= date_processed AND date_processed < aDate+WEEK::timestamp without time zone)
Inherits: reports
12.14.4 dumps
This table is deprecated (dump data is stored in the file system) see [[DumpingDumpTables]] for more information.
12.14.5 branches
This table has been replaced by a view of productdims:
CREATE VIEW branches AS SELECT product,version,branch FROM productdims;
12.14.6 extensions
This table participates in [[DatabasePartitioning]].
Holds data about what extensions are associated with a given report:
Table "extensions"Column | Type | Modifiers | Description
------------------+-----------------------------+-----------+-------------report_id | integer | not null | in child: foreign key reference to child of table ’reports’date_processed | timestamp without time zone | | set to time when the row is insertedextension_key | integer | not null | the name of this extensionextension_id | character varying(100) | not null | the id of this extensionextension_version | character varying(30) | | the version of this extension
Partitioned Child TableIndexes:
"extensions_aDate_pkey" PRIMARY KEY, btree (report_id)"extensions_aDate_report_id_date_key" btree (report_id, date_processed)
Check constraints:"extensions_aDate_date_check" CHECK (’aDate’::timestamp without time zone <= date_processed AND date_processed < ’aDate+WEEK’::timestamp without time zone)
Foreign-key constraints:"extensions_aDate_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports_aDate(id) ON DELETE CASCADE
Inherits: extensions
12.14.7 frames
This table participates in [[DatabasePartitioning]]
Holds data about the frames in the dump associated with a particular report:
Table "frames"Column | Type | Modifiers | Description
----------------+-----------------------------+-----------+-------------report_id | integer | not null | in child: foreign key reference to child of table reports
108 Chapter 12. Development Discussions
Socorro Documentation, Release 2
date_processed | timestamp without time zone | | set to time when the row is inserted (?)frame_num | integer | not null | ordinal: one row per stack-frame per report, from 0=topsignature | character varying(255) | | signature as returned by minidump_stackwalk
Partitioned Child TableIndexes:
"frames_aDate_pkey" PRIMARY KEY, btree (report_id, frame_num)"frames_aDate_report_id_date_key" btree (report_id, date_processed)
Check constraints:"frames_aDate_date_check" CHECK (’aDate’::timestamp without time zone <= date_processed AND date_processed < ’aDate+WEEK’::timestamp without time zone)
Foreign-key constraints:"frames_aDate_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports_aDate(id) ON DELETE CASCADE
Inherits: frames
Aggregate Reporting===================
.. image:: SocorroSchema.Aggregate.20090722.png
12.14.8 productdims
Dimension table that describes the product, version, gecko version (‘branch’) and type of release. Note that the releasestring is completely determined by the version string: A version like ‘X.Y.Z’ is ‘major’. A version with suffix ‘pre’ is‘development’ and a version with ‘a’ or ‘b’ (alpha or beta) is ‘milestone’. Note: current version does not conflate osdetails (see osdims):
Table productdimsColumn | Type | Modifiers | Description
---------+--------------+-----------+-------------id | integer | (serial) |product | text | not null |version | text | not null |branch | text | not null | gecko versionrelease | release_enum | | ’major’, ’milestone’, ’development’
Indexes:"productdims_pkey1" PRIMARY KEY, btree (id)"productdims_product_version_key" UNIQUE, btree (product, version)"productdims_release_key" btree (release)
12.14.9 osdims
Dimension table that describes an operating system name and version. Because there are so many very similar Linuxversions, the data saved here is simplified which allows many different ‘detailed version’ Linuxen to share the samerow in this table.:
Table osdimsColumn | Type | Modifiers | Description
------------+------------------------+-----------+-------------id | integer | (serial) |os_name | character varying(100) | |os_version | character varying(100) | |
Indexes:"osdims_pkey" PRIMARY KEY, btree (id)"osdims_name_version_key" btree (os_name, os_version)
12.14. Database Schema 109
Socorro Documentation, Release 2
12.14.10 product_visibility
Specifies the date-interval during which a given product (productdims_id is the foreign key) is of interest for aggregateanalysis. MTBF obeys start_date, but calculates its own end date as 60 days later. Top crash by (url|signature) tablesobey both start_date and end_date. Column ignore is a boolean, default False, which allows a product version to bequickly turned off. Note: Supersedes mtbfconfig and tcbyurlconfig. (MTBF is not now in use):
Table product_visibilityColumn | Type | Modifiers | Description
----------------+-----------------------------+---------------+-------------productdims_id | integer | not null |start_date | timestamp without time zone | |end_date | timestamp without time zone | |ignore | boolean | default false |
Indexes:"product_visibility_pkey" PRIMARY KEY, btree (productdims_id)"product_visibility_end_date" btree (end_date)"product_visibility_start_date" btree (start_date)
Foreign-key constraints:"product_visibility_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE
12.14.11 time_before_failure
Collects daily summary of average (mean) time before failure for each product of interest without regard to specificsignature.:
Table time_before_failureColumn | Type | Modifiers | Description
--------------------+-----------------------------+------------+-------------id | integer | (serial) |sum_uptime_seconds | double precision | not null |report_count | integer | not null |productdims_id | integer | |osdims_id | integer | |window_end | timestamp without time zone | not null |window_size | interval | not null |
Indexes:"time_before_failure_pkey" PRIMARY KEY, btree (id)"time_before_failure_os_id_key" btree (osdims_id)"time_before_failure_product_id_key" btree (productdims_id)"time_before_failure_window_end_window_size_key" btree (window_end, window_size)
Foreign-key constraints:"time_before_failure_osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE"time_before_failure_productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE
12.14.12 top_crashes_by_signature
The “fact” table that associates signatures with crash statistics:
Table top_crashes_by_signatureColumn | Type | Modifiers | Description
----------------+-----------------------------+--------------------+-------------id | integer | (serial) |count | integer | not null default 0 |uptime | real | default 0.0 |signature | text | |
110 Chapter 12. Development Discussions
Socorro Documentation, Release 2
productdims_id | integer | |osdims_id | integer | |window_end | timestamp without time zone | not null |window_size | interval | not null |
Indexes:"top_crashes_by_signature_pkey" PRIMARY KEY, btree (id)"top_crashes_by_signature_osdims_key" btree (osdims_id)"top_crashes_by_signature_productdims_key" btree (productdims_id)"top_crashes_by_signature_signature_key" btree (signature)"top_crashes_by_signature_window_end_idx" btree (window_end DESC)
Foreign-key constraints:"osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE"productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE
12.14.13 urldims
A dimensions table that associates an url and its domain with a particular id.
For example, given full url http://www.whatever.com/some/path?foo=bar&goo=car
the domain is the host name: www.whatever.com
the url is everything before the query part: http://www.whatever.com/some/path:
Table "urldims"Column | Type | Modifiers | Description
--------+------------------------+-----------------+-------------id | integer | not null serial | unique iddomain | character varying(255) | not null | the hostnameurl | character varying(255) | not null | the url up to query
Indexes:"urldims_pkey" PRIMARY KEY, btree (id)"urldims_url_domain_key" UNIQUE, btree (url, domain)
12.14.14 top_crashes_by_url
The “fact” table that associates urls with crash statistics:
Table top_crashes_by_urlColumn | Type | Modifiers | Description
----------------+-----------------------------+-----------+-------------id | integer | (serial) |count | integer | not null |urldims_id | integer | |productdims_id | integer | |osdims_id | integer | |window_end | timestamp without time zone | not null |window_size | interval | not null |
Indexes:"top_crashes_by_url_pkey" PRIMARY KEY, btree (id)"top_crashes_by_url_count_key" btree (count)"top_crashes_by_url_osdims_key" btree (osdims_id)"top_crashes_by_url_productdims_key" btree (productdims_id)"top_crashes_by_url_urldims_key" btree (urldims_id)"top_crashes_by_url_window_end_window_size_key" btree (window_end, window_size)
Foreign-key constraints:
12.14. Database Schema 111
Socorro Documentation, Release 2
"top_crashes_by_url_osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE"top_crashes_by_url_productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE"top_crashes_by_url_urldims_id_fkey" FOREIGN KEY (urldims_id) REFERENCES urldims(id) ON DELETE CASCADE
12.14.15 top_crashes_by_url_signature
Associates count of each signature with a row in top_crashes_by_url table:
Table top_crashes_by_url_signatureColumn | Type | Modifiers | Description
-----------------------+---------+-----------+-------------top_crashes_by_url_id | integer | not null |signature | text | not null |count | integer | not null |
Indexes:"top_crashes_by_url_signature_pkey" PRIMARY KEY, btree (top_crashes_by_url_id, signature)
Foreign-key constraints:"top_crashes_by_url_signature_fkey" FOREIGN KEY (top_crashes_by_url_id) REFERENCES top_crashes_by_url(id) ON DELETE CASCADE
12.14.16 topcrashurlfactsreports
Associates a job uuid with comments and a row in the topcrashurlfacts table.:
Table "topcrashurlfactsreports"Column | Type | Modifiers | Description
---------------------+------------------------+-----------------+-------------id | integer | not null serial | unique iduuid | character varying(50) | not null | job uuid stringcomments | character varying(500) | | ?programmer provided?topcrashurlfacts_id | integer | | crash statistics for a product,os,url,signature and day
Indexes:"topcrashurlfactsreports_pkey" PRIMARY KEY, btree (id)"topcrashurlfactsreports_topcrashurlfacts_id_key" btree (topcrashurlfacts_id)
Foreign-key constraints:"topcrashurlfactsreports_topcrashurlfacts_id_fkey" FOREIGN KEY (topcrashurlfacts_id) REFERENCES topcrashurlfacts(id) ON DELETE CASCADE
12.14.17 alexa_topsites
Stores a weekly dump of the top 1,000 sites as measured by Alexa (csv):
Table "public.alexa_topsites"Column | Type | Modifiers
--------------+-----------------------------+------------------------domain | text | not nullrank | integer | default 10000last_updated | timestamp without time zone | not null default now()
Indexes:"alexa_topsites_pkey" PRIMARY KEY, btree (domain)
112 Chapter 12. Development Discussions
Socorro Documentation, Release 2
12.15 Package
The applications that run the Server are written in Python. The source code for these packages is collected into a singlepackage.
There is no current installation script for this package. It just must be available somewhere on the PYTHONPATH.
12.15.1 Package Layout
• .../scripts : for socorro applications
• .../scripts/config : configuration for socorro applications
• .../socorro : python package root
• .../socorro/collector : modules used by the collector application
• .../socorro/cron : modules used by various applications intended to run by cron
• .../socorro/database : modules associated with the relational database
• .../socorro/deferredcleanup : modules used by the deferred file system cleanup script
• .../socorro/integrationtest : for future use
• .../socorro/lib : common modules used throughout the system
• .../socorro/monitor : modules used by the monitor application
• .../socorro/processor : modules used by the processor application
• .../socorro/unittest : testing framework modules
12.16 Schema
(See bottom of page for inline graphic)
12.17 Tables used primarily when processing Jobs
Reports (Partitioned)
Reports table contains the ‘cooked’ data received from breakpad and abstracted. Data from this table is further trans-formed into ‘materialized views’ (see below). Reports is unchanged from prior version.:
CREATE TABLE reports (id serial NOT NULL PRIMARY KEY,client_crash_date timestamp with time zone,date_processed timestamp without time zone,uuid character varying(50) NOT NULL UNIQUE,product character varying(30),version character varying(16),build character varying(30),signature character varying(255),url character varying(255),install_age integer,last_crash integer,uptime integer,
12.15. Package 113
Socorro Documentation, Release 2
cpu_name character varying(100),cpu_info character varying(100),reason character varying(255),address character varying(20),os_name character varying(100),os_version character varying(100),email character varying(100), -- Now always NULL or emptybuild_date timestamp without time zone,user_id character varying(50), -- Now always NULL or emptystarted_datetime timestamp without time zone,completed_datetime timestamp without time zone,success boolean,truncated boolean,processor_notes text,user_comments character varying(1024),app_notes character varying(1024),distributor character varying(20),distributor_version character varying(20)
);Indices are on child/partition tables, not base tableindex: date_processedindex: uuidindex: signatureindex: urlindex: (product,version)index: (uuid, date_processed)index: (signature, date_processed)
Processors
Processors table keeps track of the current state of the processor that pull things out of the file system and into thereports database. Processors is unchanged from prior version.:
CREATE TABLE processors (id serial NOT NULL PRIMARY KEY,name varchar(255) NOT NULL UNIQUE,startdatetime timestamp without time zone NOT NULL,lastseendatetime timestamp without time zone
);
Jobs
Jobs table holds data about jobs that are queued for the processors to handle. Jobs is unchanged from prior version.:
CREATE TABLE jobs (id serial NOT NULL PRIMARY KEY,pathname character varying(1024) NOT NULL,uuid varchar(50) NOT NULL UNIQUE,owner integer,priority integer DEFAULT 0,queueddatetime timestamp without time zone,starteddatetime timestamp without time zone,completeddatetime timestamp without time zone,success boolean,message text,FOREIGN KEY (owner) REFERENCES processors (id) on delete cascade
);index: ownerindex: (owner, starteddatetime)
114 Chapter 12. Development Discussions
Socorro Documentation, Release 2
index (completeddatetime, priority DESC)
Priority Jobs
Priority Jobs table is used to mark rows in the jobs table that need to be processed soon. Priority Jobs is unchangedfrom prior versions.:
CREATE TABLE priortyjobs (uuid varchar(255) NOT NULL PRIMARY KEY
);
12.18 Tables primarily used during data extraction
Branches
Branches table associates a product and version with with the gecko version (called ‘branch’):
CREATE TABLE branches (product character varying(30) NOT NULL,version character varying(16) NOT NULL,branch character varying(24) NOT NULL
);
Extensions (Partitioned)
Extensions table associates a report with the extensions on the crashing application. Extensions is unchanged fromprior version. (Not now in use):
CREATE TABLE extensions (report_id integer NOT NULL, -- Foreign key references parallel reports partition(id)date_processed timestamp without time zone,extension_key integer NOT NULL,extension_id character varying(100) NOT NULL,extension_version character varying(16),FOREIGN KEY (report_id) REFERENCES reports_<partition>(id) on delete cascade
);Index is on child/partition tables, not base tableindex: (report_id,date_processed)
Frames (Partitioned)
Frames table associates a report with the stack frames and their signatures that were seen in the crashing application.Frames is unchanged from prior version.:
CREATE TABLE frames (report_id integer NOT NULL,date_processed timestamp without time zone,frame_num integer NOT NULL,signature varchar(255)FOREIGN KEY (report_id) REFERENCES reports_<partition>(id) on delete cascade
);Index is on child/partition tables, not base tableindex: (report_id,date_processed)
Plugins
Electrolysis support for out of process plugin crashes:
12.18. Tables primarily used during data extraction 115
Socorro Documentation, Release 2
CREATE TABLE plugins(
id serial NOT NULL PRIMARY KEY,filename TEXT NOT NULL,name TEXT NOT NULL,CONSTRAINT filename_name_key UNIQUE (filename, name)
)
Plugins_Reports? (Partitioned)
Records oopp details. a report has 0 or 1 entry in this table.:
CREATE TABLE plugins_reports(
report_id INTEGER NOT NULL,plugin_id INTEGER NOT NULL,date_processed TIMESTAMP WITHOUT TIME ZONE,version TEXT NOT NULL
)
Indices are on child/partition tables, not base table. Setup via schema.py Example for plugins_reports_20100125:
PRIMARY KEY (report_id, plugin_id),CONSTRAINT plugins_reports_20100125_report_id_fkey FOREIGN KEY (report_id) REFERENCES reports_20100125 (id) ON DELETE CASCADE,CONSTRAINT plugins_reports_20100125_plugin_id_fkey FOREIGN KEY (plugin_id) REFERENCES plugins (id) ON DELETE CASCADE,CONSTRAINT plugins_reports_20100125_date_check CHECK ((’2010-01-25 00:00:00’::TIMESTAMP without TIME zone <= date_processed) AND ( date_processed < ’2010-02-01 00:00:00’::TIMESTAMP without TIME zone)
12.19 Tables primarily used for materialized views
product visibility
Product visibility controls which products are subject to having data aggregated into the various materialized views.Replaces mtbfconfig, tcbyurlconfig:
CREATE TABLE product_visibility (productdims_id integer NOT NULL PRIMARY KEY,start_date timestamp, -- set this manually for all mat viewsend_date timestamp, -- set this manually: Used by mat views that careignore boolean default False, -- force aggregation off for this product idFOREIGN KEY (productdims_id) REFERENCES productdims(id)
);index: end_dateindex: start_date
12.20 Dimensions tables
signaturedims
Signature dims was a table associating signature with id, no longer used. Instead, signatures are stored directly in theplaces that need them.
productdims
Product dims associates a product, version and release key. An enum is used for the release key. Product dims haschanged from prior version by dropping the os_name column, which has been promoted into its own osdims table.:
116 Chapter 12. Development Discussions
Socorro Documentation, Release 2
CREATE TYPE release_enum AS ENUM (’major’, ’milestone’, ’development’);"
CREATE TABLE productdims (id serial NOT NULL PRIMARY KEY,product TEXT NOT NULL, -- varchar(30)version TEXT NOT NULL, -- varchar(16)release release_enum -- ’major’:x.y.z..., ’milestone’:x.ypre, ’development’:x.y[ab]z
);unique index: (product,version)index: release
osdims
OS dims associates an os name and version. Promoted from earlier versions where os_name was stored directly in‘facts’ tables.:
CREATE TABLE osdims (id serial NOT NULL PRIMARY KEY,os_name CHARACTER VARYING(100) NOT NULL,os_version CHARACTER VARYING(100)
);index: (os_name,os_version)
urldims
URL dims associates a domain and a simplified url. URL dims is unchanged from prior version.:
CREATE TABLE urldims (id serial NOT NULL,domain character varying(255) NOT NULL,url character varying(255) NOT NULL
);unique index: (url,domain)
12.21 View tables
View tables now have a uniform layout:
• id: The unique id for this row
• aggregated data: As appropriate for the view
• keys: One or more of signature, urldims id, productdims id, osdims id
• window_end: Used to keep track of most recently aggregated row
• window_size: Used redundantly in case aggregation window changes
time before failure
Aggregate the amount of time the app ran from startup to fail, and from prior fail to current fail. Replaces mtbffactstable.:
CREATE TABLE time_before_failure (id serial NOT NULL PRIMARY KEY,sum_uptime_seconds integer NOT NULL,report_count integer NOT NULL,productdims_id integer,osdims_id integer,window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,
12.21. View tables 117
Socorro Documentation, Release 2
window_size INTERVAL NOT NULL,FOREIGN KEY (productdims_id) REFERENCES productdims(id),FOREIGN KEY (osdims_id) REFERENCES osdims(id)
);index: (window_end,window_size)index: productdims_idindex: osdims_id
top crashes by signature
Aggregate the number of crashes per unit of time associated with a particular stack signature. Replaces topcrasherstable.:
CREATE TABLE top_crashes_by_signature (id serial NOT NULL PRIMARY KEY,count integer NOT NULL DEFAULT 0,uptime real DEFAULT 0.0,signature TEXT,productdims_id integer,osdims_id integer,window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,window_size INTERVAL NOT NULL,FOREIGN KEY (productdims_id) REFERENCES productdims(id),FOREIGN KEY (osdims_id) REFERENCES osdims(id)
);index: productdims_idindex: osdims_idindex: signatureindex: (window_end,window_size)
top crashes by url
Aggregate the number of crashes associated with a particular URL. Replaces topcrashurlfacts table.:
CREATE TABLE top_crashes_by_url (id serial NOT NULL PRIMARY KEY,count integer NOT NULL,urldims_id integer,productdims_id integer,osdims_id integer,window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,window_size INTERVAL NOT NULL,FOREIGN KEY (urldims_id) REFERENCES urldims(id)FOREIGN KEY (productdims_id) REFERENCES productdims(id),FOREIGN KEY (osdims_id) REFERENCES osdims(id)
);index: countindex: urldims_idindex: productdims_idindex: osdims_idindex: (window_end,window_size)
top crashes by url signature
Associate top crashes by url with their signature(s). Promoted from prior topcrashurlfacts where signaturedims id wasstored directly. Use of this table allows multiple signatures to be associated with the same crashing url.:
CREATE TABLE top_crashes_by_url_signature (top_crashes_by_url_id integer NOT NULL, -- foreign keysignature TEXT NOT NULL,
118 Chapter 12. Development Discussions
Socorro Documentation, Release 2
count integer NOT NULL,FOREIGN KEY (top_crashes_by_url_id) REFERENCES crashes_by_url(id)
);primary key: (top_crashes_by_url_id,signature)
top crash url facts reports
Associate a crash uuid and comment with a particular top crash by url row. This table’s schema is unchanged fromprior version, but the topcrashurlfacts_id column is re-purposed to map to the new top_crashes_by_url table.:
CREATE TABLE topcrashurlfactsreports (id serial NOT NULL PRIMARY KEY,uuid character varying(50) NOT NULL,comments character varying(500),topcrashurlfacts_id integerFOREIGN KEY (topcrashurlfacts_id) REFERENCES top_crashes_by_url(id)
);index: topcrashurlfacts_id
12.22 Bug tracking
bugs
Periodically extract new and changed items from the bug tracking database. Bugs is recently added.:
CREATE TABLE bugs (id int NOT NULL PRIMARY KEY,status text,resolution text,short_desc text
);
bug associations
Associate signatures with bug ids. Bug Associations is recently added.:
CREATE TABLE bug_associations (signature text NOT NULL,bug_id int NOT NULL,FOREIGN KEY (bug_id) REFERENCES bugs(id)
);primary key: (signature, bug_id)index: bug_id
Nightly Builds
Stores nightly builds in Postgres.:
CREATE TABLE builds (product text,version text,platform text,buildid BIGINT,changeset text,filename text,date timestamp without time zone default now(),CONSTRAINT builds_key UNIQUE (product, version, platform, buildid)
);
12.22. Bug tracking 119
Socorro Documentation, Release 2
12.23 Meta data
Server status
Server Status table keeps track of the current status of jobs processors. Server status is unchanged from prior version.:
CREATE TABLE server_status (id serial NOT NULL PRIMARY KEY,date_recently_completed timestamp without time zone,date_oldest_job_queued timestamp without time zone,avg_process_sec real,avg_wait_sec real,waiting_job_count integer NOT NULL,processors_count integer NOT NULL,date_created timestamp without time zone NOT NULL
);index: (date_created,id)
12.24 Database Setup
This app is under development. For progress information see: Bugzilla 454438
This is an application that will set up the PostgreSQL database schema for Socorro. It starts with an empty databaseand creates all the tables, indexes, constraints, stored procedures and triggers needed to run a Socorro instance.
Before this application can be run, however, there have been set up a regular user that will be used for the day to dayoperations. While it is not recommended that the regular user have the full set of super user privileges, the regular usermust be privileged enough to create tables within the database.
Before the application that sets up the database can be run, the Common Config must be set up. The configuration filefor this app itself is outlined at the end of this page.
12.24.1 Running the setupDatabase app
.../scripts/setupDatabase.py
12.24.2 Configuring setupDatabase app
This application relies on its own configuration file as well as the common configuration file Common Config.
copy the .../scripts/config/setupdatabaseconfig.py.dist file to.../scripts/config/setupdatabase.py and edit the file to make site specific changes.
logFilePathname
120 Chapter 12. Development Discussions
Socorro Documentation, Release 2
Monitor can log its actions to a set of automatically rotating log files. This is the name and location of the logs.:
logFilePathname = cm.Option()logFilePathname.doc = ’full pathname for the log file’logFilePathname.default = ’./monitor.log’
logFileMaximumSize
This is the maximum size in bytes allowed for a log file. Once this number is achieved, the logs rotate and a new logis started.:
logFileMaximumSize = cm.Option()logFileMaximumSize.doc = ’maximum size in bytes of the log file’logFileMaximumSize.default = 1000000
logFileMaximumBackupHistory
The maximum number of log files to keep.:
logFileMaximumBackupHistory = cm.Option()logFileMaximumBackupHistory.doc = ’maximum number of log files to keep’logFileMaximumBackupHistory.default = 50
logFileLineFormatString
A Python format string that controls the format of individual lines in the logs:
logFileLineFormatString = cm.Option()logFileLineFormatString.doc = ’python logging system format for log file entries’logFileLineFormatString.default = ’%(asctime)s %(levelname)s - %(message)s’
logFileErrorLoggingLevel
Logging is done in severity levels - the lower the number, the more verbose the logs.:
logFileErrorLoggingLevel = cm.Option()logFileErrorLoggingLevel.doc = ’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’logFileErrorLoggingLevel.default = 10
stderrLineFormatString
In parallel with creating log files, Monitor can log to stderr. This is a Python format string that controls the format ofindividual lines sent to stderr.:
stderrLineFormatString = cm.Option()stderrLineFormatString.doc = ’python logging system format for logging to stderr’stderrLineFormatString.default = ’%(asctime)s %(levelname)s - %(message)s’
stderrErrorLoggingLevel
Logging to stderr is done in severity levels independently from the log file severity levels - the lower the number, themore verbose the output to stderr.:
stderrErrorLoggingLevel = cm.Option()stderrErrorLoggingLevel.doc = ’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’stderrErrorLoggingLevel.default = 40
12.25 Common Config
To avoid repetition between configurations of a half dozen independently running applications, common settings areconsolidated in a common configuration file: OB.../scripts/config/commonconfig.py.dist.
12.25. Common Config 121
Socorro Documentation, Release 2
All Socorro applications have these constants available to them. For a Socorro applications that are command linedriven, each of these default values can be overidden by a command line switch of the same name.
To setup this configuration file, just copy the example, .../scripts/config/commonconfig.py.dist to.../scripts/config/commonconfig.py.
Edit the file for your local situation.:
import socorro.lib.ConfigurationManager as cmimport datetimeimport stat
#---------------------------------------------------------------------------# Relational Database Section
databaseHost = cm.Option()databaseHost.doc = ’the hostname of the database servers’databaseHost.default = ’localhost’
databasePort = cm.Option()databasePort.doc = ’the port of the database on the host’databasePort.default = 5432
databaseName = cm.Option()databaseName.doc = ’the name of the database within the server’databaseName.default = ’’
databaseUserName = cm.Option()databaseUserName.doc = ’the user name for the database servers’databaseUserName.default = ’’
databasePassword = cm.Option()databasePassword.doc = ’the password for the database user’databasePassword.default = ’’
#---------------------------------------------------------------------------# Crash storage system
jsonFileSuffix = cm.Option()jsonFileSuffix.doc = ’the suffix used to identify a json file’jsonFileSuffix.default = ’.json’
dumpFileSuffix = cm.Option()dumpFileSuffix.doc = ’the suffix used to identify a dump file’dumpFileSuffix.default = ’.dump’
#---------------------------------------------------------------------------# HBase storage system
hbaseHost = cm.Option()hbaseHost.doc = ’Hostname for hbase hadoop cluster. May be a VIP or load balancer’hbaseHost.default = ’localhost’
hbasePort = cm.Option()hbasePort.doc = ’hbase port number’hbasePort.default = 9090
hbaseTimeout = cm.Option()hbaseTimeout.doc = ’timeout in milliseconds for an HBase connection’
122 Chapter 12. Development Discussions
Socorro Documentation, Release 2
hbaseTimeout.default = 5000
#---------------------------------------------------------------------------# misc
processorCheckInTime = cm.Option()processorCheckInTime.doc = ’the time after which a processor is considered dead (hh:mm:ss)’processorCheckInTime.default = "00:05:00"processorCheckInTime.fromStringConverter = lambda x: str(cm.timeDeltaConverter(x))
startWindow = cm.Option()startWindow.doc = ’The start of the single aggregation window (YYYY-MM-DD [hh:mm:ss])’startWindow.fromStringConverter = cm.dateTimeConverter
deltaWindow = cm.Option()deltaWindow.doc = ’The length of the single aggregation window ([dd:]hh:mm:ss)’deltaWindow.fromStringConverter = cm.timeDeltaConverter
defaultDeltaWindow = cm.Option()defaultDeltaWindow.doc = ’The length of the single aggregation window ([dd:]hh:mm:ss)’defaultDeltaWindow.fromStringConverter = cm.timeDeltaConverter
# override this default for your particular cron taskdefaultDeltaWindow.default = ’00:12:00’
endWindow = cm.Option()endWindow.doc = ’The end of the single aggregation window (YYYY-MM-DD [hh:mm:ss])’endWindow.fromStringConverter = cm.dateTimeConverter
startDate = cm.Option()startDate.doc = ’The start of the overall/outer aggregation window (YYYY-MM-DD [hh:mm])’startDate.fromStringConverter = cm.dateTimeConverter
deltaDate = cm.Option()deltaDate.doc = ’The length of the overall/outer aggregation window ([dd:]hh:mm:ss)’deltaDate.fromStringConverter = cm.timeDeltaConverter
initialDeltaDate = cm.Option()initialDeltaDate.doc = ’The length of the overall/outer aggregation window ([dd:]hh:mm:ss)’initialDeltaDate.fromStringConverter = cm.timeDeltaConverter
# override this default for your particular cron taskinitialDeltaDate.default = ’4:00:00:00’
minutesPerSlot = cm.Option()minutesPerSlot.doc = ’how many minutes per leaf directory in the date storage branch’minutesPerSlot.default = 1
endDate = cm.Option()endDate.doc = ’The end of the overall/outer aggregation window (YYYY-MM-DD [hh:mm:ss])’endDate.fromStringConverter = cm.dateTimeConverter
debug = cm.Option()debug.doc = ’do debug output and routines’debug.default = Falsedebug.singleCharacter = ’D’debug.fromStringConverter = cm.booleanConverter
12.25. Common Config 123
Socorro Documentation, Release 2
12.26 Populate ElasticSearch
12.26.1 Install ElasticSearch
First you need to install ElasticSearch. The procedure is well described in this tutorial: Setting up elasticsearch. Don’tbother configuring ES if you don’t know you will need it, it generally works just fine out of the box.
Note: ElasticSearch is not yet included in our Vagrant dev VMs but should be sometime soon.
12.26.2 Increase open files limit
ElasticSearch needs to open a lot of files when indexing, often reaching the limits imposed by UNIX systems. Toavoid errors when indexing, you will have to increase the limits imposed by your OS.
First see what user is running ElasticSearch. It may be root or vagrant. Use top for example and look for anelasticsearch-l process. Then edit /etc/security/limits.conf and add at the end the following:
root soft nofile 4096root hard nofile 10240
Replace root with vagrant (or whatever user is running ES) if needed, save and restart your VM.
You will also need to increase the system-wide file descriptors limit by editing /etc/sysctl.conf and adding atthe end:
fs.file-max = 100000
After you saved and closed the file, run sysctl -p, then cat /proc/sys/fs/file-max to verify it worked.No restart is required here.
Note: I am not sure whether restarting the VM is necessary, or if ElasticSearch only is needed. Don’t hesitate to makethis more precise with the result of your experiments.
Source: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
12.26.3 Download the dump
You can get a recent dump for ElasticSearch in http://people.mozilla.org/~agaudebert/socorro/es-dumps/.
You will also need to get the mapping of our Socorro indexes: http://people.mozilla.org/~agaudebert/socorro/es-dumps/mapping.json
12.26.4 Run the script
The script to import crashes into ElasticSearch is not yet merged into our official repository. To get it, you will needto fetch github.com/AdrianGaudebert/socorro and checkout branch 696722-script-import-es:
git remote add AdrianGaudebert https://github.com/AdrianGaudebert/socorro.gitgit fetch AdrianGaudebertgit branch --track 696722-script-import-es AdrianGaudebert/696722-script-import-esgit checkout 696722-script-import-es
Before you can run the script, you will have to stop supervisord:
sudo /etc/init.d/supervisor force-stop
124 Chapter 12. Development Discussions
Socorro Documentation, Release 2
The script is called movecrashes.py and is in .../scripts/. It has a few dependencies over Socorro and thusneeds to be ran from the root of a Socorro directory with $PYTHONPATH = .:thirdparty. Use it as follow:
python scripts/movecrashers.py import /path/to/dump.tar /path/to/mapping.json
This will simply import all crash reports contained in the dump into ElasticSearch, without cleaning anything before.If you want to have more data than available in the dump, you can just run that import again and create duplicates.
If you want to clean the old socorro data first, just run rebuild instead of import:
python scripts/movecrashers.py rebuild /path/to/dump.tar /path/to/mapping.json
Note that this will only delete indexes called socorro_xxxxxx. If you’re using a shared ES instance, or have otherindexes you want to keep, there is no risk they get deleted in this process.
12.26. Populate ElasticSearch 125
Socorro Documentation, Release 2
126 Chapter 12. Development Discussions
CHAPTER 13
PostgreSQL Database
13.1 PostgreSQL Database Tables by Data Source
Last updated: 2011-01-15
This document breaks down the tables in the Socorro PostgreSQL database by where their data comes from, ratherthan by what the table contains. This is a prerequisite to populating a brand-new socorro database or creating synthetictesting workloads.
13.2 Manually Populated Tables
The following tables have no code to populate them automatically. Initial population and any updating need to be doneby hand. Generally there’s no UI, either; use queries.
• daily_crash_codes
• os_name_matches
• os_names
• process_types
• product_release_channels
• products
• release_channel_matches
• release_channels
• uptime_levels
• windows_versions
• product_productid_map
• report_partition_info
13.3 Tables Receiving External Data
These tables actually get inserted into by various external utilities. This is most of our “incoming” data.
bugs list of bugs, populated by bugzilla-scraper
127
Socorro Documentation, Release 2
extensions populated by processors
plugins_reports populated by processors
raw_adu populated by daily batch job from metrics
releases_raw populated by daily FTP-scraper
reports populated by processors
13.4 Automatically Populated Reference Tables
Lookup lists and dimension tables, populated by cron jobs and/or processors based on the above tables. Most areannotated with the job or process which populates them. Where the populating process is marked with an @, thatindicates a job which is due to be phased out.
addresses cron job, part of update_reports_clean based on reports
domains cron job, part of update_reports_clean based on reports
flash_versions cron job, part of update_reports_clean based on reports
os_versions cron job, update_os_versions based on reports@ cron job, update_reports_clean based on reports
plugins populated by processors based on crash data
product_version_builds cron job, update_product_versions, based on releases_raw
product_versions cron job, update_product_versions, based on releases_raw
reasons cron job, update_reports_clean, based on reports
reports_bad cron job, update_reports_clean, based on reports future cron job to delete data from this table
signatures cron job, update_signatures, based on reports@ cron job, update_reports_clean, based on reports
13.5 Matviews
Reporting tables, designed to be called directly by the mware/UI/reports. Populated by cron job batch. Where popu-lating functions are marked with a @, they are due to be replaced with new jobs.
bug_associations not sure
daily_crashes daily_crashes based on reports
daily_hangs update_hang_report based on reports
os_signature_counts update_os_signature_counts based on reports
product_adu daily_adu based on raw_adu
product_signature_counts update_product_signature_counts based on reports
reports_clean update_reports_clean based on reports
reports_user_info update_reports_clean based on reports
reports_duplicates find_reports_duplicates based don reports
signature_bugs_rollup not sure
signature_first@ update_signatures based on reports@
signature_products update_signatures based on reports@
128 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
signature_products_rollup update_signatures based on reports@
tcbs update_tcbs based on reports
uptime_signature_counts update_uptime_signature_counts based on reports
13.6 Application Management Tables
These tables are used by various parts of the application to do other things than reporting. They are populated/managedby those applications.
• email campaign tables
– email_campaigns
– email_campaigns_contacts
– email_contacts
• processor management tables
– jobs
– priorityjobs
– priority_jobs_*
– processors
– server_status
• UI management tables
– sessions
• monitoring tables
– replication_test
• cronjob and database management
– cronjobs
– report_partition_info
13.7 Deprecated Tables
These tables are supporting functionality which is scheduled to be removed over the next few versions of Socorro. Assuch, we are ignoring them.
• alexa_topsites
• builds
• frames
• osdims
• priorityjobs_log
• priorityjobs_logging_switch
• product_visibility
13.6. Application Management Tables 129
Socorro Documentation, Release 2
• productdims
• productdims_version_sort
• release_build_type_map
• signature_build
• signature_productdims
• top_crashes_by_signature
• top_crashes_by_url
• top_crashes_by_url_signature
• urldims
13.8 PostgreSQL Database Table Descriptions
This document describes the various tables in PostgreSQL by their purpose and essentially what data each contains.This is intended as a reference for socorro developers and analytics users.
Tables which are in the database but not listed below are probably legacy tables which are slated for removal in futureSocorro releases. Certainly if the tables are not described, they should not be used for new features or reports.
13.9 Raw Data Tables
These tables hold “raw” data as it comes in from external sources. As such, these tables are quite large and contain alot of garbage and data which needs to be conditionally evaluated. This means that you should avoid using these tablesfor reports and interfaces unless the data you need isn’t available anywhere else – and even then, you should see aboutgetting the data added to a matview or normalized fact table.
13.9.1 reports
The primary “raw data” table, reports contains the most used information about crashes, one row per crash report.Primary key is the UUID field.
The reports table is partitioned by date_processed into weekly partitions, so any query you run against it should includefilter criteria (WHERE) on the date_processed column. Examples:
WHERE date_processed BETWEEN ’2012-02-12 11:05:09+07’ AND ’2012-02-17 11:05:09+07’WHERE date_processed >= DATE ’2012-02-12’ AND date_processed < DATE ’2012-02-17’WHERE utc_day_is(date_processed, ’2012-02-15’)
Data in this table comes from the processors.
13.9.2 extensions
Contains information on add-ons installed in the user’s application. Currently linked to reports via a synthetic report_id(this will be fixed to be UUID in some future release). Data is partitioned by date_processed into weekly partitions, soinclude a filter on date_processed in every query hitting this table. Has zero to several rows for each crash.
Data in this table comes from the processors.
130 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
13.9.3 plugins_reports
Contains information on some, but not all, installed modules implicated in the crash: the “most interesting” modules.Relates to dimension table plugins. Currently linked to reports via a synthetic report_id (this will be fixed to beUUID in some future release). Data is partitioned by date_processed into weekly partitions, so include a filter ondate_processed in every query hitting this table. Has zero to several rows for each crash.
Data in this table comes from the processors.
13.9.4 bugs
Contains lists of bugs thought to be related to crash reports, for linking to crashes. Populated by a daily cronjob.
13.9.5 bug_associations
Links bugs from the bugs table to crash signatures. Populated by daily cronjob.
13.9.6 raw_adu
Contains counts of estimated Average Daily Users as calculated by the Metrics department, grouped by product,version, build, os, and UTC date. Populated by a daily cronjob.
13.9.7 releases_raw
Contains raw data about Mozilla releases, including product, version, platform and build information. Populatedhourly via FTP-scraping.
13.9.8 reports_duplicates
Contains UUIDs of groups of crash reports thought to be duplicates according to the current automated duplicate-finding algorithm. Populated by hourly cronjob.
13.10 Normalized Fact Tables
13.10.1 reports_clean
Contains cleaned and normalized data from the reports table, including product-version, os, os version, signature,reason, and more. Partitioned by date into weekly partitions, so each query against this table should contain a predicateon date_processed:
WHERE date_processed BETWEEN ’2012-02-12 11:05:09+07’ AND ’2012-02-17 11:05:09+07’WHERE date_processed >= DATE ’2012-02-12’ AND date_processed < DATE ’2012-02-17’WHERE utc_day_is(date_processed, ’2012-02-15’)
Because reports_clean is much smaller than reports and is normalized into unequivocal relationships with dimenstiontables, it is much easier to use and faster to execute queries against. However, it excludes data in the reports tablewhich doesn’t conform to normalized data, including:
• product versions before the first Rapid Release versions (e.g. Firefox 3.6)
13.10. Normalized Fact Tables 131
Socorro Documentation, Release 2
• Camino
• corrupt reports, including ones which indicate a breakpad bug
Populated hourly, 3 hours behind the current time, from data in reports via cronjob. The UUID column is the primarykey. There is one row per crash report, although some crash reports are suspected to be duplicates.
Columns:
uuid artificial unique identifier assigned by the collectors to the crash at collection time. Contains the date collectedplus a random string.
date_processed timestamp (with time zone) at which the crash was received by the collectors. Also the partition keyfor partitioning reports_clean. Note that the time will be 7-8 hours off for crashes before February 2012 due toa shift from PST to UTC.
client_crash_date timestamp with time zone at which the users’ crashing machine though the crash was happening.Often innacurrate due to clock issues, is primarily supplied as an anchor timestamp for uptime and install_age.
product_version_id foreign key to the product_versions table.
build numeric build identifier as supplied by the client. Might not match any real build in product_version_builds fora variety of reasons.
signature_id foreign key to the signatures dimension table.
install_age time interval between installation and crash, as reported by the client. To get the reported install date, do( SELECT client_crash_date - install_age ).
uptime time interval between program start and crash, as reported by the client.
reason_id foreign key to the reasons table.
address_id foreign key to the addresses table.
os_name name of the OS of the crashing host, for OSes which match known OSes.
os_version_id foreign key to the os_versions table.
hang_id UUID assigned to the hang pair grouping for hang pairs. May not match anything if the hang pair wasbroken by sampling or lost crash reports.
flash_version_id foreign key to the flash_versions table
process_type Crashing process type, linked to process_types dimension.
release_channel release channel from which the crashing product was obtained, unless altered by the user (this hap-pens more than you’d think). Note that non-Mozilla builds are usually lumped into the “release” channel.
duplicate_of UUID of the “leader” of the duplicate group if this crash is marked as a possible duplicate. If UUID andduplicate_of are the same, this crash is the “leader”. Selection of leader is arbitrary.
domain_id foreign key to the domains dimension
architecture CPU architecture of the client as reported (e.g. ‘x86’, ‘arm’).
cores number of CPU cores on the client, as reported.
13.10.2 reports_user_info
Contains a handful of “optional” information from the reports table which is either security-sensitive or is not includedin all reports and is large. This includes the full URL, user email address, comments, and app_notes. As such, accessto this table in production may be restricted.
132 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
Partitioned by date into weekly partitions, so each query against this table should contain a predicate ondate_processed. Relates to reports_clean via UUID, which is also its primary key.
13.10.3 product_adu
The normalized version of raw_adu, contains summarized estimated counts of users for each product-version sinceRapid Release began. Populated by daily cronjob.
13.11 Dimensions
These tables contain lookup lists and taxonomy for the fact tables in Socorro. Generally they are auto-populated basedon encountering new values in the raw data, on an hourly basis. A few tables below are manually populated and changeextremely seldom, if at all.
Dimensions which are lookup lists of short values join to the fact tables by natural key, although it is not actually nec-essary to reference them (e.g. os_name, release_channel). Dimension lists which have long values or are taxonomiesor heirarchies join to the fact tables using a surrogate key (e.g. product_version_id, reason_id).
Some dimensions which come from raw crash data have a “first_seen” column which displays when that value wasfirst encountered in a crash and added to the dimension table. Since the first_seen columns were added in September2011, most of these will have the value ‘2011-01-01’ which is not meaningful. Only dates after 2011-09-15 actuallyindicate a first appearance.
13.11.1 addresses
Contains a list of crash location “addresses”, extracted hourly from the raw data. Surrogate key: address_id.
13.11.2 daily_crash_codes
Reference list for the cryptic single-character codes in the daily_crashes table. Legacy, to be eventually restructured.Natural key: crash_code. Manually populated.
13.11.3 domains
List of HTTP domains extracted from raw reports by applying a truncation regex to the crashing URL. These shouldcontain no personal information. Contains a “first seen” column. Surrogate key: domain_id
13.11.4 flash_versions
List of Abobe Flash version numbers harvested from crashes. Has a “first_seen” column. Surrogate key:flash_version_id.
13.11.5 os_names
Canonical list of OS names used in Sorocco. Natural key. Fixed list, manually populated.
13.11. Dimensions 133
Socorro Documentation, Release 2
13.11.6 os_versions
List of versions for each OS based on data harvested from crashes. Contains some garbage versions because we cannotvalidate. Surrogate key: os_version_id.
13.11.7 plugins
List of “interesting modules” harvested from raw crashes, populated by the processors. Surrogate key: ID. Links toplugins_reports.
13.11.8 process_types
Standing list of crashing process types (browser, plugin and hang). Manually input. Natural key.
13.11.9 products
List of supported products, along with the first version on rapid release. Manually maintained. Natural key: prod-uct_name.
13.11.10 product_versions
Contains a list of versions for each product, since the beginning of rapid release (i.e. since Firefox 5.0). Versionnumbers are available expressed several different ways, and there is a sort column for sorting versions. Also containsbuild_date/sunset_date visibility information and the featured_version flag. “build_type” means the same thing as“release_channel”. Surrogate key: product_version_id.
Version columns include:
version_string The canonical, complete version number for display to users
release_version The version number as provided in crash reports (and usually the same as the one on the FTP server).Can be missing suffixes like “b2” or “esr”.
major_version Just the first two numbers of the version number, e.g. “11.0”
version_sort An alphanumeric string which allows you to sort version numbers in the correct order.
beta_number The sequential beta release number if the product-version is a beta. For “final betas”, this number willbe 99.
13.11.11 product_version_builds
Contains a list of builds for each product-version. Note that platform information is not at all normalized. Natural key:product_version_id, build_id.
13.11.12 product_release_channels
Contains an intersection of products and release channels, mainly in order to store throttle values. Manually populated.Natural key: product_name, release_channel.
134 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
13.11.13 reasons
Contains a list of “crash reason” values harvested from raw crashes. Has a “first seen” column. Surrogate key:reason_id.
13.11.14 release_channels
Contains a list of available Release Channels. Manually populated. Natural key. See “note on release channelcolumns” below.
13.11.15 signatures
List of crash signatures harvested from incoming raw data. Populated by hourly cronjob. Has a first_seen column.Surrogate key: signature_id.
13.11.16 uptime_levels
Reference list of uptime “levels” for use in reports, primarily the Signature Summary. Manually populated.
13.11.17 windows_versions
Reference list of Window major/minor versions with their accompanying common names for reports. Manually pop-ulated.
13.12 Matviews
These data summaries are derived data from the fact tables and/or the raw data tables. They are populated by hourlyor daily cronjobs, and are frequently regenerated if historical data needs to be corrected. If these matviews contain thedata you need, you should use them first because they are smaller and more efficient than the fact tables or the rawtables.
13.12.1 correlations
Summaries crashes by product-version, os, reason and signature. Populated by daily cron job. Is the root for the othercorrelations reports. Correlation reports in the database will not be active/populated until 2.5.2 or later.
13.12.2 correlation_addons
Contains crash-count summaries of addons per correlation. Populated by daily cronjob.
13.12.3 correlation_cores
Contains crash-count summaries of crashes per architecture and number of cores. Populated by daily cronjob.
13.12. Matviews 135
Socorro Documentation, Release 2
13.12.4 correlation_modules
Will contain crash-counts for modules per correlation. Will be populated daily by pull from Hbase.
13.12.5 daily_crashes
Stores crash counts per product-version, OS, and day. This is probably the oldest matview, and has unintuitive andhistorical column names; it will probably be overhauled or replaced. The report_type column defines 5 different setsof counts, see daily_crash_codes above.
We recommended that you use the VIEW daily_crash_ratio instead of using daily_crashes, as the structure ofdaily_crashes is hard to understand and is likely to change in the future.
13.12.6 daily_hangs and hang_report
daily_hangs contains a correlation of hang crash reports with their related hang pair crashes, plus additional summarydata. Duplicates contains an array of UUIDs of possible duplicates.
hang_report is a dynamic view which flattens daily_hangs and its related dimension tables.
13.12.7 nightly_builds
contains summaries of crashes-by-age for Nightly and Aurora releases. Will be populated in Socorro 2.5.1.
13.12.8 product_crash_ratio
Dynamic VIEW which shows crashes, ADU, adjusted crashes, and the crash/100ADU ratio, for each product andversions. Recommended for backing graphs and similar.
13.12.9 product_os_crash_ratio
Dynamic VIEW which shows crashes, ADU, adjusted crashes, and the crash/100ADU ratio for each product, OS andversion. Recommended for backing graphs and similar.
13.12.10 product_info
dynamic VIEW which suppies the most essential information about each product version for both old and new prod-ucts.
13.12.11 signature_products and signature_products_rollup
Summary of which signatures appear in which product_version_ids, with first appearance dates.
The rollup contains an array-style summary of the signatures with lists of product-versions.
13.12.12 tcbs
Short for “Top Crashes By Signature”, tcbs contains counts of crashes per day, signature, product-version, and columnscounting each OS.
136 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
13.13 Note On Release Channel Columns
Due to a historical error, the column name for the Release Channel in various tables may be named “release_channel”,“build_type”, or “build_channel”. All three of these column names refer to exactly the same thing. While we regretthe confusion, it has not been thought to be worth the refactoring effort to clean it up.
13.14 Application Support Tables
These tables are used by various parts of the application to do other things than reporting. They are populated/managedby those applications. Most are not accessible to the various reporting users, as they do not contain reportable data.
13.14.1 data processing control tables
These tables contain data which supports data processing by the processors and cronjobs.
product_productid_map maps product names based on productIDs, in cases where the product name supplied byBreakpad is not correct (i.e. FennecAndroid).
reports_bad contains the last day of rejected UUIDs for copying from reports to reports_clean. intended for auditingof the reports_clean code.
os_name_matches contains regexs for matching commonly found OS names in crashes with canonical OS names.
release_channel_matches contains LIKE match strings for release channels for channel names commonly found incrashes with canonical names.
special_product_platforms contains mapping information for rewriting data from FTP-scraping to have the correctproduct and platform. Currently used only for Fennec.
transform_rules contains rule data for rewriting crashes by the processors. May be used in the future for otherrule-based rewriting by other components.
13.14.2 email campaign tables
These tables support the application which emails crash reporters with follow-ups. As such, access to these tables willrestricted.
• email_campaigns
• email_campaigns_contacts
• email_contacts
13.14.3 processor management tables
These tables are used to coordinate activities of the up-to-120 processors and the monitor.
jobs The current main queue for crashes waiting to be processed.
priorityjobs The queue for user-requested “priority” crash processing.
processors The registration list for currently active processors.
server_status Contains summary statistics on the various processor servers.
13.13. Note On Release Channel Columns 137
Socorro Documentation, Release 2
13.14.4 UI management tables
sessions contains session information for people logged into the administration interface for Socorro.
13.14.5 monitoring tables
replication_test Contains a timestamp for ganglia to measure the speed of replication.
13.14.6 cronjob and database management
These tables support scheduled tasks which are run in Socorro.
cronjobs contains last-completed and success/failure status for each cronjob which affects the database. Currentlydoes not include all cronjobs.
report_partition_info contains configuration information on how the partitioning cronjob needs to partition the var-ious partitioned database tables.
socorro_db_version contains the socorro version of the current database. updated by the upgrade scripts.
socorro_db_version_history contains the history of version upgrades of the current database.
13.15 Creating a New Matview
A materialized view, or “matview” is the results of a query stored as a table in the PostgreSQL database. Matviewsmake user interfaces much more responsive by eliminating searches over many GB or sparse data at request time. Themajority of the time, new matviews will have the following characteristics:
• they will pull data from reports_clean and/or reports_user_info
• they will be updated once per day and store daily summary data
• they will be updated by a cron job calling a stored procedure
The rest of this guide assumes that all three conditions above are true. For matviews for which one or more conditionsare not true, consult the PostgreSQL DBAs for your matview.
13.16 Do I Want a Matview?
Before proceeding to construct a new matview, test the responsiveness of simply running a query over reports_cleanand/or reports_user_info. You may find that the query returns fast enough ( < 100ms ) without its own matview.Remember to test the extreme cases: Firefox release version on Windows, or Fennec aurora version.
Also, matviews are really only effective if they are smaller than 1/4 the size of the base data from which they areconstructed. Otherwise, it’s generally better to simply look at adding new indexes to the base data. Try populating acouple days of the matview, ad-hoc, and checking its size (pg_total_relation_size()) compared to the base table fromwhich it’s drawn. The new signature summaries was a good example of this; the matviews to meet the spec wouldhave been 1/3 the size of reports_clean, so we added a couple new indexes to reports_clean instead.
138 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
13.17 Components of a Matview
In order to create a new matview, you will create or modify five or six things:
1. a table to hold the matview data
2. an update function to insert new matview data once per day
3. a backfill function to backfill one day of the matview
4. add a line in the general backfill_matviews function
5. if the matview is to be backfilled from deployment, a script to do this
6. a test that the matview is being populated correctly.
Point (6) is not yet addressed by a test framework for Socorro, so we’re skipping it currently.
For the rest of this doc, please refer to the template matview code sql/templates/general_matview_template.sql in theSocorro source code.
13.18 Creating the Matview Table
The matview table should be the basis for the report or screen you want. It’s important that it be able to cope with allof the different filter and grouping criteria which users are allowed to supply. On the other hand, most of the time it’snot helpful to try to have one matview support several different reports; the matview gets bloated and slow.
In general, each matview will have the following things:
• one or more grouping columns
• a report_date column
• one or more summary data columns
If they are available, all columns should use surrogate keys to lookup lists (i.e. use signature_id, not the full text of thesignature). Generally the primary key of the matview will be the combination of all grouping columns plus the reportdate.
So, as an example, we’re going to create a simple matview for summarizing crashes per product, web domain. Whileit’s unlikely that such a matview would be useful in practice (we could just query reports_clean directly) it makes agood example. Here’s the model for the table:
table product_domain_countsproduct_versiondomainreport_datereport_countkey product_version, domain, report_date
We actually use the custom procedure create_table_if_not_exists() to create this. This function handles idempotence,permissions, and secondary indexes for us, like so:
SELECT create_table_if_not_exists(’product_domain_counts’$x$CREATE TABLE product_domain_counts (
product_version_id INT NOT NULL,domain_id INT NOT NULL,report_date DATE NOT NULL,report_count INT NOT NULL DEFAULT 0,
13.17. Components of a Matview 139
Socorro Documentation, Release 2
constraint product_domain_counts_key (product_version_id, domain_id, report_date )
);$x$,’breakpad_rw’, ARRAY[’domain_id’] );
See DatabaseAdminFunctions in the docs for more information about the function.
You’ll notice that the resulting matview uses the surrogate keys of the corresponsing lookup lists rather than the actualvalues. This is to keep matview sizes down and improve performance. You’ll also notice that there are no foriegnkeys to the various lookup list tables; this is partly a performance optimization, but mostly because, since matviewsare populated by stored procedure, validating input is not critical. We also don’t expect to need cascading updates ordeletes on the lookup lists.
13.18.1 Creating The Update Function
Once you have the table, you’ll need to write a function to be called by cron once per day in order to populate thematview with new data.
This function will:
• be named update_{name_of_matview}
• take two parameters, a date and a boolean
• return a boolean, with true = success and ERROR = failure
• check if data it depends on is available
• check if it’s already been run for the day
• pull its data from reports_clean, reports_user_info, and/or other matviews (_not_ reports or other raw data tables)
So, here’s our update function for the product_domains table:
CREATE OR REPLACE FUNCTION update_product_domain_counts (updateday DATE, checkdata BOOLEAN default TRUE )
RETURNS BOOLEANLANGUAGE plpgsqlSET work_mem = ’512MB’SET temp_buffers = ’512MB’SET client_min_messages = ’ERROR’AS $f$BEGIN-- this function populates a daily matview-- for crash counts by product and domain-- depends on reports_clean
-- check if we’ve been runIF checkdata THEN
PERFORM 1 FROM product_domain_countsWHERE report_date = updatedayLIMIT 1;IF FOUND THEN
RAISE EXCEPTION ’product_domain_counts has already been run for %.’,updateday;END IF;
END IF;
-- check if reports_clean is completeIF NOT reports_clean_done(updateday) THEN
140 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
IF checkdata THENRAISE EXCEPTION ’Reports_clean has not been updated to the end of %’,updateday;
ELSERETURN TRUE;
END IF;END IF;
-- now insert the new records-- this should be some appropriate query, this simple group by-- is just provided as an exampleINSERT INTO product_domain_counts
( product_version_id, domain_id, report_date, report_count )SELECT product_version_id, domain_id,
updateday,count(*)
FROM reports_cleanWHERE domain_id IS NOT NULL
AND date_processed >= updateday::timestamptzAND date_processed < ( updateday + 1 )::timestamptz
GROUP BY product_version_id, domain_id;
RETURN TRUE;END; $f$;
Note that the update functions could be written in PL/python if you wish; however, there isn’t yet a template for that.
13.18.2 Creating The Backfill Function
The second function which needs to be created is one for backfilling data for specific dates, for when we need tobackfill missing or corrected data. This function will also be used to fill in data when we first deploy the matview.
The backfill function will generally be very simple; it just calls a delete for the days data and then the update function,with the “checkdata” flag disabled:
CREATE OR REPLACE FUNCTION backfill_product_domain_counts(updateday DATE )
RETURNS BOOLEANLANGUAGE plpgsql AS$f$BEGIN
DELETE FROM product_domain_counts WHERE report_date = updateday;PERFORM update_product_domain_counts(updateday, false);
RETURN TRUE;END; $f$;
13.18.3 Adding The Function To The Omnibus Backfill
Usually when we backfill data we recreate all matview data for the period affected. This is accomplished by insertingit into the backfill_matviews table:
INSERT INTO backfill_matviews ( matview, function_name, frequency )VALUES ( ’product_domain_counts’, ’backfill_product_domain_counts’, ’daily’ );
13.18. Creating the Matview Table 141
Socorro Documentation, Release 2
NOTE: the above is not yet active. Until it is, send a request to Josh Berkus to add your new backfill to the omnibusbackfill function.
13.18.4 Filling in Initial Data
Generally when creating a new matview, we want to fill in two weeks or so of data. This can be done with either aPython or a PL/pgSQL script. A PL/pgSQL script would be created as a SQL file and look like this:
DO $f$DECLARE
thisday DATE := ’2012-01-14’;lastday DATE;
BEGIN
-- set backfill to the last day we have ADU forSELECT max("date")INTO lastdayFROM raw_adu;
WHILE thisday <= lastday LOOP
RAISE INFO ’backfilling %’, thisday;
PERFORM backfill_product_domain_counts(thisday);
thisday := thisday + 1;
END LOOP;
END;$f$;
This script would then be checked into the set of upgrade scripts for that version of the database.
13.19 Database Admin Function Reference
What follows is a listing of custom functions written for Socorro in the PostgreSQL database which are intended fordatabase administration, particularly scheduled tasks. Many of these functions depend on other, internal functionswhich are not documented.
All functions below return BOOLEAN, with TRUE meaning completion, and throw an ERROR if they fail, unlessotherwise noted.
13.20 MatView Functions
These functions manage the population of the many Materialized Views in Socorro. In general, for each matview thereare two functions which maintain it:
update_{matview_name} ( DATE )
fills in one day of the matview for the first timewill error if data is already present, or source datais missing
142 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
backfill_{matview_name} ( DATE )
deletes one day of data for the matview and recreatesit. will warn, but not error, if source data is missingsafe for use without downtime
Exceptions to the above are generally for procedures which need to run hourly or more frequently (e.g. up-date_reports_clean, reports_duplicates). Also, some functions have shortcut names where they don’t use the fullname of the matview (e.g. update_adu).
Note that the various matviews can take radically different amounts of time to update or backfill ... from a couple ofseconds to 10 minutes for one day.
In addition, there are several procedures which are designed to update or backfill multiple matviews for a range ofdays. These are designed for when there has been some kind of widespread issue in crash processing and a bunch ofcrashes have been reprocessed and need to be re-aggregated.
These mass-backfill functions generally give a lot of command-line feedback on their progress, and should be run ina screen session, as they may take hours to complete. These functions, as the most generally used, are listed first. Ifyou are doing a mass-backfill, you probably want to limit the backfill to a week at a time in order to prevent it fromrunning too long before committing.
13.20.1 backfill_matviews
Purpose: backfills data for all matviews for a specific range of dates. For use when data is either missing or needs tobe retroactively corrected.
Called By: manually by admin as needed
backfill_matviews (startdate DATE,optional enddate DATE default current_date,optional reportsclean BOOLEAN default true
)
SELECT backfill_matviews( ’2011-11-01’, ’2011-11-27’, false );SELECT backfill_matviews( ’2011-11-01’ );
startdate the first date to backfill
enddate the last date to backfill. defaults to the current UTC date.
reportsclean whether or not to backfill reports_clean as well. defaults to true supplied because the backfill of re-ports_clean takes a lot of time.
13.20.2 backfill_reports_clean
Purpose: backfill only the reports_clean normalized fact table.
Called By: admin as needed
backfill_reports_clean (starttime TIMESTAMPTZ,endtime TIMESTAMPTZ,
)
SELECT backfill_reports_clean ( ’2011-11-17’, ’2011-11-29 14:00:00’ );
13.20. MatView Functions 143
Socorro Documentation, Release 2
starttime timestamp to start backfill
endtime timestamp to halt backfill at
Note: if backfilling less than 1 day, will backfill in 1-hour increments. If backfilling more than one day, will backfillin 6-hour increments. Can take a long time to backfill more than a couple of days.
13.20.3 update_adu, backfill_adu
Purpose: updates or backfills one day of the product_adu table, which is one of the two matviews powering the graphsin socorro. Note that if ADU is out of date, it has no dependancies, so you only need to run this function.
Called By: update function called by the update_matviews cron job.
update_adu (updateday DATE);
backfill_adu (updateday DATE);
SELECT update_adu(’2011-11-26’);
SELECT backfill_adu(’2011-11-26’);
updateday DATE of the UTC crash report day to update or backfill
13.20.4 update_products
Purpose: updates the list of product_versions and product_version_builds based on the contents of releases_raw.
Called By: daily cron job
update_products ()
SELECT update_products ( ’2011-12-04’ );
Notes: takes no parameters as the product update is always cumulative. As of 2.3.5, only looks at product_versionswith build dates in the last 30 days. There is no backfill function because it is always a cumulative update.
13.20.5 update_tcbs, backfill_tcbs
Purpose: updates “tcbs” based on the contents of the report_clean table
Called By: daily cron job
update_tcbs (updateday DATE,checkdata BOOLEAN optional default true)
SELECT update_tcbs ( ’2011-11-26’ );
backfill_tcbs (updateday DATE
144 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
)
SELECT backfill_tcbs ( ’2011-11-26’ );
updateday UTC day to pull data for.
checkdata whether or not to check dependant data and throw an error if it’s not found.
Notes: updates only “new”-style versions. Until 2.4, update_tcbs pulled data directly from reports and not re-ports_clean.
13.20.6 update_daily_crashes, backfill_daily_crashes
Purpose: updates “daily_crashes” based on the contents of the report_clean table
Called By: daily cron job
update_daily_crashes (updateday DATE,checkdata BOOLEAN optional default true)
SELECT update_daily_crashes ( ’2011-11-26’ );
backfill_daily_crashes (updateday DATE)
SELECT backfill_daily_crashes ( ’2011-11-26’ );
updateday UTC day to pull data for.
checkdata whether or not to check dependant data and throw an error if it’s not found.
Notes: updates only “new”-style versions. Until 2.4, update_daily_crashes pulled data directly from reports and notreports_clean. Probably the slowest of the regular update functions; can date up to 4 minutes to do one day.
13.20.7 update_rank_compare, backfill_rank_compare
Purpose: updates “rank_compare” based on the contents of the reports_clean table
Called By: daily cron job
update_rank_compare (updateday DATE optional default yesterday,checkdata BOOLEAN optional default true)
SELECT update_rank_compare ( ’2011-11-26’ );
backfill_rank_compare (updateday DATE optional default yesterday)
SELECT backfill_rank_compare ( ’2011-11-26’ );
updateday UTC day to pull data for. Optional; defaults to ( CURRENT_DATE - 1 ).
checkdata whether or not to check dependant data and throw an error if it’s not found.
13.20. MatView Functions 145
Socorro Documentation, Release 2
Note: this matview is not historical, but contains only one day of data. As such, running either the update or backfillfunction replaces all existing data. Since it needs an exclusive lock on the matview, it is possible (though unlikely) forit to fail to obtain the lock and error out.
13.20.8 update_nightly_builds, backfill_nightly_builds
Purpose: updates “nightly_builds” based on the contents of the reports_clean table
Called By: daily cron job
update_nightly_builds (updateday DATE optional default yesterday,checkdata BOOLEAN optional default true)
SELECT update_nightly_builds ( ’2011-11-26’ );
backfill_nightly_builds (updateday DATE optional default yesterday)
SELECT backfill_nightly_builds ( ’2011-11-26’ );
updateday UTC day to pull data for.
checkdata whether or not to check dependant data and throw an error if it’s not found. Optional.
13.21 Schema Management Functions
These functions support partitioning, upgrades, and other management of tables and views.
13.21.1 weekly_report_partitions
Purpose: to create new paritions for the reports table and its child tables every week.
Called By: weekly cron job
weekly_report_partitions (optional numweeks integer default 2,optional targetdate date default current_date
)
SELECT weekly_report_partitions();SELECT weekly_report_partitions(3,’2011-11-09’);
numweeks number of weeks ahead to create partitions
targetdate date for the starting week, if not today
13.21.2 try_lock_table
Purpose: attempt to get a lock on a table, looping with sleeps until the lock is obtained.
Called by: various functions internally
146 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
try_lock_table (tabname TEXT,mode TEXT optional default ’EXCLUSIVE’,attempts INT optional default 20
) returns BOOLEAN
IF NOT try_lock_table(’rank_compare’, ’ACCESS EXCLUSIVE’) THENRAISE EXCEPTION ’unable to lock the rank_compare table for update.’;
END IF;
tabname the table name to lock
mode the lock mode per PostgreSQL docs. Defaults to ‘EXCLUSIVE’.
attempts the number of attempts to make, with 3 second sleeps between each. optional, defaults to 20.
Returns TRUE for table locked, FALSE for unable to lock.
13.21.3 create_table_if_not_exists
Purpose: creates a new table, skipping if the table is found to already exist.
Called By: upgrade scripts
create_table_if_not_exists (tablename TEXT,declaration TEXT,tableowner TEXT optional default ’breakpad_rw’,indexes TEXT ARRAY default empty list
)
SELECT create_table_if_not_exists ( ’rank_compare’, $q$create table rank_compare (
product_version_id int not null,signature_id int not null,rank_days int not null,report_count int,total_reports bigint,rank_report_count int,percent_of_total numeric,constraint rank_compare_key primary key ( product_version_id, signature_id, rank_days )
);$q$, ’breakpad_rw’,ARRAY [ ’product_version_id,rank_report_count’, ’signature_id’ ]);
tablename name of the new table to create
declaration full CREATE TABLE sql statement, plus whatever other SQL statements you only want to run on tablecreation such as priming it with a few records and creating the primary key. If running more than one SQLstatement, separate them with semicolons.
tableowner the ROLE which owns the table. usually ‘breakpad_rw’. optional.
indexes an array of sets of columns to create regular btree indexes on. use the array declaration as demonstratedabove. default is to create no indexes.
Note: this is the best way to create new tables in migration scripts, since it allows you to rerun the script multiple timeswithout erroring out. However, be aware that it only checks for the existance of the table, not its definition, so if youmodify the table definition you’ll need to manually drop and recreate it.
13.21. Schema Management Functions 147
Socorro Documentation, Release 2
13.22 Other Administrative Functions
13.22.1 add_old_release
Purpose: Allows you to add an old release to productdims/product_visibility.
Called By: on demand by Firefox or Camino teams.
add_old_release (product_name text,new_version text,release_type release_enum default ’major’,release_date DATE DEFAULT current_date,is_featured BOOLEAN default FALSE
) returns BOOLEAN
SELECT add_old_release (’Camino’,’2.1.1’);SELECT add_old_release (’Camino’,’2.1.2pre’,’development’,’2012-03-09’,true);
Notes: if this leads to more than 4 currently featured versions, the oldest featured vesion will be “bumped”.
13.23 Custom Time-Date Functions
The present Socorro database needs to do a lot of time, date and timezone manipulation. This is partly a naturalconsequence of the application, and the need to use both DATE and TIMESTAMPTZ values. The greater need islegacy timestamp, conversion, however; currently the processors save crash reporting timestamps as TIMESTAMPWITHOUT TIMEZONE in Pacific time, whereas the rest of the database is TIMESTAMP WITH TIME ZONE inUTC. This necessitates a lot of tricky time zone conversions.
The functions below are meant to make it easier to write queries which return correct results based on dates andtimestamps.
13.23.1 tstz_between
tstz_between (tstz TIMESTAMPTZ,bdate DATE,fdate DATE
)RETURNS BOOLEAN
SELECT tstz_between ( ’2011-11-25 15:23:11-08’,’2011-11-25’, ’2011-11-26’ );
Checks whether a timestamp with time zone is between two UTC dates, inclusive of the entire ending day.
13.23.2 utc_day_is
utc_day_is (TIMESTAMPTZ,TIMESTAMP or DATE)
RETURNS BOOLEAN
148 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
SELECT utc_day_is ( ’2011-11-26 15:23:11-08’, ’2011-11-28’ );
Checks whether the provided timestamp with time zone is within the provided UTC day, expressed as either a times-tamp without time zone or a date.
13.23.3 utc_day_near
utc_day_near (TIMESTAMPTZ,TIMESTAMP or DATE)
RETURNS BOOLEAN
SELECT utc_day_near ( ’2011-11-26 15:23:11-08’, ’2011-11-28’ );
Checks whether the provided timestamp with time zone is within an hour of the provided UTC day, expressed as eithera timestamp without time zone or a date. Used for matching when related records may cross over midnight.
13.23.4 week_begins_utc
week_begins_utc (TIMESTAMP or DATE)
RETURNS timestamptz
SELECT week_begins_utc ( ’2011-11-25’ );
Given a timestamp or date, returns the timestamp with time zone corresponding to the beginning of the week in UTCtime. Used for partitioning data by week.
13.23.5 week_ends_utc
week_ends_utc (TIMESTAMP or DATE)
RETURNS timestamptz
SELECT week_ends_utc ( ’2011-11-25’ );
Given a timestamp or date, returns the timestamp with time zone corresponding to the end of the week in UTC time.Used for partitioning data by week.
13.23.6 week_begins_partition
week_begins_partition (partname TEXT)
RETURNS timestamptz
SELECT week_begins_partition ( ’reports_20111219’ );
Given a partition table name, returns a timestamptz of the date and time that weekly partition starts.
13.23. Custom Time-Date Functions 149
Socorro Documentation, Release 2
13.23.7 week_ends_partition
week_ends_partition (partname TEXT)
RETURNS timestamptz
SELECT week_ends_partition ( ’reports_20111219’ );
Given a partition table name, returns a timestamptz of the date and time that weekly partition ends.
13.23.8 week_begins_partition_string
week_begins_partition_string (partname TEXT)
RETURNS text
SELECT week_begins_partition_string ( ’reports_20111219’ );
Given a partition table name, returns a string of the date and time that weekly partition starts in the format ‘YYYY-MM-DD HR:MI:SS UTC’.
13.23.9 week_ends_partition_string
week_ends_partition_string (partname TEXT)
RETURNS text
SELECT week_ends_partition_string ( ’reports_20111219’ );
Given a partition table name, returns a string of the date and time that weekly partition ends in the format ‘YYYY-MM-DD HR:MI:SS UTC’.
13.24 Database Misc Function Reference
What follows is a listing of custom functions written for Socorro in the PostgreSQL database which are useful forapplication development, but do not fit in the “Admin” or “Datetime” categories.
13.25 Formatting Functions
13.25.1 build_numeric
build_numeric (build TEXT
)RETURNS NUMERIC
SELECT build_numeric ( ’20110811165603’ );
150 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
Converts a build ID string, as supplied by the processors/breakpad, into a numeric value on which we can do compu-tations and derive a date. Returns NULL if the build string is a non-numeric value and thus corrupted.
13.25.2 build_date
build_date (buildid NUMERIC
)RETURNS DATE
SELECT build_date ( 20110811165603 );
Takes a numeric build_id and returns the date of the build.
13.26 API Functions
These functions support the middleware, making it easier to look up certain things in the database.
13.26.1 get_product_version_ids
get_product_version_ids (product CITEXT,versions VARIADIC CITEXT
)
SELECT get_product_version_ids ( ’Firefox’,’11.0a1’ );SELECT get_product_version_ids ( ’Firefox’,’11.0a1’,’11.0a2’,’11.0b1’);
Takes a product name and a list of version_strings, and returns an array (list) of surrogate keys (product_version_ids)which can then be used in queries like:
SELECT * FROM reports_clean WHERE date_processed BETWEEN ’2012-03-21’ AND ’2012-03-38’WHERE product_version_id = ANY ( $list );
13.27 Populate PostgreSQL
Socorro supports multiple products, each of which may contain multiple versions.
• A product is a global product name, such as Firefox, Thunderbird, Fennec, etc.
• A version is a revision of a particular product, such as Firefox 3.6.6 or Firefox 3.6.5
• A branch is the indicator for the Gecko platform used in a Mozilla product / version. If your crash reportingproject does not have a need for branch support, just enter “1.0” as the branch number for your product / version.
13.27.1 Customize CSV files
Socorro comes with a set of CSV files you can customize and use to bootstrap your database.
Shut down all Socorro services, drop your database (if needed) and load the schema. From inside the Socorro checkout,as postgres user:
13.26. API Functions 151
Socorro Documentation, Release 2
./socorro/external/postgresql/setupdb_app.py --database_name=breakpad_rw
Customize CSVs, at minimum you probably need to bump the dates and build IDs in: raw_adu.csv reports.csvreleases_raw.csv
You will probably want to change “WaterWolf” to your own product name and version history, if you are setting thisup for production.
Also, note that the backfill procedure will ignore build IDs over 30 days old.
From inside the Socorro checkout, as the postgres user:
cd tools/dataloadedit *.csv./import.sh
See PostgreSQL Database Tables by Data Source for a complete explanation of each table.
13.27.2 Run backfill function to populate matviews
Socorro depends upon materialized views which run nightly, to display graphs and show reports such as “Top CrashBy Signature”.
IMPORTANT NOTE - many reports use the reports_clean_done() stored procedure to check that reports exist for thelast UTC hour of the day being processed, as a way to catch problems. If your crash volume is low enough, you maywant to modify this function (it is in breakpad_schema.sql referenced above).
Normally this is run for the previous day by cron_daily_matviews.sh but you can simply run the backfill function tobootstrap the system:
This is normally run by the import.sh, so take a look in there if you need to make adjustments.
There also needs to be at least one featured version, which is controlled by setting “featured_version” column to “true”for one or more rows in the product_version table.
Restart memcached as the root user:
/etc/init.d/memcached restart
Now the Socorro UI should now work.
You can change settings using the admin UI, which will be at http://crash-stats/admin (or the equivalent hostname foryour install.)
13.27.3 Load data via snapshot
If you have access to an existing Socorro database snapshot, you can load it like so:
# shut down database userssudo /etc/init.d/supervisor force-stopsudo /etc/init.d/apache2 stop
# drop old db and load snapshotsudo su - postgresdropdb breakpadcreatedb -E ’utf8’ -l ’en_US.utf8’ -T template0 breakpadpg_restore -Fc -d breakpad minidb.dump
This may take several hours, depending on your hardware. One way to speed this up would be to:
152 Chapter 13. PostgreSQL Database
Socorro Documentation, Release 2
• If in a VirtualBox environment, add more CPU cores to the VM (via virtualbox GUI), default is 1
• Add “-j n” to pg_restore command above, where n is number of CPU cores - 1
13.27. Populate PostgreSQL 153
Socorro Documentation, Release 2
154 Chapter 13. PostgreSQL Database
CHAPTER 14
How generic app and an example works using configman
14.1 The minimum app
To illustrate the example, let’s look at an example of an app that uses generic_app to leverage configman torun. Let’s look at weeklyReportsPartitions.py
As you can see, it’s a subclass of the socorro.app.generic_app.App class which is a the-least-you-need wrapper for aminimal app. As you can see, it takes care of logging and executing your main function.
14.2 Connecting and handling transactions
Let’s go back to the weeklyReportsPartitions.py cron script and take a look at what it does.
It only really has one configman option and that’s the transaction_executor_class. The default value isTransactionExecutorWithBackoff which is the class that’s going to take care of two things:
1. execute a callable that accepts an opened database connection as first and only parameter
2. committing the transaction if there are no errors and rolling back the transaction if an exception is raised
3. NB: if an OperationalError or InterfaceError exception is raised,TransactionExecutorWithBackoff will log that and retry after configurable delay
Note that TransactionExecutorWithBackoff is the default transaction_executor_class but if youoverride it, for example by the command line, with TransactionExecutor no exceptions are swallowed and itdoesn’t retry.
Now, connections are created and closed by the ConnectionContext class. As you mighthave noticed, the default database_class defined in the TransactionExecutor issocorro.external.postgresql.connection_context.ConnectionContext as you can seehere
The idea is that any external module (e.g. HBase, PostgreSQL, etc) can define a ConnectionContext class as perthis model. What its job is is to create and close connections and it has to do so in a contextmanager. What that meansis that you can do this:
connector = ConnectionContext()with connector() as connection: # opens a connection
do_something(connection)# closes the connection
And if errors are raised within the do_something function it doesn’t matter. The connection will be closed.
155
Socorro Documentation, Release 2
14.3 What was the point of that?!
For one thing, this app being a configman derived app means that all configuration settings are as flexible asconfigman is. You can supply different values for any of the options either by the command line (try running--help on the ./weeklyReportsPartitions.py script) and you can control them with various configurationfiles as per your liking.
The other thing to notice is that when writing another similar cron script, all you need to do is to worry about exactlywhat to execute and let the framework take care of transactions and opening and closing connections. Each class issupposed to do one job and one job only.
configman uses not only basic options such as database_password but also more complex options such asaggregators. These are basically invariant options that depend on each other and uses functions in there to get its stufftogether.
156 Chapter 14. How generic app and an example works using configman
CHAPTER 15
Writing documentation
To contribute with your documentation follow these steps to be able to modify the git repo, build a local copy anddeploy on ReadTheDocs.org.
15.1 Installing Sphinx
Sphinx is an external tool that compiles these reStructuredText into HTML. Since it’s a python tool you can install itwith easy_install or pip like this:
pip install sphinx
15.2 Making the HTML
Now you can build the docs with this simple command:
cd docsmake html
This should update the revelant HTML files in socorro/docs/_build and you can preview it locally like this(on OS X for example):
open _build/html/index.html
To modify the index itself, edit index.rst and (for instance you may want to add or remove a document filename,without the rst extension, from the ”.. toctree::” section)
15.3 Making it appear on ReadTheDocs
ReadTheDocs.org is wired to build the documentation nightly from this git repository but if you want to make docu-mentation changes appear immediately you can use their webhooks to re-create the build and update the documentationright away.
15.4 Or, just send the pull request
If you have a relevant update to the documentation but don’t have time to set up your Sphinx and git environment youcan just edit these files in raw mode and send in a pull request.
157
Socorro Documentation, Release 2
15.5 Or, just edit the documentation online
The simplest way to edit the documentation is to just edit it inside the Github editor. To get started, go tohttps://github.com/mozilla/socorro and browse in the docs directory to find the file you want to edit.
Then click the “Edit this file” button in the upper right-hand corner and type away.
When you’re done, write a comment underneath and click “Commit Changes”.
If you are unsure about how to edit reStructuredText and don’t want to trial-and-error your way through the editing,then one thing you can do is to copy the text into an online reStructuredText editor and see if you get the syntax right.Obviously you’ll receive warnings and errors about broken internal references but at least you’ll know if syntax iscorrect.
158 Chapter 15. Writing documentation
CHAPTER 16
Indices and tables
• genindex
• modindex
• search
159