Upload
alina-vilk
View
124
Download
4
Embed Size (px)
Citation preview
Building Spark Сonnector for Ryft -hardware high-speed compute appliance
Aleksandr PavlenkoBig Data Software Engineer, [email protected]
What is Apache Spark ?
Ryft ONE - hardware producing Big Data
Ryft Query Language
Query examples:
Exact Search: (RAW_TEXT CONTAINS "Some Text")
Edit Search: (RAW_TEXT CONTAINS FEDS("Some Text", DIST=2, ...))
Date Search: (RECORD.date CONTAINS DATE(MM/DD/YYYY <=
"04/05/2015"))
Ryft REST Service
Spark Ryft Connector
Use Cases:
● Financial services
● Customer visibility
● Call center records
● Security and defense
● e-Medical records
● Genomic research
● IoT sensor and devices
● Supply chain logistics
Supercharging Spark with Ryft
*Benchmark comparisons against Apache Spark running on a cluster of AWS EC2 –
c3.8xlarge “Compute Optimized” 2U servers that require 1100 Watts each.http://www.ryft.com/products#performance-proof
RDD - Resilient Distributed Dataset
abstract class RDD[T](...) {
@DeveloperApi
def compute(split: Partition, context: TaskContext): Iterator[T]
protected def getPartitions: Array[Partition]
protected def getPreferredLocations(split: Partition): Seq[String] = Nil
}
Ryft RDD
*Typical query: http://ryftone0/search?query=(RAW_TEXT CONTAINS "test")&files=somefile.txt
import com.ryft.spark.connector._
...
val sc = new SparkContext(sparkConf)
val query = RecordQuery(recordField("Description") contains
IPv4Value(IP === IPv4("192.168.190.151")))
val ryftOptions = RyftQueryOptions("data/*", xml)
val ryftRDD = sc.ryftRDD(Seq(query), ryftOptions)
...
Ryft RDD Example
Data Locality & Partitioning Mechanism
abstract class RDD[T](...) {
...
protected def getPreferredLocations(split: Partition): Seq[String] = Nil
...
}
Ryft DataFrame Support
Mapping of structured data at Ryft (JSON\XML) to DataFrame
RyftRelation extends BaseRelation with PruntedFilteredScan
val schema = StructType(Seq(
StructField("Arrest", BooleanType),StructField("Date", TimestampType),
StructField("Description", StringType), StructField("ID", StringType)
))
sqlContext.read.ryft(schema,xml,"*.crimestat","temp_table",
Map("date_format" -> "MM/dd/yyyy hh:mm:ss aa"))
sqlContext.sql("""select Date, ID, Description, Arrest from temp_table
where Date = '2015-04-15 23:59:00' ORDER BY Date""")
Ryft Twitter Demo
Q & A ?