31
! From SQL to RQL or how to reuse SQL techniques for pa5ern mining JeanMarc Pe+t INSA/Université de Lyon, CNRS Joint work with : Brice Chardin, Emmanuel Coquery, ChrisHne Froidevaux, Marie Paillloux (Agier), Jef Wijsen … and Einoshin Suzuki ! Work par.ally funded by the French Research Agency (ANR), DAG project

2014 slides RQL JMP - perso.liris.cnrs.fr

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2014 slides RQL JMP - perso.liris.cnrs.fr

!  From  SQL  to  RQL  or  how  to  re-­‐use  SQL  techniques  for  pa5ern  mining  

Jean-­‐Marc  Pe+t  INSA/Université  de  Lyon,  CNRS  

Joint  work  with  :  Brice  Chardin,  Emmanuel  Coquery,  ChrisHne  Froidevaux,  Marie  Paillloux  (Agier),  Jef  Wijsen  …  and  Einoshin  Suzuki  !  

Work  par.ally  funded  by  the  French  Research  Agency  (ANR),  DAG  project  

Page 2: 2014 slides RQL JMP - perso.liris.cnrs.fr

RQL  in  a  nutshell  

Logical  implicaHon  

DB/SQL  

Data  mining  

RQL  

April  15,  2014  Kyushu  University  

2  

RQL:  A  Query  Language  for  discovering    logical  implicaHons  in  DB  

Page 3: 2014 slides RQL JMP - perso.liris.cnrs.fr

About  logical  implications  

!  Logical  consequence  (or  entailment)  !  One  of  the  most  fundamental  concepts  in  logic    !  Premises,  Conclusion  (If  …  then  …):  Reasoning  using  proofs  

and/or  models  !  Examples  

!  ``  If  2=3  then  I  am  the  queen  of  England  ’’  !  right  or  wrong  ?  Why  ?  

!  ``  If  2=2  then  I  am  the  queen  of  England  ’’  !  right  or  wrong  ?  Why  ?  

!  Focus  on  a  specific  class  of  logical  implicaHons  !  Three  properHes  to  be  verified  

!  Reflexivity,  augmentaHon,  transiHvity  (Armstrong  axioms)  

April  15,  2014  Kyushu  University  

3  

Page 4: 2014 slides RQL JMP - perso.liris.cnrs.fr

About  databases  

!  RelaHonal  databases  systems  everywhere  !  !  RDBMS  market  expected  to  double  by  2016  

[MarketResarch.com,  Aug  2012]  

!  Query  opHmizaHon:  Awesome  !  

!  Simple  goals  :    !  Query  the  data  where  they  are  

!  Use/extend  DB  languages  for  pa5ern  mining  problems  (e.g.  DMQL,  MSQL,  …)  

April  15,  2014  Kyushu  University  

4  

Page 5: 2014 slides RQL JMP - perso.liris.cnrs.fr

About  Data  Mining  

!  Focus  on  pa5ern  mining  in  DB  !  pa5erns  =  logical  implica+on,  called  rules  hereaCer  !  DB=  Rela+onal  DBs  !  Quality  measure:  not  considered  

!  Pa5ern  mining  discovery  seen  as  query  processing  

!  Reusing  opHmizaHon  techniques  from  DB  

April  15,  2014  Kyushu  University  

5  

Page 6: 2014 slides RQL JMP - perso.liris.cnrs.fr

RQL:  contributions  

!  Original  ideas  !  Marie  Agier,  Jean-­‐Marc  Pe.t,  Einoshin  Suzuki:  Unifying  

Framework  for  Rule  Seman.cs:  Applica.on  to  Gene  Expression  Data.  Fundam.  Inform.  78(4):  543-­‐559  (2007)  

!  Since  then,  what  we  have  done  ?  !  SafeRL:  a  well-­‐founded  logical  query  language  

derived  from  Tuple  RelaHonal  Calculus  (TRC)  !  not  discussed  in  detail  here  

!  RQL:  Its  pracHcal  counterpart  derived  from  SQL  !  A  rewriHng  technique  to  use  as  much  as  possible  the  

underlying  DBMS  !  A  web  applicaHon  :  h5p://rql.insa-­‐lyon.fr  

April  15,  2014  Kyushu  University  

6  

Page 7: 2014 slides RQL JMP - perso.liris.cnrs.fr

!  Gefng  started  with  RQL  through  examples  

!  SafeRL  and  Query  rewriHng  

!  RQL  Web  ApplicaHon  

!  Conclusions  

April  15,  2014  Kyushu  University  

Outline  

7  

Page 8: 2014 slides RQL JMP - perso.liris.cnrs.fr

April  15,  2014  Kyushu  University  

Motivating  examples  

Rules  between  a5ributes  with  NULL  values  in  EMP ?!

FINDRULES!OVER workdept, mgrno!SCOPE t1 EMP!CONDITION ON $A IS t1.$A IS NULL!

•  Mgrno -> Workdept holds!•  Workdept -> Mgrno does not!

=>      counter-­‐example:  Empno  30  (or  50)  

8  

Page 9: 2014 slides RQL JMP - perso.liris.cnrs.fr

Functional  dependencies  

!  Remember  the  definiHon:  

!  X-­‐>Y  holds  in  r  

 iff    ∀  t1,  t2  ∈  r,      

 if  ∀A∈X, t1[A]=t2[A]  then  ∀B∈Y, t1[B]=t2[B]  

!  With  RQL:  FINDRULES !OVER Lastname, Workdept, Job, Sex, Bonus!SCOPE t1, t2 Emp!CONDITION ON $A IS t1.$A = t2.$A!

April  15,  2014  Kyushu  University  

9  

Page 10: 2014 slides RQL JMP - perso.liris.cnrs.fr

Variant  of  FDs:  Conditional  FDs  

!  FINDRULES !

OVER Lastname, Workdept, Job, Sex, Bonus!

SCOPE t1, t2 (select * from Emp where educlevel > 16)!

CONDITION ON $A IS t1.$A = t2.$A!

sex -> bonus holds!

i.e. « above a certain level of qualification, the gender determines the bonus »!

April  15,  2014  Kyushu  University  

10  

Page 11: 2014 slides RQL JMP - perso.liris.cnrs.fr

Approximative  FDs  

!  FINDRULES !

OVER Educlevel, Sal, Bonus, Comm!

SCOPE t1, t2 EMP!

CONDITION ON $A IS !

!2*abs(t1.$A-t2.$A)/(t1.$A+t2.$A)<0.1!

Sal -> Comm holds!

i.e. « employees earning similar salaries receive similar commissions »!

April  15,  2014  Kyushu  University  

11  

Page 12: 2014 slides RQL JMP - perso.liris.cnrs.fr

Sequential  FDs  

!  FINDRULES !

OVER Educlevel, Sal, Bonus, Comm!

SCOPE t1, t2 EMP!

CONDITION ON $A t1.$A >= t2.$A!

Sal -> Comm and Sal -> Comm hold!

i.e. « higher salary is equivalent to higher commission »!

April  15,  2014  Kyushu  University  

12  

Page 13: 2014 slides RQL JMP - perso.liris.cnrs.fr

Conditional  sequential  FDs  

!  FINDRULES !

OVER Educlevel, Sal, Bonus, Comm!

SCOPE t1, t2 (select * from EMP where Sex = ‘M’)!

CONDITION ON $A t1.$A >= t2.$A!

EducLevel -> Bonus holds!

i.e. « male employees with higher education levels receive higher bonus »!

April  15,  2014  Kyushu  University  

13  

Page 14: 2014 slides RQL JMP - perso.liris.cnrs.fr

Another  kind  of  «  FD  »  

!  FINDRULES !

OVER Educlevel, Sal, Bonus, Comm!

SCOPE t1, t2 EMP!

WHERE t1.empno = t2.mgrno!

CONDITION ON $A t1.$A >= t2.$A!

{} -> Bonus holds!

i.e. « managers always earn a bonus greater than or equal to their employees»!

April  15,  2014  Kyushu  University  

14  

Page 15: 2014 slides RQL JMP - perso.liris.cnrs.fr

Another  example  (1/2)  

!  Example  from  gene  expression  data  

!  Assume  tuples  are  ordered  

!  Idea:  adapHng  FD  for  catching  the  evoluHon  of  a5ributes  between  two  consecuHve  tuples  

X ⇒ Y is satisfied in r iff ∀ ti, ti+1 ∈ r, if ∀g∈X, ti+1[g] – ti[g] ≥ ε1 then ∀g∈Y, ti+1[g] – ti[g] ≥ ε1 ε1  =  1.0  

April  15,  2014  Kyushu  University  

15  

Page 16: 2014 slides RQL JMP - perso.liris.cnrs.fr

Rules  over  genes  

FINDRULES!

OVER g1,g2,g3,g4,g5,g6,g7,g8!

SCOPE t1, t2 GENES!

WHERE t2.time= t1.time+1!

CONDITION ON A IS t2.A - t1.A >= 1.0  

April  15,  2014  Kyushu  University  

X ⇒ Y is satisfied in r iff ∀ ti, ti+1 ∈ r, if ∀g∈X, ti+1[g] – ti[g] ≥ ε1 then ∀g∈Y, ti+1[g] – ti[g] ≥ ε1 ε1  =  1.0  

16  

Page 17: 2014 slides RQL JMP - perso.liris.cnrs.fr

g2 => g4  

-2  

-1  

0  

1  

2  

1   2   3   4   5   6  

Experiences  

Gen

e ex

pres

sion

leve

ls  

g2  g4  

Another  kind  of  rules  (2/2)  

r ⎥= g2 ⇒ g4

If expression level of g2 grows between ti and ti+1,

then expression level of g4 also grows

But r ⎥= g4 ⇒ g2

t2,t3 is a couter-example  April  15,  2014  Kyushu  University  

17  

Page 18: 2014 slides RQL JMP - perso.liris.cnrs.fr

Local  maximum  

!  FINDRULES !

OVER g1,g2,g3,g4,g5,g6,g7,g8!

SCOPE t1, t2, t3 Genes!

WHERE t2.time=t1.time+1 AND t3.time=t2.time+1 !

CONDITION ON $A IS t1.$A < t2.$A AND t3.$A< t2.$A!

=> three tuples variables needed to express a local maximum!

April  15,  2014  Kyushu  University  

18  

t1.A  

t2.A  

t3.A  

Page 19: 2014 slides RQL JMP - perso.liris.cnrs.fr

Synthesis  of  RQL  

!  RQL  :  «  look  &  feel  »  of  SQL  !  Very  simple  and  easy  to  use  by  SQL  analysts  

!  Powerful  query  language  !  Allow  interacHons  with  data  analysts  !  Powerful  tool,  need  some  pracHce  to  get  fluent  with  …  

!  Can  be  used    !  To  generate  the  rules  (if  schema  permits)  !  To  test  whether  or  not  a  given  rule  holds  

!  If  yes,  just  say  «  Yes  »  ☺  !  Otherwise,  find  a  counter-­‐examples  in  the  data  and  

refine  your  query  

April  15,  2014  Kyushu  University  

19  

Page 20: 2014 slides RQL JMP - perso.liris.cnrs.fr

!  MoHvaHng  examples  

!  SafeRL  and  Query  rewriHng  

!  RQL  Web  ApplicaHon  

!  Conclusions  

April  15,  2014  Kyushu  University  

Outline  

20  

Page 21: 2014 slides RQL JMP - perso.liris.cnrs.fr

SafeRL:  a  query  language  for  rules  

April  15,  2014  Kyushu  University  

!  SafeRL:  a  well-­‐founded  logical  query  language  

!  Syntax  +  semanHcs  not  detailed  here:  cf  papers  

!  Every  SafeRL  query  Q    defines  rules  “equivalent  to”  FD  or  implicaHons  

!  Result:  There  is  a  closure  system  C(Q)  associated  to  Q  

Q={X→Y  |∀t1...∀tn(ψ(t1,...,tn)∧(∀A∈X(δ(A,t1,...,tn))→  ∀A∈Y  (δ(A,  t1,  .  .  .  ,  tn))))}  

21  

Page 22: 2014 slides RQL JMP - perso.liris.cnrs.fr

Contribution:  Query  rewriting  

!  A  base  B  of  a  closure  system  C  is  such  that  !  Irreducible(C)  ⊆    B    ⊆  C  

!  Main  result  (cf  LML  2013):    Let  Q  be  a  SafeRL  query  over  a  DB  d  

 THM  :  There  exists  a  SQL  query  Q’  over  d  such  that  Q’  computes  a  base  B  of  C(Q),  the  closure  system  associated  to  Q  

April  15,  2014  Kyushu  University  

22  

Page 23: 2014 slides RQL JMP - perso.liris.cnrs.fr

Base  of  a  query:  the  data-­‐centric  step  

From  the  base  B  of  Q  in  d,  we  can  get:  !  The  closure  of  an  a5ribute  set  !  The  canonical  cover  of  saHsfied  rules  !  The  cover  of  Gotlob&Libkin  of  approximate  rules  

and  we  can  decide  whether  or  not  a  given  rule  is  saHsfied  !  If  not,  a  counter  example  from  d  can  be  provided  

!  Nothing  new  here,  cf  related  works  

April  15,  2014  Kyushu  University  

23  

Page 24: 2014 slides RQL JMP - perso.liris.cnrs.fr

!  MoHvaHng  examples  

!  SafeRL  and  Query  rewriHng  

!  RQL  Web  ApplicaHon  

!  Conclusions  

April  15,  2014  Kyushu  University  

Outline  

24  

Page 25: 2014 slides RQL JMP - perso.liris.cnrs.fr

Architecture  

April  15,  2014  Kyushu  University  

25  

Page 26: 2014 slides RQL JMP - perso.liris.cnrs.fr

RQL  Web  application  

!  Open  to  registered  users  (simple  applicaHon  form)  !  dedicated  Oracle  user,  200Ko  quota  

!  Web  Framework  For  Java  !  Play  Framework  (h5p://www.playframework.com/)  !  DBMS:  Oracle  v11  (+  MySQL)  !  +  specific  development  in  C++,  Java,  C  (Uno’s  code)  

!  Two  modes:  Sample  (predefined  schema)  and  SandBox  (user  schema)    

!  Try  it  out  !        h5p://rql.insa-­‐lyon.fr  

April  15,  2014  Kyushu  University  

26  

Page 27: 2014 slides RQL JMP - perso.liris.cnrs.fr

Snapshot:  sample  mode  

April  15,  2014  Kyushu  University  

27  

Page 28: 2014 slides RQL JMP - perso.liris.cnrs.fr

Counter  example  

April  15,  2014  Kyushu  University  

28  

Page 29: 2014 slides RQL JMP - perso.liris.cnrs.fr

!  MoHvaHng  examples  

!  RQL  Web  ApplicaHon  

!  SafeRL  and  Query  rewriHng  

!  Conclusions  

April  15,  2014  Kyushu  University  

Outline  

29  

Page 30: 2014 slides RQL JMP - perso.liris.cnrs.fr

Conclusion  

!  From  a  logical  query  language  for  rules  to  the  pracHcal  language  RQL  !  Easy  to  use  by  SQL-­‐aware  analysts  !  No  discreHzaHon  !  PromoHng  query  processing  techniques  in  pa5ern  mining  

!  RQL:  a  pracHcal  Web  applicaHon  !  For  teaching  !  For  research  

!  Future  works:  Data  explora+on  with  RQL  through  counter-­‐examples  

April  15,  2014  Kyushu  University  

30  

Page 31: 2014 slides RQL JMP - perso.liris.cnrs.fr

April  15,  2014  Kyushu  University  

Work  parHally  supported  by  the  French  ANR  project  DAG  (ANR  DEFIS  2009)  

Merci  !  Questions  ?  

31  

!  Query  Rewri/ng  for  Rule  Mining  in  Databases.      B.  Chardin,  E  Coquery,  B.  Gouriou,  M.  Pailloux,  J-­‐M  Pe.t.      In  Languages  for  Data  Mining  and  Machine  Learning  (LML)  Workshop@ECML/PKDD  2013,  Bruno  Crémilleux,  Luc  De  Raedt,  Paolo  Frasconi,  Tias  Guns  ed.  Prague.  pp.  1-­‐16.      2013  

!  On  Armstrong-­‐compliant  Logical  Query  Languages.      M.  Agier,  C.  Froidevaux,  J-­‐M  Pe.t,  Y.  Renaud,  J.  Wijsen.      Dans  4th  Interna.onal  Workshop  on  Logic  in  Databases  (LID  2011)  collocated  with  the  2011  EDBT/ICDT  conference,  Sweeden.  pp.  33-­‐40.  ACM  

!  Unifying  Framework  for  Rule  Seman/cs:  Applica/on  to  Gene  Expression  Data.  Marie  Agier,  Jean-­‐Marc  Pe.t,  Einoshin  Suzuki.  Fundam.  Inform.  78(4):  543-­‐559  (2007)