Upload
trinhquynh
View
216
Download
0
Embed Size (px)
Citation preview
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Pandas und Python auf die Oracle DB “loslassen”
Karin Patenge | @kpatenge | [email protected] Systemberaterin / Solution EngineerBusiness Unit Core & Cloud Technologies | Oracle Deutschland B.V. & Co. KG
DOAG Big Data Days 2018 | Dresden | 20.-21. September 2018
Geht das gut?
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Motivation
4
Quelle: https://de.wikipedia.org/wiki/Pythons
Quelle: https://de.wikipedia.org/wiki/Pythons
blogs.oracle.com/oraclespatial/spatial-with-python-and-geopandas-made-easy-with-cx_oracle
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 5
Das Ende vorweggenommen ...
arthur-e.github.io/Wicket/
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Agenda
Die Oracle Datenbank aus Sicht der Anwendungsentwicklung
Python und Pandas
Python und die Oracle Datenbank
Zusammenfassung
1
2
3
4
6@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Die Oracle Datenbank aus Sicht der Anwendungsentwicklung
Python und Pandas
Python und die Oracle Datenbank
Zusammenfassung
1
2
3
4
7@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 8
Womit entwickeln wir? Populäre Programmiersprachen
Quelle: https://insights.stackoverflow.com/survey/2018/#technology Quelle: https://redmonk.com/sogrady/2018/03/07/language-rankings-1-18/
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Eine Datenbank für alle wichtigen Plattformen und Sprachen
Die Oracle Datenbank für EntwicklerInnen
JSONRuby
Oracle ADF Oracle APEXOracle RDS
9@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
PROGRAMMIER-SPRACHE
TREIBER
C OCI
C++ OCCI
Java JDBC
.NET ODP.NET
Node.js node-oracledb
Python cx_Oracle
PHP OCI8, PDO_OCI
R ROracle
Perl DBD::Oracle
Ruby ruby-oci8
10
… und ODBC, OLE DB, Pro*C, Pro*COBOL, Pro*Fortran, SQLJ
Third-party Drivers
Open Source Drivers
(Oracle contributions)
Oracle provided
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Node.js: node-oracledb 2.3• Pre-built binaries für Node 6, 8, 10. Für Windows (x86), macOS , Linux (x86-64) – alle 64-bit. • Apache 2.0 Lizenz.• http://github.com/oracle/node-oracledb http://oracle.github.io/node-oracledb/
Python: cx_Oracle 7.0 (Sept. 2018)• Python 2.7 sowie 3.5+. Oracle 11.2, 12, 18 Client Bibliotheken. BSD Lizenz.• http://cx-oracle.sourceforge.net http://oracle.github.io/python-cx_Oracle/
PHP: OCI8 2.1.8• Oracle 10.2, 11 und 12 Client Bibliotheken. Über 2 Mio Downloads.• http://pecl.php.net/package/oci8
R: ROracle 1.3-1• Oracle Database Interface Treiber für R. DBI-compliant. LPGL-2 [2.1|3] Lizenz.• http://cran.r-project.org/web/packages/ROracle
Open Source Treiber von OracleOpen Source Drivers
(Oracle contributions)
11@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Die Oracle Datenbank aus Sicht der Anwendungsentwicklung
Python und Pandas
Python und die Oracle Datenbank
Zusammenfassung
1
2
3
4
12@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Die Programmiersprache Python• Universelle, höhere Programmiersprache
• Unterstützung unterschiedlicher Programmierparadigmen– objektorientiert, aspektorientiert, funktional
• Dynamische Typisierung– Deklarationszwang für Variablen entfällt
• Häufigste Nutzung als Skriptsprache
• Programmstruktur wird durch Einrückungen abgebildet
• Versionslinien: Python 2.x / Python 3.x– 3.x mit substantiellen Verbesserungen
• Basis: Read-Evaluate-Print Loop (REPL)
• Empfohlene Dateiendung für Skripte: .py
• Unterstützung für Object Type: SDO_GEOMETRY und LOBs (WKT, WKB)
• Warum Python?– Kompakter Kern
– Leicht zu erlernen
– Leicht lesbar - klare, übersichtliche Syntax -Wenige Schlüsselworte
– Leicht anzuwenden
– Eingebaute umfangreiche Standard-bibliothek und zahlreiche Pakete
• Sonstiges– Erscheinungsjahr 1991
– Python Software Foundation Lizenz
– www.python.org
– Prompt: „>>>“
– >>> help() | >>> exit() [quit()]
13@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (1)
• Python ist vorinstalliert in den meisten Linux Distributionen
14
[oracle@localhost ~]$ pythonPython 2.7.5 (default, Jul 3 2018, 06:28:28) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28.0.1)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> quit()
[oracle@localhost ~]$ python -v# installing zipimport hookimport zipimport # builtin# installed zipimport hook# /usr/lib64/python2.7/site.pyc matches /usr/lib64/python2.7/site.py
…
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (2)Wenn notwendig, installieren
• Zusätzliche Software-Komponenten:– PyPI ( Pakete finden) , pip (Pakete laden) , Python Tools
– Oracle Database Client Software
15
[oracle@localhost Downloads]$ sudo -i[root@localhost ~]# python get-pip.py[root@localhost Downloads]# python get-pip.pyCollecting pipDownloading https://files.pythonhosted.org/packages/5f/25/e52d3f31441505a5f3af41213346e5b6c221c9e086a166f3703d2ddaf940/pip-18.0-py2.py3-none-any.whl (1.3MB)
100% |████████████████████████████████| 1.3MB 647kB/s Collecting wheelDownloading
https://files.pythonhosted.org/packages/81/30/e935244ca6165187ae8be876b6316ae201b71485538ffac1d718843025a9/wheel-0.31.1-py2.py3-none-any.whl (41kB)
100% |████████████████████████████████| 51kB 2.9MB/s Installing collected packages: pip, wheelSuccessfully installed pip-18.0 wheel-0.31.1
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Pandas ist eine der populärsten Programmbibliotheken (Module) für Python– Setzt das Modul Numpy voraus
• Fokus: Datenmanipulation und Analyse
• Datenstrukturen und Funktionen für die Manipulation von Zeitreihen und Tabellen:– Series– DataFrame
16
Pandas und GeoPandas
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• GeoPandas ist eine Programm-bibliothek (Modul) für Python zum Manipulieren von Vektordaten– Multi-/Punkte, Multi-/Linien,
Multi/Polygone
• Nutzt das Modul shapely fürgeometrische Operationen
• Abhängigkeiten auch zu anderen Bibiotheken (u.a. matplotlib)
• Datenstrukturen analog:– GeoSeries– GeoDataFrame
Pandas und GeoPandas
Datenstruktur Attribute Methoden
GeoSeries AreaBoundsTotal_boundsGeom_typeIs_valid
Distance(other)CentroidRepresentative_point()To_crs()Plot()
geom_almost_equals(other)contains(other)intersects(other)
GeoDataFrame Analog mit ‚geometry‘ Spalte vom Datentyp GeoSeriesZusätzlich: Methode zum Lesen und Schreiben von Dateien und Geokodieren
17@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Demo> # Start Python
> python
> # oder Jupyter Notebook
> jupyter notebook --notebook-dir=~/Python &
18@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Die Oracle Datenbank aus Sicht der Anwendungsentwicklung
Python und Pandas
Python und die Oracle Datenbank
Zusammenfassung
1
2
3
4
19@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python und die Oracle Datenbank
20
Architektur der cx_Oracle Schnittstelle
Client Bibliotheken verfügbar über:• Lokaler Oracle Datenbank• Oracle Client
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
cx_Oracle
• Stand Juli 2018: Version 6.4• Open Source (Oracle Contribution)• Unterstützt Python 2.7 und 3.4+• Für Oracle Client 11.2+ und
jede davon unterstützte Oracle DB Version• Unterstützung für Vektordaten über Object Types (SDO_GEOMETRY) und
LOBs (WKT, WKB)– Binding von SDO_GEOMETRY an Python Objekte
• Laden von Vektordaten in die Oracle Datenbank
– Anfragen an SDO_GEOMETRY über Python• Nutzung des Python Package GeoPandas (http://geopandas.org/)• Beispielcode: github.com/oracle/python-cx_Oracle/blob/master/samples/SpatialToGeoPandas.py
21@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (3)Das Python Paket cx_Oracle
22
• Installation und Überprüfung:
• Dokumentation für Python:
[oracle@localhost ~]$ sudo pip install cx_Oracle[sudo] password for oracle: Collecting cx_Oracle
Downloading https://files.pythonhosted.org/packages/3b/09/6b10675a6db7c7da1b8d23225f0a95b2a45248c56a1e8f711d59809278d3/cx_Oracle-6.4.1-cp27-cp27mu-manylinux1_x86_64.whl (590kB)
100% |████████████████████████████████| 593kB 2.4MB/s Installing collected packages: cx-OracleSuccessfully installed cx-Oracle-6.4.1[oracle@localhost ~]$ pythonPython 2.7.5 (default, Apr 11 2018, 17:41:36) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28.0.1)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> help('modules')
[oracle@localhost ~]$ python -m pydoc cx_Oracle
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (4)Mit der Oracle DB verbinden
>>> import cx_Oracle>>> con = cx_Oracle.connect('hr/hr@localhost/orcl')>>> print(con.version) 12.2.0.1.0>>> cur=con.cursor()>>> >>> cur.execute('select * from employees order by last_name')<cx_Oracle.Cursor on <cx_Oracle.Connection to hr@localhost/orcl>>>>> for row in cur:... print(row)... (174, 'Ellen', 'Abel', 'EABEL', '011.44.1644.429267', datetime.datetime(1996, 5, 11, 0, 0), 'SA_REP', 11000.0, 0.3, 149, 80)
(166, 'Sundar', 'Ande', 'SANDE', '011.44.1346.629268', datetime.datetime(2000, 3, 24, 0, 0), 'SA_REP', 6400.0, 0.1, 147, 80)
23@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (5.1)Arbeit mit cx_Oracle und GeoPandas
24
• Installation GeoPandas Paket und davon abhängige Pakete[oracle@localhost ~]$ sudo yum install python-devel...[oracle@localhost ~]$ sudo pip install geopandas[sudo] password for oracle: Collecting geopandasDownloading https://files.pythonhosted.org/packages/24/11/d77c157c16909bd77557d00798b05a5b6615ed60acb5900fbe6a65d35e93/geopandas-0.4.0-py2.py3-none-any.whl (899kB)
100% |████████████████████████████████| 901kB 2.6MB/s ...Installing collected packages: pyproj, munch, click, click-plugins, cligj, enum34, fiona, geopandas
Successfully installed click-6.7 click-plugins-1.0.3 cligj-0.4.0 enum34-1.1.6 fiona-1.7.13 geopandas-0.4.0 munch-2.3.2 pyproj-1.9.5.1
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (5.2)Arbeit mit cx_Oracle und GeoPandas
25
• Vorgefertigtes Modul for Verbindung mit Oracle DBhttps://github.com/oracle/python-cx_Oracle/blob/master/samples/SampleEnv.py
– Benötigt 2 Datenbank NutzerPYTHONDEMO und PYTHONEDITIONS
– Muß im gleichen Verzeichnis wie das Hauptprogramm liegen
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Demo> # Start Python
> python
> # oder Jupyter Notebook
> jupyter notebook --notebook-dir=~/Python &
26@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (5.3)Arbeit mit cx_Oracle und GeoPandas
27
>>> from __future__ import print_function>>> >>> import SampleEnv>>> import cx_Oracle>>> from shapely.wkb import loads>>> import geopandas as gpd...>>> import matplotlib as mpl>>> # Create Oracle connection and cursor objects... connection = cx_Oracle.Connection(SampleEnv.MAIN_CONNECT_STRING)>>> cursor = connection.cursor()>>> connection.autocommit = True>>> def OutputTypeHandler(cursor, name, defaultType, size, precision, scale):... if defaultType == cx_Oracle.BLOB:... return cursor.var(cx_Oracle.LONG_BINARY, arraysize = cursor.arraysize)... >>> connection.outputtypehandler = OutputTypeHandler
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (5.4)Arbeit mit cx_Oracle und GeoPandas
28
>>> # Drop and create table... print("Dropping and creating table...")Dropping and creating table...>>> cursor.execute("""... begin... execute immediate 'drop table de_federal_states';... exception when others then... if sqlcode <> -942 then... raise;... end if;... end;""")>>> cursor.execute("""... create table de_federal_states (... name VARCHAR2(30) not null,... geometry SDO_GEOMETRY not null... )""")
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (5.5)Arbeit mit cx_Oracle und GeoPandas
29
>>> # acquire types used for creating SDO_GEOMETRY objects... typeObj = connection.gettype("MDSYS.SDO_GEOMETRY")>>> elementInfoTypeObj = connection.gettype("MDSYS.SDO_ELEM_INFO_ARRAY")>>> ordinateTypeObj = connection.gettype("MDSYS.SDO_ORDINATE_ARRAY")>>> >>> # define function for creating an SDO_GEOMETRY object... def CreateGeometryObj(*ordinates):... geometry = typeObj.newobject()... geometry.SDO_GTYPE = 2003... geometry.SDO_SRID = 8307... geometry.SDO_ELEM_INFO = elementInfoTypeObj.newobject()... geometry.SDO_ELEM_INFO.extend([1, 1003, 1])... geometry.SDO_ORDINATES = ordinateTypeObj.newobject()... geometry.SDO_ORDINATES.extend(ordinates)... return geometry
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (5.6)Arbeit mit cx_Oracle und GeoPandas
30
>>> geometryBrandenburg = CreateGeometryObj(13.8479741700806,53.5154842208988,...>>> geometrySachsen = CreateGeometryObj(12.8462989319859,51.6830540864208,...>>> geometryThueringen = CreateGeometryObj(10.80620196783,51.6409960580726,...>>> >>> data = [... ('Brandenburg', geometryBrandenburg),... ('Sachsen', geometrySachsen),... ('Thueringen', geometryThueringen)... ]>>> cur.executemany('insert into de_federal_states values (:state, :obj)', data)>>> cur.execute("""... select count(*) ... from de_federal_states""")<cx_Oracle.Cursor on <cx_Oracle.Connection to pythondemo@localhost/orcl>>>>> >>> for row in cur:... print(row)... (3,)
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python in der Praxis (5.7)Arbeit mit cx_Oracle und GeoPandas
31
>>> ...
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
1
2
3
4
Die Oracle Datenbank aus Sicht der Anwendungsentwicklung
Python und Pandas
Python und die Oracle Datenbank
Zusammenfassung
Weiterführende Informationen5
32@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 33
Wir können auch #nextGen !
(Ganz) Viel ist möglich. Ihr müßt Euch nur trauen ☺
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Bonus für die ganz Experimentierfreudigen:
Oracle Database Multilingual Engineoracle.com/technetwork/database/multilingual-engine
MLE is an experimental feature for the Oracle Database 12c. MLE enables developers to work efficiently with DB-resident data in modern programming languages and development environments of their choice.
34
https://www.youtube.com/watch?v=AY_2M3tgaZs
@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Folgen Sie @kpatenge @SpatialHannes @[email protected]
35@kpatenge #BIGDATADAYS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 36@kpatenge #BIGDATADAYS