Upload
artiom-tsyganok
View
822
Download
1
Tags:
Embed Size (px)
Citation preview
Moscow
PRESENTATIONAddendum to the Grant Application from
Innovation project: Cloud platform for development and procurement of semantic services (Semantic PaaS, SPaaS), making possible to extract and process text information using natural language.
Company name: Avicomp Services, LLC
2
1. Innovation project’s resume ( called further, Project )
Current market issues that Project is suppose to address
The challenge therefore remains on how to create meaning to the content and how to link relevant content together
the amount of Web pages totals to more than 50 Million (Google)
Avalanche-like growth of the documents: in 2002 large enterprises used to process up to 18 000 documents per year, in 2003 that amount doubled, in 2004 large enterprises used to handle about 46 000 documents on average, in 2008 amount of corporate documents grew to 80 000, and in 2011 — exceeded 400 000 documents (Forrester Research )
As of today total number of internet users exceeded 2 bln., and there is an estimate that total amount of data is over 1 800 exabyte (1 exabute = 1018).
How does the Project solve the problem
Avicomp Services has been involved in semantic field for over 10 years. One of the key achievements of the company in this area has been development of powerful linguistic vehicle that is based on in-depth research in semantic area and allows automatically produce “semantic-aware & ready” content in the Internet and build new semantic services that in turn make non-structured information usage esay and flexible in the following ways:
Formation of set of services, when users can enrich meta-information (semantic data) of their documents, published thru the Web, or at the corporate archives. Extra meta-information? Attached to the document, allows improve search accuracy and quality, information categorization and combination.
Formation of set of services, when users can use extra meta-information to integrate with existing information while performing BI/OLAP analysis.
Formation of set of services when users can publish in semantic archive own sets of semantic data and get them linked with existing (e.g. Web) sets of Open Linked Data (LOD).
Formation of set of services to identify and link semantic data sets using different languages.
Formation of set of services when users can create their own applications using established archive of semantic data.
Mentioned above and other services will become available from the single software platform - Semantic PaaS (SPaaS), that is based on the technology with strong fundament of semantic and morphologic rules.
Today’s users are not capable to start analyzing non-structured information in the Internet , not to mention to take weighted decisions based on such analysis. User gets
swamped at the stage of information gathering
3
2. The current market situation in search
The problem of any search systems
The Google and other search system based on keywords and matching concepts produce no results at all
3. Target market
Landscape of Semantic Applications Market estimate (volume)
250
1. Today market is more then $100 bln 2. Impact of semantic technology - 20-80% less labour hours - 20-75% less operating cost - 30-60% less inventory level - 20-85% less development costSource: TopQuadrant
4. Competition (Extract)
Analogues Stage(market /
development)
Price, $ Parameter 1(NLP)
Parameter 2(RDF Store)
Parameter 3(Apps/Service)
OntoText Production License model. Price from 50 K to 250 K€
Based on GATE OWL Store Search, Sort
OpenCalais Production Free and subscription (price not known)
Pure NLP Service. No store Limited set ofmash-up
GATE Research and API Service
Small subscription fee
NLP as open source or via API.
No store No services
Ontoprise Production License model and consulting service price starts at 100 K€
Only TextMining without Information Extraction
No RDF store. Only RDBMS for Indexes
Various specific Apps for ontology engineering and modelling
Comparative analysis
Analogues Functional Area Stage
PowerSet NLP Engine Bought by Microsoft
FAST Text Mining Bought by Microsoft
Freebase RDF Knowledge Base in the LOD Bought by Google
6
5. Market segments where product is focused on
Potential Project product users (Russian market only as it will serve as a test-bed to fine-tune the business model)
Russian Accounting Chamber
Russian Ministry of Education
RIA News
Moscow City Government
Rusnano
President’s Administration
At the moment all these prospects have been engaged with the conversation about their needs in information handling and processing
Business model
1. B2B
• Goverment – Use SPaaS to build the Linked Open Data within Governments (licenses & deployment consulting)
• Large Enterprises – development of the instrument to extract knowledge (licenses & deployment consulting)
• Small business – instrument to produce semantic content (SaaS)
2. B2C
• To satisfy information search needs of individual users (including mobile applications)
7
6. Technology of the Project –Semantic PaaS architecture
High level view of the SPaaS architecture integrating the ecosystem of
complementors and their customers
Application Services compromises modules to manage the RDF life cycle, various interfaces to search, retrieve and store data as well as core functions related to analytical functions (OLAP for RDF) and prediction modelling based on algorithmic game theory. Part of this stack will be also a set of core modules that will support demands from external applications.
Harvesting and Crawling with a heuristic approach that is able to integrate various sources (not only RSS Feeds) and a planarization method which automatically extracts the plain text from a Web page.
NLP Service that is based on a multi-agent and multilingual architecture allowing to scale. Further the service will incorporate an ontology rule based approach for information extraction (IE) enriched with statistical methods and a method that can use existing background knowledge for example in the Linked Open Data (LOD) cloud or inside Web pages (E.g. RDFa, schema.org or HTML5 metadata).
Knowledge Generation Process mainly for the handling of unique object identification and merging, ontology alignment, data authoring and interlinking.
Scalable RDF store for storing the extracted knowledge as semantic graphs using the latest technology and methods for handling RDF triples. The store will also include a plain SPARQL interface as well a layer for an intelligent and easy to use access (Data Access API). With the expected growth of digital data the RDF store architecture will also include other database storing mechanisms in order to solve the problem of “Big Data”.
04/12/2023 8
7. Use Case – Linked Government Data (LGD)
Our SPaaS Offer for 5 star:• Pipeline/WF to
create RDF (LOD)• Government vocabulary
(Ontology)• Scalable RDF (LOD) store• UID or controlled named
entity name server
Later adapt LGD toLinked Enterprise Data
Enable Application and Eco-System for e-Citizen
04/12/2023 9
8. Use Case – Online News
Our SPaaS for Online News:• Pipeline/WF for tagging
and NE extraction• RDFa/Microformat
injection to web pages• Scalable RDF store• Knowledge Engineering
CMS
CMS
External user/app
RE
ST
ful A
PI
(Se
ma
ntic
Pla
tfo
rm)
Topic & EntityExtraction (SPaaS)
Triple store for entities (SPaaS)
Learning corpora(topics)
OntoDix (SPaaS)
TopicsManagerTagging System API
Delivery Server(nodeJS, Fugue, SocketIO, RabbitMQ) + Routes DB HDB
(Mongo)HDB
Desktop(Sencha)
nginx+
Apache
sync
m
etad
ata
Architecture(simplified)
Existing patents
Patent for an invention № 2242048 «Method of automated processing of text –based information materials». Owner «Ontos AG (Switzerland)».
Patent for an invention №2399959 «Method of automated processing of text using natural language by semantic indexing, method of processing of text collection using natural language by semantic indexing and machine-readable media». Owner «Ontos AG (Sw)».
Computer software certificate of registration №2006610704 «OntosMiner. Russian version». Owner «Avicomp Services»
Computer software certificate of registration №2008613021 «Ontos RDF Store Server. Russian version». Owner «Avicomp Services»
Computer software certificate of registration №2009611560 «Ontos SOA Server. Russian version». Owner «Avicomp Services»
Computer software certificate of registration №2009611559 «Ontos AS Processing Server. Russian version». Owner «Avicomp Services»
Computer software certificate of registration №2009611558 «Ontos AS Delivery Server. Russian version». Owner «Avicomp Services»
Computer software certificate of registration №2009611557 «Ontology Dictionary. Russian version». Owner «Avicomp Services»
12.04.2023 10
9. Intellectual property
12.04.2023 11
10. Project’s Team (1)
Victor Klintsov Shareholder & General Director More than 20+ years of experience in IT industry Chief ideologist and chief architect Graduated in 1977г., from Moscow Chemical
Engineering Institute
Author of numerous papers
Director of Russian W3C office
Took part in the following projects: Public LOD resource in the field of science and technology, integrated into the international LOD space of knowledge, Analytical search and processing system of letters sent by citizens to the President of Russian Federation using semantic and linguistic methods of information extraction and etc.
Brief summary of key team members
Daniel Hladky COO – Chief Operation Officer
More than 20+ in the IT including SAP, iXOS (OpenText)
Responsible for regional development, marketing and sales and operations.
Holds a MBA from Strathclyde University.
Author of numerous papers , invited expert e.g. EU FP7, ISWC, Triplify-Challenge
Speaker at conferences such as SemTech, ESTC, I-Semantics
Dr Sören Auer CRO – Chief Research Officer
Researcher and Professor since 2003. Coordinatorof various EU FPx projects.
Responsible for research and innovation.
Studied Mathematics and Computer Science at University Dresden, Hagen and Yekaterinburg (Russia). PhD at University Leipzig.
Leader of the research group AKSW at University Leipzig.
Author of numerous papers , invited expert e.g. EU co-organiser of several workshops, programme chair of I-Semantics 2008, OKCON 2010, ESWC 2010 and ICWE 2011, WWW2012, area editor of the Semantic Web Journal, serves as an expert for industry, the European Commission, the W3C and is member of the advisory board of the Open Knowledge Foundation.
12.04.2023 12
11. Project’s Team (2)
Brief summary of key team members
Grigory Drobyazko CTO – Chief Technology OfficerMore than 20+ in the IT including RDBMS and custom developmentResponsible for R&D including architecture design, UI design and software support. Co-author of scientific papers on solutions for semantic web and technologies of data extraction and information resources text analysis for analytical
processingTook part in the following projects: Public LOD resource in the field of science and technology, integrated into the international LOD space of knowledge, Analytical search and processing system of letters sent by citizens to the President of Russian Federation using semantic and linguistic methods of information extraction and etc.
• Analysts - 10 persons
• Linguists Developers - 9 persons
• Programmers-Developers - 15 persons
• Programmers-Developers of Linguistic Software - 7 persons
12.04.2023 13
12. The current status
The key steps Non-stop platform development for more then 10 years Built initial platform «Alfa» of Semantic PaaS Current platform is based on the experience made with several
customer projects and with research projects (see the table below) Done of proof of concept of taggig, aggreg., news visualisation Experience from law enforcement, media and portals
Past and current financing Shareholders supported development Execution of research and development activities
Sales proceeds (R&D work)
2010 (fact) 2011 (fact) 2012-2013 (plan)
Total 46,1 mln RUB 46,6 mln. RUB. 60+ mln. RUB.
Minister of Education 21,7 mln. RUB 20+ mln. RUB.
RIA Novosti 46,1 mln. RUB 8,9 mln.RUB 40+ mln. RUB
Others 16,0
• Develop NLP module for media• Develop a portal• Research and create linguistic rule
• Develop a concept for IKB• Develop a concept for RDF storage
12.04.2023 14
13. Project’s co-investor
Fund raising plan
Current phase fund raising
Co-investor 1
Ministry of Education of Russian Federation – up to 90 mln RUB
Co-investment – signed contract to perform R&D
Co-investor 2
VEB Innovation Fund – up to 90 mln. RUB
Co-investment – equity \ debt type of financing
Exit for VEB Innovation Fund – sale to the strategic investor or MBO at agreed rate
Follow on fund raising
Stage name Expected Grant financing
Expected investment from co-investor
Timing
Core platform development
90 mln RUB 90 mln. RUB. 2012-2013
Development of semantic services
20 mln. RUB 60 mln. RUB 2013-2014
Start selling platform and services
20 mln.RUB 2014-2015
12.04.2023 15
14. Project development plan
2012 2013 2014 2015
Enhance the NLP system (WP1)
Large Scale Data Management (WP2) and the deployment of the solution to the cloud
Access to the system via SQL Lite and SPARQL
LyfeCycle Management of Data and Knowledge (WP4 and 5)
Enrichment (WP3)
Have use cases ready for eGov, Oil & Gas (WP6)
Performance optimization and scalability.
Work on Big Data analytics and Predictive Analysis (WP7)
Develop eCitizen Service Applications as showcases.
Cloud Platform optimization
NLP for Asian languages
180 mln RUB.
80 mln RUB.
20 mln RUB.