Category Archives: BigData/NoSQL

WinCC Unified GraphQL Server for Data Integration

With the GraphQL Server in WinCC Unified V18 we can now integrate various (IT) data platforms with simple programs. Those programs can be written in Python, Java, Kotlin, Go, JavaScript, or whatever kind of programming language you prefer.

In my case I have used Kotlin to implement a simple Apache Kafka Consumer, which maps and writes values from my Home-Automation to the WinCC Unified SCADA system.

I can now use WinCC Unified for visualisation, even if I don’t have any PLC at home. WinCC Unified can be used as a Low- or No-Code platform to create fancy Web-Based visualisations with real-time values from any kind of data source.

WinCC Unified also has great alarming features. Alarm handling can be done in WinCC Unified and alerts could also be published back to the streaming platform with a producer.

I collect my Home-Automation values from a Raspberry Pi, which reads values from power meters, temperature sensors, or data via Bluetooth from my PV Converter. The values are published to a MQTT Broker. And from that MQTT Broker I bring my values to Apache Kafka and then to the WinCC Unified system.

With the GraphQL Server of WinCC Unified it would also be easily possible to implement an Apache Kafka Producer, so that values from PLC’s can easily be published to Apache Kafka, or any other data streaming platform.

From Apache Kafka I write my values to WinCC Unified and additionally to a CrateDB. CrateDB is a great NoSQL database with the power of SQL and it is highly scalable. It can be used for data analytics, machine learning, Grafana Dashboard, and more…

For sure you can also grab the data directly from the MQTT broker and bring it directly into WinCC Unified via the GraphQL Server without a streaming platform, but a streaming platform has additional benefits, which are not covered in this post…

Size of tables in PostgreSQL vs Apache Cassandra…

PostgreSQL table with ts+key as primary key:  ~43GB

PostgreSQL wide column table with ts as primary key : 247GB
Cassandra wide column table with ts as primary  key: 4.5GB

Strange that in PostgreSQL a table with much less rows (but much more columns) needs a lot of more space (both tables store the same amount of data). )

It seems that the Apache Cassandra Column Store can compress the columns pretty good – factor 10 less disk space!

The source table in PostgreSQL (TimescaleDB) with a timestamp and a key column and 8 data columns had about 170 Mio rows.

    instrument character varying(10) NOT NULL,
    ts timestamp(3) without time zone NOT NULL,
    o numeric,
    h numeric,
    l numeric,
    c numeric,
    primary key (instrument, ts)

I needed to flatten the table so that i have just the timestamp as primary key and many columns and each column is of a type. It ends up in a table with about 1.6 Mio rows and many columns.

    o       float,
    c       float,
    h       float,
    l       float,
    volume  float

CREATE TABLE candles_wide
   ts timestamp(3) without time zone NOT NULL,
   AU200_AUD price,
   AUD_CAD price,
   AUD_CHF price,
   AUD_HKD price,
   AUD_JPY price,
   AUD_NZD price,
   ... 124 columns

Apache Cassandra wide column store table with ts as primary key and many columns.

CREATE TABLE candles (ts timestamp,
   AU200_AUD tuple<float,float,float,float,float>,    
   AUD_CAD tuple<float,float,float,float,float>,  
   AUD_CHF tuple<float,float,float,float,float>,  
   ... 124 tuples



Streaming SQL for Apache Kafka & WinCC OA with Docker…

KSQL makes it easy to read, write, and process streaming data in real-time, at scale, using SQL-like semantics. It offers an easy way to express stream processing transformations as an alternative to writing an application in a programming language such as Java or Python.

With WinCC OA Java ( we can stream data from WinCC OA to Apache Kafka, use KSQL to produce some insights and send it back to WinCC OA by using a WinCC OA Driver written in Java connected to Kafka.

Attached you will find a docker-compose.yml to setup KSQL + WinCC OA Connector and Driver to test it. Just use “docker-compose up -d” to start up everything. Before you should set the “data” and “event” environment variables in the docker-compose.yml to point to a running WinCC OA project.

root@docker1:~/docker/builds/winccoa# docker-compose up -d

Creating winccoa_frontend_1 ==> collect data from OA and publish it by ZeroMQ

Creating winccoa_backend-kafka_1 ==> get the data from the Frontend and write it to Kafka

Creating winccoa_driver-kafka_1 ==> OA driver to read data from kafka.

Creating winccoa_zookeeper_1
Creating winccoa_kafka_1
Creating winccoa_schema-registry_1
Creating winccoa_ksql-cli_1

We use Docker to startup WinCCOA Mangers (frontend, backend) and Drivers.

Afterwards you can start KSQL: docker-compose exec ksql-cli ksql-cli local –bootstrap-server kafka:29092

Create a stream of the topic which is sent from WinCC OA to kafka (currently every change of value in WinCC OA is sent to Kafka):

CREATE STREAM Scada_FloatVar (TimeMS BIGINT, Status BIGINT, Value DOUBLE) WITH (kafka_topic=’Scada_FloatVar’, value_format=’JSON’);

Create a result table in KSQL which will be read by the WinCC OA Driver, here we detect if a datapoint changes more often than 5 times in 10 seconds. Just a simple example to show how KSQL can be used:

CREATE TABLE result WITH (PARTITIONS=1) AS SELECT rowkey AS “Name”, count(*) AS “Value” FROM Scada_FloatVar WINDOW TUMBLING (size 10 second) GROUP BY rowkey HAVING count(*) > 5;

In WinCC OA you should put a peripheral address on a datapoint with the example driver (num 4) to get the result back (you will need the panels and scripts from here to use the driver).


Monitoring with Logstash and WinCC OA…



For example we do this with the Oracle Alert Log. Very often an Oracle Database is used with WinCC OA to store history values. But a lot of times no one takes care of the Oracle database. At least the Alert-Log file should be observed. With Logstash, Apache Kafka and the WinCC OA Apache Kafka Driver we can send alert log messages from the Oracle database(s) to a WinCC OA monitoring system.


Observing WinCC OA Logs with Elasticsearch and Logstash…

With Logstash we can collect the logs of WinCC OA systems and write it to Elasticsearch. Multiple WinCC OA system’s can be observed with a central log database…

With Kibana the logs can be easily discovered – I now see errors what i haven’t seen before in my system…

In parallel the log messages are written to Apache Kafka. With Apache Spark we can now observe the log streams and detect anomalies… a very simple observation could be to just simple count the amount of log messages per timeframe …


WinCC OA logstash configuration file: winccoa-logstash-conf

WinCC OA RDB-Manager with Oracle vs MongoDB

Keep in Mind: It is not a comparison of the databases only. With Oracle we used the WinCC OA RDB Manager with OA Query-RDB Direct option, and the RDB-Manager has lot of more functionalities than the NoSQL Prototyp! The other databases were tested with a NoSQL Logger Prototyp written in Java, and the implementations for writing and reading are different, because there are different interfaces for each database – for PostgreSQL we used the PostgreSQL JDBC driver, MongoDB has it’s own Java API and InfluxDB uses REST/HTTP. So, not only the speed of the database itself is compared – also the interfaces to WinCC OA and the implementations of reading are taken into account.

Oracle and OA RDB-Manager Results:
2016.07.29 09:09:03.302[“start…”]
2016.07.29 09:09:39.628[36.326][33669]
2016.07.29 09:11:22.051[“start…”]
2016.07.29 09:11:36.213[14.159][33669]

MongoDB Results:
2016.07.29 09:10:37.449[“start…”]
2016.07.29 09:10:53.171[15.72][33669]
2016.07.29 09:11:42.932[“start…”]
2016.07.29 09:11:52.918[9.986][33669]

InfluxDB Results:
WCCOAui1:2016.07.29 09:47:33.441[“start…”]
WCCOAui1:2016.07.29 09:47:42.477[9.035][33668]
WCCOAui1:2016.07.29 09:48:12.733[“start…”]
WCCOAui1:2016.07.29 09:48:18.745[6.011][33668]

it is faster than MongoDB. And our InfluxDB is running on a MacMini (Hyper-V) and the data is stored on a shared Synology NAS for home usage (DS414 slim) – much less power for InfluxDB compared to the four 7.2k disks and to the i7 where the Oracle DB and MonogDB is running on.

PostgreSQL Results:
WCCOAui1:2016.07.29 09:56:55.062[“start…”]
WCCOAui1:2016.07.29 09:57:03.475[8.41][33669]
WCCOAui1:2016.07.29 09:57:14.767[“start…”]
WCCOAui1:2016.07.29 09:57:20.196[5.427][33669]

PostgreSQL is running on the same machine and disks as Oracle and MongoDB.

Streaming & Complex Event Processing (CEP) & EPL/CQL with WinCC OA…

Complex Event Processing (CEP) and event series analysis are used for detecting situations among events. EsperTech provides the Event Processing Language (EPL) designed for concisely expressing situations and fast execution against both historical and currently-arriving events (

The Esper EPL is quite powerful – details can be found in the Esper documentation. Found also some slides about Esper. The EPL is a CQL (continuous query language), after a statement is created it is running coninously and results are streamed to listeners – in this prototype a listener is sending the results back to WinCC OA datapoints.

A WinCCOA API-Frontend-Manager gathers all value changes from WinCC OA and publishes it by ZeroMQ. The WinCC OA CEP Manager, with the open source Esper-Engine, subscribes to the Frontend-Manager to get the value changes. The advantage is that many subscribers can be connected to the Frontend-Manager, without increasing the load on the WinCC OA system (based on the ideas from CERN).

With the WinCC OA CEP Manager we can define EPL / CQL statements in WinCC OA and the result streams are passed back to WinCC OA on datapoints, where the results can be processed further.

Some simple EPL examples:

Calculate 5 minute average values with intermediate results every 1 minute:

select avg(value), min(value), max(value) 
from event(tag='System1:Meter_Input.Watt').win:time(5 min) 
output snapshot at (*/1, *, *, *, *)

With pattern matching complex event sequences can be observed with EPL. A simple example is: detect if datapoint B is set after datapoint A (A->B), and its value is higher than the value of A.

select a.value, b.value
from pattern [a=Event(tag='System1:Analog1.Input') -> every b=Event(tag='System1:ExampleDP_Trend1.' and b.value>a.value)]

Get a notification when a datapoint is changing more than 100 times within 10 seconds:

select tag, count(value) 
from sec) 
group by tag having count(*) > 100

Get a notification when a datapoint changes and there is no following value change within the next 10 seconds. For example: if meters are normally changing every 5 seconds, possible broken meters/interfaces can be detected with EPL:

select a.tag, count(*) from pattern 
[every a=Event -> (timer:interval(10 sec) and not Event(tag=a.tag))] 
group by a.tag

Other examples can be found here.

Attached is a screenshot of a simple panel where EPL statements can be defined and observed.


Other examples for CEP with Esper:

Streaming WinCC OA Events to Apache Kafka and Spark…

We connected a Event Data Logger, written in Java and connected to WinCC OA by JNI, to Apache Kafka, a A high-throughput distributed messaging system.

Apache Spark can read data from Apache Kafka streams. Apache Spark is a fast and general engine for large-scale data processing. Combine SQL, streaming, and complex analytics. Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.


Java is everywhere… WinCC OA Java Manager…

From laptops to datacenters, game consoles to scientific supercomputers, cell phones to the Internet, Java is everywhere! Now connected to WinCC OA.

Here you will find an implementation of an API to connect WinCC OA to Java. It is based on the WinCC OA native API and JNI. This version is for WinCC OA 3.14 on Windows. Same works on Linux, but is not in the Zip, if you want it for Linux, just send me an email.

An example for a dpSet in Java see blow, more examples are in the zip/source directory. The JClient class is an easy to use static class. It should be thread safe and callback functions are processed in a separate thread, so that the main WinCC OA thread/loop will not be blocked by callback functions.

Based on that a NoSQL database logger was created, it is able to handle up to 40000-50000 value changes by dpQueryConnect. MQTT was connected to WinCC OA with a few lines of code (see WinCC OA and MQTT).

Download from GitHub:

How to use/install? In the zip is a ReadMe.txt with a step by step instruction.

JManager m = new JManager();
ret = JClient.dpSet()
 .add("System1:ExampleDP_Trend1.:_original.._value", new FloatVar(Math.random()))
 .add("System1:ExampleDP_SumAlert.:_original.._value", new TextVar("hello world"))
Debug.out.log(Level.INFO, "retCode={0}", ret);