The second episode of the “Cloud stories from Norway” videoshow is ready for watching! In this 40-minutes long show we feature Azure cloud best practices, patterns, tips & tricks used in production in the products and services of the well-known Norwegian companies.
Yo can watch this episode on-demand. Just fill in the form and you will immediately receive a link to the video in your email.
For this episode, we invited Bent Eikmo (CTO at Oss Norge AS) and Sebastien Didierjean (Data Scientist at Oss Norge AS) to present about Azure architecture decisions they made and successfully implemented to build a hardware + software + cloud solution for helping people and companies to get useful information about the power consumption in their apartments and offices. You are welcome to read about how Oss Norge uses Azure cloud for creating a greener energy market in their Customer Story on our blog.
In their short, very focused technical sessions Bent and Sebastien presented about:
- Technical journey on building IoT device and sending data to the cloud using Azure IoT Hub
- Machine Learning infrastructure used in production at scale which includes Azure Data Explorer and Databricks
- Many more cloud tips & tricks to skill you up on Azure!
During the streaming session we received lots of technical questions from our attendees. Here we publish the answers to some of them:
- Q: Did you reduce the amount of data or only reduced the bandwidth used?
A: We didn’t reduce the amount of data, but the amount of overhead by packing more data into a single IoT hub frame.
- Q: Is ADX like Google BigQuery?
A: Yes, it is. It is more ergonomic and powerful (in our case!)
- Q: Is Azure Data Explorer easy to configure to ingest data from non-Azure sources, i.e Google Big Table or Big Query? Do you have any experience with that?
A: I have limited experience of such compatibility. We use ADX for both storage and data wrangling with ingestion from IOT hub.
- Q: What kind of resolution are you working with on the data?
A: We are working at seconds level.
- Q: Which programming language is being shown here?
A: It is mainly KQL and python (in the sandbox)
- Q: Why not make all the data processing in Databricks rather than using an additional framework with ADX to do the initial predictions?
A: Since we use ADX for the storage, it is easier to perform the first step of data wrangling/ML in ADX directly. We only use a few hours of Databricks clusters per day.
- Q: Why are you doing some machine learning in Spark rather than using ADX for everything? To me this sounds like a complication.
A: ADX is limited in term of ML and its DevOps capability is not enough to cover our needs.
- Q: Do you use ADX also for data storage, and how do you handle data retention?
A: Yes, we do. We keep the last 31 days in hot (SSD) storage.
- Q: If I remember correctly, you also have an API that could be consumed? I assume working with electricity data, there is a lot of volume. If so, did the volume of data create any challenges in making an API?
A: The raw data is not presented to the API. Only aggregated data is presented at the moment.
- Q: Could you tell what are in your experience the main advantages of doing the preprocessing in ADX vs Spark?
A: For a little startup, ADX offers out of the box tools to perform advanced data wrangling (less coding) and the development is faster since the queries are developed directly where the data is stored.
- Q: How much did you reduce in cost by optimizing in total (Azure, Code and IoT)
A: It is hard to say because we are not completely done optimizing, but I would say about 70%
- Q: How do you store the archive data in Azure?
A: We are not moving data to additional archive storage at the moment.
- Q: Do you have a solution for including the different grid tarifs (nettleie) when presenting the costs/price of the consumed electricity to each user?
A: We consider the zones where the customer lives. Considering the exact grid tariffs is under development.
- Q: For how long do you keep the data? Is it deleted afterwards or archived somewhere else?
A: As long as the user is a customer, we keep the data in our system (ADX)!
- Q: Could a machine learning model trained in AzureML be used directly in ADX to achieve the same capability as SparkML?
A: I am a bit unsure about the level of performance (it depends on the cluster configs/costs) but a model trained with AzureML should work with the Python sandbox. Let me know if you manage to couple ADX and AzureML!
- Q: Is data is stored as JSON?
A: It is ADX that deals with it. It looks JSON like.
- Q: Do you push code to the devices as well with IoT Hub?
A: Yes, we are using IoT Hub to control the update process.
- Q: Have you considered the life of Data? How long the data is planned to be stored?
A: As long as a user remains a customer, we do not have any limit on storage based on how old the data is.
- Q: I understand that customers’ anticipation and awareness might be one the goals but what is the future for Oss Norge and its product?
A: As of today, Oss Norge addresses both private and business markets with products oriented towards energy prediction and awareness. We are working on improving the present developments and on the optimisation of energy savings in the business segment with highly competitive solutions.
- Q: Could you provide some thoughts on what types of workloads you would use ADX and Databricks for going forward, in both the data engineering and machine learning areas? ADX seems to be very effective for less complex stuff (simple data transformation, basic built-in ML?), whereas Databricks provides full flexibility (and, perhaps, added scalability). Do you also see it this way, and if so, where would you say that you hit that «complexity threshold» on data engineering and ML complexity where Databricks becomes more effective?
A: So far ADX is scaling very smoothly so we do not think about Databricks overtaking part of the ETL. The presence of the python sandbox in ADX makes it flexible enough to cope with our future developments.
- Q: How did Azure help you in ensuring Data quality?
A: I would say that it is one of our first mission as Data Scientist to ensure optimal Data Quality. ADX with the built-in visualisation tool helps in productivity for all the tasks related to Data Quality.
- Q: Do you delete the data after some time or archive it in some way (e.g. so that you can perform long term analysis later)?
A: All data is stored within ADX at the moment for later analysis.
How to contact Bent Eikmo:
How to contact Sebastien Didierjean:
All episodes of the “Cloud stories from Norway”
Next events from Microsoft Norway?
How to follow all Microsoft Norway’s and local tech communities’ events about Azure cloud for the developers – conferences, seminars, workshops, training, webinars, etc.? Just follow our technical twitter https://twitter.com/MSDevNo or hashtag #MSDevNo
To stay connected:
Feel free to send your questions about the Azure cloud technology and educational events here: maxim.salnikov@microsoft.com
Maxim Salnikov
Developer Engagement Lead at Microsoft Norway