Things You Can do Using Kangas Library in Data Science

In-depth Analysis of Kangas Library using Python

Published in

Heartbeat

7 min readFeb 13, 2023

Working with large datasets has always been a challenge for data developers, and it remains so in the current data industry. One of the main issues developers face is how to efficiently handle and process massive volumes of multimedia data, such as images. However, technological advances have led to the development of specialized software, such as Kangas, that is designed to handle these large datasets easily.

Kangas, developed by the team at Comet, is an open source tool that allows data developers to load, sort, group, and visualize millions of images at once without the risk of crashing their notebooks. Data developers no longer have to worry about the limitations of working with large datasets and can focus on analyzing and interpreting the data.

In this article, I will provide a detailed overview of Kangas, including information on how to install it and its advantages over other Python libraries. I will also delve into the features and capabilities of Kangas, giving readers a better understanding of how it can help them work with large datasets more efficiently and effectively.

What is Comet?

Comet is an MLOps platform that offers a suite of tools for machine-learning experimentation and data analysis. It is designed to make it easy to track and monitor experiments and conduct exploratory data analysis (EDA) using popular Python visualization frameworks. The integration of these frameworks allows Comet users to quickly and easily gain insights from their data, making it an essential tool for data scientists and machine-learning engineers.

Introducing Kangas

A powerful software application for working with large amounts of multimedia data. Developed by Comet engineers, Kangas is designed to make it easy for data developers to discover, analyze and display massive datasets. Its simple Python API allows for logging large data tables, and its intuitive visual interface allows for executing complex queries on your information.

One of the critical features of Kangas is its web-based UI, which is designed to facilitate easy visualization. It can quickly render visualizations while executing various queries, such as filtering, sorting, grouping, and reordering columns, using server-side rendering (React Server Components). This allows for a seamless and efficient user experience, making it easy to work with large datasets.

In addition to its user-friendly interface, Kangas has several advantages over other Python libraries. Some of these include:

Increased Performance

One of the critical benefits of Kangas is its ability to dramatically improve the performance of working with large datasets. With Kangas, users no longer have to waste time waiting for datasets to load or trying to visualize data one at a time. Instead, Kangas allows users to load millions of datasets simultaneously and group, sort, filter, and visualize the data in just a few seconds using the Kangas UI. This can save time and make working with large datasets more efficient.

Storage Capacity

Another advantage of Kangas is its ability to store large amounts of data. The SQL database used by Kangas Datagrid enables users to keep many datasets. This is because Kangas maintains its datasets in a SQL database, which allows for better efficiency and also allows users to carry out complex queries of the datasets. This is in contrast to other libraries in Python, where datasets are saved as objects in memory, which can slow down performance when the data increases. Kangas’s database-based storage system is more scalable and can better handle large amounts of data.

Flexibility

One of the standout features of Kangas is its flexibility, allowing it to receive data from various sources. These sources include existing DataGrids, CSV files, and Pandas DataFrames, making it easy to integrate Kangas with existing systems and workflows. In addition, Kangas is designed to operate in any environment, whether locally on a laptop or as a stand-alone program, making it a versatile tool that can adapt to different needs and use cases.

Another aspect of Kangas’s flexibility is the ability for users to customize and make adjustments to the platform. For example, customers who require additional features that Kangas does not yet offer can always fork the repository on GitHub and add the features they require. This open-source approach allows users to tailor the platform to their specific needs and use cases.

Have you tried Comet? Sign up for free and easily track experiments, manage models in production, and visualize your model performance.

Debugging

Another critical benefit of Kangas is its built-in sorting, grouping, and filtering features, which make it easy to debug and troubleshoot models and outputs. These features allow users to quickly and easily isolate and identify issues with the data, which can save a lot of time and effort when working with large datasets. Additionally, the intuitive UI allows for easy navigation and exploration of the data, making it simple to identify patterns, outliers, and other vital insights that can inform the debugging process.

Kangas is a valuable tool for data developers working with large datasets. It offers a range of features and capabilities that make it easy to gain insights from the data.

Getting started with Kangas

Installing Kangas

Kangas is installed the way most Python Libraries are installed using Pip.

pip install kangas

Using Kangas library

import kangas as kg

Load an existing DataGrid

dg = kg.read_datagrid("https://github.com/caleb-kaiser/kangas_examples/raw/master/coco-500.datagrid")

After loading data, we can also view the data using dg.head() or dg.tail() .

Build a DataGrid from a CSV

dg = kg.read_csv("/path/to/your.csv")

Build a DataGrid from a Pandas DataFrame

import kangas as kg
import pandas as pd

df = pd.DataFrame({"hidden_layer_size": [8, 16, 64], "loss": [0.97, 0.53, 0.12]})
dg = kg.read_dataframe(df)

Construct a DataGrid manually

import random
import datetime

dg = kg.DataGrid(name="Example 1", columns=["Category", "Loss", "Fitness", "Timestamp"])
for i in range(1000):
    dg.append([
        random.choice(["dog", "cat", "mouse", "duck"]), 
        random.random() - 2.0, 
        random.random() * 10, 
        datetime.datetime.now()
    ])
dg.save()

Visualizing the data

After creating the Datagrid from any of the methods, we can use dg.show() to view and play with the data.

Here, we can filter our data, sort the data based on different columns, group the data based on different columns, and many more things that we can explore.

We can also filter the data based on different conditions. More details on the filter expressions can be found on the Kangas module page.

Conclusion

Kangas is still a relatively new tool; currently, only a handful of beta users are testing it. The development team is actively working on improving and expanding the platform’s capabilities, and the community’s feedback and contributions will shape the project’s future direction.

As an open-source project, Kangas is free to use and contribute to. The development team encourages the community to test and provide feedback on the current version of Kangas and contribute to developing new features and improvements. This can include bug reporting, feature requests, and even contributing code.

I recommend visiting the Kangas repository on GitHub if you’re interested in using Kangas or contributing to the development. You can find more information on how to use Kangas and instructions on how to contribute. You can stay up-to-date on new releases and developments by following or starring the repository.

I believe that Kangas has a lot of potential, and I’m excited to see how the community will shape the platform’s future development. The Kangas team are open to feedback and suggestions, and I’m confident that with the community’s help, Kangas will become an essential tool for data developers working with large datasets.

Before you go…

If you liked this article and want to stay tuned with more exciting articles on Python & Data Science — do consider becoming a medium member by clicking here https://pranjalai.medium.com/membership.

Please consider signing up using my referral link. In this way, the portion of the membership fee goes to me, which motivates me to write more exciting stuff on Python and Data Science.

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.