2024Small Data SF

Enhancing The Scalability And Usability Of Visualization Toolkits

Data visualization should allow users to quickly understand their data and intuitively communicate key insights that inform decision-making. However, the process is still not as straightforward, performant, or intuitive as it should be.
With increased data volumes and usage in industry and science, visualization toolkits must support scalable computation for interactive exploration and adapt to audiences with diverse design and technical backgrounds.
This talk will discuss common scalability and usability challenges related to visualization toolkits and propose enhancements to make them easier to use, more flexible, and ultimately more beneficial for systems built on top of them.
Speaker
Junran Yang
Junran Yang

PhD Student

University of Washington

Junran is a PhD student at University of Washington, where she works in the Interactive Data Lab and the UW Database Group. Her research interests focus on developing scalable interactive systems to help users effectively understand and communicate data.

0:00[Music]

0:16Ideally, everyone should be able to use intuitive data interfaces to make sense of the their data and figure out how to clean uh model and uh extract insights from their data. data to make decisions. So um when we are talking about data interfaces here you can think think of both the front end where the users can interact with and also the

0:42back end that process and transform the data in response. Uh so why are existing systems still not able to support scalable and interactive exploration? So on the one hand we have visualization systems that are very useful and engaging tools for data exploration but they are not scalable enough due to their assumptions about they either being small and can fit into

1:14our browsers um or uh you have to process it somewhere else uh

1:22first. So on the other hand we have uh

1:26database management systems that can support scalable but not interactive explorations. So naturally we would want to combine this these two together to uh integrate the scalability with interactivity. However simply attach them together won't always work well and uh this is because we have many challenges that we need to deal with. So for the rest of the talk I want to talk

1:55about the challenges first and with each of these challenges I'll give some examples of the current partial solutions and uh towards the end of the talk I'm going to share some ongoing ongoing and future work about how to jointly optimizing both the perceptual and computational performance and hopefully to generalize uh the first challenge here is effective Ive design

2:24uh to create uh good visualizations we have to consider task data type and domain and we apply good design principles. So with every good design there are a dozen more that are either confusing or misleading and as your data volume increases. Uh if we keep the same multi-line design here, you see that it

2:52uh affects uh obviously affects the

2:56latency uh to process and render uh this many lines here but more importantly uh it also affect our ability to uh be able to mentally process uh the visuals here. So there are perceptual scalab scalability breaks down and one of the solution here is to

3:19find a way to summarize the data in a meaningful way. Uh so here we use the bin method.

3:26Uh so in this example we can see there are some individual line you can see on the top and uh there's a collective dip around uh the crash of 2008 and also there are two uh bands uh

3:42which are the $25 and $15 stocks.

3:49Uh so other than binning there are many uh useful scalable plotting techniques we could use like modeling and sampling besides binning. And another challenge here is the uh scalable interaction. So you could throw in all the layers of components and the library here, but your viewer will still expect uh the interface and dashboard to be uh able to

4:17interact smoothly. So your architecture necessarily becomes more complex but it's also uh non-trivial to

4:29optimize and uh just to combine this system components together. And to show you one example of uh what are the common interactions uh that are used in dashboard and uh interfaces. Uh this one is a example of um uh 1.8 8 billion stars uh in the

4:51galaxy. And the largest view here is a a

4:55raster sky map of the Milky Way way. And towards the bottom when uh we brush the histogram to select the high parallax stars uh on the right side it reviews a pattern of uh uh this plot uh about uh the stellar color versus the magnitude.

5:23So this is the demo from a project called mosaic I want to highlight here.

5:29Uh so they analyze interface specification and map them into query specification. Uh and how this work is to modularize dashboard architectures such as rendering interactions and processing as separate components that communicate to each other uh using a common language SQL.

5:53here. So, uh the last challenge we have is uh the strategies for optimization are design dependent. Other than the mosaic project, we have many prior uh research project uh in in this area of scalable visualization. Uh so as an example, immense and falcon here uh they use client side indexing or uh pre-agregated uh data cubes to speed up

6:23uh the brushing and linking interaction we just saw with the previous demo. And uh another project Kyrix it uh

6:34uses prefetching uh based on how the users interact with the uh interface. So basically they could zoom in or panning uh on the data

6:46and uh the system will be able to prefatch uh based on uh the uh how they would assume the user will go for the next step. Uh so with the challenges and uh the partial solutions there are some uh key principles we can see that uh to improve performance. The first one, perceptual and interactive scalability should be limited by the chosen

7:14resolution of visualized data uh rather than the uh volume of your raw data. And

7:22the second one is that the interface constrain the query scope. So uh the problem space is more uh constrained. And uh the third one is users exhibit a consistent pattern of interaction during their exploratory

7:42analysis.

7:48Okay. Okay. I realize I haven't finished this slides. uh so for this one I haven't really talked about the challenge of data dependent because this uh pri prior work they uh kind of

8:03optimize uh the performance but another problem they have is uh these techniques are designed for fixed interface or fixed uh interaction and they're not able to generalize uh to a broad broader design space. Okay. So given these lessons learned, we want to ask a question, can we leverage them while be able to generalize uh to the larger design

8:28space? And as an attempt uh one of my

8:33part past project look into how we can build optimization directly into declarative vis languages because they are the building block of many of the visualization systems.

8:46Uh so there are of course many vis languages and unlike mosaic they might not be that well organized or integrated with SQL. Uh so one language in

8:59particular is Vega uh because many of the current languages are inspired by uh Vega and um they provide building blocks to compose design uh a v variety of design interactions by declaratively mapping the data to the visual components. But everything including the data transformation happens uh in the data flow. it compiles into that runs in the JavaScript runtime. So of course we

9:30have scalability issues. Uh so how we can optimize the Vega data flow? Uh we essentially break it down into more specific questions. Uh when we are visualizing larger data set, can we still use the native Vega in JavaScript and when should we migrate our work uh to a separate back end? And is there a easier way to do this so the users don't

9:56have to be database administrators? So uh in this project uh I try to translate the data flow nodes into single queries and enumerate uh hundreds of possible v plans and utilizing some really simple machine learning models uh to rank the plans and pick the best one uh through pair wise comparison. So you might find this really similar to traditional database

10:25query optimization which are still very active research problems in the database community. But uh these simple tricks actually work really well in uh visualization problems because they constrain uh the problem uh under this visualization scenario. uh so it also limits the type of uh queries we can see and also utilize the principles uh and lessons learned. Okay. So

10:58uh with Vega plus the users are still

11:02expected to write their own charts and uh we have the assumption that they will make effective design. So how about how how we can help them uh so that they don't need to really think too much about uh the design. Uh so there are

11:21visualization recommenders uh even Tableau is one of the example uh so they

11:28are able to identify effective design given the data input and also there are system like foresight. It can even uh suggest the subset of data based on maybe statistical significance uh to choose the part that are of interest to the users. But uh simultaneously recommending interactions and being able to adapt to data sizes is still open research

11:59problem. Okay. And uh this recommenders is just one of the useful modalities uh for authoring charts and for data exploration. There are still many uh other uh modalities we could improve to help them uh make better charts and uh help them reason about the data. So just as self advertisement I've been working on project including uh to provide users

12:26with better examples and uh documentation. I've looked into how we can provide uh better abstraction and tools for them to compare and update uh

12:40the design knowledge base. And I've also looked into how we could do performance profiling for these visualization languages. Um so what are the takeaways for today? First one, supporting scalable interactive data exploration is challenging. And uh second, many system builders may believe that paying attention to their users and interfaces will we wake will make the performance weak for their system. But for my own

13:14research and research of other people, we debunk this stereotype. um paying attention to the uh human side of the work actually could improve both the performance of techniques and algorithm as well as the overall experience your u of uh the your users to use this tools. So one thing uh that's not here but I want to talk about

13:37um is that as academics we build many tools that uh maybe no one will ever use

13:45and um I think it it would be great uh

13:49if we could have them tested uh under uh real world problem but as uh industry also invent many of the tools that might not consider u established lished guidelines and principles. So I think it's a important thing that we could work together to make more usable tools.

14:12Uh okay. So with that I conclude my talk

14:17and also I'm graduating next year and looking for a job. [Music]

More 2024 Talks
View all