Hello Public Lab community! I'm Tommy, and I'm working on a pro bono project through Autodesk to ...
Public Lab is an open community which collaboratively develops accessible, open source, Do-It-Yourself technologies for investigating local environmental health and justice issues.
As an open source community, we believe in open licensing of content so that other members of the community can leverage your work legally -- with attribution, of course. By joining the Public Lab site, you agree to release the content you post here under a Creative Commons Attribution Sharealike license, and the hardware designs you post under the CERN Open Hardware License 1.1 (full text). This has the added benefit that others must share their improvements in turn with you.
Hello Public Lab community! I'm Tommy, and I'm working on a pro bono project through Autodesk to help out with improving the user experience on the publiclab.org website. You will start to see some questions popping up here from my teammates and myself to get feedback from you, the people who use the site, on what works and what doesn't, or what can be improved. Thanks in advance for taking a minute to weigh in, and helping us improve the site for everyone!
Another thing that makes tag pages sort of mysterious is that you can get to one by searching for "https://publiclab.org/tag/balloon-mapping", but if you enter "balloon-mapping" in the publiclab search box (https://publiclab.org/search/balloon-mapping) you don't get a tag page, and the search results you get are rather unhelpful. Apparently search does not include tags. There are two things about the user experience at publiclab that seem strange, and one of them is that search doesn't seem to work like I expect it to.
For me, visualizing tags is a way to visually depict associated tags, e.g. tags that appear together on the same content. For great example, see the color-coded clusters in @skilfullycurled 's visualization above. Clustering tags are important because they visually connect the website's presentation of community activity closer to what the Public Lab community culturally refers to as "research areas", or perhaps "topics" --> this is my actual goal with this entire issue.
Here's some background information: on our tags page (https://publiclab.org/tags) we write "We use tags to group research by topic" and encourage people to browse tags (currently only sorted by recent activity). This is an important way that we name, link to, and/or promote people to find and engage with topics. The Dashboard itself emphasizes recent activity. The Dashboard now features a "recently used tags" bar -- which is an important but partial step to the goal of seeing "research areas" or "topics".
To move forward, I am not interested in navigating by a graphic tag visualization (so 2007!), however, the clusters of activity provide an important additional way of connecting/navigating to topics. To achieve the goal, by which i mean the ability for the tags page to show which are the most interconnected tags, to communicate the breadth of connected topics in a research area, to navigate/connect to a research area, and to subscribe appropriately we do not necessarily need color-coded swooping arrows. Let's think about how to achieve these goals.
We might also consider mirroring publiclab.org/tags at publiclab.org/topics to make the language more accessible.
I left a long response to this with a possible initial solution:
Cool, thanks Liz!
To try for one stab at a narrower feature towards this goal, what if tag pages (floating new name: topic pages...!?!) had a list of "Related topics", something like:
Where "related" means that (acknowledging that there are different ways to measure this, and that we want some "computationally efficient" way) these are the tags which most commonly appear on pages that already have the primary tag. So for the topic onions, we tally every page tagged with onions and take the top, say, five most commonly occurring tags
Small follow-up if the above sounds good -- would it be all right to do this solely for the most recent 20-30 pages? Even if this is just a starting point, that would make this easier to implement without worrying about it causing overall website slowness. There could be more complex ways around this, but this is the easiest way to get started.
quick question -- are "primary tags" the most interconnected tags, or would there be a manual step of choosing them?
Glad to see there is a way to get started on this. While I appreciate the computational need for "recent," it is also true that i find it nervewracking to meet my responsibility to helping people find the topics they are interested in when everything is calculated by recently modified. After we get this first step done, i might have a suggestion that we lean on prior work done by @bsugar that shows these clusters, and potentially also include some that are not recently modified.
I think we should distinguish between tags and topics. I think, while we still don't use the word "topic" widely on the site, we have the opportunity to be very selective and concrete with what is deemed a "topic." This might save a headache later.
For tag groupings, I agree with Liz that it would be really great to have a way to group and view that is not related to chronology. I'd love to see primary landing pages that are always at the top, and those can have links to wikis and research notes etc. Perhaps a landing page could have a section on common tags associated with this topic, etc.
Hi @gretchen , do you have any more comments specifically on the interrelatedness of tags? That's what we are discussing here, rather than the adjacent navigation question of "primary landing pages always at the top." Here in this thread, we are working on developing how to -- for the first time -- show clusters of interrelated tags on Public Lab. I would love to hear any thoughts you have on clustering.
Hi, @liz and @gretchen - unfortunately, there is a direct link between site slowness and capacity costs and database queries such as required by a non-chronological tag lookup. I spent some time after talking with @liz thinking about creative ways around this, but unfortunately this is one of the main reasons @bsugar runs these types of analysis offline, and that we don't crank through this level of calculations as part of the site's functioning. Sadly, even if a calculation took, say, 10 seconds, that would bring the whole site to it's knees. And since there are no limits to the # of tags per page, it's likely that some pages would take 10x or 100x that amount of time, halting other activity on the site while that runs. (if you google "optimize database slowness", you'll get about a million articles telling you NOT to run queries of this kind)
Not to be apocalyptic! I just want to be clear that I'm not just saying it'd be hard, but that the scope of this is far greater than calculating the most recent values, which is something we can move on immediately.
Barring a database specialist, and a relatively major infrastructure project, this may be a hard limit. I'd be happy to try to plan what initial steps could be on this, but my guess was that you'd be interested in moving forward on this faster than that.
Sorry to drone on, I just wanted to explain why I'm trying so hard to think around this problem! I fully understand why this function is important, I just don't know how possible it is to implement, so I'm offering what we definitely can do.
Ah, and to your question, by "primary tag" I just meant the tag for the tag page you're looking at, so for /tag/onions, just onions. If we implemented this, the system would run automatically on all tag pages.
Hey everyone! For what it's worth I was heavily inspired by this project Tag Overflow, the code for which is here and from an initial version, here.
My version didn't require any ongoing database queries because I just exported the tags table to csv and ran it from there. The visualization only uses the top 256 tags (I think). I'm not sure if you can have other processes concurrently running (as in workers?) but you could export that single table on a weekly basis and then run a program to calculate the rest. I understand that even a single process can block the rest of the website but were it possible to run a separate process concurrently, it's not a very computationally intensive at all to create the tag co-occurrence graph and export it to a form (json, graphml) for visualization.
I also responded over there, but I think that with all the work on the API, code cleanup and outreach, we could do a daily or weekly cached version of such a query, and be OK with 10-15 seconds total compute time per week. The rest would be run locally in the browser.
To be fair, I think that as the site grows, this query will become more and more taxing on the database, and we may need to rethink this in the future. But at present I think we can do an initial implementation of this API call to produce the unique page IDs, grouped by the tags they use, as above.
Reply to this comment...
Also, this issue which will help us count all contributors on a topic will help generate a broader picture of topic-specific community:
@cfastie, I have those same questions too re: many tags, one wiki; one tag, many wikis. I'm hoping we can start creating primary wikis that are topic landing pages, which then could link to other wikis about similar topics or subtopics. That might start to solve the "many wikis, one tag" question with regards to how a wiki is selected to be the lead wiki for a "tag page."
thank you so much for this @julvie , i've started reading your great write-up and am excited to be in touch on this
Thank you again so much @julvie, i added so many comments i hope it's not too much! Would you be interested in coming to a 3pm meeting next Tuesday January 30th and hosting a discussion on this with us?
Hi Liz, it's certainly great to have a discussion going. An in-person discussion would be even better but I am based in Singapore. :( I hope you get a lot of insights that could help improve the experience on the site!
I mean, stranger things have happened....:) Thanks so much @julvie!