Arjun Chandra's Log

Brua IO is on the runway

Tue, 02 Jan 2024 00:00:00 +0000

A few months ago, I told this story at bedtime to my elder son, then a 3 year old.

There was once a boy who made all kinds of robots. Some robots did puzzles. Some made drawings. Some made music. One day the boy heard a strange noise. He saw that someone had made robots that were taking all his friends away to far off places. His friends started missing each other. They wanted to play with each other but they just couldn’t. They were just too far away. The boy decided to make a new robot. This new robot built long sturdy bridges for each and every friend of his to come back and start playing together, and they did. The name of this robot was Brua!

The last sentence was finished by my son. He understands what Brua stands for.

AI to expand our collective potential

Think about powerful AIs working alongside humans. Interactions with these ought to be reliable. Interactions with these out to be safe. Interactions with these ought to be socially responsible.

Some of you may be familier with the bicycle for the mind idea that was notably popularised by Steve Jobs. It is a way of saying that computers amplify the capabilities of the human mind. They do, but this idea needs more colour, specifically in the context of AI, a technology being advanced in our image (for now). AI can sure be seen as a tool that augments us at an individual level. It could engage us and expand our scope of knowledge and understanding. It could empower us to solve problems hithertoo unapproachable. But something about this picture is incomplete. Say if everyone got so engaged, would we then find ourselves in a world where artificial companions get optimized to keep us occupied, or even entertained, at the expense of empathizing with a fellow human? Do we want humans isolating ourselves with numerous AI friends? Could this form of engagement breed social isolation? Also, aren’t hard problems better solved in the company of other human minds anyway?

Instead, what about building bridges on which those bicycles could run? What about working to engage humans in deeper conversations? We humans feel meaning in doing things to serve the other. We ought to be each others companions driving the adoption of AI alongside endeavours that push the world forward. The hard things we do are not things done by humans in isolation. These endeavours are collective by nature. What about building AI products that bring human minds together to have transformative collaborative experiences, to expand our collective potential. For me at least, this latter picture is more agreeable as a model for AI adoption.

Recap 2023

Year 2023 has been the year of Brua’s founding, and by this I mean setting up the foundational structures across product, team, and early backers.

Here are some of the highlights:

Q1: Made demos that employ LLMs to reliably mediate access to information.
Q1: Promoted the idea that Norway is the ideal place for founding a deep tech AI company grounded in human values. It is.
Q2-Q3: Raised pre-seed from close network, which doubles up as Brua’s critical support system at this very early stage, specially for a first time founder like me.
Q2: Brua IO officially founded on May 16, 2023.
Q2: Got an old colleague who has shaped the frontier of AI tech to join as co-founder. Built a founding company structure for Brua.
Q2-Q4: Effort and toil to carve out our product.
Q4: Product came alive. Details upon request, entertained on a need to know basis.
Q4: Onboarded initial set of users, who are providing active feedback.

We enter 2024 doing a lot of listening, building on feedback, and expanding our user base. We’re on the runway. We’re busy.

A big thank you to what I consider has become an eco-system surrounding Brua, specially the team, our early backers, and early users.

Thank you also to all who showed/reached out with a show of support, and for the conversations about Brua through 2023. Apologies if some of those threads needed more attention – we will make up for it. It has been a wild time. More generally, we’re happy to resume conversations if you wish to learn more and back our mission.

The Brua way

Going back to the bedtime story, a few weeks ago I used our product to have my elder son (now a 4 year old) and I connect at a new curious level. We’re already very close, but Brua particularly helped me empathize and step inside my son’s fantastical world with greater confidence, and it entertained both our curiosities. Brua transformed our collective experience. I’m now able to play with my toddlers at the frontier of their fantasy world without feeling overwhelmed by it all. That world is teeming with life, complexities, and a highly fluid universe with its own laws, species, and customs. I can enter it. I can live, feel, and help shape their truths.

Now, amongst human endeavors, science ranks the highest as the embodiment of the pursuit of truth. Critically, this is a collective endeavor. It’s collective because shaping, refuting, or confirming the frontier of knowledge cannot happen in isolation. This frontier is forever laden with uncertainties, and is open to change, even tumultuous change, which science embraces. Such change accepting frontiers are common across all human experiences relying on knowledge and ideas, or indeed all human endeavours that handle the complexities of human thought. Bringing human minds together in pursuit of experiences that push or shape this frontier, augmenting our collective lives, yielding a stronger social fabric by unifying the plurality of ideas, where AI facilitates the celebration of our collective potential, is the Brua way of bringing AI to the world at large.

Brua exists to establish a socially responsible role for AI, where AI gets to work alongside us for us to do wondrous things together.

With that, I wish you a very happy new year.

Brua the Critical expresses my story, and the story behind brua.io

Tue, 21 Feb 2023 00:00:00 +0000

I had the pleasure of speaking at the CuttingEdgeAI event organized by NORA today. I told my story through the lenses of Brua the Critical.

Brua the Critical

Brua the Critical is a research chatbot that grounds conversations in reality. Under the hood, it integrates multi billion parameter (large) language models (LLMs) from Cohere and an expanding suite of reliability capabilities. As I told my story, we examined the techniques for reliable training and deployment of LLMs that AI systems like Brua the Critical utilise. Along the way, we saw the good, the bad and ugly, and the useful side of LLMs.

But what is the story?

I was leading the large model training stability program at Graphcore, an AI chip and systems company, up until 2022. We delivered software that made life easier for those who want to (pre-)train large models end to end on multi IPU systems without suffering numerical collapses that are otherwise rife in the process for all manner of AI hardware. Users would enable this functionality with a single line of code. I am proud of this and the team that made it happen.

While I am no longer at Graphcore, I carry its ambition with me — in fact I carry it literally in my pocket (a souvenir IPU chip). Now, I built my first robot from scratch at home in India in the year 2000 together with a couple of great engineers in the making. I wrote my first neural net from scratch in 2003 in the UK, where I spent a good number of years before moving to Norway as an academic in 2011. I have explored academia, lived the life of an entrepreneur, and have ridden the deep learning wave in the industry.

Page 1

After roughly 20 years working in the field, hindsight, the present state of affairs, and the future of my kids growing up with AI capabilities as first cultural concerns, I have converged on what I want to do next, and to do so in Norway, despite the misguided narrative that one has to move to the Bay Area to make a dent in the universe.

I consider Norway to be the epitome of human values. I have also directly worked with people here who have built AI systems that supersede those built by Deepmind, OpenAI, Google and NVIDIA. There is no dearth of talent and vision.

My ambition is to give the world an alternative path in the ongoing AI arms race, a path with human values. With brua.io, we aim to transform the way humans interact with AI capabilities. We are here for these capabilities to augment human endeavours reliably and safely. We are also here to build a deep tech company rooted in Oslo that expands the frontier of AI along the way.

We strive for a culture where humans and AI work alongside each other, and do so with the same trust that we have in a bridge, or brua if you like in Norwegian, as we walk over it.

If you are interested in furthering this ambition, let’s talk, and let’s shape the future of AI right here!

This is my story, and I am on page 1.

Brua the Critical says, ask me anything

Thu, 12 Jan 2023 00:00:00 +0000

The year is 2023. ChatGPT is all the rage. It is wonderfully exciting. It marks a shift in culture, and the beginning of a new era in human-AI/machine symbiosis. It posits large language models (LLMs) as useful tetchnology with ample more room to grow. It sure makes mistakes. And, if you are anything like me, you want to attempt a fix. You can’t really peek behind the scenes, but we are in luck. There are a couple¹² of projects ongoing that aim to bring the tech, or equivalent thereof, behind ChatGPT, into the open. I encourage us all to join such projects.

This post is not about that tech, nor is it about ChatGPT. But it is about attempting a fix, albeit with humble beginnings. Yet, the goal is audacious, powered by a flame to engineer AI systems that humans can rely on, much like engineering a bridge. In this instance, a bridge that lets the human get from a query to a useful and reality grounded response.

Cohere’s sandbox

About a month ago, Cohere, the company brining LLM capabilities into the hands of developers across the planet, released their sandbox³. Cohere already has a playground⁴ (much like OpenAI’s) that I was drawn to for reasons outside the scope of this post, but in acute alignment with the tastes of my kids, I felt the pull of the sandbox. This is a set of open source applications built on top of the LLM endpoints that Cohere offers. Amongst these is the grounded Q&A⁵ bot, which showcases the idea of augmenting LLM capabilities with Google search results⁶.

LLMs have fantastic abilities to generate content derived from the data they have been trained on (and adapted to via finetuning or prompting, or other more complex model editing mechanisms), but the real world sits on firm ground with a life of its own that is also in flux. Search results based on consensus are one alternative to exposing reality to LLMs. It should be noted that they too have limitations. One limitaiton is that consensus mechanisms drive the relevancy of results relative to a search term, which may itself not be grounded in truth. Nick Frosst of Cohere explains the problem here. I set out to attempt a fix.

Brua the Critical

Given a question posed by a user, what if we have a way of checking if the question itself is grounded in reality? Enter Brua the Critical, a bot with an early instance of this capability – a question-grounding mechanism – expanding the capabilities of the grounded Q&A bot from Cohere’s sandbox. When it comes to Q&A, a fully reliable AI system would have many more capabilities than this. My aim with Brua is to expand the scope of its capabilities one humble step at a time, with feedback from you the reader. The eventual audacious goal is to make Brua the most reliable and safe to use Q&A system out there, whether limited to a narrow domain in the real world, or for answering open ended question.

This particular incarnation of Brua does not take any question for granted! Here is an early preview of how Brua responds to user questions, in cases where they seem to have factual issues. I welcome feedback on other ways Brua could respond to questions not grounded in reality. Also, in what ways we humans would challenge/question a question, e.g. from a linguistic or common sense perspective? All feedback on this will help make Brua more reliable.

Brua as a discord bot attempting to answer questions having factual issues.

Getting more reliable with compute, and next steps

One of the key ideas in the design of Brua is the nature of its computational workload. The question-grounding mechanism, whilst increasing the inference time compute requirements, is designed to be embarrasingly parallel, thus Brua will get safer and more reliable with compute, with marginal, if any, effect on response times. This is one dimension along which I am extending Brua as we speak.

Another way I’ll be extending Brua is to use a reward model to further tailor the eventual response from the bot to be more human friendly/preferred in a linguistic sense. The way this latter feature could work in its rudimentary form is by having the LLM component that faces the user generate multiple responses, which would be ranked and filtered by reward (assuming monotonicity), favouring one with the highest.

So there we have it. Brua the Critical is looking for improvements and new capabilities, to eventually be known as Brua the Reliable!

The face of Brua on Discord, generated using Stable Diffusion v2.1 at DreamStudio using the prompt “A friendly robot and a human shaking hands, in the style of picasso and dali”.

Let’s chat?

By now you have noticed that I’ve left out the gory details of the question-grounding mechanism that Brua uses from this post. The mechanism is in flux, and at this stage, I’d love to talk more about Brua over a human-human chat with you. Why don’t you ping me @boelger. Perhaps we could also have a play with Brua, live!

Acknowledgements

It is an exciting time to witness and, behold, play at ease with the progress ongoing in the LLM capabilitiy space. I very much appreciate the existence of companies like Cohere to have made working with LLM capabilities as accessible to me, as is building and exploring new shapes in our sandbox for my kids (it is more like a snow covered box at present – i’ts January, and we are in Norway, but you get the idea). Thanks for inspiring this work on making LLM capabilities ever more reliable. I’d very much recommend people interested in LLM capability exploration to engage with the friendly Cohere and Cohere for AI communities.

Thanks also to my good friend Gordon Klaus for suggesting some edits to this post.

PaLM with RLHF ↩
Open Assistant ↩
Cohere’s sandbox ↩
Cohere’s playground ↩
Grounded Q&A ↩
Retreival augmented language models is an active area of research. ↩

AI Salon

Tue, 08 Nov 2022 00:00:00 +0000

The history of AI is long and varied.

Deploying increasingly capable AI systems

Superhuman automata have been in the human psyche since the advent of thought¹. Over time, and with the promulgation of scientific thinking, humanity started looking at the creation of thinking machines as a goal in and of itself². More recently, we have seen a serious expansion in the capability suite of AI Systems³.

As these systems get deployed, and the associated risks with deployment come to the fore, we must question the fundamentals on which our thinking about AI systems rests upon. As an example, see ideas around assistance games⁴, where agents are deemed uncertain about specified goals, and elicit human preferences through interaction to then be of value.

Personally, I’ve always favoured the community driven, bottom-up grassroots approach to the crystallisation of ideas. We are at a nascent stage when it comes to economically and culturally valuable AI capabilities making leaps out of labs. The time to steer these in a direction that benefits humanity and the planet is now. Doing so from the bottom up and in the open, driven by an ever expanding crew of enlightened humans, is how I’d like to engage with the nuances around this topic.

Community to steer the AI narrative

The AI Salon meetup group has been created with this mind. The group will promote discussions around the past (incl. in myths), present, and future of AI, with the aim of exploring ideas that feed into engineering the safe deployment of current and future AI systems.

Communities, small or large, are key to shaping culture. Let’s re-examine the roots, and help grow a culture around the safe deployment of AI systems considering all manner of concerns, be they technical, social, legal, and beyond. The open source movement around seriously capable models that has currently gripped the creative imagination of the masses, fueled by Stability AI, as refreshing as it is, makes it even more urgent for the community to develop and drive the techniques and the narrative surrounding safe AI deployment.

I’m looking forward to having you join us.

Generated using Stable Diffusion v1.5 at DreamStudio using the prompt - “Artificial Intelligence safely deployed to augment the human condition and benefit people and the planet”.

Let’s have a conversation, seed collaborations, and help shape the broader narrative around AI to what one feels when one walks along a randomly selected bridge in a randomly selected town on this planet we all call Earth.

Gods and Robots: Myths, Machines, and Ancient Dreams of Technology, Adrienne Mayor ↩
Artificial Intelligence: The Very Idea, John Haugeland ↩
On the Opportunities and Risks of Foundation Models, Center for Research on Foundation Models (CRFM), Stanford University ↩
Human Compatible: AI and the Problem of Control, Stuart Russell ↩

So long Graphcore

Sun, 30 Oct 2022 00:00:00 +0000

Tomorrow is my last day at Graphcore.

A little over 3 years ago, I officially¹ joined this company to contribute towards the vision of advancing machine intelligence to expand the scope of good that humanity could do. By this, I personally meant to help address, where it made sense, some of humanity’s greatest challenges head on via advancing machine intelligence in concert with advances in IPU chips and systems, unburdened from the trends or incrementing on the literature as it were. This included advancing machine intelligence as an endeavor in and of itself, both from the capability and safety² perspectives. Whist personally this remained (and continues to remain) my guiding star, I found myself in an environment which had room for me to grow in ways I hadn’t fully anticipated.

Shaping technology expanding the frontier of machine intelligence

Having spent most of my adult life working on machine/artificial intelligence (AI) in some form or the other, I expected to drive the conceptual side of things, to come up with novel neural architectures and learning schemes that would get realized in software in tandem, and unblock paths to unsolved problems across various domains of human undertaking. Little did I know that driving software features in the context of “merely” the state of the art in the field, thus ripe with opportunities to make advancements lower down the software stack, is what I’d actually end up doing. I even enjoyed it!

It has been a memorable ride. From working out the efficacy of training multi billion parameter models under execution schemes yet (at the time) to be realized in software, to (more recently) driving a cross stack and function team effort to put in production ideas that make training large models in reduced precision numerically stable, in particular Automatic Loss Scaling³ — I had the privilege to drive, contribute, learn and grow both technically and professionally along the way.

Automatic Loss Scaling from PyTorch down to IPU-PODs to realise stable training of large models.

Perhaps a few people at Graphcore can say that they did work that would not be possible elsewhere or outside organisations with a large amount of compute. I certainly am one of those who ticks both boxes, having had the luxury to touch knowledge that not many on the planet can, due to sheer compute requirements, and at a technical depth that was only possible at Graphcore. Of course, this was with a view to enable everyone, as an example, seamlessly navigate the numerical vagaries of large model training, with just a single line of code. The compute was designed and built from the ground up by this magnificent AI systems engineering team in Oslo. It may just be the best AI supercomputing machinery in the world. Kudos to this powerhouse of a team, the dream machine has indeed been built.

Whilst being part of this powerhouse and Graphcore at large, I got to use these machines on a daily basis, with in the order of 256 IPUs unraveling knowledge all at once over and over again. At the same time, working with people from across the software stack allowed for ideas to easily bend to the IPU execution model. To be able to work like this, one felt the shaping of technology that expands the frontier of machine intelligence or computing in general happening under one’s fingertips (quite literally). This was a privilege.

Companio

I must also take this opportunity to thank my amazing colleagues across locations. Over the pandemic, we worked over Zoom, Slack, and code differentials across Oslo, London, Bristol, Cambridge, and more. It did not matter that some of us never met in person. What a privilege it has been to work alongside the kindest and smartest folk, ever receptive to my non-work constraints raising small kids (one only just about to start walking). Whilst I still, out of habit, often found myself working late into the night after kids slept, there were times it was getting extremely difficult to balance work and life. The support I got from my immediate (even if physically remote) colleagues was phenomenal. What a reliable and friendly bunch with an immaculate work ethic. Thank you very much.

When we did meet, outside of the pandemic and childcare, we made sure of maximising the fun that can be had on site.

As I leave Graphcore, I’d just like to end on a thought that everyone at the company is very much accustomed to – that a company is “a group of people who come together to share bread”⁴. We share the highs and the lows together. We work as a unit, driven in our case to create meaning out of machine intelligence. This is a tall order, and even if one or more fall in the process, we all still keep the mission going.

All the best Graphcore. My job here feels done. I have grown, and I carry many learnings with me, specially on how we worked as a team to carve out the frontier of computing to whereever I end up next. So long.

Next step

As for what I’ll be doing next – time may have the answer. I do not, just yet.

We were working out collaboration opportunities with the Norwegian Open AI Lab, and the AI academic/industry ecosystem (specially my previous workplace, Telenor Research) in Norway for over a year before I officially joined Graphcore. We did co-supervise Masters theses at the AI Lab, e.g. on parallelisation techniques for large models, and others. ↩
More on the safety part forthcoming in another post soon. ↩
See this excellent blog post on Automatic Loss Scaling on the IPU. ↩
https://www.pentagram.com/work/graphcore-the-companio/story ↩

Moving to Graphcore

Sat, 28 Sep 2019 00:00:00 +0000

I have joined Graphcore’s Oslo unit.

Expressed below are my personal views and opinions on what this is about.

Joining Graphcore is an opportunity to learn and work alongside a remarkable team trying to engineer the future of computing. Graphcore, with its systemic approach to engineering the building blocks of machine intelligence, presents a clear and empowering path for me to express myself. I also hope to do so together with research colleagues and friends (old and new), both in Norway and abroad.

It is no big secret that we are entering a new era in how we perceive and use computers for problem solving. People lucky to play a direct role in it potentially feel, to some extent, similar to what it may have been like during the intellectual ferment of the 1950s. While computing, or any other human capability augmentation idea for that matter, has always been about enabling us humans to approach and tackle hard problems collectively, the prevailing social and technological contexts let us reimagine problem solving and invent new tools serving this purpose. Back in the 50s, not much was in digital electronic form, which is something we take for granted now. It was all being built. The world is indeed different today. There is this ever growing blob of bits that encompasses Earth (for lack of a better way to put it), containing immense amount of knowledge and garbage, pertainnig to all that we have managed to model, simulate, and tool. Computers can now bootstrap from this, help us embrace complexity, enable new discoveries, and assist humanity in ways hithertoo not possible. This is a choice, and perhaps an easy one. But to do this effectively and meaningfully entails fundamental changes in the software and the hardware employed to process information. Graphcore is trying to go about this by working on both, putting together the building blocks of machine intelligence to expand the scope of problems humanity can solve¹.

Graphcore in Oslo

While the Brisol unit is designing, from the ground up, the processor for machine intelligence workloads, the Oslo unit focuses on building, again from the ground up, the networking and scale out infrastructure for these processors. This infrastructure is being built to support and discover machine intelligence workloads and algorithms that scale efficiently with compute. Imagine inventing methods the world has yet to see, and which bring orders of magnitude improvements in solving problems being tackled by our current AI enterprise. Far more importantly though, imagine methods that enable us humans to solve problems far beyond the capabilities allowed for by current hardware. The workload for machine intelligence is different from what the chip industry has evolved towards since the late 1950s. Whilst we’ve benefitted heavily from repurposing compelling developments that accelerate algebraic operations from companies like NVIDIA, the world is asking for something different. It needs compute machinery that is designed from the ground up for machine intelligence. This is where Graphcore comes in.

At the grand opening of Graphcore’s Oslo unit. Holding 23.6 billion transistors in one’s hands can clearly bring a smile on one’s face.

Test of time

As lager and larger problem domains get tooled to produce ever more complex data, models able to handle this complextiy are also evolving, many of them working in parallel – somewhat of a necessity. Deep learning is a testament to this idea. In step, the world is steadily embracing machines that operate under a paradigm different from what made personal and cloud computing stand the test of time. This latter, time tested workload, relies on compute machinery that processes instructions in sequence, keeping a precise account of digital objects, e.g. documents, being processed. The workload expresses the nature of computer programs written for automating and managing scientific, business, engineering, design, art, and recreational endeavours, without much recourse to harnessing the amount and kind of data the world generates today. It also expresses programs that enable humans to symbiotically interact with computers and with each other. The people who championed the building blocks – the dream machine – serving this workload, invented the present day a few decades ago. These were the pioneers of computing, spearheaded by the likes of J. C. R Licklider, followed by Bob Taylor at the sotried Palo Alto Research Center (PARC), with Bob Noyce simultaneously cultivating the spirit of invention at Intel, modestly stamping silicon in the valley. Do read this book, if and when you get a chance, to learn about serving a dream tailored to expanding human potential through technology.

The intelligene workload

The intelligence workload, as I personaly see it, and this is subject to change, expresses programs that make machines learn from a pleathora of experiences, including from interactions with each other, and lets them adapt to changing circumstances. Machines have to complement each others’ strengths and weaknesses whilst dealing with uncertainty and changes in the dynamics of the world they operare in. This requires machines to discover structure contained in experiences, continuously distilling discovered knowledge in models of the world and of themselves, and using these models for decision making/operation in concert with the ecosystem they are situated in. Knowledge structures, indeed those that take inspiration from nature, rely on interconnected self-organising computational processes and abstractions that exchange information and operate in parallel, producing decisions adapted to and/or which adapt with the task at hand. The programs that the workload expresses have far greater degrees of freedom and built in redundancy, than the other, more mature, practice. These programs can express ideas and solutions us humans may not have happened upon when dealing with problems at the frontier of knowledge. They can thus reveal promising uncharted paths to solutions, in turn solving problems or making them approachable, advancing our intuition, and expanding our creative capacities in the process. These programs also enable autonomy in machines, allowing them to adapt to our needs. The world is in need of a place that champions the building blocks for the workload expressed by these kinds of programs. It needs a place that reaches out to the PARC of Bob Taylor and the Intel of Bob Noyce, and take on the responsibility for bearing the torch that carved computing history.

General prupose machinery

Graphcore is a good candidate for bearing this torch. It is building the general purpose programmable machinery for the evolving intelligence workload. Importantly, it is working on the system as a whole, which includes the processor, the scale-out machinery, and the software tool-chain, which would enable us to create the next generation of machine intelligence for tackling hard unsolved problems with greater resolve. Graphcore is on a mission to accelerate the shift in paradigm that the world is embracing.

Graphcore is hiring

If you want to play a meaningful role in engineering the future of computing, and care about expanding the scope of problems humanity can solve whilst at it, do get in touch. Graphcore is hiring! Check out these videos to hear more about working at Graphcore from some of the team.

Let’s build the next dream machine together.

One may wonder why this is interesting. In my humble opinion, to be content, there is not much else for one to do in life than to fully express oneself. As human beings confronted with challenges at the frontier of knowledge, intuition, reason, and action, the spirit of exploration spurs us to learn, and has taken humanity along a trajectory that enables us to shape our world. In the words of Jakob Bronowski, “__ is a singular creature. __ has a set of gifts which make __ unique among the animals, so that unlike them, __ is not a figure in the landscape - __ is the shaper of the landscape”. Knowing that we can shape is enough to suggest that we won’t be settling for a limited existence. Responsibly expanding what we can shape thus follows. Humanity has indeed created tools – the wheel, the transistor, the microchip, the computer, to name a few. They let us express ourselves to further the scope of knowledge we can acquire and the challenges we can tackle. If you believe this to be the case, you may understand why this may be interesting. ↩

Self-Organizing Conference on Machine Learning

Tue, 01 Jan 2019 00:00:00 +0000

NB. Part of this post is imported from my Medium post, which was written on Dec 1, 2017 during SOCML. It did not cover Day 2. I am slowly collecting my scattered writings from across the web over here, so I decided to go down memory lane a little bit and re-tell my SOCML experience in full.

For those who are new to SOCML, it has been happening since 2016. It is a 2 day event organised by Ian Goodfellow and co. The 2017 edition happened just before NeurIPS, and was located more-or-less on the way, as did the 2018 edition. This positioning in time and space worked really well in 2017. It was a really friendly event, with attendees from all walks of ML — industry, academia, startups, and students. It also covered varying degrees of ML experience, from leaders in the field to those just breaking in to ML. Everyone was available and willing to discuss.

The conference sessions (hour long group discussions) were crowdsourced from amongst the attendees beforehand. I participated in a few. Below are some of my personal musings around these. They do not necessarily fully reflect the points discussed during each session nor do they accurately cover the opinions of others.

Day 1

Interpretability (morning)

The crux of the matter here is to understand why a model does what it does when asked or potentially decides for itself that it has to do something. In my personal opinion, this is a particularly ill-posed problem.

Let’s take for instance human, unless the reader is not one, brains. Our decisions/actions are driven by feelings, which one can interpret as motivation or goals. These can be both short or long sighted. You feel strongly about something, thereby create a goal for yourself. In parts of the world where you have time to think — I luckily live in Norway, and I do — the goals tend to be long sighted.

Every such goal drives one’s behaviour. This should imply that a behaviour can be explained by goals. But, there is a problem here. It is possible to arrive at the same goal state via different paths, each requiring potentially different behaviours driven by various sub-goals. Furthermore, behaviours tend to overlap across different goals.

So, simply looking at the goals is not enough to understand behaviour, even if the goal gets achieved — disregarding asking the achiever to explain their behaviour in natural language, which potentially adds further variation to this understanding. Perhaps then, one has to probe the brain during the act. I do not think we have a good way of doing this with human brains, thereby making the problem ill-posed. But, we do have a better handle on the machine learning models we build, because we build them, thereby letting us probe them e.g. probing individual artificial neurons for what activates them.

Following the dance between goals and catching models in their act makes the problem less ill-posed, and brings fair motivation to working on the problem of interpretability.

Transfer learning/Domain adaptation/Few-shot learning (early afternoon)

My jet lag was kicking in at this point. But, it turns out that researchers working within these domains need a naming convention. Aside from that, learning reusable skills does not hurt, unless old skills become useless. Furthermore, it also makes sense to continuously learn new skills which either augment old skills or are completely novel, while making sure reusable skills aren’t forgotten. By the way, this is a common sense description, and thus I expect researchers driving the field to stay calm and not worry about what I mean by skill in a mathematical sense.

Exploration in RL (late afternoon)

The jet lag was in full bloom by now, but I followed this discussion better than the others, since my day job involves all things RL.

Clever realisations of common sense ideas like novelty, surprise, curiosity, safety, use of/guidance from old knowledge/skills, etc. have enthused this research. All these and more areas were touched upon.

The discussion revolved around a couple of very interesting questions posed by the moderator of our discussion group, Milan Cvitkovic:

Is reward or indeed related ideas of intrinsic/extrinsic motivation all that we have to drive research on exploration? Shaping the reward signal to guide exploration is indeed a heavily explored arena with RL. See what I did there? What could be good ways/metrics for evaluating exploration techniques? How does one technique differ from another? Perhaps visualising policies as they evolve and skills as they get discovered will help. How the distributions of returns for actions evolve, say compared to those resulting using baseline policies e.g. from human demonstrations, may also help. The idea here would be that an efficient exploration technique may deliver policies with return distributions similar to baseline distributions faster (compared to a less efficient one), provided human demonstrations are good baselines of course.

Day 2

ML in Production (morning)

This session was moderated by Omoju Miller from Github. Attendees included folk from H2O.ai, LinkedIn, Adobe, ETH Zurich, etc. Here is a related post from Omoju.

Bringing research to production was one of the concerns we touched upon. Keeping up with ML research is a challenge, and people have different strategies to tackle this, e.g. reading groups, meetups, http://www.gitxiv.com/ etc. But keeping up is one thing. An applied researcher also builds on the latest research and puts it in production, when appropriate. However, one often works in a sandbox, using a bunch of tools — for the sake of exploration — that may not fully sync with the production system in place. In this context, ways to transfer work to production could mean employing data engineers to set up the requisite pipelines for using/testing the trained models. This is costly. There are cheaper alternatives like what H2O.ai makes available, converting models to Java objects, letting one compile these objects in production (assuming a Java production environment). Note that things don’t really get to this stage if data access issues are overlooked.

We also looked at the question of whether there ought to be a separation between research and production people/teams. For a company with money, the answer seems affirmative. On the other hand, in small teams, one usually builds and ships whole products, so having specialist roles may be premature. If indeed there is a separation, a strong technical solution for coherence across teams seems like a natural thing to have.

In general, the groundwork of bringing the latest research to production is still under way across organisations of different sizes. We are in the ad-hoc stage of this endeavour, and best practices are in the works. It did seem reasonable for all applied ML people to aim at making whole products out of their projects/papers, e.g. serve their work through APIs, with an eye out for best practices. This way of work can also be helpful when trying to convince for uptake of your work in the wider organisation.

ML in the Sciences (early afternoon)

Note that before each set of sessions, Ian asked for a show of hands for interest. There was some interest in this session, but when the moment arrived, only two players remained — Kalai Ramea from PARC (the moderator) and myself (the notetaker). This was just as well, as we discovered our mutual interest in agent based modeling (ABM).

I also got to visit PARC not long after, where we continued discussions on ABM. Just as an aside, I was quite visibly star struck walking the halls of this historic building, thinking about the personal computing and networking pioneers who created our present world in the 70s right there.

Me a little star struck.

The Alto.

Hardware for ML (late afternoon)

Hardware has had a tremendous role in deep learning going mainstream. We are at a time when AI researchers and practitioners cannot just keep an eye on new developments in this space, but acively try to design algorithims that can scale with compute¹, whether or not one has access to new hardware. One should be aware of the different architectural possibilities out there — from incumbents like NVIDIA, and to the extent possible, from promising challengers like Graphcore. We are also at a stage where architectures can be influenced by generic algorithmic advances. The co-evolution of AI hardware and algorithms/workloads has only just begun.

The chief scientist of NVIDIA was present in this discussion, which meant that questions gravitated quite a bit in Bill’s direction. A bunch of topics were discussed. One of them was if the world needed different hardware for training vs. inference. The notion of specialised hardware for inference on the edge and cloud does not seem extremely far fetched, but time will tell. Further specialised ASICs for inference in the low power regimen, e.g. for IoT functions, also seemed reasonable. Feasibility of neuromorphic architectures to break into the market was met with some skepticism. Technical (CUDA related) and access frustrations with GPUs could also be felt.

Furthermore…

Apart from these sessions, there were some excellent posters, many of which involved GANs, and at least one involved wormholes to past/future memories.

Kalai also had a fantastic poster about transferring intricate styles on to silhouettes at the conference. Check out this git repo and get your styles transferred!

Last but not least, I got introduced to a nice tool for turning git repos into interactive Jupyter notebooks, called binder. Here is a related research paper.

All in all, I had a great time, and I hope to return to SOCML again in the near future.

Given a dataset, model, and an epoch budget, throwing more compute resources for training should lead to a decrease in wall time. This decrease has practical limits, due to the communication overhead resulting from the movement of data, model parameters, and gradients, over training iterations. An algorithm, or more specifically training workload, that can scale with compute is one which manages to balance compute and communication well enough for communication to cause minimal overhead (e.g. by masking it with computation, compressing gradients, etc.), without any reduction in the performance of the trained model. Maintaining performance has implications for more fundamental changes to the optimisation machinery. These ideas also apply to solving problems with bigger and richer models that require more memory than available on single accelerators. ↩

The Talking Drums

Sat, 29 Dec 2018 00:00:00 +0000

For a few days in Autumn, an annual science festival called Forskningsdagene has been taking place in Norway since 1995. Every year, the festival has a new theme. Popular science lectures are some of the things that happen. In 2014, the theme was communication, and the computer scientist in me got excited. The excitement lead me to doing a lecture on Information Theory for an audience of high school students at Litteraturhuset. Also in the audience was my Mother, which was super cool! The lecture happened on Sept 19, 2014. This was the time when I had my full-time entrepreneurial hat on, driving the AI effort at Studix, at the time based in StartupLab.

It was an exciting opportunity to say the least. It challenged me to find the right kind of abstractions for explaining a few bits of Information Theory such that it could easily be grasped by anyone. In preparation for the lecture, I happened upon a wonderful book by James Gleick called ‘The Information: A History, a Theory, a Flood’. It laid bare the abstractions I was after. The lecture became an adaptation of some of the ideas expressed in the book.

What was covered in the lecture?

It focussed on the need for redundancy in messages for robust long distance communication. The idea was to show how redundancy (extra drum beats/symbols/bits) in sent messages, whether communicated via drumming centuries ago in Africa, or via mobile phones today, helps with recovering information from them after having been corrupted in transit. High school students from two schools in Oslo, Elvebakken and Oslo Katedralskole, were present. Two of them got to play drums on the stage in our very own redundant orchestra, while I got to play the conductor. It was a rewarding experience for me, and I believe some in the audience may also have enjoyed it.

The lecture!

Before lecture day, I had also decided to write out the lecture script as it were, should the school students were to refer to it again at some point. It was hanging out on my old personal webpage until now. But, I am slowly moving my web presence over here. So, below is the whole thing. Enjoy!

Here are the slides (as a video), following the story as it was told.

Drums Can Talk

In the heart of Africa centuries ago, they picked up drums. Rhythms ensued, and they all meant something. They were not merely signals for raising alarms. They were not like a chain of bonfires on mountain tops that were lit one after the other to communicate alerts in times of war, like ancient Greeks are said to have done. They could in fact contain poetry and elaborate messages. Such messages could go far and wide, relayed form village to village, travelling hundreds of miles in a matter of hours. Not a word was spoken, not a word was written, there were only drum beats and they contained these messages. This was an early form of long distance communication. This was also a long standing dream in the rest of the world at the time: to communicate messages faster than travellers on foot or horseback.

Of course, spoken word would normally travel at the speed of the messenger. Indeed, many messengers reached their destinations after the event they were meant to communicate. Julius Caesar often arrived before the messenger sent to announce his arrival. A written message would have the same fate, as the speed of the fastest courier was still the speed of the messenger on horseback, or a pigeon for that matter.

Drum beats, on the contrary, would reach much before the event they were meant to convey. Distant villages could even rap with reach other if you like, solely by beats, building on a chain of thought beat after beat. They surely conveyed jokes in drum beats. Drummers would talk in the medium of drum beats. Travellers to Africa were astonished how a far off village they were visiting for the very first time would already know what they liked for breakfast, amongst other things. So, how did drums let humans talk?

Too Many Beats (Redundancy) Keeps the Message Alive

For a while people thought the roots of drum communication were similar to those which lead to the Morse code. But, the only similarity between drum beats and the electromagnetic beats of the Morse code is that they sound percussive. At heart, they are entirely different processes. While Morse code is essentially an alphabet, where a single dot means E and a single dash means T for instance, African languages had no written form — no Es, no Ts, no As, no Bs. These were purely spoken languages. Drum beats could not represent letters, as there weren’t any. They did represent one thing in speech — the tones.

Tonality is key to many languages, both in Africa and Asia: meaning is determined as much by pitch as by the sounds of consonants and vowels. Imagine seeing in black and white and describing the scene — we would lose a lot of valuable details! So is the case when tone is removed from speech in these languages. Most Western languages however are more-or-less tone blind. If we were to drum English messages for instance, it would all sound like a ticking clock, and it would be rather hard to know what’s exactly being said. There will be no information in such messages. It would all just sound the same.

In 1949, it was all laid bare by John F. Carrington in his book The Talking Drums of Africa. If one could write an African word or phrase in the Latin alphabet, it would mean different things when spoken in different tone variations. If one were to drum it however, one would get the tone variations right but lose the sounds of vowels and consonants. Thus a drum beat sequences, playing a spoken message tone for tone, would not be able to convey the full message. It would be as if we only saw colours through our eyes and no texture, no depth.

Such reduction in dimensionality came at the cost of losing information. A tonal pattern for some word could in fact be the pattern for hundreds, if not more, of other words. Drum beats would thus have to be padded up with contextual information — more drum beats. One would not simply drum the tones in the word wood, say [--], but instead the tones in the phrase the wood gives us fuel, say [_--__--_]. This would distinguish the word wood from the word lion, which would be drummed as the lion makes a funky king, say [_----__-] for instance, if the word lion had the same tonal contours [--] as wood does. A message saying bring home some wood today will not lead to any big surprises at home. The drum messages would have to be wordy.

Redundancy in Language

Redundancy or wordiness tackles confusion. This is true for all modes of communication, be it spoken, be it through drums, be it digital. A pilot flying an aircraft with a tail ID DE-RDS says Delta Echo Romeo Delta Sierra, or else the Ds and Es would sound very similar on the noisy radio channel, even on a cellular phone for that matter. When we send an SMS saying cn i hv a rd, we could be talking either about a book or transport. The extra letters in read or ride provide context to someone who may not be part of the full conversation. There are patterns in all languages, beat patterns, grammar, spelling, word frequencies, which help reduce ambiguity in a conveyed message.

If a message is very redundant, much of it can be guessed. There is some certainty about it. It is not totally random. We can do a little experiment to understand redundancy in language.

Go to some blog and pick random words from it, noting them down in a sequence.

I went here: http://www.anettemarie.no

I got this: deg vi det kokes basis frem ute dagen fordi

This sequence looks quite random.

Now try another thing. Pick a random word from this blog. Note it down. Now do a search for this word on the same page — there might be many matches. Pick some match and note down the word ‘next’ to this match. Now search for this new word on the page, pick a match and note down the word next to the match again. Repeat this process a few times — search for noted word, pick a match, note down word next to match.

I got this: alle sammen som kanskje tillatt seg rundt og visper mens jeg

This sequence of words does not seem like the first one. Not so random. We could even make a sentence out of it. This shows how pairs of words usually occur together. Some words occur more often next to a given word than other words. We can often guess neighbouring words. A totally random sequence of words would make it hard for us to guess what’s coming next. In a conversation in a noisy room, if we miss one word but hear the next, it is often possible for us to guess the first. The first word unravels the next and vice versa. Our brains are experts at keeping a mathematical record of these repeating patterns, and such patterns help us deal with background noise.

Redundancy in Digital Communication

Claude Shannon played such games in the 1940s, but there was no internet at the time, so no blogs either. He used books, and through playing such games and analysing language mathematically, made digital communication a reality. He invented a measure for information. Now we could measure how many bits of information a message contains, like we could measure how many grams of sugar a pack of sugar contains.

Central to the idea of communicating messages is the presence of a the ‘medium’ through which the message travels from sender to receiver. Our messages are represented as a stream of bits, [0]s and [1]s, and sent as electrical signals over cables or electromagnetic waves over wifi. Many things can corrupt a message as it is on its way, e.g. cabling problems, network issues. The medium is noisy. Our computers and mobile phones still let us communicate reliably. How do they do it?

Yes, they do it by adding extra symbols to our messages, extra [0]s and [1]s. This was possible to figure out because of Claude Shannon’s work. We could now understand how many bits of information a message contained. Likewise, how many extra bits makes a message wordy enough to recover the information in it, if it gets corrupted by noise while it is on its way.

When a drummer drums the lion makes a funky king as [_----__-], as long as the neighbouring village is able to hear most of the message, say [_??--__-], they would figure out that its the lion we are talking about, for who else would make a funky king? Similarly, if our digital message were [0 1 0] and we substitute [000] for any [0] and [111] for any [1], our message would become [000 111 000]. We can design our receiver to detect patterns: at least two [0]s in sequence of three digits to mean [0] and at least two [1]s to mean a [1]. In so doing, we would recover from errors that corrupt the message to [100 011 010].

Padding messages with extra symbols, not only removes confusion, it helps overcome errors introduced by thunderstorms in case of drummers, or network errors in case of our WhatsApp messages. Redundancy helped centuries ago with drums, and continues to do so today with computers and mobile phones.

Lecture on Deep Reinforcement Learning

Thu, 15 Nov 2018 00:00:00 +0000

Just like last year, I had the pleasure (indeed a privilege) of delivering another lecture on the current state of deep reinforcement learning (deep RL) on Nov 15, 2018, at NTNU, Trondheim. The field is moving very fast, so I get to talk about something new every time, which is fun.

The lecture was open to the public and streamed live. It was mostly attended by students taking courses offered by Prof. Keith Downing. It was also wonderful to see some of my immediate colleagues from Telenor Research and academic friends from NTNU. Some of the material I covered will likely be stale by the time you reach here. Nevertheless, below is the recording, and slides are available here. A shout out to BRAIN NTNU for taking care of the local arrangements.

What does it cover and who is it for?

The lecture attempts to draw a broad and intuitive picture of the field as it stands. The historical development of the field is examined, building up to current frontiers. One aim of this lecture is to spread awareness of current advances and some of the key technical challenges towards scaling deep RL methods to real world control/sequential decision making problems. It also aims at inspiring curiosity to tackle these challenges amongst students at University, data scientists, and academic and industry researchers intending to work in the field. A lot is going on to address each one of these challenges. Lecture favours covering enough conceptual ground to being exhaustive.

Why another lecture on the topic?

As is perhaps obvious, exposing fresh minds to the frontier and the technical challenges therein, is one way to make some of them curious about advancing the field. When these minds get to work, the field invariably matures, so does its applicability, trickling those advances into the industry. Lectures like these are necessary to keep the momentum going. Since the research effort in the field is moving quite rapidly, it also helps to checkpoint the current state every now and then.

Keep calm and advance RL!

It is indeed of great help that my work at Telenor Research revolves around applying and advancing RL. Although it takes some effort to keep up, it is fun to do so, and discussions with colleagues and students with similar interests keeps things under control. It has indeed been a great pleasure for me to get to work with a few in the recent past, in part due to supervisory activities at the Norwegian Open AI Lab, and through this reading group. I hope to keep learning from them and the wider community in the foreseeable future!

Reinforcement Learning Reading Group

Mon, 17 Sep 2018 00:00:00 +0000

When there seem to be very few people in one’s viscinity working to advance a field one is interested in, it can be a rather overwhelming place to be. But of course, one must not take this as a signal to abandon ship.

Scant general interest?

The reasons for seemingy scant general interest are many.

One is that the road to creating immedaite business value from the advances can be very unclear and costly. This is the case with most visionary ideas, and this is partly what academia and industry research labs are for, or so one would hope. Take for instance some of the pioneering work on training neural networks. It took until 2012 for advances from the early 1980s to go mainstream, thanks to more compute power, availability of more data, and further innovations by a relentless community to make training more stable.

Another is that the resources enabling progress in the field are disproportionally distributed across the worldwide talent pool, thereby creating severe disadvantages for some. This can drive the disadvantaged to focus their energies on something else, or move to places they feel enabled, unless one finds alternative ways of encouraging them.

Yet another reason is that you may be living in Norway. The general feeling of wellbeing over here incentivises playful individual pursuits, a la the world is one’s oyster. That is a great thing. But, if a hundred tinkerers tinker along thousand different directions, we get a lot of toys. If they were driven by a vision or even a pressing need, it could just as well be a solution to climate change, or for that matter making reinforcement learning (RL) go wild. Norway has oodles of talent. It is just missing that big pressing need to come together and reshape humanity. Most of it seems driven to taking up jobs as consultants. Steve Jobs showered some wisdom on consulting a while ago. Yet they go. Not for long I hope! To be fair, the ones who can drive/incentivise talent need to do a better job.

RL at the cusp of going wild

The basic RL machinery has been maturing over a few decades. Scalable approaches to RL driven decision making are advancing by leaps and bounds since 2013. It may, in a sense, seem like a field at the cusp of going wild. What makes RL peculiar in AI’s recent industry upwing is that we are at a stage where both basic pursuits and industry scale application are in fact being worked on at the same time. The basic pursuits will potentially lead to many orders of magnitude improvements in how agents collect and learn from experience. The push to apply RL from the industry further incentivises these advances. The tension in the field is very real and exciting.

To break or not to break out of the lab!

Scant general interest… revisited

This tension can however incentivise playing the waiting game. Since enough people in key industry research environments, e.g. Deepmind in collaboration with the wider Google, are trying to make RL go wild, can we not just wait until it happens? Let’s add this to the list of reasons for the seemingly scant interest. But for the sake of argument, even if one did wait, it is no guarantee that RL will go wild for one’s purpose. Sure, general algorithms are what we want, but people are not trying to solve your problem. Problems have peculiarities, some more severe than others, and not all of interest to everyone. Adaptations and tinkering are inevitable, at least in the short-mid term.

Reading group to drive breakthroughs

Furthermore, what about those who find a peculiar kind of satisfaction working on the subject — peculiar in the sense of an artist in unision with their process of creating art. They are not here to wait. They want to be part of the community nudging RL out of the lab. They certainly do not abandon ship. They embrace the tension. They care about pursuing the vision, and they are also not daunted by the lack of resources. Shouldn’t they be given a chance to rise to their potential?

Fortunately, since the field is so open, collaborating with the ones who have resources is of course an option. But to do so, one has to play at the same level. To this end, it certainly helps to have a local community which discusses, critiques, and appreciates ongoing RL research. Learning from each other, without a doubt, can kindle new ideas — ideas that move the field forward.

From follow up discussions with students after various lectures on the subject, it became clear that there was a need for such an outlet. Unsurprisingly, people who want to toy with the frontier of RL do exisit in Norway. Bringing them together clearly seemed like a good thing to do. Ergo, the Reinforcement Learning Reading Group. We are currently sampling physical spaces in Oslo (first two meetings happened at MESH). However, attending remotely via https://appear.in/machine-learning is indeed an option for those not in Oslo.

All welcome!