The Humming Box

A technology-powered toy to empower children’s ability to compose and play with music.
Project Info
Hacking Smart Toys for AI Learning class
Taught by Stefania Druga
Foundation -- Start from scratch
Deliverable -- Physical Prototype + Design Paper | CHI PLAY 2019 Conference Interactivity & Play Track
Mar - May 2019 | Part-time
Arduino MKRZero, NeoPixel, Rotary Encoder, GarageBand
C, MIDI, Sketch, AI, Laster Cutter, Acrylic Heater, Drills, Wood Sander
All group members participated in concept development, user research, interaction design, usability tests, and technical troubleshooting.
My Contribution
Arduino prototyping (code structure, NeoPixel animation, card insert detection, rotary encoder stabilization, MIDI notes, keyboard control);
Pitch detection algorithm exploring;
Fabrication (cranking handle);
Icon design.
Teammates' contributions
Tianyi Xie -- Rotary encoder read; Card design; Fabrication (card insert mechanism, cranking mechanism, box body, laser cut);  Demo video editing; Visual system design.
Yihan Tang -- NeoPixel connection; Box surface design; Fabrication (NeoPixel holder, box body, laser cut);  Demo video retouching.
Traditional music composing training presents a higher barrier of entry due to additional cost time, money, and effort. Additionally, children's increasing screen time and monotonous interaction mode in the digital era raised our concern.
We proposed a multimodal, tangible and playful smart toy, the Humming Box, to blend traditional hand-crank music boxes with software-based music manipulation and create new possibilities for children to create and remix music manually.

The evaluation of the latest prototype suggests that multimodal musical creation leads to increased engagement and that the modular design encourages creative expression and cooperative play.

In 3 design sprints, we took a deep-dive into children's behavior pattern and iterated accordingly.

Final Demo

The Humming Box has three main components: a wooden box prototype with a USB 2.0 Micro cable, three transparent instrument cards and keys. It also requires a computer installed with GarageBand program.

Key power switch
Cranking for music generation
Inserting instrument card

Set Up (By adults)
  • Connect the Humming Box to GarageBand with a USB2.0 Micro cable.
  • In GarageBand, set at least four instruments.Currently, we choose Guzheng, Electronic Piano,Guitar, and Funk. The order of instruments matters.
Play (By children)
  • Use the key to unlock the box by spinning clockwise. A white animated LED ring will light up on top of the box.
  • Keep GarageBand active and start cranking the box. The "Baby Shark" melody will start to play. The speed of cranking would change the rhythm of the melody in realtime. The LED ring will light up like a rainbow and gradually change color at the same pace as cranking.
  • Insert instrument cards into the card slot on top of the box to switch the song’s timbre. The different colors will flash on the LED ring as the box detects the different cards.


Domain Research
See Detailed takeaway ↓

Pros of multimodality:
  • Stimulates children's interests and extends focus time.
  • Cultivate their curiosity.
  • Different sensory modalities are effective as guides for their selective actions.
Pros of modularity:
  • Free children's creativity by encapsulating intricate technical details to accessible, functional modules for the younger audience.
  • Constructive interactions with modules will benefit children-initiated activities in the integrated playful pedagogies.
Pros of tangible toys:
  • Prevent children from wasting time with overwhelming digital products
  • Help children to form healthy entertaining habits.
  • Beneficial to children’s dexterity development and muscle growth.

Through analyzing existing related toys, we managed to scope down the concept and shape the value proposition as:

The intersection of multimodality, modularity and tangible interaction.

After rounds of idea provocation and voting, we chose "humming to compose" as the theme.

Humming is commonly used by children to create their music but often gradually diminishes as they grow up. Even if music education has become commonplace, the established rules in the field set a barrier for children to express themselves with music creatively. These frustrations may affect their interests and confidence in constructive music play.

Frontier interactive digital composing explorations (e.g. NYU MusEDLab) inspired us to reconsider basic elements in traditional music education by mapping them in a more natural and accessible way for younger players. Therefore, we framed the major design question as:

"How might we offer children innovative and intuitive physical experiences of composing and playing with music without professional training? "

In this way, children could potentially gain confidence in making broader expressions and creations.
Concept Evaluation
Quick and Dirty Interview
To assert children’s interest in music, we quickly invited a group of students to recall their childhood experiences.

As shown in the diagram, the validation from the ITP community supported our focus in the design question.

Minimum Viable “Dummy” Prototype
In order to further probe our initial interaction idea and determine the most appropriate form factor for our design, we made a "Dummy" prototype out of clay and foam core.

For the music functionality, we made a "Wizard of Oz" prototype, a hidden smart phone with HumOn app, to demonstrate the key features of detecting a hummed melody and modifying its rhythm, pitch, and timbre.

Subject Matter Experts (SME) Interview at FabLearn Conference 2019
We presented the “dummy” prototype at FabLearn Conference 2019, and held a subject matter experts (SME) interview session with more than 20 professional UX designers and researchers.
  • These experts highly acknowledged the concept of multimodal music education.
  • The group’s review gave us more insights on children’s behavior patterns.
  • They helped us to simplify interaction accordingly while maintaining their attention.
read A Story ↓

For instance, one of the teachers suggested children might yell at the box or talk to the box instead of humming, which was proven right in the first round of play testing. In other words, children may not interact with the toy expectedly. Incorporating children behavior patterns into consideration is critical to design more playful and intuitive experiences.



We developed a persona to tailor the design for target age group's needs, experiences, behaviors and goals.


Based on the feedback at FabLearn, we decided to pivot the design spotlight of our MVP (most viable product). First of all, we re-addressed our value proposition

"empower children's ability to compose and play with music"

before conducting divergent thinking sessions to make sure that we're on the same page and focusing on framing core interactions that map well with children’s existing conceptual model.

  • We ended up choosing cranking, exploited from traditional music boxes, as the primary interaction of generating sound.
  • Besides, cards were adopted as modification modules, in replacement of previous cubes.

iteration, in a loop!

During 3 rounds of prototyping and playtesting, we gathered feedbacks, reactions, and insights from:

Version I: Key interaction prototype
Once we reached the same ground about the core concept, the crux of the problem became shopping the key interaction around with target young audience and gathering feedback as soon as possible.

Therefore, we rapidly designed and built a wooden prototype box with a cranking handle connected to a rotary encoder, a button as the placeholder for recording and a bottomless slot. Although the prototype requires a laptop connection in order to output the MIDI signal to GarageBand, children are only interacting with the box. In this version, we used paper to represent instrument modules and manually toggle the timbre in GarageBand to mock-up the switching instrument effect.

Designing the mechanism to support the cranking read was the biggest challenge for us. The anatomy of traditional music boxes acted as our guide.

Achieved Technical Features

Basic rotary encoder read
I2S Microphone Testing
As the first trial of humming-MIDI note conversion
Arcade button connection
MIDI output with GarageBand
MIDI Note Mapping

To be continued

Pitch detection algorithm
AI powered modification
Rotary encoder stabilization
Automatic MIDI manipulation
Card insertion detection

Playtest I: After-school session in Chinatown
Aiming to gather insights about how children would physically and emotionally interact with cranking, button pressing, and module inserting, we conducted the initial round of play test with  14 children (3rd & 4th graders) during an after-school session in Chinatown, New York.

The children were grouped in two to interact with the prototype for about 5 minutes, followed by a 2-minute interview session. We first observed their behavior, then raised questions to clarify children's choices among the interaction.

Through observation, we found that:
  • Nearly every child enjoyed pressing the button while cranking simultaneously.
Modification Suggested
This observation directly lead us to focus on designing the multimodal interaction for music performance with the Humming Box. Since humming detection turned out to be a relatively separate feature and that it required bulk of research effort, we decided to prioritize music performance interaction design with the box over algorithm study. Pre-set MIDI notes are used as placeholders within the project's scope.
  • We were surprised that children got very involved in inserting almost everything they could find on the desk into the bottomless slot at once, which broke the prototype later.
Modification Suggested
By inserting they expected timbre changes and were disappointed when the output failed to modify. We attributed this reaction to children's nature of exploring and that single sound output didn't meet their expectations. The impact of cranking speed variations on rhythm changes was not perceivable enough for most groups.
  • Due to the lack of satisfying, various feedback, a reasonable focus time for children in this play test ranged between 1 to 2 minutes.
Modification Suggested
Therefore, a precise mapping between cranking speed and rhythm output became critical, and more variations of feedback are needed to increase engagement and improve their playful experience.
  • Besides, children tend to push cards all the way to the bottom.
Modification Suggested
Instrument module (card) inserting needs clear feedback to indicate to which degree they should push the cards into the box.

Version II: Multimodal prototype
With lessons learned from the first playtest, we re-constructed the prototype, replacing the button placeholder and adding a NeoPixel ring as another visual feedback component. The NeoPixel ring was coded to perform light animation of spinning while changing in rainbow colors. We mapped the animation pace to the cranking speed.

"Multimodality" is a call-out for this version of prototype. We mainly wanted to evaluate the impact of integrating multimodal interactions to the box's attractiveness to children.

We designed, laser-cut, and laser-edged three rectangle acrylic cards as instrument switch modules. The card insertion mechanism was improved by attaching switches to a cardholder at the bottom of the slot aiming to provide firmer support and the click feedback, which can indicate that the insertion had been detected and prevent users from further pushing the card through. At this stage, however, we were still manually switching the instruments in GarageBand.

Through out the process, we realized that the system was still in lack of clear state signal when it was initialized and ready for a new interaction session. Hence, we added a key switch for the“power control” effect. The keys also created a belonging relationship between children and the toy.

As for fabrication in this stage, I was mainly dealing with NeoPixel. Inspired by common-used diffuser tool in Photography, we applied a piece of semi-transparent paper on top of the LED ring to create a blurring and smooth animation effect.

Achieved Technical Features

For light animation, interaction and mapping
Key Switch
For "power control" effect
Card insertion mechanism

To be continued

Pitch detection algorithm
AI powered modification
Rotary encoder stabilization
Automatic MIDI manipulation
Card insertion detection

Playtest II: "Bring your child to work" day at Tisch
During the "Bring Your Child to Work" Day at Tisch School of the Arts, New York University, we invited around 10-15 children (3-6 years old) to play with the new prototype. Observations took the most part during this playtest.

Overall, children reacted positively to the Humming Box. A girl even came back to play with it for a second time.
  • Children were all actively reacting to the light animation of the NeoPixel ring.
  • Like in the first playtest, they also expressed strong interests when the timbre got toggled.
  • As for card detection mechanism, the clicking feedback successfully kept children from inserting too far.
Takeaways for improvement:
  • We noticed that card inserting was not smooth enough that sometimes, cards would be stuck.
Modification Suggested
Better card holding & clicking mechanism to make inserting more smoothly.
  • Two children were confused about the orientation of cards at their first trials.
Modification Suggested
Designed the card to be identical for both sides. 
  • Most children failed to pick up the melody.
Modification Suggested
Picking a more popular song as the preset MIDI note array.
  • Some younger children (approximately 3 years old) struggled a little bit with the first key.
Modification Suggested
Changing the texture of the key or make it bigger for smaller hands to grab.

Version III: Final Prototype
Based on observations during the second playtest, we improved the rotary encoder reading algorithm such that it would stabilize the cranking input. Since that children found the previous melody less attractive, we changed the MIDI notes to a more popular song, BabyShark, in this version. Additionally, in order to provide appropriate feedback for card insertion detection, we added instant color changes to the NeoPixel ring. Each instrument card would trigger different color wipe out. Meanwhile, it would also send a switching command to GarageBand by triggering keyboard inputs. In this way, we achieved automatic timbre manipulation. Finally, we reorganized and refactored our code.

During the fabrication, the most challenging part was to re-design and build the card insertion mechanism. By bending the acrylic board, we built a cardholder with calibrated angles and slots for switches. Each card would trigger different switch combinations.

Our observation about children’s confusion of the orientation of the cards guided us through adding a unique, symmetrical, encoded curve to the bottom of each card. We used transparent acrylic for cards to make them identical for both sides and also to allow the NeoPixel light to become visible. As a result, no matter how children inserted, they would always behave in the right way.

Additionally, we wrapped the keys with wool to not only give it a more attractive look and more delicate texture but also make it easier to grab for children.

We also stabilized the cranking mechanism and calibrated the rotation angles to ensure a smoother movement.

Achieved Technical Features

Rotary encoder stabilization
Card insertion detection
Automatic MIDI manipulation

Current Technical Structure

To be continued

Pitch detection algorithm
AI powered modification

Playtest III: Japanese Elementary School and ITP Spring Show
We evaluated the latest prototype at the Brooklyn Nihongo Gakuen Saturday workshop and the ITP SpringShow with 10 boys and 12 girls (3-6 years old).

Children expressed a strong willingness to play with multiple interactions, figure out connections between multimodal feedback, and share their findings with others.
  • The productive interactions and improved melody significantly increased their focus time (around five minutes) and drove their curiosity to the next level.
  • Moreover, some children invited friends to explore new possibilities cooperatively.
    They even primarily distributed responsibilities by assigning a representative to each interaction. The audience was also highly engaged by contributing thoughts and suggestions.
  • When there are guardians, the Humming Box acted as a linkage of the cooperation between younger children and parents.
    Children usually initiated the interaction. Parents would guide children to finish more challenging tasks such as card insertion and key switching. They would also inspire children to experiment with different instrument cards, which lead to a structured play activity that benefit children from an integrated pedagogical approach.
  • During the ITP SpringShow, older children and adults from various professional fields expressed strong interests in the prototype.A 9-year-old boy immediately mastered basic interactions and established an original way to mix with the instrument choices.

In the Future

We aim to integrate multiple AI-powered pitch detection algorithms to allow children sing directly to the box then play their own melody in future iterations of our Humming Box design. Moreover, we would also provide various pre-set melody selections as an alternative solution for children who are not interested in writing melody but like to explore music composing element such as determine pitch, rhythm or even Remix.

Additional modifications would be made to provide personal features for children, such as allowing customized instrument modules by using RFID detection and encouraging customized painting to the box.

Besides, we are looking into executive functions and uses of the instrument, trying to test with children on the spectrum based on dexterity.

Pitch detection algorithm
AI powered modification
Customized features
Test with children on the spectrum based on dexterity.
In conclusion...
Children, without experiential knowledge, can be easily affected by the overwhelming technology explosion. Thus, we designers are responsible for shaping the next generation's perceptions and habits of new technologies. We hope to continue exploring the horizon of music composing for younger creators by offering a smoother tangible form of interaction.

How I grew

Prioritizing is always important
Especially in a small team working to shape an innovative project, people could be way too attached to some features. I was personally "married" to pitch detection before the first playtest. It might seem to be a "golden idea" or a "core feature", but it requires huge time and effort costs for us. Moreover, the importance of the humming feature remains as an assumption, rather than tested truth. Therefore, prioritizing performance interaction design over investing into algorithm turns out to be the right call. Laster evaluation proved that children were already engaged with existing interactions.

Problem & Solution
Concept validation
Version I
Playtest I
Version II
Playtest II
Final Version