Mini-Feed Webservice Making a Comeback?

The Hackystat Mini-Feed has hit a road block. A Mini-Feed webservice may be the answer.

What I want to do

I had the idea that the mini-feeds should display some aggregated information based on batches of the sensor data entries. This will allow the feed to say things like:

Austen currently has 25 minutes of active time in the past hour. 2 hours total for today.

@ 12:02pm Austen has invoked an ant build.

@ 4:04pm Philip has run the Emma Coverage tool. The foo module currently has 99.9% method level coverage.

@2:02am Aaron has commited 25 files.

The Problem

The issue I had tonight was the large amount of webservice requests required to get the right data. There are basically two ways to request project data in Hackystat. [1] Request the index of sensor data entries, which provides you with information such as the URL to the entries' representation, a timestamp, sdt, user, etc. [2] Use the URL found in the index to drill down into the entities' representation.

To implement the Mini-Feed, I need to drill down into the entities' representation to get the entries' runtime and data type specific stuff.

For those of you wondering what the difference between timestamps and runtime are, timestamps are unique to all data entries. It acts as the 'primary key' of an entry. A runtime can exist in multiple entries and provides a way to 'batch' data together. An example of a batch of data would be all of the entries associated with an unit test tool's execution. Each batch would have information about the unit test's execution, such as whether or not a test passed or failed.

Drilling down into each data entry is time-consuming due to the large number of data entries. If the index has 5000 entries, I will have to make 5000 separate requests to get the entry specific information. My tests of performing 3000 or so web requests to my local sensorbase took about 45 seconds to complete. Thats just too long. I think the overhead of making all those requests is causing the slowdown. Thats my theory anyways...


The solution?

So I was thinking "How come the DailyProjectData services don't have that problem too?" I believe the reason is that there is only one webservice call binding each entries' data to Java objects.

So in order to remove the large amount of requests I will have to create a Mini-Feed service that can talk directly to the SensorBaseClient. I am wondering if caching will get rid of this problem. If so, I probably should hold off on writing a Mini-Feed service.

Hackystat Mini-Feed Part 2

This weekend I did a bit of ruby hacking to parse out the sensor data information from the Sensorbase webservice calls and found it to be relatively easy. Ruby is so nice ;) As I was hacking I had this feeling that I was placing the logic to parse the data in the wrong place. For my quick prototype, bunching all of the logic together was fine. But it turns out I need a higher level of abstraction in the form of a new service. Philip explains in more detail to Pavel and Dan Port in his email about creating a new service.

In order to separate the view from the mini-feed business logic, I need a new service that can allow clients to retrieve the abstracted mini-feed information.

This new service will allow multiple views such as the Rails webapp I'm writing, RSS feed viewers (Maybe RSS isn't a view, but I want people to have that information available via RSS), twitter, and other apps to be able to request the mini-feed information via webservice calls.

Now what type of information can be requested?


I've only gotten through half of the RESTful Webservices book so I only know how (relatively speaking since I'm still a REST noob) to design a service allowing clients to GET information. I'm probably going to take a look at the telemetry service or the SensorBaseClient class in order to figure out how to get my service working.


With the new service comes a Mini-Feed REST API that I must design. I have to think of what type of information can be requested by clients. Here are some of my initial thoughts:

- Mini-feed information spanning all projects, kind of like what twitter does on it's public timeline.
- Relevant mini-feed information for a specific project.
- Relevant mini-feed information for a set of users
- The start and end time of the information can be specified to reduce/expand the grain size.
- Webservice calls to filter the data. (Explanation below)

The ideas I just listed above isn't anything new. You can get all of that information using the sensorbase REST API. I'm thinking that abstracting the webservice calls to a higher level will help filter the data that isn't useful for mini-feed purposes. Clients lose the flexibility of getting all the data, but will hopefully gain an easier and smaller interface to get the information they need.



So what is this filtering I'm talking about? Pavel and Aaron turned me onto Facebook's News Feed preferences, which allows you to customize the information that appears in your News Feed. The cool thing is that it doesn't completely remove all of the information that you don't like. It just shows it less frequently.

The goal would be to get this type of control in my Rails webapp and request the user's customized information using the "Mini-Feed service data filter" REST API.

How does that sound?




Hackystat Stream of Development

Tonight I started working on the Hackystat version of Atlassian's Stream of Development Consciousness, which will tell mini-stories about what is currently happening in your development shop. Much like Facebook's Mini Feed, we want to provide awareness to people about what you did, when you did it.

The first thing I did was figure out how to request Hackystat data. For that information I headed over to the Hackystat REST API Specification wiki page and took a look at what was available to me. After a bit of head scratching and help from some fellow Hackystat hackers, I finally found the URI that should provide me with some interesting information.

Now that I got past that hurdle, now I can start to think about getting my web application (written in Rails yay!) to parse and display the right information. There are still a bit of things to think about.

What type of information do we display?
I was talking to Aaron online tonight and he came up with the idea of just displaying the amount of events that have taken place. For example, one mini-stream could say, '3 hours total dev time, 50 unit test invocations with 4 failing tests, and 200 builds in the last 2 hours'.

Of course that may be enough information for our little data stream, but it would be interesting to get more context behind the collected data. Perhaps we could start add comments to the mini-feeds? What if we added some telemetry to the mini-feeds? That would be cool to see how the mini-feeds turn into a large information feed over time with a telemetry chart.

What type of information should be collected/filtered?
Currently commit data from Subversion is collected once a day. I think it would be important to see commit information in the mini-feed. Committing code is a good indicator of progress so it would be nice to get more up-to-date information with respect to commits.

Another metric that we commit once a day is Issue data from JIRA. It would be good if we could collect that data more often so that we could update the mini-feed with JIRA event information.

I think that Filemetric and Code Issue data wouldn't be as useful as UnitTest and Commit information. I don't really know about this one. Maybe someone would want to see what kinds of FindBugs errors you have ;)

Cool New Features
DrillDown information would be cool because if you find an interesting mini-event, you could click on the event link and see what types of unit test failures are happening or what classes were commited.

An RSS feed with the mini-feed information would fit nicely into my Google Desktop app.

Commenting on feeds would also be cool.

Mini-feed comparisons at the grain size of a day? Is your mini-feed activity the same during the week the same?

Imply information from the data in the mini-feed. For example, if Aaron and I commit changes to the same module or are running similar unit tests, can we say that we are working together? 'Aaron and Austen have been working on module mini-feed for 2 hours'. Totally awesome.



So fun!

Lists are fun!

I'm a bit late, but I've been reading lists of what people want to accomplish this year and I think it's time to write one of my own. I personally don't believe in New Year's resolutions since I know I won't keep them, but I've done pretty well with blogging twice a week. I'm going to see if I can keep up the technical improvements this year.

  1. Improve my writing skills
    Since August I've tried to up keep this blog with technical topics and personal rants. Not all of them have been my finest work (actually a lot of them were horrible), but after making time for this blog, I think it's a great personal improvement project. You get to share ideas with the people that read your blog, work on your writing skills, and best of all, you can jot down your ideas and thoughts. I often have ideas that sound excellent in my head, but I found that things aren't so clear once you start to talk about it. I notice that I come up with these great one sentence ideas for blogs and have trouble writing 3 paragraphs. Writing it down will help my thinking process.
  2. Ruby/Apache Wicket and Hackystat
    This past year I spent time writing a Ping Pong ranking application with help from Aaron. Learning a different language was great and I now can talk about design patterns and technical topics by comparing other languages to Ruby.

    Back when Hackystat 8 was being just started to be re-written, there was talk of a Ruby interface. We have decided to explore other options using Ruby or Apache Wicket. The current interface was written in GWT by David and Pavel, but I think there was some difficulty developing with GWT. Feel free to talk to them for all the nitty gritty details. Now that we have some new Hackystat developers coming into the lab, we are going to explore a new interface. I want to help out development to become part of the Ruby community ;D
  3. Practice Puzzles
    One of the themes of this year’s learning will be focused on buffing up not only my technical abilities, but to work on my thinking power. I will be the first to admit that I am not the swiftest person out there. I have trouble understanding what people are saying and piecing things together. I have a hard time visualizing concepts in my head. That is the main reason I love dry erase markers and white boards.

    I am thinking I need to start giving my brain more practice. I’ve decided that I’m going to massage my brain by working on technical puzzle books. I just bought Puzzles for Programmers and Pros from Amazon and plan to work through it once I finish reading RESTful Webservices. Not only will this help me work on my brain power, it will be great practice for intern and my own interviews. We also have a large Rubik’s cube contingent at work. I plan on buying a cube to practice my visualizing skills at home.

  4. Play with Python
    I don’t really have much to say here. Python is a dynamic language that I’ve some interest in for a while. I plan on working with a lot of Ruby this year, so I probably won’t have much time for Python. Getting introduced to it this year is definitely doable. At the very least it will satisfy my dynamic language curiosities.

  5. Buff up my outward and critical thinking
    As I mentioned earlier, thinking is not one of my strong points. Aaron often talks about how he looks for interns that have outward thinking. I am not 100% sure what he means by outward thinking, but I assume he means that he wants people to think of problems that need to be solved. People should not only solve their own problems, but think outside of their domain and take a look at other problems to solve. He wants people to think critically about things that might spark some interest from others.

    I totally have a deficiency in this area. I am an inward thinker meaning I usually think about implementation level details rather than the big picture. I think critically about code, process, and software development, but I don’t look at why I’m so concerned about it all. I really need to take a look at the problem first before diving in the solution.

    I’m not sure how I’m going to tackle this problem this year. Hopefully people can help me along the way. At the very least, I am aware of the problem, so that is a good start.

  6. Give a speech to a group about something in technology
    I was scheduled to answer questions from Sandy Ho at the HTDC Science and Technology Fair about what Referentia is looking for and can give to it’s potential employees. Sadly the Q/A session was canceled and we ended up emailing the answers to her. This year I want to give a speech to a group about what companies are looking for in their budding software engineers. It would be nice to give a speech to students at UH and talk about the importance of reading, hacking, internships, and all the good stuff about software development. Right now I feel that UH has too much emphasis on theory and not enough on software development (aside from Philip’s class). It is up to Hawaii's hitech companies to take some initiative and provide internships that give students the experience necessary for them to succeed. Hopefully I can give a convincing speech to some people and get them motivated.


Phew. 50 weeks to go.

Edit. Add one more to the list.:

aaron: okay.
aaron: back to goal.
aaron: want you to find a hackystat research question and hypothesis.
aaron: thats your goal.

Sharing is addicting

Today as I was reading some blogs on my mobile and sharing stuff, I felt that twinge. That twinge that subconsciously tells me, "Hey why are you sharing that? Everyone knows that. People are going to see that you shared that and be like WTF."

That feeling that overcame me was totally awesome. It's that feeling that makes people actually care about the quality of their software. People don't want to let their team down. Similarly, I don't want to let you all down by sharing crappy articles. After all, I know that you are all like me and are too busy coding, reading, thinking, blogging, and all that other stuff (eating, sleeping, face-to-face interaction) to waste time reading boring articles.

Thinking about it more, sharing is kind of like console gaming. You start up your Nintendo Wii, play some Wii Tennis, and dominate. You are a pro. Your rank is as high as it possibly can be. You destroy the computer. Then you go to your company summer party and get pwned by your coworker. (True story)

Sharing items lets you get out of the sandbox. The range of topics you learn about grows out of control. You might just find something that interests you. Sharing has totally infected me. Every time I login to Google Reader I try to select interesting articles that might spark some interest. Sharing items is like an implicit way of blogging. They look at the items I share on Google Reader and wonder if any of the links are worth clicking. Now I have to make sure that I read blogs often, constantly sharing new things or else people might stop reading my shared items. What will they think of me? Sharing is important, but so is the stuff that is being shared.

GeoTIFF Hell

This weekend I spent a lot of time manipulating GeoTIFF images for work. GeoTIFF images have embedded meta-tags that contain coordinate, scale, and other miscellaneous information about the wrapping image. My only problem with GeoTIFFs is that the tag information is hard to access. The metadata isn't human readable so I had to use another application to read it. Man that sucks.

I had a problem where I had different pixel coordinate spaces for the GeoTIFF and the GeoTIFF's canvas. Figuring out a common way to relate the two pixel spaces was a pain because I didn't know what information was available to me. It would have been so much easier if I was able to take the graphic, look at it's metadata in a text editor (Textpad yah!), and see what I could do. What I had to do instead, was load the image using my image manipulation API and see what data was available via the debugger in Eclipse. Oh I also had to read the GeoTiff specification (yuck). What a pain. Not only did I not know what the metadata meant, I had to do this every time I wanted to see what information was available. Lots of overhead compared to reading the information in Textpad whenever I wanted. Why not just have a key-value pair of human-readable information?

Scale= 10 meters per pixel
Lat=10.10
Lon=-86
etc etc.

Aaron brought up the idea at work about images with context. He seems to enjoy looking at pictures instead of reading (blasphemy!). If there was a reliable way to search for images based on what was happening in the image rather than whatever Google does with their image search, we would be better off. This idea could be spread to all types of images. If all images had a standard text format that could be read, parsed, and manipulated, we might be able to create some interesting applications. Flicker currently allows context to be associated with images by users manually adding tags to their images. Why not embed the information in the image just like GeoTiffs? Hopefully the metadata would be of the human-readable variety.