Author Archive
Fitting AOP into the paradigm jigsaw
As one of the core contributors to a compiler for a new language, I often find myself needing to explain new concepts and ideas to other developers. Over time, I’ve gradually found an approach that tends to work. First, where does the concept fit on the language feature or paradigm map? Second, what is the problem that it is aimed at, and how are other perhaps more familiar solutions weak? And then — with that critical foundation laid — what does the new concept actually do to help? Recently, I was asked to speak about Aspect Oriented Programming at the .Net Community Day. I’ve known about AOP for years and discussed it with developers before, but this was a nice opportunity for me to spend a while thinking about how best to go about explaining it. So I set about coming up with the AOP answers for my three questions. There are a bunch of titles that we tend to put before the word “programming”. “Object Oriented Programming”, “Functional Programming”, “Declarative Programming” and “Generic Programming” are just some examples. They’re all paradigms, but the amount they try to answer differs. The first three I listed — OOP, FP and DP — will happily jostle for pride of place in a language, seeking to be its “core paradigm”, and posing a challenge to language designers who see value in all of them. Of course, some languages do decide to just follow a single paradigm: regexes are happily declarative, and Java has admitted little into the language that isn’t squarely inside the OO box. However, most of the popular and expressive general purpose languages out there today embrace multiple of these “core paradigms”, recognizing that solving every programming task with a single paradigm is like doing every garden task with a lawnmower. At the same time, there are paradigms that simply seek to deal with a very particular set of issues. Generic programming is one of them: it deals with the situations where we want to use the same piece of code with different types, in a strongly typed way. This idea actually cross-cuts the “core paradigms”; generics are simply a case of parametric polymorphism, which one finds in functional languages. This is the kind of place that AOP sits. It’s not pitching itself as a successor to — or even a competitor of — any of our familiar core paradigms. Instead, it takes a problem and proposes an approach for solving it. So what’s the problem? In any complex code base, there are cross-cutting concerns — things that we need to do in many places that are incidental to the core logic. For example, logging and exception handling and reporting are generally things that a piece of code does in addition to the task that it really exists to perform. Of course, we factor as much of the logging and reporting code out as we can, but generally we find ourselves doomed to repeat similar-looking bits of cross-cutting logic in many places. Even if it’s just a call to the logger, or a catch block, it ends up duplicated. Duplicated code is generally a bad sign. For one, it inhibits future refactoring; if we decide to switch logging library, this is now a change in many places. Since we have mentioned the logging library in many places, we have many pieces of our code that are closely coupled to it — something that goes against our general preference for code that is loosely coupled. We also run into challenges when we want to install new cross-cutting concerns into our software — there’s so many places to change! Aspect Oriented Programming is aimed at these kinds of challenges. It introduces the concept of join points — places in our code that we could add some functionality — and advice — the functionality that we want to incorporate. It provides a mechanism to write the cross-cutting functionality in one place and have it “inserted” into the join points. A common reaction is, “wow, magical”, but if we cast our eyes back over programming history, it’s really not so surprising. Once upon a time there was just assemblers. Repeating regular sequences of instructions was tiring, so macro systems were created. These “magically” got substituted for a sequence of instructions, with some parametrization thrown in to make them more powerful. A little further on, named and callable subroutines allowed for real re-use of code. We raised the abstraction bar: “you don’t have to worry about what this bit of code does, just call it like this”. Then came OO. Here things got really magical: invoking one of these method things leaves you with no idea what bit of code is going to be called! It could come from that familiar bit of code you wrote yesterday, or it could come from a subclass that’s going to be written in 10 years time. Over the years, we’ve evolved ways to manage complexity in software, many of them hanging off increasingly non-linear or loosely coupled flow of control. We’ve coped with the increasiging complexity in our paradigms and programming style, because they more than pay themselves back by decreasing complexity in the software we produce using them. AOP’s concept of join points and advice is giving us a little more conceptual overhead in exchange for a more maintainable and less coupled implementation of our cross-cutting concerns. Looking to the future, I think it’s important to remember that where we are with any paradigm today is almost certainly some way from the ideal. We’ve learned a lot about object oriented programming over the decades that it has existed, and continue to do so. Multiple inheritance has generally been marked down as a bad idea, and the emergence of traits in Smalltalk (also known as roles in Perl 6) has made some — myself included — question whether we’d be better off dropping inheritance altogether in favor of flattening composition. AOP has only had 15 years of lessons learned so far, and I’ve no doubt we’ve got a long road to walk from here. However, that being the case shouldn’t stop us embracing AOP today. After all, the ongoing journey of the object oriented paradigm isn’t something that stops us from using it. The critical question is, “does this help me deliver higher quality, more maintainable software” — and I think there are a bunch of use cases where today’s implementations of AOP help us to do so. About The Author
Jonathan Worthington is an architect and programming mentor at Edument. He works to help development teams improve their software architecture and work more efficiently, following best practices. He is also one of the core developers of the Rakudo Perl 6 compiler, where his work focuses on the implementation of object oriented language features and the type system. In his free time, he enjoys hiking in the mountains, good beer and curry.
.NET Community Day
.NET Community Day som hölls både i Göteborg den 10/2 och i Stockholm 11/2 hade en väldig framgång!
Med talare i världsklass lyckades vi få fulla lokaler av kunskapshungriga deltagare. Vi fick ta lärdom av talare så som Mark Seemann, Jonathan Worthington, Hadi Hariri och Dag könig om både gammalt som nytt inom .NET världen.
Inte nog med det så lanserades även nya kurser och utbildningar som kunde bokas direkt på plats av våra kunder. Exempel på dessa kurser är:
Web Security for Developers
Software Architecture
Using ASP.NET MVC in the real world: From start to finish
Vi tackar alla som deltog denna gång och ser redan fram emot nästa!
(för att se Part: 2,3,4 o.s.v klicka in er på YouTube där resterande finns!)
Product Predictions
Using Product Predictions to increase sales
One of the areas I’ve been working on since joining Edument is product prediction engines. In product prediction, we take a set of historical data that indicates what products a customer was interested in, and use it to suggest products that current customers may be interested in. That may be based on the customer’s own purchase history. Often we also want to use the data to compute a set of similar products to the one that the customer is currently viewing – something that works with new customers who we have no historical sales data about.
How a predictor engine typically integrates with an existing database-driven website
For example, if a customer is looking at a travel guide to Russia, from the historical sales data we may see that other customers who purchased this travel guide also tended to purchase a Russian phrasebook or a guide book about neighbouring Ukraine. “Gee,” says the customer. “I guess a phrase book would come in handy too, so I know my borsch from my babushki.” And so you get an extra sale.
Data Models
The key thing you need to get started with product prediction is a data set. Generally, the larger the better, although extremely large data sets may lead to slow predictions. This is something you will have to assess in your particular use case. To give you an idea of what’s manageable, though, I’ve been comfortably churning out predictions in under 10 milliseconds for data sets containing millions of past purchases. Granted it took me a little effort to achieve this kind of performance, but it’s within reach.
There are two types of data models: boolean models and ratings models. In a boolean model, either a customer has a connection with a product or they don’t. For example:
- They either purchased the product (and have a connection with it) or they didn’t purchase the product (and thus don’t)
- They either wrote a review of the product or they didn’t (note that it could be argued that even if the user wrote a crappy review of a product, they still had an interest in it, so we just treat the presence of a view as a connection)
- They either viewed the item or they didn’t (if you don’t have loads of historical sales data, you could data-mine web logs and try and get some connection data based upon which products a given site visitor viewed, on the basis that visitors to a site will generally tend to view related products; here, site visitors serve the place of customers in the data set).
An alternative is to have a ratings based data model. For example, we may have a large number of product ratings given by customers, on a scale of 1 to 5, where 1 is suckful and 5 is awesome. With a ratings based model, a customer giving a product a rating of 5 implies a stronger connection than a rating of 1. The prediction process then considers these values. Note that ratings are not the only possible source of data for using such a model; you could also consider:
- How many purchases a customer made of a product, if you have a range of products that often lead to repeat orders. For example, if you’re selling beer online, then a customer may have purchased Hobgoblin 10 times, but (understandably) only purchased Carling once.
- If mining web logs, you could consider number of visits to a product page as being your kind of “rating”
Your choice of boolean or rating model will directly influence the prediction algorithm that you use, since some algorithms only work with one model or the other. Which you opt for largely depends on what type of data you have, though as noted even if you have ratings data you may just like to abstract it down to “did the customer rate the product or not”. A boolean model certainly has the benefit of simplicity, and I’ve had very good results using such a model. If you have resources, then trying multiple algorithms and data models and using A-B testing would, of course, give you a far more scientific answer.
User-based and Item-based Predictors
Once you have a data model, there are two general types of predictors that you can use it with: user-based and item-based.
Improving Predictions
In some cases, you may have extra information that you wish to use to influence product prediction. For example:
A user-based predictor works by building a “network” of users with similar interests, and then making recommendations based on that. That is, given our current user, we can work out the other users nearest to them in the network and give recommendations based upon what those other users purchased. This depends on having some current historical data for the current user. We generally need to pick two things when working with a user-based predictor: a user similarity algorithm that computes the similarity of two users, and a user neighbourhood algorithm that builds the network of most related users. A side-effect of this is that we also win a way to compute “users with similar interests”, which for some cases may be useful.
An item-based predictor works based upon item similarity. To work, it also requires you to select an item similarity algorithm, which determines how similar two items are based on the overlap in customers purchasing them (depending on the algorithm, it actually can make transitive reasoning rather than requiring immediate overlap). To produce recommendations for a user, we go from the items they purchased and find most similar items to all of those. However, the big win from this approach is that we can compute most similar products. This is extremely useful because it gives us a way to make predictions for a product that is currently being viewed.
One additional benefit of an item-based predictor is that they allow for more pre-computation. The reason I could compute most similar products in 10 milliseconds was actually because I could, off-line in a batch job, pre-compute tens of thousands of set intersections that would be used in producing the results rather than computing them on the fly. Thus if you have time constraints or a large data set, item predictors are going to scale better.
- You may want to filter out products that a customer has already added to their basket from the prediction results; they already decided to buy it, so it’s best to show them new things they may also wish to buy.
- If you have some products that are not “family friendly” then you may wish to filter them out. Just because a many parents of kids who like Thomas The Tank Engine also happen to have purchased horror movies doesn’t mean they are suitable recommendations.
- You may want to suggest products in a given price range, which will take the customer over some threshold where they get a volume discount, or can use a voucher, or get free shipping.
An additional tweak can involve re-scoring the predicted items. Prediction engines not only find products to recommend, but they rank them by giving each one a score. When we re-score, we twiddle this score a bit to bias certain products to being more likely to appear at the top of the predictions list. Since a site will only have space to show a selection of the predictions, this can make the difference as to whether a product appears as predicted or not. Use cases for re-scoring include:
- Finding a way to factor in some ratings data when we have a boolean model. For example, if you primarily want to go with a boolean data model based on sales history, you may also want to use customer’s ratings of a product somehow. You could bias items with high ratings to be more likely to appear in the results than items with lower ratings. This helps you make the most of both data sets.
- Biasing products that are more profitable for you to sell towards being ranked as better predictions.
- Biasing products that are on special offer to being ranked higher (perhaps you have a lot of a certain item in stock and want to try to increase sales of it)
Tracking and A-B Testing
Just introducing a product prediction system is not the end of the story – or at least, should not be. It’s also important to track how many purchases are made as a result of the predictions. This just means tracking click-throughs to product pages made from the product prediction results, and setting a flag when such products are later purchased. This way, you can get a sense of the return on investment that your product prediction engine is delivering.
Isn’t this expensive?
Getting a basic product prediction solution in place need not be expensive. Naturally, much time can be spent optimizing predictions and trying out different configurations. However, a basic solution can deliver significant ROI. Then, once you’re comfortable that the predictor is delivering value, you can invest in further optimizing it to increase your sales.
Going further
If you’re interested in product prediction solutions, feel free to contact Edument. We’ll be happy to discuss how you can put the data you already have available to good use in making product predictions. We are also experts at integrating with existing systems and data, with specialists in a range of technologies.
Taking things a step further, you may want to make your prediction engine configurable, or even run two parallel versions of it. The first time a visitor hits the site, you select which version of the prediction engine that their visit will fall under. You then store not just that the purchase was the result of a prediction, but which version or configuration of the predictor led to the purchase. By combining that with data on how many predictions were served by each configuration or version, you can get real world data on how well different approaches are working for you, and then pick the configuration that works best.
About the author:
Jonathan Worthington is an architect and programming mentor at Edument. He works to help development teams improve their software architecture and work more efficiently, following best practices. He is also one of the core developers of the Rakudo Perl 6 compiler, where his work focuses on the implementation of object oriented language features and the type system. In his free time, he enjoys hiking in the mountains, good beer and curry.
Välkommen, Carl Mäsak, till Edument!
Vi vill börja med att välkomna vår nya medarbetare Carl Mäsak till Edument och hoppas att han kommer trivas hos oss. Vi har tänkt köra en presentation av Carl så att både våra kunder och medarbetare ska få chansen att lära känna honom!
Vad har du gjort tidigare?
Jag har studerat till civilingenjör i Uppsala. Under studietiden har jag jobbat vid sidan om med programmering, mestadels för universitetet. Jag har en bred erfarenhet som utvecklare av öppen källkod, särskilt som Perl-utvecklare. Jag är en av de mer aktiva personerna i utvecklingen av Perl 6. I jobbet har det mestadels blivit Java och Eclipse, men jag är även naturligt nyfiken på nya språk och paradigm.
Du har hållit på med bioinformatisk mjukvara tidigare?
Ja, min civilingengörsutbildning har namnet ”Bioinformatik”, och under de senste åren har jag ingått i en forskargrupp som utvecklar Bioclipse, en integrerad mjukvaruplattform för kemo- och bioinformatik.
Vad inom systemutveckling brinner du mest för?
Jag tycker att själva utvecklingsprocessen är det häftiga – att utforska ett problem, och skriva något som sedan körs och löser problemet.
J. S. Bach lär ha sagt att det inte är så svårt att spela orgel: man behöver bara trycka på rätt tangenter i rätt ordning. Programmering känns “lätt-svårt” på just det sättet; det är lätt när man kan det. Och det är spännande att lära sig nya tekniker för att förstå det bättre och nå längre.
Varför valde du att jobba för just Edument?
Jag gillar värdena som ligger till grund för Edument. Konsultverksamhet kan se ut på så många olika sätt – hos Edument handlar det om att ge kunden något långsiktigt bra, inte bara plåsterlösningar. Det finns något lockande i att vara med hela vägen med kunden, både i utvecklingsarbetet på plats och i fortbildning och mentorskap.
Hon Edument kommer jag att ägna mig åt mobil utveckling, Android, Meego, samt undervisning/konsulting i diverse språk: Java, C++, Perl…
Har du några speciella highlights i din karriär?
Min “karriär” så här långt återfinns huvudsakligen i Perl-världen. Jag har varit med och utvecklat specifikation och implementationer för Perl 6, jag har skrivit ett antal program och moduler, mina föredrag på konferenser är omtyckta, och min blogg har tusentals besökare i månaden. Under den gångna månaden har jag arrangerat en programmeringstävling på Internet.
Vad har du för fritidsintressen?
Jag tycker om att dansa, simma, springa, sjunga i kör, laga mat. baka, spela piano/orgel, komponera, skriva, och rita. Dock ej alla samtidigt.
Kuriosa: Vilken var din första dator?
En Amstrad med två 5 1/4 tums diskettstationer och utan hårddisk. MS-DOS 3 någonting. Gjorde mitt första “plattformsspel” i Lotus 123, den tidens Excel.
Läs mer: Carls konsultprofil hos Edument.
Community Connection II – .NET best practices (26nov)
Eventet ”Community Connection II — .NET best practices” som hölls den 26 november blev lyckat. Schemat ändrades en aning under dagen, inga drastiska förändringar dock byttes det plats mellan de olika talarna. Vi fick höra talare som: Hadi Hariri, Dag König, Jonathan Worthington, Mark Seemann och slutligen Peter Larsson.
Med över 150 deltagare och positiv feedback så är vi otroligt nöjda och ser redan fram emot nästa gång då vi hoppas få se alla deltagande igen, och kanske några nya ansikten? Det hoppas vi!
Perl 6-föredrag i Oslo
Nyligen blev Jonathan Worthington inbjuden till Oslo, av Norska Unix-användargruppen (Norwegian Unix User Group, NUUG), för att tala om Perl 6-projektet.
I föredraget fokuserade han på hur språket är utformat för att vara skalbart från små enradiga engångsprogram, över små verktyg, och vidare upp till att bygga större applikationer. I synnerhet tittade vi på några exempel av specifika språksärdrag som är nyttiga för vart och ett av dessa områden. Det tillhandahåller en bra överblick av något av det mest intressanta nya innehållet i Perl 6.
Lyckligtvis gjorde NUUG en utmärkt videoinspelning av föredraget, som man kan se på:
http://www.nuug.no/aktiviteter/20100914-little-tools-large-apps/
Tack, NUUG, för inbjudan, till alla som anslöt till föredraget, och för den mycket angenäma middagen och drinkarna efteråt. Oslo är en härlig stad, och det var “greit” att hänga med några av dess vänliga hackare också.
Community Connection 2010 – SQL & .NET
Informator, Pass Scania och Edument bjuder in till en kostnadsfri heldag om SQL och .NET i Malmö den 10 september. Missa inte Jonathan Worthington, Peter Larsson, Thomas Ivarsson m fl
Agenda
07.30-08.30 Frukost
08.30-08.40 Introduktion av arrangörerna, Informator, Pass, Edument
08.40-09.25 Keynote: Microsoft introducerar SQL Server 2008 R2.
09.40-10.25 Vanligaste misstagen som utvecklare gör i SQL Server
Peter Larsson (MVP), PASS Scania
Vilka är de vanligaste misstagen utvecklare gör i SQL Server i början? Varför går det inte så snabbt nu längre? Frågorna är många och det finns lyckligtvis några få enkla svar! Vi kommer att gå igenom de vanligaste misstagen och hur man åtgärdar problemen för att få bättre prestanda och säkrare resultat.
- Översikt
- Beräkningar på indexerade kolumner
- Tid- och datumberäkningar
- Felaktiga datatyper
- RBAR (Row-By-Agonizing-Row)
10.40-11.25 .Net Data Access: Too many ways to do it?
Jonathan Worthington
Today’s .Net developers have more choice than ever when it comes to data access. To simply select some data from a database, one could use the SqlConnection, SqlCommand and SqlDataReader objects (with an SQL query or through a stored procedure), Linq, Enity Framework (POCO or not) – and that’s before we even consider non-Microsoft offerings, such as NHibernate, or the possibility of generating data access layers from templates and database schemas.
No one of these approaches is right for every application; indeed, applications may legitimately make use of more than one approach. In this talk we’ll look through the strengths and pitfalls of a range of the options, their performance both in terms of development time and execution time, and see what tasks they lend themselves to.
11.40-12.40 Lunch & Mingel
12.40-13.25 SQL CLR
Matt Whitfield (MVP), Atlantis Interactive
CLR integration within SQL Server allows developers and administrators alike to harness the power of the .NET Framework from directly within their databases. This seminar will cover the basics of implementing all types of CLR objects, including types, aggregates, procedures, triggers and functions, as well as showing some examples of how CLR objects have been put to good use.
SQL CLR Overview
- * Overview of CLR in SQL Server
- * Explore all types of CLR object
- * The differences in CLR permission sets
- * Worked examples of problem-solving with the CLR
13.40-14.25 Execution Plan Basics
Peter Larsson (MVP), PASS Scania
Som utvecklare kan exekveringsplanen i SQL Server vara dötrist, och långtråkig och… och alldeles underbar! I denna session visar vi på hur du kan tyda resultaten i vad som verkar en främmande och krånglig karta och hur du kan dra slutsatser ur resultaten. Vi visar också hur man jämför två queries och ser vilken som är mest effektiv.
14.40-15.25 SQL Azure
15.40-16.25 PowerPivot
Thomas Ivarsson (MVP), PASS Scania
PowerPivot is a new client add in for Excel 2010 and a Sharepoint 2010 Server application that can query extremely large datasets, with millions of rows, with an instant response within a second.
PowerPivot is based on the older Analysis Services archtecture but with a new column based in memory engine called Vertipaq that surpasses all previous limits for Excel clients. In this seminar you will get a general introduction to this new application.
- * PowerPivot for beginners
- * What is PowerPivot and why is it important
- * Data sources supported in PowerPivot
- * Import data into PowerPivot
- * What can you do with the new DAX-expressions language for PowerPivot calculations?























