Wednesday, January 27, 2010

The Most Important Question (2)

I respond to some requests for elaboration from the same person (who prefers to remain anonymous). His questions are in italics:

1. Is the issue that LMS's are driving instruction rather than professors? I have some interaction with an LMS at Queen's but in general it's been irrelevant to what I'm doing. As more of a constructivist handling an adult audience I'm concerned about things that matter to them.

Professors would probably say that it is. LMSs restrict what a professor can do, particularly if he or she is not tech-savvy. They then have to follow the default model, which is much more constraining than the open classroom.

But the LMS is only an early, and arguably out of date, technology (see the Google results form the 'VLE is dead' debate in Britain for a good example of this discourse). From the perspective of the student, it doesn't make a whole lot of difference if the professor tells them what to do or whether the LMS does.

In general, new technologies have had a liberating effect. But from the perspective of the managers, from whom people are being liberated, it has a constraining effect.

From a constructivist perspective, you can find LMSs (such as Moodle) designed from a strictly constructivist perspective. You may argue that it has been more or less successful, but it's clear the intent is there.

From my own perspective, I don't see constructivist methodology to be a whole lot more liberating than traditional instruction. Students still receive a great deal of direction from the instructor. They are not free to pursue an alternative learning methodology. This is especially the case when the students are younger, but still applies in adult learning.

2. When you refer to management, do you mean the content of learning or the mode, pace and timing of its delivery? I assumed the latter but thought it was wiser to check.

I mean all aspects of learning, from the definition and selection of subject, to the application in a practical or learning context, and to the mode, pace and time of delivery.

In this regard, it may be useful to create a taxonomy defining the extent of what constitutes 'management' in this context. Because historically a great deal of management has remained in force (for example, the selection of subject matter) simply by redefining it out of the scope of 'learning management'. But in general, if a decision is made, then in my view it constitutes an aspect of 'management'.

3. I am sympathetic to Freire and also to Gramsci (cultural hegemony) and agree that corporate interests will attempt to penetrate universities even further. Is there any way that ed tech can return control to professors and/or to learners in groups or at least those virtually connected?
I don't really see the benefit of returning control to professors, except insofar as this extends to their own research and learning. This though amounts to a negotiation of working conditions, and is consequently a wider issue.

With respect to control over student learning, my own inclination is to allow professors only a minimum of control, ideally none (though it is again part of the research question regarding how much, and under what conditions, learning can occur absent professorial control).

My own feeling is that it may be more useful to discuss the role of the professor in a learning environment, rather than to talking of returning control. If the discussion is framed as one in which professors and corporate interests vie for control over student learning, then we have by stipulation defined out of the frame the idea that the student might control his or her own learning.

What sort of role might be contemplated? I have in the past suggested that this role amounts to modeling and demonstrating the values, abilities and behaviours desired in the student; the idea is that the student (voluntarily) looks to the professor (or, indeed, any professional, or any other person of higher esteem) as an example worth emulating and following; the student learns by attempting to emulate, in an active and reflective manner, the exemplary model.

Another frame you have set: a choice between giving control to professors, or to learners in groups. This is perhaps a preference for social constructivism showing through. Many people argue that learning is inherently social. I think that if you are attempting to learn social behaviours, then you will want most to practice in a social environment, in order to support authentic learning. And many disciplines, even those we see as overtly scientific, such as chemistry, can be defined not simply as a body of content to be retained but rather as a set of values, skills, beliefs and ways of seeing in the world learned and demonstrated as learned effectively by participation in a community.

But there is nothing inherent in learning itself that says that this should be so, and so there are many types of learning, and aspects of learning, perhaps best undertaken by a person working alone. The models we learn from need not be human. There is, for example, a long and viable history of learning from, and studying, and emulating, nature. Much of my own learning takes place in this way. Other forms of learning even in social contexts may be supported not by interaction, but simply by observation.

Monday, January 25, 2010

The Most Important Question

I was asked, what are the most important questions that need to be resolved during, say, the next five or so years?

There's only one: under what conditions can a learner manage his or her own learning?

I know this may seem like I'm being glib (and it has been that kind of day) but I am being very serious.

First of all, we are rapidly entering a period in which there will be a significant shrinkage in the public resources allocated to education. This is already the case in the developing world, and one of the reasons (not the only reason) it remains the developing world. Meanwhile, most online learning to date has been directed toward emulating in-class instruction, there is an increasingly pervasive trend such that subjects are defined as competences and learning material is automatically delivered to learners based on these competences. Could such a system be sustainable, and to what degree could it replace more traditional forms of learning. So there is a genuine human need here, to know whether we can sustain this shrinkage in funding, and if so, how.

Secondly, there is globally an even increasing onslaught of rich media and other content, including even inside schools, which is intended not to educate or to inform, but to sway or to sell. Against this, especially in web 2.0 circles, there is a school of thought modeled loosely along the lines of Freire's 'Pedagogy of the Oppressed' which suggests that people, working for their own benefit and creating their own association, can take charge of their own learning, and hence, their own understanding of the world. If it possible for people to effectively mount a counter to propaganda or corporate-based 'content learning' on their own, or is some manner of public intervention required, and to what scale.

-- Stephen

Learning Styles and Work Styles

Transcript of a Twitter exchange this afternoon on learning styles.

Downes: hmm: if there's no learning styles then there would be no working styles, just one best way to accomplish every task (which your boss knows)

nlowell: @Downes ok that has to be covered by one of your fallacies somewhere

opencontent: @Downes - Learning styles may exist - they just don't make any meaningful contribution to learning. E.g.,

Downes @nlowell it's reasoning by analogy, so if there's a fallacy it'd be 'false analogy', which requires a relevant difference between the cases

@Downes I'm havning trouble w/relationship of "no learning/working styles" and "only one best way"

nlowell: @Downes Learning Style holds that I should ALWAYS learn best in one style. Content/context are irrelevant. Seems absurd on its face.

@opencontent That doesn't save instruction either - "we cannot store up generalizations and constructs for ultimate assembly into a network"

@opencontent @Downes Learning styles determine the choice of the PLE

Downes: @nlowell I think we can all reject extreme single-point interpretations of learning style theory (and still agree people learn differently)

Learning styles have historically been used to have an instructor 'select' an appropriate learning activity (and just as often get it wrong)

Downes: @DolorsCapdet But as Dolors says, learning styles are the ground and license to allow a learner to select his/her own learning activity

Downes: for if there are no learning styles, then there's only one learning style for each content, no matter who you are, and you should use that about

nlowell: @Downes No problem with rejecting single point interpretations, but doesn't that disallow identifying learners as VARK?

Downes: Doing a pretest may or may not help you understand your own style (probably not, with the usual unreliability of tests) but isn't predictive about 4 hours ago from web

nlowell: @Downes perhaps it's semantic but i'm failing to see the link between "no learning style" and "only one style for content"

geoffcain: @Downes Do you have any research on the selecting and getting it wrong bit? I only see learning styles used to ensure multimodal delivery.

Downes: @nlowell The point is that taxonomies like VARK are irrelevant; only instructivists try to test and prescribe; & anyway styles are complex

nlowell: @Downes agree re taxonomy but that appears to be the thrust of the application - instructivist, follow the recipe and learning happens.

scottbw: @Downes there is a big gap between accepting individual differences and asserting there are stable characteristics of individuals

Downes: @geoffcain on 'getting it wrong' - some evidence from the link @opencontent sent see also etc etc

Downes: @scottbw I don't need to assert there are stable difference between individuals, only that there ARE differences (these vary, of course)

@Downes "everyone learns differently, lets see what works for us here, in this time/place" vs. "aha, you're a type B7, you need module 28b"

Downes: It's just like working - sometimes I know I'll do better coding, others tweeting, or writing - it's hard to tell which on any given day...

Downes: what's wrong is making me work the same as everyone else, as if there's only one best way to do work having nothing to do with the worker

geoffcain: @Downes "The authors draw negative conclusions about a field they fail adequately to review," Mr. Sternberg says. I keep running into that!

Downes: @scottbw no, it's more like a recommender system; without learning styles you don't need a recommender, just assign module 2b to everyone

Downes: @scottbw or it's like having a music selector system on your iPod; without individual styles you may as well play everyone the same song

geoffcain: @Downes That article that seems to be against learning styles points to some strong research in favor via Sternberg.

georgeroberts: Hmm not if she's a good boss RT @Downes if no learning styles then no working styles, just 1 best way to do every task (which yr boss knows)

It ends there. Just as well; I had a meeting and people were tuning it out anyways. Twitter just isn't the venue for this sort of thing, clearly.

Friday, January 22, 2010

Research Ethics

Blog summary of an NRC-IIT summary on research ethics held in Fredericton on Tuesday. Speakers were given an opportunity to review the text before it was posted, but not to exercise editorial control.

Will van den Hoonaard

Professor Emeritus, University of New Brunswick 

Ingeragency Advisory Panel on Research Ethics, Government of Canada

‘Vertical ethics’ is the idea that ethics reviews are being required not only at the project level, but also at the institute level and even by journals and publications. Research Ethics Boards (REBs) may vary greatly in their approach.

The Tri-Council Board of Research Ethics (TCB) is mostly concerned with medical ethics. It is concerned with an ethics “protocol”, which specifies a procedure that is not subject to interpretation. But as social researchers we interpret all the time. Also, protocols are unchanging, but we change our plans all the time. We also don’t know all the benefits and the risks – again, the idea of benefits and risks comes from a medical perspective. And again, what is consent? Can you be asked just once, or must you be asked for each event? Confidentiality, again, is interpreted very differently in social research.

Take Mitch Duneier’s book, Sidewalk, for example. A researcher saw a homeless person selling a copy of his book on the sidewalk, and then worked with homeless people for seven years to find out how they live. All the names and the photos are in the book – except for the name of the police officer who takes the books away. This is good – ethical - research.

Or consider Tim Diamond, who in Making Grey Gold took notes while working at a nursing home. Covert research. He almost got caught – an executive was going to find out, but was turned back by the smells. When you read the book, it brought change in the way nursing homes were run. And he treated his subjects with dignity. So here you have two examples that are very different but are still good research.

Deception and covert research – these are very different, but are often brought together. And finally, consider anonymity. This is often hard to practice in social research. The data doesn’t come anonymously – we interview people, we live with people, we record identities in field notes. A small community of 500 people, if you interview someone, everybody in the community will know. Anonymity is not possible; you have to call on other principles.

The new TCPS has 160 pages (the old was 84 pages). There are differences between the two:

          - qualitative research is covered by only 4 paragraphs in the old draft (three of which are warnings). In the new TCPS, 60 pages deal with it.

          - The old TCPS has 8 basic principles. The new has three: respect for persons, concern for welfare, and justice

          - The old TCPS talks about standards and procedures. The new talks about a “compass” for doing research – knowing about ethics, ‘this is what is best’.

          - The new TCPS talks about ‘relative autonomy’ because no person is completely autonomous. So if you are called upon to respect a person’s autonomy, that person has to consider the effects on family members, the university, etc.

Ethics is a relational thing – it’s about relationships. Dignity is intrinsic to persons. Every person has dignity, whether we acknowledge it or not.

This also involves not inventing the motives of others. You would be amazed how often people rush in to explain the motives of others. Consider a letter carrier. What’s the best route to take, do you think? Shortest, easiest? To know, we should ask the letter carrier. We have so little knowledge, so our natural habit should be to resist inventing motives.

Finally, there are exceptions to the TCPS, and we can talk about that.

Q. Different people have different moral compasses, how do we deal with it in practical life?

A. We cannot have them all operating at the same time, as there may be conflicting principles. We have to determine which principle is more important. There are things to help you – the statements in the TCPS, the literature in your own field, examples of other researchers in your own field.

Q. There are different frameworks for defining what is ethical, what is justice, etc. What is the context for defining ethics in the TCPS.

A. The drafters were trying to find the touchstones of ethics and corner them on eight principles. Those principles were brought down to three principles. The interplay of these principles will depend very much on what you are doing. For example, power is a very important element in research relationships. In social research, there’s more power in the research participant – he or she can refuse to answer questions. But it varies in other disciplines. You have to go back to the principles; these principles were well thought out, and have a basis in the philosophical literature.

(den Hoonaard comments later: A point added while editing this talk:  Ethics involves relationships and the “new” TCPS makes a point about how the social context and the discipline itself will turn the core principles in varying emphases.)

Q. But there is the case where you could justify it for the general good of the people (ie., utilitarian)?

A. Yes. That’s why you have to argue your case, and bring to bear your own knowledge of the particular topic, and what other people have done.

Q. Where should the power lie in ethical decisions, with the research, or boards?

A. Ideally, the researcher and the board would have a base of consulting about the issue. But that is very variable – some boards are very stringent, others are more open. So you need to create a climate where you discuss the ethics. There are two aspects – education, and ethics review. The focus is always on review, but by far the best approach is to create a learning approach. You will find, it always boils down to some core principles.

(den Hoonaard comments later: in a later discussion with Stephen, we agreed that researchers should own ethics. It is not something that can be delegated to a checklist or to a body.  Ultimately, it is the researcher who must take the moral responsibility for conducting ethical research.)

Q. You mention, you often have to look to your own field for answers and practices. But what if a method is ethically wrong, but is often used in the field. For example, in computer science, we often see ‘Wizard of Oz’ tests, where people are deceived into thinking something works, when it doesn’t.

A. So how do you thing the basic three principles would apply?

(den Hoonard comments later: My later reflection and discussion with this researcher made me realized that much of social research resembles the “Wizard of Oz” tests.  You see, a number of social researchers, including me, start researching a topic believing that we have selected in advance (and we explain that to the research participants) but in the course of the research we might actually change the focus of our research, sometimes catching us by surprise.  Can one call this a “Wizard of Oz” test in reverse?)

Q. It depends on what you think of as justice.

A. In some sense it sometimes seems like the question is taken away from you. And that goes back to the issue of power. You have to decide. You can’t just fall back on principles. Take them into account – but you should not be alienated from your own research.

Q. One of the problems with ethics in research, partial disclosure is often confused with deception. But the problem is, if you disclosed the whole thing, it would invalidate the research. For example, you might not tell people what you are measuring for. In the consent process, you don’t tell them what you want to find out. Partial disclosure should be completely fine so long as it doesn’t impact their assessment of the risk of their participation.

Now in ‘Wizard of Oz’ experiments, you are actually saying something that is not true. You are saying the computer is doing something, but in reality it’s a person typing into a keypad. So, first, do you need to use deception? The answer is, we don’t know. Do you need to tell the people it’s the computer? Mass and Reeves argue it shouldn’t make any difference, because people treat computers and TVs anthropomorphically. And second, how does this impact the subjects’ risks and benefits? Not a whole lot. But you are lying to them; it’s a bit of an affront to their dignity.

Second speaker

Francis Rolleston

Former  Director, Ethics, Canadian Institutes of Health Research

Chair, National Research Council Ethics Board

My real question is whether the NRC is being optimally served. The NRC’s policies are on the website. “NRC affirms that ethics in research cannot be achieved without excellence in ethics.” The REB is responsible for oversight, and reports to the Secretary General.

We are a moderately active REB, with 153 ongoing files. Reviews can take anywhere from 20 days (for a subcommittee) to 77 days (for a full board review).

(Rolleston comments later: The times stated as taken for review are generally the top end of the ranges that I identified.  The median time (24 days for full Board review, 13 for sub-committee and 2 for Chair review) would be more accurate.  This applies to the second note on my talk and also the long answer to the first question.  The long time of 77 days was an aberration, and involved an initial application that was badly prepared and reviewed over the Christmas period.  If an applicant prepares the application carefully and well (i.e., professionally), review times are at or below the median.  If the application is badly thought through then review times get high.)

The idea of the REB is intended to provide independent research support, that we are part of the research. But we sometimes hear that people think that we are a nuisance and get in the way,

Background: what does the REB do? A project will meet Canadian standards of ethics if it is carried out as described in the documents reviewed and approved.

You get conflicts in ethics when you get values that conflict with each other. How do you determine the results? There are three levels of consent:

          you, the researcher, NRC, the funders, etc

          society, through the Research Ethics Board (REB)

          research subjects, through individual consent

Who is then responsible for questions of research ethics? Ultimately, it’s the researcher. The REB is there to support you, but is not responsible for research ethics.

Does NRC’s REB support research? It scales the review process in relation to ethcis issues, There can be delegated review, generic and template applications, a process of consultation, and where applicable, excuse from REB review.

Does the REB support research? We try to deal with a rapid turnaround and respond quickly to questions.

One question has to do with science review – what right does the TEB have with respect to the science. When we are not qualified? Well – we have 6 people with PhDs, 5 people with extensive experience in ethics, 3 in law, 5 from outside NRC, and 6 bilingual members. So it’s a fairly widely based REB – but it’s not free from idiosyncrasies.

What does REB review? First, protocol. Second, informed consent, which is basically required for all research. If you are asking for an exception here, the REB will need to understand why, otherwise it will not be easy to approve the way you are doing things. Another concern is the question of personal information – whether data is tied to a person, whether data is private.  Unless there is consent for data to be identifiable, then it must be free of identifying information.

The important questions are, does the REB serve the needs of NRC, and second, is research not being done because of REB. Also, if there is research that you are doing, and you have not involved REB, why not?

Q. (Stephen described how he doesn’t use REB because of three points: REB questions about the science, the turnaround, the results never coming back.)

A. We think we’re pretty fast. For example, 15-77 days seems fast to us. Three weeks seems to be a pretty quick turnaround. It’s not long compared to other REBs, it’s not long compared to what it used to be. (My comment – it used to be 90 days, the REB only met once a month. Now, maybe, around 30 days is more likely).  Surveys on the web – we now have a generic protocol for them. (Comment: not that I’m aware of). We have a standard approach for these. For example, anonymous surveys. Should be turnkey, review by the chair, then it comes back. (Comment: I have an online survey, it wasn’t turnkey). Well if we make it a generic review, then we can have a review by the chair, turnaround time 8 days.

Q. I think a lot of people aren’t really clear about when something becomes human subject research. Eg. I’m developing some software, and it’s going to be used by some people at IAR. They’re using something else and say we’re going to give them something better. So we do a requirements assessment. Is that human research?

A. Well, if we’re asking people to do something they would normally do in the performance of their job, it’s not human subject research.

Q. Well, what if I go to Mitel and observe software engineers keyword searching?

A. That gets closer to human subject research, it involves questions of anonymity and privacy, but it’s the sort of thing that could be part of a generic application.

Q. So basically when we talk to users the question comes up. So what do I do?

A. Come ask us. The risk is a privacy risk (comment: and a coercion risk).  (Will van den Hoonaard: the nature of the institution or corporation would also be a factor – the more likely you go out of your organization, the more likely the REB is going to be involved. There is privacy, risk, and whether some corporations want people to be interviewed by outsiders).

Q. What is REB? What does it stand for? Also, is there a template for user interface testing? Because getting permission for testing for each element of user interface gets a bit much.

A. REB stands for ‘Research Ethics Board’, it has been in operation for 20 years, and it must give permission before NRC allows the research to take place.

If there is a standard approach, this is a great case for a generic application, then it will take you 20 minutes to fill out, and it comes to me, and I turn it around right away.

Monday, January 18, 2010

The ACOA Case

Responding to David W. Campbell, who in turn is repsonding to criticisms of ACOA.

The article states:

For 20 years in a row, six companies and a university-affiliated corporation have secured money from the Atlantic Canada Opportunities Agency... the seven firms have received a total of $41.1 million in two decades, according to a list obtained by The Canadian Press under access-to-information legislation.... Eighty-two other firms received cheques from ACOA for at least 10 of the last 20 years, for a total of $203 million.

Campbell replies:

The intent of the article is to discredit ACOA and its funding without any balance... The other thing the journalist missed is any reference to the benefits of the investments in these firms.  There may be none but there may be substantial... if those investments in seven firms ($41M in total) had led to 50,000 new jobs and $500 million in new tax revenue, most reasonable people would be highly supportive of the deals.

OK, first, the article is a hatchet job. But second, the criticisms are valid.

There may be mitigating factors, as you suggest, but they are increasingly difficult to find.

You suggest, "if those investments in seven firms ($41M in total) had led to 50,000 new jobs and $500 million in new tax revenue, most reasonable people would be highly supportive of the deals." True, but the evidence seems to be that the return was nothing like this. We're more likely talking 50 jobs than 50,000.

What was the return? Well, we don't know. But when the same 7 companies get new money every for 20 years, and when an additional 82 companies got money at least 10 of the 20 years, it is arguable that the money is not helping these companies stand on their own. And it makes us wonder how it could be spent in such a way that it does enable companies stand on their own.

Arguably, what is *in fact* happening is that the money is acting as a local industry subsidy, which (when combined with other local industry subsidies, such as special tax deals from the local or provincial governments) gives existing Maritime industries an unfair advantage over companies that might relocate here.

Who is going to set up a frozen food industry when they see that Oxford Frozen Foods has millions of dollars in subsidies? A critic might simply see the $12.1 million subsidy as a way for Oxford to monopolize the local blueberry crop. Certainly their web site (which was suddenly taken down, but you can still see internal pages here ) suggests nothing otherwise. No innovation here!

I don't agree with the proponent in the story, that the way to stimulate corporate activity is to lower taxes. For one thing, taxes are already very low - it's like interest rates, once you get to a 0.5 percent rate, the old refrain 'lower the rate' doesn't work any more.

But what we need to be doing is creting incentives for *new* industry, to provide competition for the old ones. Sobey's has actually started building new stores now that Superstore has established a presence. Kent started opening on Sundays once Home Depot moved in across the street. One could only imagine what would happen if we had competition for Irving in oil, transportation and forestry, of McCains in food production!

Or even better, more ACOA money could be used to seed industries in the new economy. We should be working on biotech, alternative energy, information technology, etc. - high paying jobs with low resource impact. This, of course, requires better infrastructure - lower power rates, better telecommunication (especially wireless), efficient transportation - all things the province lacks.

I propose, from time to time, only half in jest that each person would be eligible for only one gran t in a lifetime. Say, a million dollars. The same requirements upfront would apply - they would still have to have a business plan, they would still have to have a product. And they would need business support, access to infrastructure, and marketing assistance. But once the money ran out, they could come back for more each year; it would be someone else's turn.

Clearly, after a company has received five or ten grants, it's no longer about starting something new. Rather, it becomes all about entrenching their position. At that point, the grant money stops helping the economy, and starts hindering it.

Tuesday, January 12, 2010

Content Delivery Networks

Summary of a tutorial workshop from Bruce Maggs (Duke University and Akamai).

Services and Design

The basic service to the first customers was the provision of access to static content to websites. For example, used Akamai to provide static images for the website. Also, a lot of static objects are delivered to web pages – they don’t appear on web pages. For example, software updates for Microsoft, or virus updates for Symantic. Yahoo uses Akamai to run their DNS service. Apple uses Akamai for realtime streaming of Quicktime. And the FBI uses Akamai to deliver all its content on through Akamai, so there’s no way to access the FBI servers through the site.

Today Akamai has roughly 65,000 servers (was 40,000 servers last summer). There are 1450 points of presence (POPs), 950 networks or ISPs in 67 countries. The first round of servers were typically co-located with ISPs (that’s what OLDaily does too). But that’s expensive; once we got enough traffic we were able to co-locate in universities, ISP, etc to lower the bandwidth coming in. “We’ll serve only your users, but we’re not going to pay for bandwidth, electricity, etc.”

From 30,000 domains Akamai serves 1.1 terabytes per second, 6,419 terabytes per day. That’s 274 billion hits per day to 274 million unique client addresses per day.

Static Content

The simplest service example occurs when decides to outsource its image sourcing. To do this, it may change the URL of its images from to This results in a domain lookup process that points to the Akamai servers instead of’s servers. Image [1] (below) display’s the domain lookup process that happens. In essence, what happens is that the DNS process tells the system that is actually (or some such thing, depending on how the load is distributed). These are called ‘domain delegation responses’.

You may wonder how we get better performance by including all these steps. The answer is that they rarely perform all 16 steps. The DNS responses are cached, and the ‘time to live’ tells the servers how long the data will be valid. At the early stages, or for much-used domains, the time-to-live may be a few minutes or even seconds. But the more stable, static time-to-live values may be several days.

So, the system maps the IP address of the client’s name server and the type of content being requested to an Akamai cluster (Akamai has clusters designed for specific types of content, eg., Quicktime video, etc; Akamai also has servers reserved for specific users in a community, eg., at a particular university). Where the content is not served by an ‘Akamai Accelerated Network Partner’ the request is subjected to a more general ‘core point analysis’.

Note that this is based on the name server address, not the client’s address. It assumes that what’s good for the name server is good for the client. Some ISPs were serving all clients from centralized name servers; we’ve asked them not to do this.

What if the user is using a centralized DNS, like Google’s ( There’s a system called ‘IPanycast’ – the idea is that while many clients name use the same NS IP address, the reality is that IP address is in many different locations – my request to Google’s server will go to the closest Google NS, but Akamai will get some information about the location of that particular name server (NS).

Now – how does Akamai pick a specific server within a cluster? Remember, the client is mapped to a cluster based on the client’s name server IP address. You need an algorithm to assign clients to specific servers based on specific type of content requested. Algorithms:
- stable marriage with multi-dimensional hierarchy constraints (for load balancing)
- consistent hashing (for content types)
Let’s look first at consistent hashing.

Again, Akamai puts specific types of content on specific servers. The content needs to be spread around the servers, to put the same content type on the same server, and to allocate more server space to more popular (types of) content.

Here’s how the hashing works: you have a set U of web objects, each with a serial number, and you have a set of B buckets, where each bucket is a web server. So the function assigns h:U->B. For example, you might have a random allocation function, h(x)=(((a x+b) mod P) mod |B|), where P is prime and greater than |U|, a and b are chosen randomly, and where x is a serial number.

But this won’t work, because you have a difficulty changing the number of buckets. For example, a server might crash. In consistent hashing, instead of mapping objects directly to buckets, you map both objects and buckets to the unit circle. Then you assign the object to the next bucket in the unit circle. When a bucket is added or subtracted, the only objects affected are the objects mapped to the specific bucket that has changed.

One complication occurs with the use of multiple low-level DNS servers that act independently, since each DNS will have a slightly different view of the hash. The consistent hashing algorithm is sufficiently robust to handle this.

Properties of consistent hashing:
- balance – objects are assigned to buckets random
- monotonicity – when a bucket is added or removed, the only objects affected are those mapped to the bucket
- load – objects are assigned to the buckets evenly over a set of views
- spread – an object is mapped to a small number of buckets over a spread of views

Now, how it’s really done is that each object has a different view of the unit circle. Then it is assigned to the first open server in the circle. They have different views because, if one goes down, you don’t want to assign all the objects to the same new server. Rather, because each view of the unit circle is unique, each is assigned to a (potentially) different server.

Now, what about huge events, such as ‘flash crowds’ (a huge crowd where there is no warning) or planned crowds (such as a software release)? For flash crowds, the system is designed so that, in either case, we don’t do anything. It’s designed to be resilient.

Real-Time Streaming Content

How you pay for bandwidth: typically, you sign a ’95.5 contract’ – you sign a contract saying that ‘as measured at 95.5 percent of usage I’m going to pay so much per megabyte’. But of course many people have less than 95.5 – and you can play games with it, to get the 95.5 number lower (serve some content from elsewhere, eg.). Some huge events change the calculations – big events were 9-11, Steve Jobs keynotes, the Obama inaguration (which doubled the normal peak traffic for the day).

Streaming media has, in the last few years, become the major source of bits that are delivered. Streaming media is becoming viable, it’s happening, but slowly. This is mostly U.S.-based, but there are other locations – Korea, for example. Some sports events have been huge – world cup soccer, NCAA basketball games. The photos below show the network before and after the inauguration (yellow is slow, and red is dead). Everybody’s performance on the internet was degraded (Akamai’s belief is that it’s capacity is as big as the internet’s capacity).

There may be redundant data streaming into clusters, to improve fault-tolerance. The actual serving of the stream is typically through proprietary servers, such as the Quicktime server. Up until the proprietary servers, this is a format-agnostic server system. It just sends data – there is very little error-checking or buffering, because it would create too much of a delay at the client end (eg., for stock market conference calls).

A study from 2004 – 7 percent of streams were video, 71 percent were audio, and 22 percent one or the other (too close to tell by bitrate). There were tons of online radio stations, for example. Streams will often (24 percent of the time) create flash crowds, as shows start, for example. Audio streams (in 2004) often ran 24 hours, but none of the video streams did back then. An analysis of users showed that there was a lot of repeat viewership – the number of new users should go down – but it doesn’t go down very fast. There was a lot of experimenting – people would come and go very quickly. In almost all events, 50 percent of the clients show up one day only and never come back. But if you exclude the one-timers, most people watch for a long time, the full extent of the duration.

What makes for a quality stream? Here are criteria that were used: How often does it fail to start? How long does it take to start? How often does it stop and start up again? You could ask about packet loss, but this is a bit misleading – you have to ask about packets that arrive on time to be used (some packets arrive and are therefore not lost but arrive too late to be useful and are thrown away).

FirstPoint DNS

This is the service that is used to manage the traffic into mirrored websites. It’s a DNS service, eg., for Yahoo. We field the request to send people, eg. To the east coast to the west coast. It directs the browser to the optimal mirror. It may be only two mirrors (which is actually hard to figure out) or as many as 1500. Today, content providers typically want to manage their own servers, and only offload embedded content. This creates a mapping problem: how to direct requests to servers to optimize end-user experience. You want to reduce latency (especially for small objects), reduce loss, and reduce jitter.

So, how do it? We could measure ‘closeness’, but this changes all the time. We could measure latency, frequently. But you get a lot of complaints, and there’s too many clients to ping. What they do: on any given day there are 500,000 distinct name server requests on any given day (not individual web servers, higher level requests; eg., university asks Akamai, where’s Yahoo?). That’s way too many to measure.

There are two major approaches:

First, network topology. This is basically a way of creating a map of the internet. Topologies are relatively static, changing in BGP (border gateway protocol - it’s basically a way of the systems to manage rtoutes, exchanging route information to each other) time.

Second, congestion, This is much more dynamic. You have to measure changes in round-trip time in order of milliseconds. So we do accurate measurements to intermediary point. You don’t measure all the way to the end, because any differences are really out of anyone’s control – we can only control different route to ‘proxy points’ (or ‘core points’). The 500,000 name servers reduce to about 90,000 proxy points. That’s still too many, but at least they’re major routers.

Now if we look at these name servers and sort them according to how many requests we get, if we look at 7,000 of them we are getting 95 percent of all requests from them. So we ping these most often. 30,000 proxy points results in 98.8 percent of every request these are pinged every 6 minutes. We use latency and packet loss to guide the algorithm. We also use other data in mapping – about 800 BGP feeds, traceroute to 1,000,000 per week, constraint-based geolocation, and other data.


There have been attacks that have shut down servers. Siteshield redirects traffic from affected servers. Eg. Yahoo was down once on the east coast, and didn’t know until we told them.

What we want to do is prevent DDOS attacks – this is where the attackers take over innocent user computers (called ‘xombies’) which then launch the attack on the service. Akami basically stands in between the content provider and the attacker – this way the provider’s IP address is shielded, and the attacked server can be swapped out. Akamai has a lot of capacity to handle flash crowds, etc., and so can do the load balancing and can resurrect crashed servers very quickly.

Dealing With Failure

Wherever possible try to build in redundancy, decentralization, self-assessment, fail-over at multiple levels, and robust algorithms.

At the OS level, Akamai started with Red Hat Linux and, over time, have built in Linux performance optimizations and eventually created a ‘secure OS’ derived from 2003 Linux that is ‘battle hardened’. Windows was later added for the Windows content, because Windows Media Server runs on no other platform.

To optimize security on the server: disk and disk cached are managed directly. The network kernel is optimized for short transactions. Services are run in user mode as much as possible. And the only way to get into the machine is through ssh.

Akamai relies a lot on GNU (GPL licensed) software. Akamai only runs the code on its own machines. The license says the act of running the program is not restricted – if you want to distribute the program you have to reveal your own code, but they never do that; they never reveal their own code. (Akamai is close to the line in terms of what they are doing with GPL.)

So what kind of failures are there? Hardware failures, network failures, software failures, configuration failure, misperceptions, and attacks. Of these, configuration failures can be the most severe. Hardware and network you expect to fail, and you build around them. Software can be a problem, because it’s difficult to debug, and it is really easy to get feature creep in your system, so that you’re configuring it in real-time.

Harware can fail for a variety of reasons. The simplest way to respond to it is the buddy system – each server in a cluster is buddied with another. You can also suspend an entire cluster if more than one fails. To recover from a hardware failure, you try to restart, and if it doesn’t work, you replace the server.

Network failures result mostly from congestion and connectivity problems. This was discussed above; it is dealt with basically by pinging the proxy points.

For software, an engineering methodology is adopted, Everything is coded in C, complied with gcc. There is a reliance on open source code. There are large distributed testing systems, and there is an ‘invisible system’ burned in – that is, you try the system without any customers on it to see whether it causes crashes, etc. (it’s sort of like a shadow system, it uses real data, but doesn’t actually serve customers). The rollout is staged – first you start with the ghost shadow system, then to a few customers, then system-wide. The software is always backward-compatible with the previous version. In principle it’s possible to roll back to a previous version, but in practice it’s not a good idea, because you’d have completely clean the server.

But – this, which is state of the art, is still not good enough. It’s still fraught with risk. But there isn’t a better answer yet. There are maybe some long term solutions – safer languages, code you can ‘prove’ is correct, etc. You can prove some things, face the halting problem, but there are some things that are still complex.

Perceived failures – a lot of people see Akamai in their firewalls and think it’s attacking them, because they never requested anything from Akamai. Other misperceptions are created from reporting software, customer-side problems, and third-party measurements (eg. Keynote (, which was swamping servers with too many tests – and now there are servers set up close to keynote for speed measurement, and keynote was told their own servers are overloaded). Or – there was another case where the newspaper’s own internet access was down, and they couldn’t receive Akamai-served content.

Attacks – are usually intended to simply overwhelm a site with too many requests. Hacker-based attacks are usually from individuals. Sometimes there are weird hybrid attacks where volunteers manually click on the websiet (eg. To attack the World Economic Forum (WEF)). Maggs summarized some attacks – the most interesting was a BGP attack, because when the BGP link is broken, they flush all data and reloaded the addresses from scratch, which creates a huge increase in traffic.

War Stories

The Packet of Death – a router in Malaysia was taking servers down. Servers negotiate the ‘maximum sized packet’ that can be sent. But someone had configured a server in Malaysia that chopped servers at some arbitrary number – and the Linux server had a bug that failed on exactly that number. But, of course, every time a server went down, a new server was used – eventually sending every server to that router in Malaysia!

Lost in Space – a server started receiving authenticated but improperly formatted packets from an unknown host. They were being discarded because they were unformatted. It turns out, the ‘attack’ was coming from an old server that had been discarded and was trying to come back into the network. We probably have servers in our rack somewhere that have been running for 10 years that we don’t know about. (Of course, this does raise the question of where you keep your secret data, like keys, if servers can be ‘lost’).

Steve Can’t See the New Powerbook – a certain ‘Steve’ who is very famous had a problem – his assistance Eddie explained that Steve’s new computer can’t see the pictures. Went through the logs, found no evidence he had tried to access the images. Eddie snuck into Steve’s office to try to access the images. No image appeared, no request was sent. Eddie was ‘not allowed’ to tell what OS and browser were being used (Safari on OS X). It turned out that the Akamai urls were so long the web designer put in a line feed. Browsers like Internet Explorer compensate for this, but the new browser didn’t.

David is a Night Owl – one person would start doing experiments at 1:00 a.m. pacific time, and would send a message (at about 4:00 eastern) saying that the servers aren’t responding. But he was using the actual IP address. He asks, “why don’t you support half-closed connections”. It will be out in two weeks. What about “transactional TCP”? We will not support transactional TCP – because it starts and finishes the transaction in a single packet. By sending one packet to a server you can blast data and spoof your source address.

The Magg Syndrome – got a call one day saying Akamai was ‘hijacking’ a website. “I became the most hated person on the internet.” It was when people went to they got the Akamai server, and they would get an Akamai error page. It was a major problem, because Gomez was covering the internet boom and was popular. “We were getting 100,000 requests a day for websites that have nothing to do with us.” First – debrand the website (changed to Then track where people were trying to go. The number kept going up and up and up. The problem was – when you issue a DNS request to your local server, you provide the name you want resolved, an identifier for your request, and the port where you want to receive the answer. All the problems were happening on one (the most popular) operating system – the operating system wasn’t checking that the name in the answer was the name in the request.

What’s Coming

It’s now possible to run websphere applications on Akamai servers. Most customers, though, depend on a backend database, which is not provided by Akamai. So there needs to be a way to provide database caching at the server level.

Physical security is becoming more of an issue. This is increasing some costs.

There is also the issue of isolation between customer applications (ie., keeping customer applications separate from each other). Akamai is not providing virtual machines, not providing a general purpose plan (so everyone is running under the same server).

Energy management is also an increasing issue. The servers consume more energy than everything else in the company combined. It would be nice to save energy, but that’s difficult, because people are reluctant to turn a server off when it’s idle – it may take time to restart, it may be missing updates, it creates wear and tear on a server (from temperature changes). But even if Akamai can’t save energy, it can save money on energy or use greener sources of energy. Eg. You can switch to servers that are using off-peak energy, etc., or adapt server load to exploit spot-energy prices in the U.S.

Monday, January 11, 2010

The Great Convergence

Summary of a talk by John Paul Shen, head of the Nokia research Center, Palo Alto, at the IEEE Consumer Communications and Networking Conference.

There is a collision of two different cultures (open, closed) and four different industries (internet, computers, mobile phone, cellular networks).

Eg. We at Nokia want to play in this space too, we want to be more than just devices. Similarly, companies that are in computer industry want to sell mobile devices.

In this convergence space, there isn’t an existing dominant incumbent.

Also, you have other industries coming in – automotive, consumer electronics, entertainment, content industries.

The next 5-10 years will be a phase of phenomenal innovation, because you have these industries coming together. It’s going to be a golden age for researchers, because of the opportunity to create impact.

Nokia research is focused on four areas:
- rich context modeling
- new user interface
- high performance mobile platforms
- cognitive-radio (ie., new uses for radio spectra)

Nokia’s research approach:

- first, an organic team formation – researchers with broad backgrounds and a practical focus, people with a broad view of what counts as research (can publish at top conferences, or can sit down with developers); researchers ‘vote with their feet’ to join whatever team they want; people can create new teams, that operate like a startup, and we depend on a charismatic team leader to drive the process

- second, we do end-to-end rapid prototyping (a McGyver approach) – research moves from a stable phase to ‘Cambrian phase’ with lots of entropy, and then to a stable phase – we are in a Cambrian phase, which means research much be very quick, to seize opportunities

- third, we thrive in the ‘valley of death’ that lies between pure research and market opportunity – a lot of research dies in this valley, but this is where we live

- fourth, we build strategic partnershios in an open innovation model

The future of computing:

- is not yet completely mobile; we will be purely mobile – it’s mobile like an umbrella is portable, but in the future it will be portable like glasses are portable

- is not yet really personal, others can use our computer, it’s personal like ‘my hat’, but in the future, it will be personal like ‘my denture’

Three possible approaches:

- rich approach – this is the technology approach, most tech companies want to do this, they just add more and more features

- thin approach – you use a thin client, and you come to us for everything – this is a business perspective

- but what you really want is the third approach, called ‘fit’, which is based on users’ needs – you have to focus on what users really want; users are more diverse than ‘PC users’ – you have to look at what people really need and provide them with what they need

You want to maximize your user experience to energy ratio – maximize the user experience, and minimize the energy consumed.

Energy is a huge issue – not just of devices, but also back-end servers. We need to improve energy efficiency by at least 20x, if nit 50x. A laptop is 20 watts, the portable is 3 watts, and we want to get it down to 1 watt. Also, a genuinely portable device needs to be always on – it may be doing a lot of things in stand-by mode, but we want it to consume zero power.

We also want smart parallelism (parallelism is the use of many small CPUs instead of one big CPU). We need a flexible energy-efficient parallelism; we want to adapt the parallelism relative to workload demands. We are also moving from SOC (system on chip) to SOS (system on stack) (where a stack is a bunch of chips connected together). These combine with form factors – eg., you might want your phone to be bendable.

In the area of user experience, you want, first of all, ubiquitous interoperability. All your digital devices have to interoperate seamlessly. Nokia would like to see your phone as your primary hub controlling all these devices. And not only devices, but also content. Eg., you have a camera, you upload photos to your PC, you upload to Flickr, and you have images on your phone, etc. – you have to provide seamless access to all your content.

Additionally, your device should be context-aware – it knows a lot about you, and beyond just a personal thing, but can we also aggregate and crowdsource information, to do this big social science? Finally, we want ‘delightful and playful interactions’. This (of course) needs to be done in a privacy-preserving way.

Some Nokia projects:

- the automotive environment – can we get Nokia to interoperate with the car? The main problem is that the car will take 7-8 years in development, and then after it’s sold, it will be kept 7-10 years. Development of the phone now is 1-2 years. So we need some sort of seamless interoperability to be able to leverage mobile phone innovation to add functionality to the car.

- community-enhanced traffic - we should never be surprised by a traffic jam, there are people who have already hit it, they should be able to tell me; in the same way, if I can offer up a little bit of my GPS information, I can help them; so we can crowd-source traffice information even on roads that don't have sensors.

Thursday, January 07, 2010


I haven't made desktop wallpaper for ages... but this is a good exception, I think. From the video here, 3:15.

Monday, January 04, 2010

Questioning Pedagogy

I was given a reminder of the promises I made regarding my talk next week, which has to do with the pedagogical underpinning for personal learning.

Probably others have a much greater base of knowledge regarding pedagogy particularly, so I hesitate to offer models for specific situations. I would imagine people in the field have those well in hand.

That said, I think that I approach the subjective from a perspective that is very non-standard (and possibly, therefore, non-useful) in the educational literature.

Specifically, although we can speak of objective-oriented or goal-directed learning (and hence of a pedagogy that leads us there) I think that such a view is somewhat misrepresentative of what actually happens in learning, and therefore of what effective learning looks like. because I do not have a 'content-based' view of learning, I do not view learning (simply) as the acquisition and memory of new knowledge, and therefore do not trade in theories discussing the integrity of messaging or method of construction of that message.

Yes, we can focus in on one item of study, but in the ordinary course of events (and in the ordinary course of a classroom) we in fact acquire information across a broad spectrum, and this information does not accumulate facts, but rather, stimulates the growth and development of a neural network, such that one's learning is not a set of propositional storage, or even a (propositionally identifiable) skill, but rather, a complex set neural connections manifest in altered dispositions to behave (respond / think / react ) across a wide spectrum of cases, and not the particular subject.

So my view on learning, more generally, and without respect to the subject-specific exceptions we call domain learning, is centered around richness and diversity of the learning experience. I am interested in the sorts of experiences that will manifest themselves in useful dispositions (or habits of mind) across a wide spectrum of disciplines, where these dispositions are not taught as content, but rather, acquired as habits, through repeated exercise in increasingly challenging environments.

Thus learning (and pedagogy) as I see it is more about the development or creations of capacities (such as the capacity to learn, capacity to reason, capacity to communicate, etc) where these capacities are (again) not 'subjects' but rather complex developments of neural structures - more like 'mental muscles' than anything else. So (to carry the analogy), yes, you can focus on a certain muscle, or you can focus on a certain sport, but only at the expense of your wider fitness - an d a cross-training approach would be more appropriate.

The role of technology is to place learners into these environments. Technology - thought of from a learning perspective - is not a carrier of educational content, nor even a locus of educational activities, but rather, the provision of a space, a facility, where someone can exercise mental capacities through increasingly challenging experiences (where these experiences may (as they typically do) or may not involve other people. Playing skill games, blogging and being challenged, taking part in online debates, organizing people in a MMRPG, these are all not simply increasingly engaging, but increasingly challenging, educational environments.

In my view, content - including educational materials, OERs, communications, blogs, conversations, and the like - is the raw material we use to create these environments, and the stuff we work with in order to exercise and grow our capacities. We become, for example, more mentally agile by analyzing and creating arguments (as opposed to by remembering arguments). It is our work with the argument that develops our capacity, not our acquisition or storage of the argument. (Just so: it is our work with various forms of mathematics that develop our capacity to think formally and abstractly (and not our memory of mathematical formulae)).

Does that make sense? Perhaps there are pedagogical antecedents - again, as I say, others would be more likely than I to know these. What do you think?